Streaming and mainframe: old hardware architecture meets modern software

If companies want to rely on digital business models, their outdated data warehouse and messaging architectures often turn out to be a hindrance. A new, streaming-driven approach helps to increase the flexibility, agility and scalability of the infrastructure – especially when data from the mainframe also needs to be included.

Kai Wähner is Field CTO of the company Confluent, which complements Kafka with over 120 connectors and data stream processing with an enterprise level of security and governance – including for mainframes. Apache Kafka is an open-source licensed distributed streaming system used for stream processing, real-time data pipelines, and high-scale data integration. Kafka was developed at LinkedIn in 2011 to handle real-time data feeds. Since then, it has rapidly evolved from a messaging queue to a full-fledged event streaming platform capable of handling more than a million messages per second and trillions of messages per day.

Because with real-time data analysis technologies, it is now possible to integrate mainframes, which are discredited as isolated silos, into modern streaming platforms. In addition to the free software Apache Kafka, which is used in particular for processing data streams, this also includes, for example, Apache Kudu (as a fast analysis engine for Hadoop) or Spark Streaming, a framework for writing applications with streaming data.

Since the streaming platforms were developed without the mainframes in mind, a viable way is needed to ingest mainframe data in a format that the streaming platforms support. There are several options that theoretically make this possible. We asked Kai Wähner which ones exist and what is important.

Mr. Wähner, what is the streaming that Confluent implements based on Kafka? This term is quite vague in usage due to its use by end users.

In traditional environments, data is chained to your applications in data silos. The applications are in turn firmly integrated with other applications via tailor-made connectors, so that any flexible, comprehensive data use is very complex and cumbersome.

Data streaming decouples applications from each other and gets the data flowing, which is particularly important in widely distributed environments, for example in branch banks or retail chains. Events – which can be anything from an online purchase to a system malfunction – generate data in real time, which is then transmitted to the streaming platform. It saves this unchanged in the order in which it was received, making it the central source of information.

In addition, the content of the data is assigned to so-called topics, which can be tailored to the needs of the applications. A modern streaming platform can also easily connect to traditional data sources such as databases, connecting event data with inventory data. If applications access streaming platforms, they subscribe to the topics they need and thus automatically receive the most up-to-date data that is important to them in real time. These bring the apps themselves into the format that suits them, so the complex ETL process is no longer necessary.

Virtually any application can be connected using suitable connectors. These legacy or cloud-native connectors are often already provided in a modern streaming platform.

So this is a data infrastructure focused on data in motion: why does mainframe data even belong on such a streaming platform? After all, mainframes actually implement the concept of central data management.

The fact that mainframes or other legacy platforms can be integrated into streaming environments such as Confluent, which is based on Apache Kafka, is one of their main advantages: Because banks, insurance companies, retailers and large industrial companies keep immeasurably large and valuable databases in their mainframe applications. In addition, the mainframe applications are usually very functional and powerful, and have been tried and tested over many years. It is neither necessary to do without them nor are they easy to detach.

On the one hand, the constantly emerging real-time events, for example from trade or IoT environments, should be saved and made available in precise documentation, on the other hand, the inventory data should also serve as a source of knowledge. Only the joint use of both often generates new insights and enables new business models.