Project Metamorphosis: Unveiling the next-gen event streaming platformLearn More

ksqlDB: The Missing Link Between Real-Time Data and Big Data Streaming

Is event streaming or batch processing more efficient in data processing? Is an IoT system the same as a data analytics system, and a fast data system the same as a big data system? These questions stem from two philosophies of data analytics architecture: state orientation and event orientation. This post will reflect on both concepts and their evolution, as well as highlight the most recent event streaming capabilities found in ksqlDB.

State orientation vs. event orientation

In 1887, Dutch paleoanthropologist Eugène Dubois found the fossil remains of “Java Man” (Anthropopithecus erectus) in Indonesia. He soon proclaimed that he had found the missing link, the evolutionary species that served as the connection between apes and Homo sapiens. Later, new discoveries appeared such as “Lucy” (Australopithecus afarensis)—also thought to be the longed-for link at one point—but the idea of a linear evolution of the human species was eventually abandoned during the 20th century. It was replaced by the theory of a branching tree evolution, where development was believed to be diversified into different species with similar characteristics that coexisted over time.

Within data analysis technologies, also subject to continuous evolution, we can distinguish two main currents that have inspired data architecture: state orientation and event orientation.

Under the first paradigm is the classic three-layer architecture: service-oriented architecture (SOA), microservices, and traditional business intelligence (BI) systems, in which data is exposed in the form of services and BI universes or operational APIs, which are consumed by applications or analysts. In this variant, data is requested (pulled) to know the state of an element or set of elements at a given time.

Under the second paradigm is event-driven architecture (EDA), or fast data, as well as the centralized architectures of IoT and streaming analytics. In this case, the data is delivered (pushed) in the form of events that report changes in state. Jerry Mathew explains these concepts in detail.

Data analysis and architecture complexity

The technological implementation of state orientation versus event orientation varies.

In the case of state orientation, it is based on a database system that acts as a source of truth. This system’s data is exposed to applications through synchronous calls or queries via SQL or an API.

With event orientation, fundamentally the event manager or queue manager allows the applications to store and distribute events in an asynchronous way. In the IoT world, there are also local Data Distribution Service (DDS) technologies that allow for peer-to-peer networks to distribute events, which are very useful in applications that want to avoid centralized dependencies.

If the application requires state orientation and event orientation simultaneously, it is necessary to implement both in the architecture. Any modern IoT or big data platform typically incorporates elements for both event streaming and batch processing with push and pull query mechanisms, at the expense of introducing complexity into the architecture.

The first pre-hominids in databases

This section will focus on the evolution of event streaming technologies and data analytics into new intermediate architectures, which adopt elements of both event streaming and batch processing.

Similar to what some believe are the first pre-hominids like Ardipithecus ramidus, which maintained the characteristics of an ape but began to adopt the first human traits, databases are beginning to develop capabilities for reporting events and vice versa.

  • Databases with SQL push notifications are available. Platforms such as Microsoft SQL Server or Google’s Firebase allow you to send notifications of changes to clients to store a cache of the database. It is very useful for getting mobile applications to work without connectivity.
  • Database replicators are proliferating (e.g., Debezium, Qlik Replicate™, or Oracle GoldenGate). These allow changes to be propagated to another database in the backend. Its usefulness is not in providing high availability but in implementing ETL (extract, transform, load) processes, replicating the data in another destination to apply a change.
  • Event managers adopt SQL syntax. Normally, they incorporate processing engines based on programming languages such as Scala, Python, or Java. To facilitate adoption among SQL developers, they’ll also incorporate programming frameworks in SQL. In IoT, HiveMQ is a SQL manager for MQTT, a popular messaging manager for telemetry, which is already part of the OASIS standards.

ksqlDB: The missing link to event streaming databases

As we continue on this evolutionary path, Confluent, founded by the original co-creators of Apache Kafka®, introduced a new database for event streaming called ksqlDB. This technology turns the well-known streaming SQL engine into a powerful event streaming database with real-time querying functions.

Imagine, for example, an IoT application used to collect changes in the state of a vehicle’s sensors. ksqlDB allows, by means of SQL code, the creation of stream-type tables that register in an immutable way a log of all the events. In addition, it incorporates numerous connectors for automatic synchronization with very diverse sources.

But if we want to know the status of the sensors at a given time, ksqlDB can materialize the previous stream in a view or table, in which the key is the ID of each sensor. Thus, the most up-to-date sensor value appears in different columns. In this way, the application can obtain data via SQL using a classic pull query to know the status of a sensor:

SELECT instant_velocity, current_latitude, current_longitude
FROM sensor_assets
WHERE vehicleid = '19012015' AND sensorid = "01031975";

But it is also possible to launch the query to continuously subscribe to status changes by simply adding the EMIT CHANGES statement:

SELECT instant_velocity, current_latitude, current_longitude
FROM sensor_assets
WHERE vehicleid = '19012015' AND sensorid = "01031975"

In this way, the application continuously receives the events of state changes, which allows for alerting or application of a machine learning model.

ksqlDB also enables high performance, scalability, high availability, fault tolerance, and low latencies. Is ksqlDB just another pre-hominid in our technological evolution, or is it the real missing link between big data and fast data systems?

Open source innovation is accelerating exponentially, and ksqlDB is an ongoing effort to simplify your architecture.

Get started with ksqlDB

Ready to check ksqlDB out? Head over to to get started, where you can follow the quick start, read the docs, and learn more!

A version of this post was originally published by Guillermo Gavilán on the Empresas blog.

Guillermo Gavilán is a telecom engineer at Telefónica, where he serves as manager of the BI & Big Data Infrastructures and Data Governance Team. He started out 20 years ago as a developer of intelligent network services, where he acquired experience in real-time systems. His expertise is in telecom networks and IT technologies and architectures, and for the last five years, he has been focusing on big data and analytics. He is also a hobbyist musician and an enthusiastic father.

Did you like this blog post? Share it now

Subscribe to the Confluent blog

More Articles Like This

Announcing ksqlDB 0.11.0

We’re pleased to announce ksqlDB 0.11.0, which takes a big step forward toward improved production stability. This is becoming increasingly important as companies like Bolt and PushOwl use ksqlDB for […]

I’ve Got the Key, I’ve Got the Secret. Here’s How Keys Work in ksqlDB 0.10.

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately […]

How PushOwl Uses ksqlDB to Scale Their Analytics and Reporting Use Cases

Using a declarative SQL-like interface, ksqlDB makes it easy to integrate event streaming applications into any tech stack. This article illustrates how ksqlDB was added to PushOwl’s Python tech stack, […]

Sign Up Now

Start your 3-month trial. Get up to $200 off on each of your first 3 Confluent Cloud monthly bills


上の「新規登録」をクリックすることにより、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします : プライバシーポリシー

上記の「新規登録」をクリックすることにより、お客様は以下に同意するものとします。 サービス利用規約 Confluent からのマーケティングメールの随時受信にも同意するものとします。また、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします: プライバシーポリシー

Get Confluent Cloud

Get up to $200 off on each of your first 3 Confluent Cloud monthly bills

Choose one sign-up option below


  • AWS
  • Azure
  • Google Cloud

  • Billed through your Cloud provider*
  • Stream only on 1 cloud
*Billing admin role needed


  • Billed through your Cloud provider*
  • Stream only on 1 cloud
  • Billing admin role needed

*Billing admin role needed


  • Pay with a credit card
  • Stream across multiple clouds


  • Pay with a credit card
  • Stream across multiple clouds

上の「新規登録」をクリックすることにより、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします : プライバシーポリシー

上記の「新規登録」をクリックすることにより、お客様は以下に同意するものとします。 サービス利用規約 Confluent からのマーケティングメールの随時受信にも同意するものとします。また、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします: プライバシーポリシー

単一の Kafka Broker の場合には永遠に無料

商用版の機能を単一の Kafka Broker で無期限で使用できるソフトウェアです。2番目の Broker を追加すると、30日間の商用版試用期間が自動で開始します。この制限を単一の Broker へ戻すことでリセットすることはできません。

  • tar
  • zip
  • deb
  • rpm
  • docker
  • kubernetes
  • ansible

上の「無料ダウンロード」をクリックすることにより、当社がお客様の個人情報をプライバシーポリシーに従い処理することを理解されたものとみなします。 プライバシーポリシー

以下の「ダウンロード」をクリックすることにより、お客様は以下に同意するものとします。 Confluent ライセンス契約 Confluent からのマーケティングメールの随時受信にも同意するものとします。また、お客様の個人データが以下に従い処理することにも同意するものとします: プライバシーポリシー

このウェブサイトでは、ユーザーエクスペリエンスの向上に加え、ウェブサイトのパフォーマンスとトラフィック分析のため、Cookie を使用しています。また、サイトの使用に関する情報をソーシャルメディア、広告、分析のパートナーと共有しています。