Fast, Frictionless, and Secure: Explore our 120+ Connectors Portfolio | Join Webinar!

Jan 19, 2021読み取り時間: 4 min

Better to Be Wrong Than Vague: Apache Kafka and Software Architecture Predictions for 2021

Jan 19, 2021読み取り時間: 4 min

On a recent episode of Streaming Audio, Gwen Shapira, Michael Noll, and Ben Stopford joined me to hold forth about the near future of Apache Kafka^® and software architecture in general. House rules were that predictions could cover any topic, but they had to be “precise” in the spirit of Bob Metcalfe, who, back in the 90s, famously predicted a particular day the dot-com bubble would pop, under the theory that it was better to be wrong than vague. At least we all felt pretty sure that our predictions couldn’t possibly be worse than the ones being made at the same time in 2019.

10 million partitions in a single production Kafka cluster

We began, fairly enough, with Kafka itself. Gwen started us off by predicting that by the end of the year it will be possible to run a Kafka cluster with 10 million partitions, facilitated by some consequential architectural changes: KIP-405 (Tiered Storage) and KIP-500 (ZooKeeper removal). These KIPs enable growth in the number of partitions by moving data out of the cluster proper and enabling metadata to be managed in a more scalable and robust way.

Double the size of a Kafka cluster in seconds

Ben predicted being able to double the size of a Kafka cluster in seconds, a task specifically enabled by Tiered Storage. Tiered Storage is often understood simply as a cost-savings and storage play, which is fair enough. People using Kafka as a system of record tend to want longer retention periods, and Tiered Storage is an obvious enough improvement to the economics of that architecture, but it doesn’t stop there. You also get quick autoscaling. Because so much state gets offloaded to your friendly neighborhood cloud object store, when you go to scale brokers, there is significantly less data to move around.

Another architectural benefit is a potential performance boost: data in the object store tier is accessed over the network, with the presumption that it is accessed less frequently than data still on disk. If one plays one’s cards right, that local hotset can fit entirely into the broker’s page cache, making I/O on the hotset a vastly faster proposition. Remember, when it comes to data access patterns, the power law works for you; you don’t work for it.

Streaming everywhere

Event streaming Michael’s prediction, for which there was consensus (see what I did there, KIP-595?), was the continued growth of Kafka-like streaming features in products across the data landscape: from relational stalwarts like Oracle, to Redis, to traditional messaging systems like RabbitMQ. Users have increasingly come to expect features that will let them work with real-time, unbounded datasets, and vendors tend to notice things that users expect.

Given that this transition to understand systems “events first” is well underway and already looks rooted in the emerging software architecture consensus, I will see Michael’s prediction for the year and raise him another couple of decades: I predict event streaming will be seen as the dominant paradigm of this generation’s software architectures.

Multi-paradigm products

So it’s clear that event streaming is happening, but another question is how it can best be added to existing database products, since many existing tools came to life before this paradigm was yet a thing. To begin with, companies each have their own idea for how streaming should even be defined, as Michael has seen firsthand with his work on the committee writing the SQL standard’s streaming extension. And it can be hard to retrofit an existing product built under batch- or state-oriented assumptions, particularly when one wants to operate it at scale. As Gwen pointed out, it may even require completely new data structures to make a truly successful multi-paradigm solution.

As a side note, the broader Kafka ecosystem is making its own claim on multi-paradigm status, since it began with streaming and later added ksqlDB, which brings database concepts and SQL itself into an event-driven system.

Conclusion

So our money is on the table. I must say that I would be surprised if when I’m starting to roll into my Christmas playlist in October of 2021, streaming isn’t even more on the minds of those in the industry than it is now. I don’t know that it will be completely mainstream, but if you’re not doing it already by then, or at least thinking seriously about it, you might start to feel a bit behind the zeitgeist. It would also be surprising if by that same time the effects of KIP-500 and the rest of the gang haven’t started to make their mark in the community’s collective imagination, as we continue to think about what we might build with Kafka next.

Interested in more?

If you want to hear the episode for yourself, have a listen to Streaming Audio and make sure to subscribe through Apple Podcasts or wherever fine podcasts are sold.

Listen Now

Tim Berglund は、StarTree の開発者広報部門のリーダーとして、さらには教師や作家としても活躍しています。米国や世界中のさまざまな会議で頻繁に講演者を務め、O'Reilly トレーニングビデオの共同プレゼンターとして Git から分散型システムまで多彩なテーマをカバーするほか、『Gradle Beyond the Basics』の著者でもあります。ツイート (@tlberglund) やブログ投稿 (http://timberglund.com) に加え、ポッドキャスト (http://devrelrad.io) の共同ホストとしても発信しており、米国コロラド州リトルトンに長年連れ添った妻と末子と共に暮らしています（他の2人の子供はほぼ成人に近い年齢）。

Get started with Confluent, for free

Get started

Watch demo: Kafka streaming in 10 minutes

Watch now

このブログ記事は気に入りましたか？今すぐ共有

Getting Started with Generative AI

Oct 24, 2023

This series of blog posts will take you on a journey from absolute beginner (where I was a few months ago) to building a fully functioning, scalable application. Our example Gen AI application will use the Kappa Architecture as the architectural foundation.

David Peterson

GPT-4 + Streaming Data = Real-Time Generative AI

Jun 8, 2023

ChatGPT and data streaming can work together for any company. Learn a basic framework for using GPT-4 and streaming to build a real-world production application.

Michael Drogalis