Project Metamorphosis: Unveiling the next-gen event streaming platformLearn More

Confluent 5周年の節目にイベントストリーミングを振り返る – Part 2

When people ask me the very top-level question “why do people use Kafka,” I usually lead with the story in my last post, where I talked about how Apache Kafka® is helping us deliver on the promises the cloud made to us a decade ago. But I follow it up quickly with a second and potentially unrelated pattern: real-time data pipelines. These provide a different set of motivations for using an event streaming platform than scaling and microservices: specifically, the need to produce analytics results and business insights faster than the next day, which has been the tradition most of us have received since early on in our careers.

If it’s not real-time data, it’s old data

When I was a younger developer (well, when I was a younger developer, I was writing firmware on small microcontrollers whose “database” consisted of 200 bytes of RAM, but stick with me here)—relational databases had only recently become mature and stable data infrastructure platforms. Around that same time, the disciplines, tooling, and consulting companies that would come to define business analytics were just being formed. The pattern was that every night, your ETL process would dump that day’s transactional activity into your newfangled analytic data warehouse.

It wasn’t just the best we could do; it was a revolutionary achievement that brought powerful new insights into the hands of business leaders faster than they had ever had them before. This was in the days in which “please allow four to six weeks for delivery” was still a recent echo in the air, and next-day analytics sounded really fast. Batch was good enough.

Until it wasn’t. Business is much more globalized than it was 30 years ago, and a nightly cadence doesn’t make as much sense when it’s not clear when “night” is for a global enterprise. Consumer expectations have also shifted dramatically from the era of four-to-six-weeks to “wait, that’s not on Prime?” In other words, working with yesterday’s data just might not be possible. You are probably being asked to deliver more than that.

You see signs of this tension in shrinking ETL batch times: overnight was the original gold standard, then data architects figured out how to run batches hourly, then they started trying 15-minute batches, and so on. This is a nice evolutionary trajectory, but eventually it shows signs of a strained paradigm. And the last thing you want is to be responsible for delivering on executives’ demands when the tools you have available to you are under stress.

Batch vs. real-time streams of data

So, businesses need data-driven insights based on things that are happening right now, and that’s where real-time data pipelines come in. Whereas it’s practically impossible to pull data from a database and cut your batch times down to minutes, once you have an event streaming platform in place, it is relatively trivial to get those results in seconds.

Perhaps the easiest-to-understand use case here is fraud detection. It’s just not valuable to know that fraud took place in a transaction you cleared yesterday; maybe you can provide the DA with evidence months down the line, but your money is gone. Instead, you need to know that it’s happening right now, inside the loop of the transaction clearing process itself, so you can refuse the transaction in real time. This has to be real time. Why? Because on the other end of this transaction, there is a human who needs immediate confirmation of a trade being made or that a transaction has successfully gone through. Industry heavyweights like Capital One use event streaming on Kafka for this very task.

Of course, there are countless uses for real-time data pipelines beyond fraud detection in the finance and retail industries. For example, the software stack of prescription benefits provider Express Scripts was originally built around a mainframe. And hey, plenty of things are, and sometimes they work just fine. However, mainframes can be costly, and often do not lend themselves to architectures that are later on described as being “agile” or “low latency” or other things we normally like. Accordingly, Express Scripts has refactored its data architecture to a low-latency data pipeline based on the Confluent Platform. They’re a great example of a business that couldn’t easily remain competitive using technology paradigms of decades past. They responded appropriately, and are reaping the benefits.

Many more examples are coming to light every day, and if you’d like to learn more about how to build this kind of thing and how real-time pipelines fit into broader business concerns, I can’t think of a better use of your time—if you will pardon a moment of promotion—than attending Kafka Summit, where Capital One and Express Scripts shared their stories last year, and where many more developers are set to share their experiences in a few weeks.

If you’re convinced but your boss isn’t, I’ve got you covered. I hope to see you there.

Other articles in this series

Tim Berglund is a teacher, author and technology leader with Confluent, where he serves as the senior director of developer experience. He can frequently be found at speaking at conferences in the U.S. and all over the world. He is the co-presenter of various O’Reilly training videos on topics ranging from Git to distributed systems, and is the author of Gradle Beyond the Basics. He tweets as @tlberglund and lives in Littleton, CO, U.S., with the wife of his youth and their youngest child, the other two having mostly grown up.

Did you like this blog post? Share it now

Subscribe to the Confluent blog

More Articles Like This

Real-Time Small Business Intelligence with ksqlDB

If you’re like me, you may be accustomed to reading articles about event streaming that are framed by large organizations and mountains of data. We’ve read about how the event […]

Sharpening your Stream Processing Skills with Kafka Tutorials

In the Apache Kafka® ecosystem, ksqlDB and Kafka Streams are two popular tools for building event streaming applications that are tightly integrated with Apache Kafka. While ksqlDB and Kafka Streams […]

Featuring Apache Kafka in the Netflix Studio and Finance World

Netflix spent an estimated $15 billion to produce world-class original content in 2019. When stakes are so high, it is paramount to enable our business with critical insights that help […]

Sign Up Now

Start your 3-month trial. Get up to $200 off on each of your first 3 Confluent Cloud monthly bills


上の「新規登録」をクリックすることにより、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします : プライバシーポリシー

上記の「新規登録」をクリックすることにより、お客様は以下に同意するものとします。 サービス利用規約 Confluent からのマーケティングメールの随時受信にも同意するものとします。また、当社がお客様の個人情報を以下に従い処理することを理解されたものとみなします: プライバシーポリシー

単一の Kafka Broker の場合には永遠に無料

商用版の機能を単一の Kafka Broker で無期限で使用できるソフトウェアです。2番目の Broker を追加すると、30日間の商用版試用期間が自動で開始します。この制限を単一の Broker へ戻すことでリセットすることはできません。

  • tar
  • zip
  • deb
  • rpm
  • docker
  • kubernetes
  • ansible

上の「無料ダウンロード」をクリックすることにより、当社がお客様の個人情報をプライバシーポリシーに従い処理することを理解されたものとみなします。 プライバシーポリシー

以下の「ダウンロード」をクリックすることにより、お客様は以下に同意するものとします。 Confluent ライセンス契約 Confluent からのマーケティングメールの随時受信にも同意するものとします。また、お客様の個人データが以下に従い処理することにも同意するものとします: プライバシーポリシー

このウェブサイトでは、ユーザーエクスペリエンスの向上に加え、ウェブサイトのパフォーマンスとトラフィック分析のため、Cookie を使用しています。また、サイトの使用に関する情報をソーシャルメディア、広告、分析のパートナーと共有しています。