This blog post is written jointly by Stephan Ewen, CTO of data Artisans, and Neha Narkhede, CTO of Confluent. You can also find this post at the Confluent blog.
The open source stream processing space is currently exploding, with more systems becoming available presenting users with many alternatives. In the Apache Software Foundation alone, there are now more than 10 stream processing projects, some in incubation and others graduated to top-level project status.
While the availability of alternatives benefits the industry and the users of these systems by enabling competition and thus, encouraging innovation, it can also be quite confusing: with all these options, which one is right for me both now and in the future? Stream processors can be evaluated on several dimensions, including performance (throughput and latency), integration with other systems, ease of use, fault tolerance guarantees, etc, but making such a comparison is not the topic of its post (and we are certainly biased).
For some time now, the Apache Kafka project has served as a common denominator in most open source stream processors as the the de-facto storage layer layer for storing and moving potentially large volumes of data in streaming fashion with low latency. Recently, the Kafka community introduced Kafka Streams, a stream processing library that ships as part of Apache Kafka. With the addition of Kafka Streams and Kafka Connect, Kafka has now added significant stream processing capabilities.
In this post, we focus on discussing how Flink and Kafka Streams compare with each other on stream processing, and we attempt to provide clarity on that question in this post. Flink and Kafka Streams were created with different use cases in mind. While they have some overlap in their applicability, they are designed to solve orthogonal problems and have very different sweet spots and placement in the data infrastructure stack.
Following Ted’s keynote, we’ll present a panel discussion on “Large Scale Streaming in Production“. As stream processing systems become more mainstream companies are looking to empower their users to take advantage of this technology. We welcome leading stream processing experts Xiaowei Jiang (Alibaba), Monal Daxini (Netflix. Inc), Maxim Fateev (Uber) and Ted Dunning (MapR Technologies) on stage to talk about the challenges they have faced and the solutions they have discovered while implementing stream processing systems at very large scales. The panel will be moderated by Jamie Grier (data Artisans).
Moreover, we are looking forward to Maxim Fateev’s talk “Beyond the Watermark: On-Demand Backfilling in Flink“. Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. Maxim’s talk covers Uber’s solution for on-demand backfilling.
Jamie Grier, Director of Applications Engineering at data Artisans, gave an in-depth Apache Flink® demonstration at OSCON 2016 in Austin, TX. A recording is available on YouTube if you’d like to see the complete demo.
For our readers out there who are new to Apache Flink®, it’s worth repeating a simple yet powerful point: Flink enables stateful stream processing with production-grade reliability and accuracy guarantees. No longer should streaming applications be synonymous with estimates–imprecise systems that must be coupled with a batch processor to ensure reliability–but rather, robust and correct computations made in real-time.