Month: August 2016

Apache Flink and Apache Kafka Streams

A comparison and guideline for users

This blog post is written jointly by Stephan Ewen, CTO of data Artisans, and Neha Narkhede, CTO of Confluent. You can also find this post at the Confluent blog.

The open source stream processing space is currently exploding, with more systems becoming available presenting users with many alternatives. In the Apache Software Foundation alone, there are now more than 10 stream processing projects, some in incubation and others graduated to top-level project status.

While the availability of alternatives benefits the industry and the users of these systems by enabling competition and thus, encouraging innovation, it can also be quite confusing: with all these options, which one is right for me both now and in the future? Stream processors can be evaluated on several dimensions, including performance (throughput and latency), integration with other systems, ease of use, fault tolerance guarantees, etc, but making such a comparison is not the topic of its post (and we are certainly biased).

For some time now, the Apache Kafka project has served as a common denominator in most open source stream processors as the the de-facto storage layer layer for storing and moving potentially large volumes of data in streaming fashion with low latency. Recently, the Kafka community introduced Kafka Streams, a stream processing library that ships as part of Apache Kafka. With the addition of Kafka Streams and Kafka Connect, Kafka has now added significant stream processing capabilities.

In this post, we focus on discussing how Flink and Kafka Streams compare with each other on stream processing, and we attempt to provide clarity on that question in this post. Flink and Kafka Streams were created with different use cases in mind. While they have some overlap in their applicability, they are designed to solve orthogonal problems and have very different sweet spots and placement in the data infrastructure stack.

Read more

Flink Forward 2016: Announcing keynotes and panel discussion

We are very excited to announce Ted Dunning as a keynote speaker for Flink Forward 2016! Ted is the VP of Incubator at Apache Software Foundation, the Chief Application Architect at MapR Technologies and a mentor on many recent projects. “How Can We Take Flink Forward?” will be presented on the second day of the conference.

Following Ted’s keynote, we’ll present a panel discussion on “Large Scale Streaming in Production“. As stream processing systems become more mainstream companies are looking to empower their users to take advantage of this technology. We welcome leading stream processing experts Xiaowei Jiang (Alibaba), Monal Daxini (Netflix. Inc), Maxim Fateev (Uber) and Ted Dunning (MapR Technologies) on stage to talk about the challenges they have faced and the solutions they have discovered while implementing stream processing systems at very large scales. The panel will be moderated by Jamie Grier (data Artisans).

The welcome keynote on Monday, September 12, will be given by data Artisans’ co-founders Kostas Tzoumas and Stephan Ewen. They will talk about “The maturing data streaming ecosystem and Apache Flink’s accelerated growth“. In this talk, Kostas and Stephan discuss several large-scale stream processing use cases that the data Artisans has seen over the past year.

Moreover, we are looking forward to Maxim Fateev’s talk “Beyond the Watermark: On-Demand Backfilling in Flink“. Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. Maxim’s talk covers Uber’s solution for on-demand backfilling.

Don’t miss the latest developments, best practices and use cases on Apache Flink. Register here:

Robust Stream Processing with Apache Flink®: A Simple Walkthrough

Jamie Grier, Director of Applications Engineering at data Artisans, gave an in-depth Apache Flink® demonstration at OSCON 2016 in Austin, TX. A recording is available on YouTube if you’d like to see the complete demo.

For our readers out there who are new to Apache Flink®, it’s worth repeating a simple yet powerful point: Flink enables stateful stream processing with production-grade reliability and accuracy guarantees. No longer should streaming applications be synonymous with estimates–imprecise systems that must be coupled with a batch processor to ensure reliability–but rather, robust and correct computations made in real-time.
Read more