The data Artisans team was very much impressed by this year’s Flink Forward speaker sessions, and the speakers delivered tons of detail on Apache Flink® use cases and benchmarks. Here, we’ll share just a small selection of our favorite insights from the presentations.
Bouygues Telecom, one of the largest telecom networks in France, is running 30 production applications powered by Flink and is processing 10 billion raw events per day. As of Flink Forward 2015, they were live with 5 Flink applications, so we’re looking forward to hearing about their 180 Flink applications in 2017. (All Slides, Talk) Read more
Berlin’s surprise 32° September weather (90° F for those of you Stateside) has come and gone, and there was lots happening in the last few weeks of summer. Here are a few of the highlights.
Apache Flink® to the enterprise
In order to make Flink more accessible to organizations seeking enterprise support, data Artisans announced the dA Platform, a data Artisans-certified distribution of Flink bundled with 24x7x365 support. Get in touch with us if you’d like to learn more.
And we were thrilled to see that Lightbend included Flink in its Fast Data Platform. September was month of great progress in growing the Flink community and broadening the user base.
From September 26-29, 2016 the big data community meet at Strata + Hadoop World in NYC. This year data Artisans will take part in several ways at the conference. For the first time we will have a booth at the conference (#P2), and will be demonstrating Apache Flink® and our brand new dA Platform. Stop by to connect with Apache Flink experts and learn more about implementing enterprise-grade streaming data applications in production.
A team of original Apache Flink® contributors founded data Artisans in 2014 because we believed that existing data processing frameworks weren’t adequately addressing the needs of organizations and their engineering teams. From the global saturation of smartphones, to the rapid adoption of the Internet of Things and connected devices, the very nature of data and how it is generated had evolved far more quickly than the tools available to manage that data.
While most of the Continent was away on holiday, it was a productive August for data Artisans and for the Apache Flink® community. Here are highlights from the past month, and we can’t wait to see what the rest of 2016 has in store.
Apache Flink 1.1
There were many long-awaited features included in the Flink community’s 1.1 release, which was supported by 95 contributors. If you haven’t already, we recommend that you browse the release notes. Here are a few of the highlights:
Metrics: In most conversations we have about Flink, someone will mention the need for deeper operational metrics, and the community delivered a comprehensive metrics system that makes it easy to gather metrics and expose them to external systems.
Table API and SQL: The architecture of the Table API has been reworked completely in Flink 1.1, and the API has been integrated with Apache Calcite™.
Say hello to members of our team in India, Germany, and the USA
Here’s a quick rundown of where you can find members of the data Artisans team in September 2016. If you want to see where we’ll be traveling throughout the rest of the year, check out our Events page.
We hope to get to meet many members of the Apache Flink® and stream processing communities in person this month.
VLDB (New Delhi, India), Sept 5-9: data Artisans software engineer Kostas Kloudas will be presenting an Apache Flink® training. Attendees will have a chance to get hands on with Flink during the session.
Strata + Hadoop World (New York City, USA), Sept 26-29: Kostas Tzoumas, Director of Applications Engineering Jamie Grier, and Product Manager Mike Winters will attend this year’s Strata + Hadoop World NYC. More to come about where you can find us inside the conference.
This blog post is written jointly by Stephan Ewen, CTO of data Artisans, and Neha Narkhede, CTO of Confluent. You can also find this post at the Confluent blog.
The open source stream processing space is currently exploding, with more systems becoming available presenting users with many alternatives. In the Apache Software Foundation alone, there are now more than 10 stream processing projects, some in incubation and others graduated to top-level project status.
While the availability of alternatives benefits the industry and the users of these systems by enabling competition and thus, encouraging innovation, it can also be quite confusing: with all these options, which one is right for me both now and in the future? Stream processors can be evaluated on several dimensions, including performance (throughput and latency), integration with other systems, ease of use, fault tolerance guarantees, etc, but making such a comparison is not the topic of its post (and we are certainly biased).
For some time now, the Apache Kafka project has served as a common denominator in most open source stream processors as the the de-facto storage layer layer for storing and moving potentially large volumes of data in streaming fashion with low latency. Recently, the Kafka community introduced Kafka Streams, a stream processing library that ships as part of Apache Kafka. With the addition of Kafka Streams and Kafka Connect, Kafka has now added significant stream processing capabilities.
In this post, we focus on discussing how Flink and Kafka Streams compare with each other on stream processing, and we attempt to provide clarity on that question in this post. Flink and Kafka Streams were created with different use cases in mind. While they have some overlap in their applicability, they are designed to solve orthogonal problems and have very different sweet spots and placement in the data infrastructure stack.
Following Ted’s keynote, we’ll present a panel discussion on “Large Scale Streaming in Production“. As stream processing systems become more mainstream companies are looking to empower their users to take advantage of this technology. We welcome leading stream processing experts Xiaowei Jiang (Alibaba), Monal Daxini (Netflix. Inc), Maxim Fateev (Uber) and Ted Dunning (MapR Technologies) on stage to talk about the challenges they have faced and the solutions they have discovered while implementing stream processing systems at very large scales. The panel will be moderated by Jamie Grier (data Artisans).
Moreover, we are looking forward to Maxim Fateev’s talk “Beyond the Watermark: On-Demand Backfilling in Flink“. Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. Maxim’s talk covers Uber’s solution for on-demand backfilling.
Jamie Grier, Director of Applications Engineering at data Artisans, gave an in-depth Apache Flink® demonstration at OSCON 2016 in Austin, TX. A recording is available on YouTube if you’d like to see the complete demo.
For our readers out there who are new to Apache Flink®, it’s worth repeating a simple yet powerful point: Flink enables stateful stream processing with production-grade reliability and accuracy guarantees. No longer should streaming applications be synonymous with estimates–imprecise systems that must be coupled with a batch processor to ensure reliability–but rather, robust and correct computations made in real-time.
We are thrilled to announce that the accepted talks for Flink Forward 2016 are now available at the conference website. Flink Forward 2016 takes place in September 12-14 in Berlin, Germany, bringing together the open source stream processing community.