Reflections on Flink Forward 2015
Posted on Oct 28th, 2015 by Kostas Tzoumas
Flink Forward 2015 was the inaugural conference around the Apache Flink™ community and took place at the beautiful Kulturbrauerei in Berlin, a former brewery turned into a fantastic event space. Overall, we are both delighted and overwhelmed by the success of the conference, and how the Flink community is rapidly growing and connecting.
Around 250 participants had the opportunity to attend a total of 33 technical talks (organized in 2 parallel sessions), as well as participate in free Flink trainings. The talks had been selected by a program committee comprising of five Flink PMC members: Marton Balassi, Stephan Ewen, Vasia Kalavri, Henry Saputra, and Kostas Tzoumas.
The slides of all the talks are available online. Video recordings will be added soon.
Flink Forward featured two keynotes, one by Kostas Tzoumas and Stephan Ewen from data Artisans, and one from William Vambenepe from Google. Both keynotes emphasized how stream processing is more than just “fast data”, but, really, a new paradigm for programming data-intensive applications which embraces the unbounded and continuous nature of data as it is produced in the real world. Instead of ignoring or implicitly managing continuous data using batch frameworks or hybrid (lambda) architectures, developers can now use modern stream processing frameworks to get timely answers from their data. Since the value of data is directly correlated to its freshness, a streaming-first infrastructure increases the actual value of the insight we get from our data signals. Both keynotes asserted that stream processing technology has now matured, citing Apache Flink™ (with an emphasis on the upcoming Flink 0.10 version) and Google Cloud Dataflow, two systems that share a lot in common, and two communities that are closely cooperating to provide compatibility between the two frameworks.
The movement towards stream processing was very visible in the conference, with several talks dedicated to how companies put Apache Flink™ to their production environments for real-time data processing, as well as talks on how the Flink framework internally treats streaming data:
- Mohamed Amine Abdessemed (Bouygues Telecom): Real-time data integration with Flink & Apache Kafka
- Ignacio Mulas Viela (Ericsson): Applying Kappa architecture in the telecom industry
- Anwar Rizal (Amadeus): Implementing Streaming Decision Tree Using Approximative Algorithms in Flink
- Christian Kreuzfeld (Otto Group): Static vs Dynamic Stream Processing
- Alexander Kolb (Otto Group): Flink? Yet another streaming framework?
- Marton Balassi (Hungarian Academy of Sciences): Stateful Stream processing
- Till Rohrmann (data Artisans): Fault Tolerance and Recovery of Flink Jobs
- Aljoscha Krettek (data Artisans): Notions of Time – How Apache Flink™ Handles Time and Windows
- Assaf Araki (Intel): Real Time Analytics at Scale – Smart Data Pipes for the Internet of Things
- Matthias Sax (HU Berlin): A tale of Squirrels and Storms
- Maximilian Michels (data Artisans): Google Cloud Dataflow on top of Apache Flink™
- Ufuk Celebi (data Artisans): Stream and Batch Processing in One System — Apache Flink™’s Streaming Data Flow Engine
- Albert Bifet (Huawei): Apache SAMOA: Mining Big Data Streams with Apache Flink™
Stream processing is, of course, only one of the things that people are doing with Flink. Michael Häusler from ResearchGate (rightly) argued that batch is not dead. He shared ResearchGate’s methodology to choose a framework that makes simple things easy by comparing solutions to solve a simple task. Other talks that focused on evaluation of different systems and performance were:
- Fabian Hueske (data Artisans): Cascading on Apache Flink™
- Slim Baltagi (Capital One): Flink and Spark: Similarities and Differences
- Dongwon Kim (POSTECH): A comparative performance evaluation of Flink
- Christopher Hillman (University of Dundee): Beyond MapReduce, Scientific data processing in real-time
- Vyacheslav Zholudev (ResearchGate): Flink – a convenient abstraction layer for YARN?
Another focus of the conference was Machine Learning, interactive analytics, graph processing, and integration of Flink with other pieces of the Big Data infrastructure:
- Vasia Kalavri (KTH): Automatic Detection of Web Trackers at Telefonica Research
- Mikio Braun (Zalando): Procedural Programming vs. Data Flow
- Martin Junghans (University of Leipzig): Gradoop: Scalable Graph Analytics with Apache Flink
- Moon soo Lee (NFLabs): Data science lifecycle with Apache Flink™ and Apache Zeppelin
- Sebastian Schelters (TU Berlin): Declarative Machine Learning with the Samsara DSL
- Stefano Bortoli & Flavio Pompermaier (OKKAM): A Semantic Big Data Companion
- Kamal Hakimzadeh (KTH): Karamel – Reproducing distributed systems and experiments on cloud
- Jim Dowling (SICS): Interactive Flink Analytics with Hopsworks and Apache Zeppelin
- Nam-Luc Tran (Euranova): Stale Synchronous Parallel Iterations on Flink
- Romeo Kienzler and Simon Laws (IBM): Apache Flink™ Cluster Deployment on Docker using Docker-Compose
- Suneel Marthi (RedHat): BigPetStore: A Comprehensive Blueprint for Apache Flink™
- Marc Schwering (MongoDB): Using Flink with MongoDB to Enhance Relevancy in Personalization
The training sessions were very well attended, with hands-on training on Flink’s DataStream, DataSet, and Gelly APIs. As always, you can access the latest data Artisans training material for Apache Flink™ online.
Of course, not everything about Flink Forward 2015 was perfect. As this was the first installment of the conference, glitches like spotty internet or mix-ups with registration did happen at times, but were quickly resolved. A more important issue that we noticed is the lack of diversity in the community, and especially the very small percentage of women speakers and attendees. While this is an issue of the tech community at large and not just specific to the Flink community, we would like to start an open discussion about it as early as possible and see how we can improve in the future.