Konstantin Knauf and Maximilian Bode of Munich-based TNG Technology Consulting are no strangers to stateful stream processing. They’ve been working closely with Apache Flink® for well over a year now, and in their Flink Forward 2016 talk in Berlin, they detailed their experience building an anonymization platform for one of Germany’s largest mobile network providers, processing billions of records in near-real-time every day. And in April 2017, TNG is partnering with data Artisans to provide Apache Flink training for more than 20 of their developers, further spreading Flink expertise within the company.
A Primer On Accessing Flink Application State Directly
Ufuk Celebi (@iamuce) is a co-founder and software engineer at data Artisans.
2016 was the year that stateful, event-time, and event-at-a-time stream processing arrived as the paradigm for high-throughput, low-latency, and accurate computations. Streaming has now been adopted by a wide range of organizations in production. So what comes next?
We believe that 2017 will be the year to realize the full potential of streaming application state.
Streaming application state is the set of all variables that are updated when processing events. Keeping state reliably and consistently across the processing of individual events is what makes a streaming framework actually useful: it is required for all interesting operations, like windowing, joins, statistics, state machines, and so on.
Apache Flink’s support for streaming application state is already advanced: its checkpoint-based fault tolerance mechanism is lightweight and guarantees exactly-once semantics in the event of failure, and its savepoints feature makes it possible to deploy code updates without losing an application’s current progress. State can be very large and can be updated event-time aware.
However, there was always one issue with application state in Flink: it wasn’t available to external applications outside of the streaming framework. It was still necessary to send the result of a streaming computation to a database or key-value store to make it accessible for querying.
But that limitation of application state is being addressed. Apache Flink’s Queryable State, a new feature first introduced to Flink’s master branch in August 2016 and included in the version 1.2.0 release in February 2017, provides a turnkey mechanism for external access to application state and an API for submitting queries against this state.
That’s quite some star power behind the product, and we expected that the co-founders’ fan base would drive a surge of traffic to the site as soon as it launched in November 2016. As the team tasked with building a product–from scratch–that could handle high user volume from the start and scale efficiently, we had to make key early decisions about Drivetribe’s architecture.
In this post, we’ll walk you through how and why we built drivetribe.com using Apache Flink® and other technologies, and we’ll also discuss how our approach enables a better experience for the end users.
First, a little bit about Drivetribe. After creating an account, users join ‘Tribes’, which are topic-specific groups hosted either by one of the three co-founders or by other bloggers, experts, and motoring enthusiasts:
Last month, software engineer Scott Kidder (@hexdumpster) of Mux published a popular post on his company’s blog titled “Discovering Anomalies in Real-Time with Apache Flink”. The post covered Mux’s process for adding anomaly-detection alerting to its product, starting with evaluating different streaming frameworks through to which Apache Flink® operators are used in the application.
Lucky for all of us, the blog post was just a preview of things to come, and Scott will be presenting on the topic at the first-ever Flink Forward in San Francisco on April 11. From his talk’s abstract:
Mux uses Apache Flink to identify anomalies in the distribution & playback of digital video for major video streaming websites. Scott Kidder will describe the Apache Flink deployment at Mux leveraging Docker, AWS Kinesis, Zookeeper, HDFS, and InfluxDB. Deploying a Flink application in a zero-downtime production environment can be tricky, so unit- & behavioral-testing, application packaging, upgrade, and monitoring strategies will be covered as well.
San Francisco-based Mux is a Y Combinator graduate whose product provides monitoring and analytics for streaming video. The company was founded by experts in the space from the likes of Brightcove and Zencoder, and Scott himself has been working with video for more than 10 years. By the way, they’re hiring.
Introducing Flink 1.2.0's integration with DC/OS and Mesos
Till Rohrmann (@stsffap) is an Engineering Lead at data Artisans. This post also appeared on the DC/OS blog. Thanks to Jörg Schad, Judith Malnick, and Ravi Yadav from Mesosphere for their help with writing and editing.
If you’re interested in learning more, check out this post from Mesosphere about partner integrations in DC/OS 1.9 or register for this webinar co-hosted by Mesosphere and data Artisans to discuss running Flink on DC/OS.
Last December, data Artisans organized the first-ever Apache Flink® user survey. We asked the community where they were running Flink, and here’s what we found:
Just under 30% of respondents were running Flink on Apache Mesos either on-premise or in the cloud. Notably, Flink hadn’t even provided official support for Mesos until this month’s Flink 1.2.0 release. This 30% is a testament to Mesos’ popularity.
Rescalable state, queryable state, async I/O, low-level stream operations, SQL improvements, and more
On Monday, February 6, the Apache Flink® community announced the project’s 1.2.0 release. We at data Artisans would like to extend a sincere thanks to the 122 members of the Flink community who contributed to 1.2.0. The release included contributors employed by Alibaba, Amazon, Cloudera, King, and many other enterprises.
At data Artisans, we spend most of our waking hours thinking about and working on Flink, and so there’s lots that we’re excited about in the 1.2.0 release. In this post, members of the data Artisans engineering team will share their thoughts on just a subset of the release’s new features.
Last week, we published the first of two blog posts recapping the results of the 2016 Apache Flink® user survey. In part 1, we shared a selection of graphs summarizing responses to the survey’s multiple choice questions. In part 2, we’ll look at responses to the survey’s open-ended questions:
What new features or functionality would you like to see in Flink?
Please briefly describe the application(s) your team is building or plans to build with Flink.
Are there any other challenges (when working with Flink) not listed (in the previous question) that you’d like to mention?
What other sources / sinks not included in the list (that was provided in the survey), if any, are important for your Flink application?
We welcome any final comments about any aspect of Flink.
At the end of 2016, data Artisans organized the first-ever Apache Flink® user survey in order to better understand Flink usage in the community, asking for feedback about both common patterns and the most-needed Flink features.
The results are in, and we’ll be sharing them in a two-post series. This first post will include a summary of answers to the survey’s multiple-choice questions, and the second post will include written answers to open-ended questions that respondents gave us permission to share anonymously.
For context, here’s some general information about the survey:
We collected responses between 18 Nov 2016 – 13 Dec 2016
The survey was distributed via the Apache Flink mailing lists, the data Artisans Twitter account, and Apache Flink meetup groups around the world
In total, 119 respondents from 21 different countries answered at least 1 question; note that each graph includes a count of respondents for that particular question
If you’d like to download a single file with all 5 of the graph images from this post, you can do so here.
First, a fun one: where in the world are Flink users? The Flink community has long been a global one, with 27% of respondents are based in the United States with many more throughout continental Europe, South America, and Asia.
We’re in the home stretch of 2016, and November was another action-packed month for the Apache Flink® community and the data Artisans team.
Here’s a recap of November’s most exciting highlights.
Announcing Flink Forward 2017
Flink Forward is coming to San Francisco! For the first time ever, the annual Apache Flink user conference will expand beyond Berlin with a 2-day event at the Hotel Kabuki in Japantown on 10-11 April, 2017. The call for papers is open, so submit your talk or register today.
And the Berlin event will return to Kulturbrauerei on 11-13 September, 2017. Registration is available now, and stay tuned for the call for papers.
Flink Training in Frankfurt (Munich and Hamburg coming up soon)
Last month, the data Artisans team hosted the first of three Flink training sessions–this time in Frankfurt–in coordination with codecentric. The Munich workshop happens tomorrow, Tuesday 13 December, and timing for our Hamburg event is still TBD. Interested in setting up a Flink training for your organization? Learn more and get in touch.
Apache Flink User Survey: Results Coming Soon
In late November and early December, data Artisans ran the first-ever Apache Flink user survey. There have been over 100 responses so far, and we’ll be publishing a summary of results to share with the community before the end of 2016. We’re excited to share feedback from Flink users around the world.
We’re excited to announce that Apache Flink® enthusiasts have two events to look forward to in 2017, both fully-packed with the latest and greatest on Flink.
Flink Forward, the premier Flink conference, is coming to Berlin for a third time on September 11-13, 2017. But before our annual event in Germany, we invite the data stream processing community to the first-ever Flink Forward San Francisco on April 10-11, 2017.
In this first edition of Flink Forward San Francisco, we’ll connect with the already-thriving Flink community in the Bay Area and beyond. Our mission is to foster innovation and discussion with developers around the world in order to push Apache Flink to the next level.
The call for submissions is already open and you are invited to share your knowledge, use cases, and best practices with the Apache Flink community and to shape the program of the first edition of Flink Forward San Francisco! Submit your talk here.
Flink Forward San Francisco will take place at Hotel Kabuki, in the heart of Japantown in the city center. Participants are invited to join one day of hands-on Flink training sessions on April 10 followed by one day of speaker sessions on April 11. The speaker sessions will be made up of technical talks covering Flink in the enterprise, Flink system internals, ecosystem integrations with Flink, and the future of the platform.
From September 11-13, we welcome the stream data processing community to the third edition of Flink Forward Berlin at Kulturbrauerei in the heart of Prenzlauer Berg in Berlin.
Participants are invited to join one day of hands-on Flink training sessions on September 11 followed by two days of speaker sessions on September 12-13. Tickets are on sale now, and you can purchase your Early Bird Ticket here.
In 2017, we seek to provide a platform for developers, architects, engineering managers, and C-level executives to gain in-depth insights on Apache Flink. We hope that you’ll join us.