Category: General

Flink Forward SF Preview: “Queryable State or How to Build a Billing System Without a Database” with TNG

Konstantin Knauf and Maximilian Bode of Munich-based TNG Technology Consulting are no strangers to stateful stream processing. They’ve been working closely with Apache Flink® for well over a year now, and in their Flink Forward 2016 talk in Berlin, they detailed their experience building an anonymization platform for one of Germany’s largest mobile network providers, processing billions of records in near-real-time every day. And in April 2017, TNG is partnering with data Artisans to provide Apache Flink training for more than 20 of their developers, further spreading Flink expertise within the company.

At Flink Forward San Francisco, Konstantin and Max will present “Queryable State or How to Build a Billing System Without a Database”, getting into a sample use case for one of Flink 1.2.0’s new features (which we covered at a high level in a blog post this week, too).

Read more

Queryable State in Apache Flink® 1.2.0: An Overview & Demo

A Primer On Accessing Flink Application State Directly

Ufuk Celebi (@iamuce) is a co-founder and software engineer at data Artisans. 

2016 was the year that stateful, event-time, and event-at-a-time stream processing arrived as the paradigm for high-throughput, low-latency, and accurate computations. Streaming has now been adopted by a wide range of organizations in production. So what comes next?

We believe that 2017 will be the year to realize the full potential of streaming application state.

Streaming application state is the set of all variables that are updated when processing events. Keeping state reliably and consistently across the processing of individual events is what makes a streaming framework actually useful: it is required for all interesting operations, like windowing, joins, statistics, state machines, and so on.

Apache Flink’s support for streaming application state is already advanced: its checkpoint-based fault tolerance mechanism is lightweight and guarantees exactly-once semantics in the event of failure, and its savepoints feature makes it possible to deploy code updates without losing an application’s current progress. State can be very large and can be updated event-time aware.

However, there was always one issue with application state in Flink: it wasn’t available to external applications outside of the streaming framework. It was still necessary to send the result of a streaming computation to a database or key-value store to make it accessible for querying.

But that limitation of application state is being addressed. Apache Flink’s Queryable State, a new feature first introduced to Flink’s master branch in August 2016 and included in the version 1.2.0 release in February 2017, provides a turnkey mechanism for external access to application state and an API for submitting queries against this state.

Read more

Drivetribe’s Modern Take On CQRS With Apache Flink®

The Architecture And The Developer Experience

This is a guest post from Aris Koliopoulos, a senior software engineer at London-based Drivetribe.

Drivetribe is the world’s digital hub for motoring. The platform was created by former Top Gear presenters Jeremy Clarkson, Richard Hammond, and James May, and the company has raised funding from the likes of 21st Century Fox, Atomico, and Breyer Capital.

That’s quite some star power behind the product, and we expected that the co-founders’ fan base would drive a surge of traffic to the site as soon as it launched in November 2016. As the team tasked with building a product–from scratch–that could handle high user volume from the start and scale efficiently, we had to make key early decisions about Drivetribe’s architecture.

In this post, we’ll walk you through how and why we built using Apache Flink® and other technologies, and we’ll also discuss how our approach enables a better experience for the end users.

1. An Introduction to Drivetribe
2. Architecture Overview
3. Apache Flink in the Drivetribe Stack
4. The Developer Experience with Flink

An Introduction to Drivetribe

First, a little bit about Drivetribe. After creating an account, users join ‘Tribes’, which are topic-specific groups hosted either by one of the three co-founders or by other bloggers, experts, and motoring enthusiasts:

Read more

Flink Forward San Francisco Preview: Real-time Anomaly Detection with Mux

Last month, software engineer Scott Kidder (@hexdumpster) of Mux published a popular post on his company’s blog titled “Discovering Anomalies in Real-Time with Apache Flink”. The post covered Mux’s process for adding anomaly-detection alerting to its product, starting with evaluating different streaming frameworks through to which Apache Flink® operators are used in the application.

Mux's event pipeline from their blog post
A diagram from the Mux post

Lucky for all of us, the blog post was just a preview of things to come, and Scott will be presenting on the topic at the first-ever Flink Forward in San Francisco on April 11. From his talk’s abstract:

Mux uses Apache Flink to identify anomalies in the distribution & playback of digital video for major video streaming websites. Scott Kidder will describe the Apache Flink deployment at Mux leveraging Docker, AWS Kinesis, Zookeeper, HDFS, and InfluxDB. Deploying a Flink application in a zero-downtime production environment can be tricky, so unit- & behavioral-testing, application packaging, upgrade, and monitoring strategies will be covered as well.

His talk is timely, both because of Flink’s increasing popularity in real-time alerting & anomaly detection applications and because of his discussion of deployment-related issues–an area where much is happening in the Flink community.

San Francisco-based Mux is a Y Combinator graduate whose product provides monitoring and analytics for streaming video. The company was founded by experts in the space from the likes of Brightcove and Zencoder, and Scott himself has been working with video for more than 10 years. By the way, they’re hiring.

To see Scott’s talk along with 25 others, buy your ticket to Flink Forward San Francisco today.

P.S. For more recommended reading, Scott also published a nice piece about using Flink’s Amazon Kinesis connector and building the connector source. Check it out.

Running Apache Flink® Everywhere: Flink on DC/OS and Apache Mesos

Introducing Flink 1.2.0's integration with DC/OS and Mesos

Till Rohrmann (@stsffap) is an Engineering Lead at data Artisans. This post also appeared on the DC/OS blog. Thanks to Jörg Schad, Judith Malnick, and Ravi Yadav from Mesosphere for their help with writing and editing. 

If you’re interested in learning more, check out this post from Mesosphere about partner integrations in DC/OS 1.9 or register for this webinar co-hosted by Mesosphere and data Artisans to discuss running Flink on DC/OS.

Last December, data Artisans organized the first-ever Apache Flink® user survey. We asked the community where they were running Flink, and here’s what we found:

Just under 30% of respondents were running Flink on Apache Mesos either on-premise or in the cloud. Notably, Flink hadn’t even provided official support for Mesos until this month’s Flink 1.2.0 release. This 30% is a testament to Mesos’ popularity.

Read more

Apache Flink® Community Announces 1.2.0 Release

Rescalable state, queryable state, async I/O, low-level stream operations, SQL improvements, and more

On Monday, February 6, the Apache Flink® community announced the project’s 1.2.0 release. We at data Artisans would like to extend a sincere thanks to the 122 members of the Flink community who contributed to 1.2.0. The release included contributors employed by Alibaba, Amazon, Cloudera, King, and many other enterprises.

At data Artisans, we spend most of our waking hours thinking about and working on Flink, and so there’s lots that we’re excited about in the 1.2.0 release. In this post, members of the data Artisans engineering team will share their thoughts on just a subset of the release’s new features.

For a complete overview, be sure to check out the changelog on the project site.

And in the coming weeks, we’ll be writing about 1.2.0 features in more detail here on the data Artisans blog.

Read more

Apache Flink® User Survey 2016 Results, Part 2

Last week, we published the first of two blog posts recapping the results of the 2016 Apache Flink® user survey. In part 1, we shared a selection of graphs summarizing responses to the survey’s multiple choice questions. In part 2, we’ll look at responses to the survey’s open-ended questions:

  • What new features or functionality would you like to see in Flink?
  • Please briefly describe the application(s) your team is building or plans to build with Flink.
  • Are there any other challenges (when working with Flink) not listed (in the previous question) that you’d like to mention?
  • What other sources / sinks not included in the list (that was provided in the survey), if any, are important for your Flink application?
  • We welcome any final comments about any aspect of Flink.

Read more

Apache Flink® User Survey 2016 Results, Part 1

(You can find part 2 here)

At the end of 2016, data Artisans organized the first-ever Apache Flink® user survey in order to better understand Flink usage in the community, asking for feedback about both common patterns and the most-needed Flink features.

The results are in, and we’ll be sharing them in a two-post series. This first post will include a summary of answers to the survey’s multiple-choice questions, and the second post will include written answers to open-ended questions that respondents gave us permission to share anonymously.

For context, here’s some general information about the survey:

  • We collected responses between 18 Nov 2016 – 13 Dec 2016
  • The survey was distributed via the Apache Flink mailing lists, the data Artisans Twitter account, and Apache Flink meetup groups around the world
  • In total, 119 respondents from 21 different countries answered at least 1 question; note that each graph includes a count of respondents for that particular question

If you’d like to download a single file with all 5 of the graph images from this post, you can do so here.

First, a fun one: where in the world are Flink users? The Flink community has long been a global one, with 27% of respondents are based in the United States with many more throughout continental Europe, South America, and Asia.

Read more

November 2016 in Review: Flink Forward 2017, Amazon EMR + Google Dataproc, and kicking off the Flink training series in Germany

We’re in the home stretch of 2016, and November was another action-packed month for the Apache Flink® community and the data Artisans team.

Here’s a recap of November’s most exciting highlights.

Announcing Flink Forward 2017

Flink Forward is coming to San Francisco! For the first time ever, the annual Apache Flink user conference will expand beyond Berlin with a 2-day event at the Hotel Kabuki in Japantown on 10-11 April, 2017. The call for papers is open, so submit your talk or register today.

And the Berlin event will return to Kulturbrauerei on 11-13 September, 2017. Registration is available now, and stay tuned for the call for papers.

Flink in Amazon EMR and Google Cloud Dataproc

Flink is now natively supported in Amazon EMR 5.1.0, and Google included support for Flink 1.1.3 in its November 29 Cloud Dataproc release. We’re excited to see Flink become available in an increasing number of commercial distributions.

Flink Training in Frankfurt (Munich and Hamburg coming up soon)

Last month, the data Artisans team hosted the first of three Flink training sessions–this time in Frankfurt–in coordination with codecentric. The Munich workshop happens tomorrow, Tuesday 13 December, and timing for our Hamburg event is still TBD. Interested in setting up a Flink training for your organization? Learn more and get in touch.

Apache Flink User Survey: Results Coming Soon

In late November and early December, data Artisans ran the first-ever Apache Flink user survey. There have been over 100 responses so far, and we’ll be publishing a summary of results to share with the community before the end of 2016. We’re excited to share feedback from Flink users around the world.

Community and Conference Circuit

CEO Kostas Tzoumas hosted a session at Big Data London, CTO Stephan Ewen gave a keynote at Apache Big Data Europe, Director of Applications Engineering Jamie Grier gave a workshop at QCon San Francisco, software engineer Aljoscha Krettek hosted a session at Big Data Spain, software engineers Robert Metzger and Maximilian Michels presented at a Flink meetup in the San Francisco Bay Area, and data Artisans hosted a Flink meetup at our Berlin office (setting an attendance record for the Berlin group). It was quite a month! We very much enjoyed meeting members of the Flink user community in person.

On the data Artisans Blog

Check out Savepoints, Part 2 to see an example of how to update a streaming application using Flink’s savepoints (of course, we recommend starting with Part 1).

And for a debunking of commonly-held myths in the stream processing space, we recommend this post from data Artisans’ CEO Kostas Tzoumas.


Hello, San Francisco! And nice to see you again, Berlin.

By @danibentrup

We’re excited to announce that Apache Flink® enthusiasts have two events to look forward to in 2017, both fully-packed with the latest and greatest on Flink.

Flink Forward, the premier Flink conference, is coming to Berlin for a third time on September 11-13, 2017. But before our annual event in Germany, we invite the data stream processing community to the first-ever Flink Forward San Francisco on April 10-11, 2017.

In this first edition of Flink Forward San Francisco, we’ll connect with the already-thriving Flink community in the Bay Area and beyond. Our mission is to foster innovation and discussion with developers around the world in order to push Apache Flink to the next level.

The call for submissions is already open and you are invited to share your knowledge, use cases, and best practices with the Apache Flink community and to shape the program of the first edition of Flink Forward San Francisco! Submit your talk here.

Flink Forward San Francisco will take place at Hotel Kabuki, in the heart of Japantown in the city center. Participants are invited to join one day of hands-on Flink training sessions on April 10 followed by one day of speaker sessions on April 11. The speaker sessions will be made up of technical talks covering Flink in the enterprise, Flink system internals, ecosystem integrations with Flink, and the future of the platform.

Last but not least, tickets are on sale here:

From September 11-13, we welcome the stream data processing community to the third edition of Flink Forward Berlin at Kulturbrauerei in the heart of Prenzlauer Berg in Berlin.

Participants are invited to join one day of hands-on Flink training sessions on September 11 followed by two days of speaker sessions on September 12-13. Tickets are on sale now, and you can purchase your Early Bird Ticket here.

In 2017, we seek to provide a platform for developers, architects, engineering managers, and C-level executives to gain in-depth insights on Apache Flink. We hope that you’ll join us.