Needless to say, we here at data Artisans spend a lot of time thinking about stream processing. Even cooler: we spend a lot of time helping others think about stream processing and how to apply streaming to data problems in their organizations.
A good first step in this process is understanding misconceptions about the modern stream processing space (and as a rapidly-changing space high in its hype cycle, there are many misconceptions worth talking about).
We’ve selected six of them to walk through in this post, and since Apache Flink® is the open-source stream processing framework that we’re most familiar with, we’ll provide examples in the context of Flink.
Last month, we gave a high-level overview of Apache Flink® savepoints and touched on why and how you’d reprocess data in a streaming application. If you haven’t already read that post, or if you aren’t familiar with Flink’s savepoints, we recommend starting there.
A common use for savepoints is to fix a bug or make an improvement to a streaming application, which is a task that in some sense has requirements that are similar to an F1 pit stop: every second of downtime counts, and the car needs to be back on the track as quickly as possible without sacrificing the driver’s current position in the race (‘application state’).
In this post, we’ll walk you through the process of updating a streaming application and deploying an improved version without losing application state and with minimal downtime.
How do I update a running job?
Imagine that you have a streaming application that observes a stream of events sent out by an alarm system. There are three types of events:
Each event is associated with a different room in a facility and has a timestamp. The job of our application is to trigger an alarm if a MotionDetection event is observed for a room for which the last received event was ActivateAlarm. The dataflow of such an application implemented as a Flink job would look like the figure below.
Putting this dataflow into Scala source code is pretty straightforward with Flink’s DataStream API:
In this first edition of Flink Forward San Francisco, we are looking to connect with the already thriving Flink community in the Bay Area and beyond. Our mission is to foster innovation and discussion with developers around the world in order to push Apache Flink to the next level.
The Call for Papers will be open soon and you are invited to share your knowledge, use cases and best practices with the Apache Flink community and shape the program of the first edition of Flink Forward San Francisco!
Flink Forward San Francisco will take place on April 10-11, 2017. Participants are invited to join one day of hands-on Flink training sessions on April 10 followed by one day of speaker sessions. April 11 is dedicated to technical talks on how Flink is used in the enterprise, Flink system internals, ecosystem integrations with Flink, and the future of the platform.