Python Streaming Pipelines with Beam on Flink
Python is popular amongst data scientists and engineers for data processing tasks. The big data ecosystem has traditionally been rather JVM centric. Often Java (or Scala) are the only viable option to implement data processing pipelines. That sometimes poses an adoption barrier for organizations that have already invested in other language ecosystems. The Apache Beam project provides a unified programming model for data processing and its ongoing portability effort aims to enable multiple language SDKs (currently Java, Python and Go) on a common set of runners. The combination of Python streaming on the Apache Flink runner is one example. Let’s take a look how the Flink runner translates the Beam model into the native DataStream (or DataSet) API, how the runner is changing to support portable pipelines, how Python user code execution is coordinated with gRPC based services and how a sample pipeline runs on Flink.
Thomas is Software Engineer, Streaming Platform at Lyft, working with Apache Flink. Earlier he has been at a number of other technology companies in the San Francisco Bay Area, including DataTorrent, where he was a Co-Founder of the Apex project. Thomas is Apache Apex PMC Chair, committer to Apache Beam and has contributed to several more of the ASF ecosystem projects. He has also presented at international big data conferences and is author of the book “Learning Apache Apex”.
Aljoscha Krettekdata Artisans
Aljoscha Krettek is a PMC member at Apache Flink and co-founder and software engineer at “data Artisans”:https://data-artisans.com. He studied Computer Science at TU Berlin, he has worked at IBM Germany and at the IBM Almaden Research Center in San Jose. In Flink, Aljoscha is mainly working on the Streaming API. The most recent additions the to the windowing and state APIs where designed and implemented by him. Aljoscha has spoken at Hadoop Summit, Flink Forward and several meet ups about stream processing and Apache Flink before.
Fill out the form to view
the Slides and Video
* All fields required