Ecosystem Track

Universal Machine Learning with Apache Beam

Apache Beam is a unified batch and streaming programming model. Apache Beam runs on various execution backends, such as Apache Flink, Apache Spark, Apache Samza, Apache Gearpump, Apache Hadoop, and Google Cloud Dataflow.

Up until recently, Java was the predominant language for writing Beam Jobs. However, thanks to the Beam portability project you can now write your pipelines in other languages (Java/Scala/Python/Go/SQL). The benefit of this is simple - Not only can you use your favorite programming language to write data processing pipelines but also all of its libraries.

After a brief introduction to Apache Beam, we want to explain how cross-language portability was made possible. Further, we want to showcase the portability with TFX, a Python library for machine learning with TensorFlow.

This talk is for everyone who wants to learn about Apache Beam, its API, and its portability layer. No machine learning knowledge required.

Authors

Robert Bradshaw
Google Cloud
Robert Bradshaw

Robert Bradshaw is a software engineer at Google, developing on tools for doing petabyte-scale data processing, most recently working on Apache Beam. He is also active in the open source community, leading the Cython project since it’s inception and as a long-time contributor to the open source mathematics software Sage. He received Ph.D. in Mathematics from University of Washington and currently resides in Stockholm, Sweden.

Maximilian Michels
Open-Source Software Engineer
Maximilian Michels

Max is an independent software engineer and PMC member of Apache Flink and Apache Beam. During his studies at Free University of Berlin and Istanbul University, he worked at Zuse Institute Berlin on Scalaris, a distributed transactional database. Inspired by the principles of distributed systems and open-source, he helped to develop Apache Flink at data Artisans and, in the course of, joined the Apache Beam community. After maintaining the SQL layer of the distributed database CrateDB, he is now working on the cross-language portability aspects of Apache Beam.

Fill out the form to view
the Slides and Video

* All fields required