Keynote Track

Apache Flink + Apache Beam: Expanding the horizons of Big Data

"Over the past few months, the Apache Flink and Apache Beam communities have been busy developing an industry leading solution to author batch and streaming pipelines with Python. This was made possible by a significant effort to revamp Beam’s portability framework, build the corresponding Flink Runner, and simplify Flink’s artifact distribution & deployment mechanisms.

What is the “killer big-data app” enabled by this integration: production TensorFlow pipelines. Building production machine learning pipelines that process large distributed data sets can get complex. In this talk, we will describe a set of open source libraries developed at Google, that simplify and unify pre and post processing stages for a production TensorFlow pipeline. These libraries are authored on Beam’s python SDK, and can be run on Apache Flink at scale.
Last, but not least, we will describe how Beam & Flink aim to bring the power of big-data to newer audiences, in particular, developers of the Go programming language."

Authors

Anand Iyer
Anand Iyer
Product Manager Google Cloud
Anand Iyer

Anand Iyer is a Product Manager at Google Cloud Platform, focused on delivering industry leading big-data solutions that delight users. He is particularly passionate about the intersection of data, machine learning and open source. Prior to Google, he gained experience building and delivering big-data platforms at Cloudera and LinkedIn. He holds a master’s in computer science from Stanford and a bachelor’s from the University of Arizona.