Ecosystem Track

Powering Yelp’s Data Pipeline Infrastructure with Apache Flink

Last year, during the Flink Forward conference, a group of Yelp engineers saw in Apache Flink a perfect candidate to solve many of the pressing challenges that the Yelp’s fast growing data pipeline was encountering. One year later, Yelp is running hundreds of streaming Flink jobs. These provide everything from simple data transformation, to complex stateful stream join and stream SQL queries. In this talk, I’ll introduce the challenges that the Yelp data pipeline was facing and how Flink tackled them. I’ll then focus on the integration of Apache Flink into an existing multi-regional Kafka based streaming ecosystem at scale. I’ll discuss the challenges that we encountered during our journey and some of the best practices we learned while building and scaling the infrastructure for Flink.

Authors

Enrico Canzonieri
Tech lead Yelp Inc.
Enrico Canzonieri

Enrico works as a tech lead on the distributed systems team at Yelp, designing, building and maintaining streaming and real-time processing infrastructure. He’s been working on real-time processing systems since 2013 and he is one of the maintainers of Yelp’s Kafka deployment that moves tens of terabytes of data and tens of billions of messages every day. Enrico loves designing robust software solutions for stream processing that scale and building tools to make application developers’ interaction with the infrastructure as simple as possible. Enrico has previously spoken about Apache Kafka at Berlin Buzzwords, Techsummit.io and Kafka meetup.

Fill out the form to view
the this Video

* All fields required