Research Track

Stream Join in Flink: from Discrete to Continuous

As a distributed stream processing engine, Flink provides users with convenient operators to manipulate data on the fly. Among all these operators, join could be the most complicated one as it requires the capability to cross-analyze various sources simultaneously. In this talk, we aim to give a comprehensive introduction to the stream join in Flink. Specifically, we'll first provide an overview of the different join types and which of them are currently supported by Flink DataStream and Table & SQL APIs. Then we'll discuss some key points when performing distributed stream join. After that, we'd like to focus the rationale and implementation details of the time-windowed join launched in version 1.4. Since there are still a lot of improvements can be made, we'll end our talk by sharing some proposals for the future work.

Authors

Xingcan Cui
Xingcan Cui
Shandong University
Xingcan Cui

Xingcan Cui, who is interested in database and stream processing, is a committer of the Apache Flink project. He has just finished his Ph.D. study under the supervision of Prof. Xiaohui Yu at Shandong University, China and will continue his research as a postdoc at York University, Canada.

Fill out the form to view
the Slides and Video

* All fields required