Use Case Track

Streaming topic model training and inference with Apache Flink

Analysing streams of text data to extract topics is an important task for getting useful
insights to be leveraged in subsequent workflows. For example extracting topics from text to be
continuously ingested into a search engine can be useful to tag documents with important
keywords or concepts to be used at search time. Another use case is doing analysis of support
tickets to get insights on the most common problems for customers.
In this talk we illustrate how to use Flink's Dynamic processing capabilities to continuously train
topic models from unlabelled text and use such models to extract topics from the data itself.
Such topic models will be built leveraging distributed representations of words and documents.

Authors

Suneel Marthi
Suneel Marthi
Amazon
Suneel Marthi

Suneel is a member of Apache Software Foundation and is a PMC member on Apache OpenNLP, Apache Mahout and Apache Streams. He has presented in the past at Hadoop Summit, Apache Big Data, Flink Forward, Berlin Buzzwords, Big Data Tech Warsaw. He is a Principal Engineer at Amazon Web Services.

Joey Frazee
Joey Frazee
Databricks
Joey Frazee

Joey Frazee is a Solutions Architect at Databricks, an Apache Software Foundation member, and contributor to the Apache Streams and Apache NiFi projects. He was previously a graduate student in statistics and linguistics at the University of Texas, a data scientist and director of engineering at People Pattern, and IoT specialist at Hortonworks. He has presented at Flink Forward, Big Data Warsaw, and elsewhere.

Fill out the form to view
the Slides and Video

* All fields required