Back
  • Ecosystem
  • Keynote
  • Operations
  • Research
  • Technology Deep Dive
  • Use Case

“Play it again, Sam”: Bookmarking, Slicing, and Replaying Unbounded Data Streams for Analytics Applications

Pravega is a novel storage system that exposes data stream as a first-class abstraction as opposed to objects and files. With Pravega, a stream is a consistently ordered, durable, available and elastic series of data events. Pravega is designed to ingest, store ...

Ecosystem
Speakers Raúl Gracia-Tinedo
(Dell EMC)
View Video & Slides

A streaming Quantitative Analytics engine

The application of Quantitative Analytics to trades for the generation of Risk and P&L metrics has traditionally followed a batch based approach. Regulatory changes impose increasing demand for compute on financial institutions along with a growing demand for real time ...

Use Case
Speakers Dr Raj Subramani
(Flumaion Ltd)
View Video & Slides

A Year in Flink

Stream processing still evolves and changes at a speed that can make it hard to keep up with the developments. Being at the forefront of stream processing technology, the evolution of Apache Flink has mirrored many of these developments and continues to ...

Keynote
Speakers Aljoscha Krettek (data Artisans) Till Rohrmann (data Artisans) View Video & Slides

Anomaly Detection Engine for Cloud Activities using Flink

Microsoft Cloud App Security provides organizations with enterprise grade protection to cloud applications. One of the main capabilities of CAS is the real time detection of threats like compromised accounts, insider threat and ransomware, based on abnormal user activity.
In this ...

Use Case
Speakers Yonatan Most (Microsoft) Avihai Berkovitz (Microsoft) View Video & Slides

Approximate standing queries on Stream Processing

Data analytics in its infancy has taken off with the development of SQL. Yet, at web-scale, even simple analytics queries can prove challenging within (Distributed-) Stream Processing environments. Two such examples are Count and Count Distinct. Because of the key-oriented nature of ...

Research
Speakers Tobias Lindener
(RISE/KTH)
View Video & Slides

Assisting millions of active users in real-time

Nowadays many companies become data rich and intensive. They have millions of users generating billions of interactions and events per day. These massive streams of complex events can be processed and reacted upon to e.g. offer new products, next best actions, ...

Use Case
Speakers Krzysztof Zarzycki (GetIndata) Alexey Brodovshuk (Kcell) View Video & Slides

Automating Flink Deployments to Kubernetes

Deploying Flink jobs while maintaining state requires a number of CLI tasks that need to be performed. As error-prone as that is when done manually, in any serious software project you'll rely on a continuous integration pipeline to automate it. Summing ...

Operations
Speakers Marc Rooding (ING) Niels Dennissen (ING) View Video & Slides

data Artisans Platform: Enterprise-Ready Stream Processing with Apache Flink

Use Case
Speakers Robert Metzger
(data Artisans)
View Video & Slides

data Artisans Product Announcement

Use Case
Speakers Igal Shilman
(data Artisans)
View Video & Slides

Data lossless event time streaming processing for revenue calculation

One of the main characteristics of the good streaming pipeline is correctness for event time processing. Real challenges become when such pipeline should be resilient to different types of failures. In this talk, we describe how Criteo runs Flink on one of ...

Use Case
Speakers Oleksandr Nitavskyi
(Criteo)
View Video & Slides

Democratizing data in GO-JEK

At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. Data at GO-JEK doesn’t grow linearly with the business, but exponentially, as people start building new products and logging new activities on top of the ...

Ecosystem
Speakers Rohil Surana (GoJek) Prakhar Mathur (GoJek) View Video & Slides

Deploying a secured Flink cluster on Kubernetes

One of the main sources of concerns when switching to the container paradigm is security. When dealing with big amounts of sensitive customer data it’s very important to be able to guarantee that the data is transported safely between the different ...

Ecosystem
Speakers Edward Alexander Rojas Clavijo
(IBM)
View Video & Slides

Detecting Patterns in Event Streams with Flink SQL

For a long time, complex event processing (CEP) and stream analytics have been treated as distinct classes of stream processing applications. While CEP workloads identify patterns from event streams in near real-time, stream analytics queries ingest, enrich, and aggregate high-volume streams. Both ...

Use Case
Speakers Dawid Wysakowicz
(dataArtisans)
View Video & Slides

Efficient Window Aggregation with Stream Slicing

Computing aggregates over windows is at the core of virtually every stream processing job. Typical stream processing applications involve overlapping windows and, therefore, cause redundant computations. Several techniques prevent this redundancy by sharing partial aggregates among windows. However, these techniques do not ...

Research
Speakers Jonas Traub (TU Berlin) Philipp Grulich (German Research Centre for Artificial Intelligence) View Video & Slides

Elastic Streams at Scale

One of the big operational challenges when running streaming applications is to cope with varying workloads. Variations, e.g. daily cycles, seasonal spikes or sudden events, require that allocated resources are constantly adapted. Otherwise, service quality deteriorates or money is wasted. Apache ...

Technology Deep Dive
Speakers Till Rohrmann (data Artisans ) Joerg Schad (Mesosphere) View Video & Slides

Exploiting Apache Flink’s Stateful Operators

Flink’s stateful processing allows enriching the event data with data acquired from previous events. To achieve this, a KeyedStream is used to distribute state and its processing by key. Sometimes, though, an event contains not a single but multiple keys, requiring ...

Use Case
Speakers Olga Slenders (ING) Gijsbert van Vliet (ING) View Video & Slides

Failure is not fatal: what is your recovery story?

Failures are inevitable. How can we recover a Flink job from outage? How do we reprocess data from outage period? What are the implications to downstream consumers? These are important questions that we need to answer when running Flink for critical data ...

Technology Deep Dive
Speakers Steven Wu
(Netflix)
View Video & Slides

Flink as a Library (and still as a Framework)

Containerized deployments have taken the world by storm. Containers make your application portable across different machines and operating systems. They allow to scale applications in a matter of seconds. And they significantly simplify and speed up deployments which decreases development and operating ...

Operations
Speakers Gary Yao
(data Artisans)
View Video & Slides

Flink Positive/Flinking Positive

So you're on the hook for millions of transactions per minute. You've already considered all the buzzwords? Blockchain? Machine learning? Microserverless? You're out of buzzwords? Desperate for something that just works? Bonus points for looking indie on Hacker News? ...

Use Case
Speakers Caito Scherr (New Relic) Nikolas Davis (New Relic) View Video & Slides

Flink SQL in Action

SQL is the lingua franca of data processing, and everybody working with data knows SQL. Apache Flink provides SQL support for querying and processing batch and streaming data. Flink's SQL support powers large-scale production systems at Alibaba, Huawei, and Uber. Based ...

Operations
Speakers Timo Walther
(data Artisans)
View Video & Slides

Hardware-efficient Stream Processing

In the era of big data and AI, many data-intensive applications, such as streaming, exhibit requirements that cannot be satisfied by traditional batch processing models. In response, distributed stream processing systems, such as Spark Streaming or Apache Flink, exploit the resources of ...

Research
Speakers George Theodorakis
(Imperial College)
View Video & Slides

How to keep our flock happy with Apache Flink on AWS

Data is in the very core how Rovio builds and operates its games. What does data mean for Rovio: how its processed and how we gain value from it? In this talk we take a deep dive into Rovio analytics pipeline and ...

Use Case
Speakers Henri Heiskanen
(Rovio)
View Video & Slides

Improving throughput and latency with Flink’s network stack

Flink's network stack is designed with two goals in mind: (a) having low latency for data passing through, and (b) achieving an optimal throughput. It already achieves a good trade-off between these two but we will continue to tune it further ...

Technology Deep Dive
Speakers Nico Kruber
(data Artisans)
View Video & Slides

Lessons learned from Migrating to a Stateful Streaming Framework

In modern applications of streaming frameworks, stateful streaming is arguably one of the most important usage cases. Flink, as a well-supported streaming framework for stateful streaming, readily helps developers spend less efforts on system deployment and focus more on the business logic. ...

Use Case
Speakers Wei-Che (Tony) Wei
(Appier)
View Video & Slides

Managing Flink operations at GoJek

At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. Data Engineering team is responsible to create a reliable data infrastructure across all of GO-JEK’s 18+ products. We use Flink extensively to provide real-time streaming aggregation ...

Operations
Speakers Ravi Suhag (GoJek) Sumanth Nakshatrithaya (GoJek) View Video & Slides

Matchmaking_in multiplayer games with Apache Flink

King's streaming platform processes over hundred billion daily events to provide real-time analytics and personalization capabilities for some of the largest mobile games in the world. Platform’s newest experimental addition uses existing infrastructure, player state, machine learning and global windows ...

Use Case
Speakers Vladimír Schäfer
(King)
View Video & Slides

MERA: Trading precision for performance

Sudden spikes in load can be a source of disaster for stream processors. These spikes can reveal latent bottlenecks in otherwise well-balanced configurations and through them introduce backpressure, increase latency and reduce overall throughput. This problem is far from being solved. While ...

Research
Speakers Niklas Semmler
(TU Berlin)
View Video & Slides

Monitoring Flink with Prometheus

Prometheus is a cloud-native monitoring system prioritizing reliability and simplicity – and Flink works really well with it! This session will show you how to leverage the Flink metrics system together with Pronetheus to improve the observability of your jobs. There will be ...

Operations
Speakers Maximilian Bode
(TNG Tech)
View Video & Slides

Our successful journey with Flink

At Trackunit we have based our telematic IoT processing pipeline on Flink. We started out on version 1.2 and are now on 1.5. In this session I will share the lessons learned going from one giant Flink job to many smalls and some of ...

Operations
Speakers Lasse Nedergaard
(Trackunit A/S)
View Video & Slides

Python Streaming Pipelines with Beam on Flink

Python is popular amongst data scientists and engineers for data processing tasks. The big data ecosystem has traditionally been rather JVM centric. Often Java (or Scala) are the only viable option to implement data processing pipelines. That sometimes poses an adoption barrier ...

Ecosystem
Speakers Thomas Weise (Lyft) Aljoscha Krettek (data Artisans) View Video & Slides

Real-time driving score service using Flink

SK telecom presents how to build and operate a session-based streaming application using Flink. A driving score service essentially calculates a driving score of a user's driving session considering speeding, rapid acceleration and rapid deceleration during the session. At SK telecom, ...

Use Case
Speakers Dongwon Kim
(SK Telecom)
View Video & Slides

Real-time Processing of Noisy Data from Connected Vehicles

Modern vehicles are capable of producing large volumes of data from dozens of sensors. We will demonstrate the use of map matching to deal with noisy GPS information, and how to develop and deploy real-time sensor data processing applications on Flink using ...

Use Case
Speakers Robin Slomkowski
(HERE Technologies)
View Video & Slides

Running Flink Data-Connectors at Scale

Data is a core part of our infrastructure here at Yelp, with tens of billions of messages per day flowing across our streaming pipelines, empowering us to solve core business problems. To reliably connect and route this massive amount of data across ...

Ecosystem
Speakers Vipul Singh
(Yelp)
View Video & Slides

Runtime Improvement for Flink Batch Processing

The original intention of Flink is to be a unified computing engine for both streaming and batch. Although now the streaming mode has been widely used and considered as the best streaming solution, the batch processing mode is still under developed. We ...

Technology Deep Dive
Speakers Feng Wang
(Alibaba)
View Video & Slides

Stream Join in Flink: from Discrete to Continuous

As a distributed stream processing engine, Flink provides users with convenient operators to manipulate data on the fly. Among all these operators, join could be the most complicated one as it requires the capability to cross-analyze various sources simultaneously. In this talk, ...

Research
Speakers Xingcan Cui
(Shandong University)
View Video & Slides

Stream Loops on Flink: Reinventing the wheel for the streaming era

You have probably heard that stream processing subsumes batch workloads, a valid but not yet fully implemented claim. Our lab research aims to fulfil this dream and delve further into the deep world of iterative processes, a fundamental building block for graph ...

Research
Speakers Paris Carbone
(KTH Royal Institute of Technology in Stockholm)
View Video & Slides

Streaming Digital Fingerprints

Authorisation is typically associated with a single act of “logging in”. But nobody likes to login too often, so most websites have a “remember me” option. But it’s not very safe to be constantly logged in. How to accommodate contradictory goals ...

Use Case
Speakers Sebastian Czarnota
(Centrum Bezpieczeństwa Cyfrowego S.A.)
View Video & Slides

Streaming ETL with Flink and Elasticsearch

At Intellify we have implemented a system where we can create Flink apps for streaming ETL into normalized datasets in Elasticsearch, with schemas specified in Avro. Our data comes in via a single Kafka topic, but in different shapes depending on the ...

Ecosystem
Speakers Jared Stehler
(Intellify Learning)
View Video & Slides

Streaming topic model training and inference with Apache Flink

Analysing streams of text data to extract topics is an important task for getting useful
insights to be leveraged in subsequent workflows. For example extracting topics from text to be
continuously ingested into a search engine can be useful to ...

Use Case
Speakers Suneel Marthi (Amazon) Joey Frazee (Databricks) View Video & Slides

Taming large-state to join datasets for Personalization

Streaming engines like Apache Flink are redefining ETL and data processing. Data can be extracted, transformed, filtered and written out in real-time with an ease matching that of batch processing. However the real challenge of matching the prowess of batch ETL remains ...

Use Case
Speakers Shriya Arora
(Netflix)
View Video & Slides

The Apache Way! … ?

To quote http://www.apache.org/foundation - “The mission of the Apache Software Foundation (ASF) is to provide software for the public good. We do this by providing services and support for many like-minded software project communities of individuals who choose ...

Ecosystem
Speakers Isabel Drost-Fromm
(Europace AG)
View Video & Slides

The convergence of stream processing and microservice architecture

Two of the main software architectural trends in software development this decade has been the move to streaming data processing, and the move to microservice architecture.

Both of these architectures are driven by the needs of managing and mining knowledge ...

Keynote
Speakers Viktor Klang
(Lightbend)
View Video & Slides

Threading Needles in a Haystack: Sessionizing the Uber firehose in realtime

One Uber's Marketplace team we're tasked with efficiently matching our riders and driver partners in real time. To that end, we we employ various systems within the ride sharing marketplace such as dynamic pricing (popularly known as surge), demand modeling ...

Use Case
Speakers Amey Chaugule
(Uber)
View Video & Slides

Tuning Flink for Robustness and Performance

Flink's stateful stream processing engine presents a huge variety of optional features and configuration choices to the user. Figuring out the ""optimal"" choices for any production environment and use-case can therefore often be challenging. In this talk, ...

Technology Deep Dive
Speakers Stefan Richter
(data Artisans)
View Video & Slides

Unified Engine for Data Processing and AI

Flink started with the mission to unify batch and stream processing. We believe that Flink’s architecture is uniquely positioned to be a great engine for streaming, batch and AI workloads at the same time. We will talk about the work we ...

Keynote
Speakers Xiaowei Jiang
(Alibaba)
View Video & Slides

Universal Machine Learning with Apache Beam

Apache Beam is a unified batch and streaming programming model. Apache Beam runs on various execution backends, such as Apache Flink, Apache Spark, Apache Samza, Apache Gearpump, Apache Hadoop, and Google Cloud Dataflow.

Up until recently, Java was the predominant ...

Ecosystem
Speakers Robert Bradshaw (Google Cloud) Maximilian Michels (Open-Source Software Engineer) View Video & Slides

Unlocking the next wave of applications with Stream Processing

Stream Processing as helped to turn many monolithic database-centric applications into fast, scalable, and flexible real time applications. However, there are still entire classes of applications that are built against databases, because today's streaming processing model is not yet rich enough ...

Keynote
Speakers Stephan Ewen
(data Artisans)
View Video & Slides

Upgrading Apache Flink Applications: State of the Union

Apache Flink streaming applications are typically designed to run indefinitely for long periods of time. As with all long-running services, the applications need to be maintained and upgraded, including improvements to adapt to changing business logic and bug fixes. With this in ...

Technology Deep Dive
Speakers Gordon Tai
(data Artisans)
View Video & Slides

Upshot: distributed tracing using Flink

Distributed tracing is used to analyze performance and error cases in service oriented architectures. The Observability team at Airbnb recently created Upshot, a data pipeline that uses Flink to analyze over 40 million trace events per minute. Summaries of the resulting data are ...

Use Case
Speakers Brian Wolfe
(Airbnb)
View Video & Slides

Using a sharded Akka distributed data cache as a Flink pipelines integration buffer

A common and reliable way to buffer streaming data in between Flink pipelines is a pair of Flink Kafka Source and Sink. However, in some low-latency streaming firehouse use-cases this option is not the best choice: a) backlog will quickly accumulate in ...

Technology Deep Dive
Speakers Andrew Torson
(Walmart Labs)
View Video & Slides

Using Apache Flink for Smart Cities: Warsaw case study

Mining large streams of real time data has recently grown to one of the key challenges for Big Data community in both, industry and academia. At the same time the concept of Smart City has gained significant acclaim by providing user-oriented services ...

Use Case
Speakers Piotr Wawrzyniak (Orange Polska S.A.) Jarosław Legierski (Orange Polska S.A.) View Video & Slides