Improvements to Flink and its application in Alibaba Search
This is a guest post from Xiaowei Jiang, Senior Director of Alibaba’s search infrastructure team.The post is adapted from Alibaba’s presentation at Flink Forward 2016, and you can see the original talk from the conference here.
Alibaba is the largest e-commerce retailer in the world. Our annual sales in 2015 totalled $394 billion–more than eBay and Amazon combined. Alibaba Search, our personalized search and recommendation platform, is a critical entry point for our customers and is responsible for much of our online revenue, and so the search infrastructure team is constantly exploring ways to improve the product.
What makes for a great search engine on an e-commerce site? Results that, in real-time, are as relevant and accurate as possible for each user. At Alibaba’s scale, this is a non-trivial problem, and it’s difficult to find technologies that are capable of handling our use cases.
Apache Flink® is one such technology, and Alibaba is using Blink, a system based on Flink, to power critical aspects of its search infrastructure and to deliver relevance and accuracy to end users. In this post, I’ll walk through Flink’s role in Alibaba search and outline the reasons we chose to work with Flink on the search infrastructure team.
Stream processing is commonly associated with ‘data in motion’, powering systems that make sense of and respond to data in nearly the same instant it’s created. The most frequently discussed streaming topics, such as latency and throughput or watermarks and handling of late data, focus on the present rather than the past.
In reality, though, there are a number of cases where you’ll need to reprocess data that your streaming application has already processed before. Some examples include:
Deployment of a new version of your application with a new feature, a bug fix, or a better machine learning model
A/B testing different versions of an application using the same source data streams, starting the test from the same point in time without sacrificing prior state
Evaluating and carrying out the migration of applications to newer releases of the processing framework or to a different cluster
Apache Flink’s savepoint feature enables all of the above and is one of the unique points that distinguishes Flink from other distributed open source stream processors.
The data Artisans team was very much impressed by this year’s Flink Forward speaker sessions, and the speakers delivered tons of detail on Apache Flink® use cases and benchmarks. Here, we’ll share just a small selection of our favorite insights from the presentations.
Bouygues Telecom, one of the largest telecom networks in France, is running 30 production applications powered by Flink and is processing 10 billion raw events per day. As of Flink Forward 2015, they were live with 5 Flink applications, so we’re looking forward to hearing about their 180 Flink applications in 2017. (All Slides, Talk) Read more
Berlin’s surprise 32° September weather (90° F for those of you Stateside) has come and gone, and there was lots happening in the last few weeks of summer. Here are a few of the highlights.
Apache Flink® to the enterprise
In order to make Flink more accessible to organizations seeking enterprise support, data Artisans announced the dA Platform, a data Artisans-certified distribution of Flink bundled with 24x7x365 support. Get in touch with us if you’d like to learn more.
And we were thrilled to see that Lightbend included Flink in its Fast Data Platform. September was month of great progress in growing the Flink community and broadening the user base.