From September 26-29, 2016 the big data community meet at Strata + Hadoop World in NYC. This year data Artisans will take part in several ways at the conference. For the first time we will have a booth at the conference (#P2), and will be demonstrating Apache Flink® and our brand new dA Platform. Stop by to connect with Apache Flink experts and learn more about implementing enterprise-grade streaming data applications in production.
A team of original Apache Flink® contributors founded data Artisans in 2014 because we believed that existing data processing frameworks weren’t adequately addressing the needs of organizations and their engineering teams. From the global saturation of smartphones, to the rapid adoption of the Internet of Things and connected devices, the very nature of data and how it is generated had evolved far more quickly than the tools available to manage that data.
While most of the Continent was away on holiday, it was a productive August for data Artisans and for the Apache Flink® community. Here are highlights from the past month, and we can’t wait to see what the rest of 2016 has in store.
Apache Flink 1.1
There were many long-awaited features included in the Flink community’s 1.1 release, which was supported by 95 contributors. If you haven’t already, we recommend that you browse the release notes. Here are a few of the highlights:
- Metrics: In most conversations we have about Flink, someone will mention the need for deeper operational metrics, and the community delivered a comprehensive metrics system that makes it easy to gather metrics and expose them to external systems.
- Table API and SQL: The architecture of the Table API has been reworked completely in Flink 1.1, and the API has been integrated with Apache Calcite™.
- Connectors: Flink 1.1 adds support for continuous file processing, Amazon Kinesis source & sink connectors, and a sink connector for Apache Cassandra™.
Say hello to members of our team in India, Germany, and the USA
Here’s a quick rundown of where you can find members of the data Artisans team in September 2016. If you want to see where we’ll be traveling throughout the rest of the year, check out our Events page.
We hope to get to meet many members of the Apache Flink® and stream processing communities in person this month.
- VLDB (New Delhi, India), Sept 5-9: data Artisans software engineer Kostas Kloudas will be presenting an Apache Flink® training. Attendees will have a chance to get hands on with Flink during the session.
- Flink Forward (Berlin, Germany), Sept 12-14: Have we mentioned that we’re excited about Flink Forward? The entire data Artisans team will be in attendance. Hope to see you there.
- Data Driven NYC (New York City, USA), Sept 27: For all of our East Coast friends, data Artisans CEO Kostas Tzoumas will be a speaker at this month’s event.
- Strata + Hadoop World (New York City, USA), Sept 26-29: Kostas Tzoumas, Director of Applications Engineering Jamie Grier, and Product Manager Mike Winters will attend this year’s Strata + Hadoop World NYC. More to come about where you can find us inside the conference.
A comparison and guideline for users
This blog post is written jointly by Stephan Ewen, CTO of data Artisans, and Neha Narkhede, CTO of Confluent. You can also find this post at the Confluent blog.
The open source stream processing space is currently exploding, with more systems becoming available presenting users with many alternatives. In the Apache Software Foundation alone, there are now more than 10 stream processing projects, some in incubation and others graduated to top-level project status.
While the availability of alternatives benefits the industry and the users of these systems by enabling competition and thus, encouraging innovation, it can also be quite confusing: with all these options, which one is right for me both now and in the future? Stream processors can be evaluated on several dimensions, including performance (throughput and latency), integration with other systems, ease of use, fault tolerance guarantees, etc, but making such a comparison is not the topic of its post (and we are certainly biased).
For some time now, the Apache Kafka project has served as a common denominator in most open source stream processors as the the de-facto storage layer layer for storing and moving potentially large volumes of data in streaming fashion with low latency. Recently, the Kafka community introduced Kafka Streams, a stream processing library that ships as part of Apache Kafka. With the addition of Kafka Streams and Kafka Connect, Kafka has now added significant stream processing capabilities.
In this post, we focus on discussing how Flink and Kafka Streams compare with each other on stream processing, and we attempt to provide clarity on that question in this post. Flink and Kafka Streams were created with different use cases in mind. While they have some overlap in their applicability, they are designed to solve orthogonal problems and have very different sweet spots and placement in the data infrastructure stack.
We are very excited to announce Ted Dunning as a keynote speaker for Flink Forward 2016! Ted is the VP of Incubator at Apache Software Foundation, the Chief Application Architect at MapR Technologies and a mentor on many recent projects. “How Can We Take Flink Forward?” will be presented on the second day of the conference.
Following Ted’s keynote, we’ll present a panel discussion on “Large Scale Streaming in Production“. As stream processing systems become more mainstream companies are looking to empower their users to take advantage of this technology. We welcome leading stream processing experts Xiaowei Jiang (Alibaba), Monal Daxini (Netflix. Inc), Maxim Fateev (Uber) and Ted Dunning (MapR Technologies) on stage to talk about the challenges they have faced and the solutions they have discovered while implementing stream processing systems at very large scales. The panel will be moderated by Jamie Grier (data Artisans).
The welcome keynote on Monday, September 12, will be given by data Artisans’ co-founders Kostas Tzoumas and Stephan Ewen. They will talk about “The maturing data streaming ecosystem and Apache Flink’s accelerated growth“. In this talk, Kostas and Stephan discuss several large-scale stream processing use cases that the data Artisans has seen over the past year.
Moreover, we are looking forward to Maxim Fateev’s talk “Beyond the Watermark: On-Demand Backfilling in Flink“. Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. Maxim’s talk covers Uber’s solution for on-demand backfilling.
Don’t miss the latest developments, best practices and use cases on Apache Flink. Register here: flink-forward.org/registration
Jamie Grier, Director of Applications Engineering at data Artisans, gave an in-depth Apache Flink® demonstration at OSCON 2016 in Austin, TX. A recording is available on YouTube if you’d like to see the complete demo.
For our readers out there who are new to Apache Flink®, it’s worth repeating a simple yet powerful point: Flink enables stateful stream processing with production-grade reliability and accuracy guarantees. No longer should streaming applications be synonymous with estimates–imprecise systems that must be coupled with a batch processor to ensure reliability–but rather, robust and correct computations made in real-time.
We are thrilled to announce that the accepted talks for Flink Forward 2016 are now available at the conference website. Flink Forward 2016 takes place in September 12-14 in Berlin, Germany, bringing together the open source stream processing community.
This blogpost introduces RBEA (Rule-Based Event Aggregator), the scalable real-time analytics platform developed by King’s Streaming Platform team. This new platform opens the doors to the world of stream analytics for our data scientists across the company. Here, we will describe what motivated us to build RBEA, how the system works, and how it is implemented on Apache Flink™.
A data Artisans perspective
In this post, we would like to shed some light upon Apache Beam, the new Apache Incubator project that Google initiated with us and other partners. We would like to highlight our involvement in Beam and how we see the relationship between Beam and Flink developing in the future. See also Google’s perspective on how Beam and Flink relate.