Author: Kostas Tzoumas

Stream Processing Myths Debunked

Six Common Streaming Misconceptions

By @kostas_tzoumas and @wints

Needless to say, we here at data Artisans spend a lot of time thinking about stream processing. Even cooler: we spend a lot of time helping others think about stream processing and how to apply streaming to data problems in their organizations.

A good first step in this process is understanding misconceptions about the modern stream processing space (and as a rapidly-changing space high in its hype cycle, there are many misconceptions worth talking about).

We’ve selected six of them to walk through in this post, and since Apache Flink® is the open-source stream processing framework that we’re most familiar with, we’ll provide examples in the context of Flink.

Myth 1: There’s no streaming without batch (the Lambda Architecture)
Myth 2: Latency and Throughput: Choose One
Myth 3: Micro-batching means better throughput
Myth 4: Exactly once? Completely impossible.
Myth 5: Streaming only applies to “real-time”
Myth 6: So what? Streaming is too hard anyway.

Read more

Announcing the dA Platform, our distribution of Apache® Flink®

A team of original Apache Flink® contributors founded data Artisans in 2014 because we believed that existing data processing frameworks weren’t adequately addressing the needs of organizations and their engineering teams. From the global saturation of smartphones, to the rapid adoption of the Internet of Things and connected devices, the very nature of data and how it is generated had evolved far more quickly than the tools available to manage that data.

Read more

Flink 1.0: General availability and pushing the envelope in open source stream processing

We are delighted to see that the Flink community has announced the availability of Apache Flink™ 1.0. This release is one of the largest Flink releases ever, with about 64 individuals resolving more than 450 JIRA issues, and, most importantly, marks the beginning of the Flink 1.x.y series, which initiates backwards compatibility for all minor releases moving forward. We see this release as the most important milestone in the project since Flink graduated from the Apache Incubator one year ago. Additionally, we see this release as (1) validating production-readiness for Flink, and (2) significantly pushing the envelope in stream processing with features that are unique in the open source world.

Read more

How Apache Flink™ enables new streaming applications

Part I: The power of event time and out of order stream processing

Stream data processing is booming in popularity, as it promises better insights from fresher data, as well as a radically simplified pipeline from data ingestion to analytics. Data production in the real world has always been a continuous process (for example, web server logs, user activity in mobile applications, database transactions, or sensor readings). As has been noted by others, until now, most pieces of the data infrastructure stack were built with the underlying assumption that data is finite and static. To bridge this fundamental gap between continuous data production and the limitations of older “batch” systems, companies have been introducing complex and fragile end-to-end pipelines. Modern data streaming technology alleviates the need for complex solutions by modeling and processing data in the form that it is produced, a stream of real-world events.

Read more

Flink 0.10: A significant step forward in open source stream processing

We are delighted to see that the Apache Flink™ community has announced the availability of Apache Flink™ 0.10. The 0.10 release is one of the largest Flink releases ever, with about 80 individuals resolving more than 400 JIRA issues.

While these numbers are impressive on their own, in Flink 0.10, the whole is truly greater than the sum of its parts. The combination of new features advances Flink to be a data stream processor that truly stands out in the open source space and that significantly eases the effort to bring streaming jobs into production. While the official release announcement provides an extensive list of new features, this blog post focuses on those features that jointly improve the experience of developing and operating stream processing applications.

Read more