Apache Flink® User Survey 2016 Results, Part 1
(You can find part 2 here) At the end of 2016, data Artisans organized the first-ever Apache Flink® user survey in order to better understand Flink usage in the community, asking for feedback about both common patterns and the most-needed Flink features. The results are in, and we’ll be sharing them in a two-post series. This first post will include a summary of answers to the survey’s multiple-choice questions, and the second post will include written answers to open-ended questions that respondents gave us permission to share anonymously. For context, here’s some general information about the survey:
- We collected responses between 18 Nov 2016 – 13 Dec 2016
- The survey was distributed via the Apache Flink mailing lists, the data Artisans Twitter account, and Apache Flink meetup groups around the world
- In total, 119 respondents from 21 different countries answered at least 1 question; note that each graph includes a count of respondents for that particular question
Flink UsageNext, we’ll look at a few basic Flink usage metrics.
- Just over ⅓ of respondents either have or had a Flink application running in production
- An overwhelming majority (91%) use Flink’s DataStream API, while just over half (55%) use the DataSet API as well
- Java is the most popular language for developing in Flink (77%), and more than half (57%) use Scala
- And more than half of respondents (52%) use at least one of Flink’s libraries
Flink Satisfaction and Evaluation CriteriaWe asked users to share overall Flink satisfaction as well as satisfaction with different components of Flink by selecting one of: Completely Satisfied, Very Satisfied, Moderately Satisfied, Slightly Satisfied, or Not At All Satisfied.
- Overall satisfaction: 70% of Flink users are either Completely Satisfied or Very Satisfied with Flink
- Component-specific satisfaction: “Throughput and Latency” (89%) and “Event time handling” (85%) led the way with percentage of respondents who are either Completely Satisfied or Very Satisfied. “Support for SQL and Python” (21%) and “Monitoring & Operations” (19%) are at the bottom of this list.
Flink EcosystemNext, let’s get a sense of how Flink fits into the broader ecosystem.
- When it comes to getting data in and out of Flink, Apache Kafka is the clear leader, with 77% of respondents using Kafka as a source or sink. Next on the list is HDFS, in use by 57% of respondents.
- First, what else did users evaluate as alternatives when choosing a stream processor? Spark Streaming led the way (86%), a logical result given the popularity of Apache Spark as a batch processor, followed by Apache Storm (53%), the first widely known open-source distributed stream processor.
- As of late 2016, on-premise deployments using YARN (45%) and standalone mode (41%) were most popular among respondents, but it’s worth noting that the resource manager space is evolving quickly, and Flink 1.2 will introduce improved support for Mesos and other deployment models.
- And Cloudera (32%) and Hortonworks (28%) were the two most commonly-used commercial software distributions, with no company holding a clear majority.
Flink User ProfileLastly, here are some of the characteristics of Flink users who responded to the survey.
- A majority of respondents identified their role as “Engineering / Application Development” (54%), with “Data / Systems Architecture” next on the list at 22%.
- And more than ⅔ of respondents (69%) develop on Unix / Linux, while just over half (51%) develop on a Mac environment–note that it was possible for respondents to submit more than one answer to this question.
- “Software” was by a large margin the most common industry among respondents (51%), followed by “Internet” (29%), “Telecomm” (15%), and “Finance” (10%). The size of respondents’ organizations varies, with the largest share (25%) in a company with more than 100 but fewer than 1000 employees. Over ⅓ of respondents (34%) work in an organization with 1000 or more employees.