data Artisans Platform

Getting Started with data Artisans Platform on Azure Kubernetes Service


In previous posts, we walked you through how to get data Artisans Platform up and running in Google Cloud Platform and Amazon Web Services. This time, we are focusing on Microsoft Azure and its hosted Kubernetes solution: Azure Kubernetes Service (AKS).


data Artisans Platform is an enterprise-ready real-time stream processing platform that brings together open source Apache Flink and other useful components, making stateful application lifecycle management easy and frictionless. It is built on Kubernetes as the underlying resource manager, and thus supports a wide range of deployment scenarios: from all major cloud vendors (Google Cloud Platform, Amazon AWS and Microsoft Azure) and on-premise to hybrid-cloud deployments using Red Hat OpenShift or vanilla Kubernetes.

data Artisans, data Artisans Platform, Microsoft, Azuure, Kubernetes, AKS, Microsoft Azure, Microsoft AKS


Before getting started, please make sure you:

Step 1: Spinning up an AKS Cluster for your data Artisans Platform deployment.

Because we will have to brave through a couple of resources, let’s first create a resource group to keep things tidy and then create our AKS cluster within it. The rule of thumb is to have at least three nodes in the cluster, each with at least 6GB of memory.

Azure CLI enables RBAC (Role-based Access Control) by default, so after the provisioning of the cluster is finished we just need to configure kubectl to access it:

Step 2: Getting Helm and Tiller going to set up data Artisans Platform on your AKS Cluster.

Helm is a two-sided package manager for Kubernetes, with a client-side (Helm) and a server-side that runs inside your Kubernetes cluster (Tiller). With RBAC enabled, Tiller will need a dedicated service account with the right roles and permissions to access resources. So, the next step is to create this service account and a cluster role binding to associate it with the cluster-admin role.

If the previous commands were successful, we should be able to see Tiller running when executing the following:

Helm charts allow their configuration values to be set and overridden. To lend a hand in the installation process, the distribution includes a tool that produces a .yaml file to properly configure data Artisans Platform for your environment. To finish this step up, navigate to the main directory of the data Artisans Platform tarball and execute:

After generating this file, we can use it alongside the included Helm chart to complete the installation:

Step 3: Accessing and using the Web UI.

Part of the magic of setting up data Artisans Platform is getting to use Application Manager, the core orchestration and lifecycle management component. To access it, we will simply configure a port forward to our container. Depending on your environment requirements, you might want to set up more robust access management using, for instance, Ingress.

What now?

Application Manager is finally available here and the time has come to create our first Deployment to launch a Flink job! There are a couple of examples hosted on Maven Central that we can use to get started. To create a new Deployment, head over to the Web UI and hit the “Create Deployment” button (Figure 1 (1)).
data Artisans, data Artisans Platform, Microsoft, Azuure, Kubernetes, AKS, Microsoft Azure, Microsoft AKSFill in the form that is prompted with the following values:
Once this is through, the deployment you just created should come up in the Deployment list (Figure 1 (2)). To launch a Flink cluster on Kubernetes for the sample job TopSpeedWindowing, just select the newly created Deployment from the list and press “Start” (Figure 2 (4)).
data Artisans, data Artisans Platform, Microsoft, Azuure, Kubernetes, AKS, Microsoft Azure, Microsoft AKSFrom here, Application Manager is your oyster: you can follow the status of your deployment (Figure 2 (3)), and explore details such as event logs, job history and (eventually) savepoints (Figure 2 (6)). In addition, you can dive into Grafana and Kibana to monitor and debug your application (Figure 2 (5)).


Advanced: Using Azure Blob Storage for Checkpoints and Savepoints.

In order to use Azure Storage services to persist checkpoints and savepoints, we need to set up a storage account and dedicated blob containers. For this walkthrough, we rolled with a general purpose account with locally-redundant replication to keep things simple.

The storage account we just created has an access key (<sAccountKey>) associated, which we have to first retrieve and pass as a parameter when creating the containers:

Now that storage is sorted out, we will build a custom Docker image of Flink with the required dependencies to access and use it. We start by creating a core-site.xml Hadoop configuration file:

Passing the account key as plain text is unarguably the least secure way to go, so we recommend that you obscure it for any non-demonstration scenarios. An option is to store this file in a secret and mount it into the containers making use of Application Manager’s integration with Kubernetes.
Next, we create a Dockerfile that includes instructions to retrieve the azure-storage connector and hadoop-azure, which will enable our application to consume Azure Storage services.

Make sure that core-site.xml and Dockerfile are under the same directory and then trigger the build:

The last step is to host the image in Azure Container Registry (ACR). For this, we will create an account:

From the output of these commands, we want to keep <loginServer> at hand to tag the image and push it to the registry instance, after logging in to support authenticated Docker operations:

Take a breath. Now, we still need to give our AKS cluster permissions to access the ACR Docker registry we just pushed our image to.

And we are all set! To override the configuration of your Application Manager deployment to use the custom Flink image we just created, either edit the Deployment configuration directly in the UI (Figure 3 (8)) or use the following cURL command:

data Artisans, data Artisans Platform, Microsoft, Azuure, Kubernetes, AKS, Microsoft Azure, Microsoft AKSAfter following these steps, you should be able to successfully checkpoint and trigger savepoints
(Figure 2 (7)) to Azure Blob Storage.



Now that you have a fully operating data Artisans Platform setup on AKS, we encourage you to play around and experiment with it! The official documentation will give you more detailed insights into the full scope of how Application Manager can be used to deploy, manage and debug your Flink streaming applications. Do not hesitate to reach out to data Artisans for feedback, questions or any other requests!