Introduction
This blog will show you how to deploy Apache Kafka cluster on Kubernetes. We assume you already have kubernetes setup and running.
Apache Kafka is a distributed streaming platform which enables you to publish and subscribe to streams of records, similar to enterprise messaging system.
There are few concepts we need to know:
- Producer: an app that publish messages to a topic in Kafka cluster.
- Consumer: an app that subscribe a topic for messages in Kafka cluster.
- Topic: a stream of records.
- Record: a data block contains a key, a value and a timestamp.
We borrowed some ideas from defuze.org and updated our cluster accordingly.
Pre-start
Zookeeper is required to run Kafka cluster.
In order to deploy Zookeeper in an easy way, we use a popular Zookeeper image from Docker Hub which is digitalwonderland/zookeeper. We can create a deployment file zookeeper.yml which will deploy one zookeeper server.
If you want to scale the Zookeeper cluster, you can basically duplicate the code block into the same file and change the configurations to correct values. Also you need to add ZOOKEEPER_SERVER_2=zoo2 to the container env for zookeeper-deployment-1 if scaling to have 2 servers.
zookeeper.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
--- kind: Deployment apiVersion: extensions/v1beta1 metadata: name: zookeeper-deployment-1 spec: template: metadata: labels: app: zookeeper-1 spec: containers: - name: zoo1 image: digitalwonderland/zookeeper ports: - containerPort: 2181 env: - name: ZOOKEEPER_ID value: "1" - name: ZOOKEEPER_SERVER_1 value: zoo1 |
We can deploy this by:
1 |
kubectl create --filename zookeeper.yml |
It’s good to have a service for Zookeeper cluster. We have a file zookeeper-service.yml to create a service. If you need to scale up the Zookeeper cluster, you also need to scale up the service accordingly.
zookeeper-service.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
--- apiVersion: v1 kind: Service metadata: name: zoo1 labels: app: zookeeper-1 spec: ports: - name: client port: 2181 protocol: TCP - name: follower port: 2888 protocol: TCP - name: leader port: 3888 protocol: TCP selector: app: zookeeper-1 |
Deploy Kafka cluster
Service
We need to create a Kubernetes service first to shadow our Kafka cluster deployment. There is no leader server in terms of server level, so we can talk to any of the server. Because of that, we can redirect our traffic to any of the Kafka servers.
Let’s say we want to route all our traffic to our first Kafka server with id: "1". We can generate a file like this to create a service for Kafka.
kafka-service.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
--- apiVersion: v1 kind: Service metadata: name: kafka-service labels: name: kafka spec: ports: - port: 9092 name: kafka-port protocol: TCP selector: app: kafka id: "1" type: LoadBalancer |
After the service being created, we can get the external IP of the Kafka service by:
1 2 |
kubectl get service kafka-service |
Kafka Cluster
There is already a well defined Kafka image on Docker Hub. In this blog, we are going to use the image wurstmeister/kafka to simplify the deployment.
kafka-cluster.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
--- kind: Deployment apiVersion: extensions/v1beta1 metadata: name: kafka-broker1 spec: template: metadata: labels: app: kafka id: "1" spec: containers: - name: kafka image: wurstmeister/kafka ports: - containerPort: 9092 env: - name: KAFKA_ADVERTISED_PORT value: "9092" - name: KAFKA_ADVERTISED_HOST_NAME value: $SERVICE_EXTERNAL_IP - name: KAFKA_ZOOKEEPER_CONNECT value: zoo1:2181 - name: KAFKA_BROKER_ID value: "1" - name: KAFKA_CREATE_TOPICS value: topic1:3:3 |
If you want to scale up Kafka Cluster, you can always duplicate a deployment into this file, changing KAFKA_BROKER_ID to another value.
KAFKA_CREATE_TOPICS is optional. If you set it to topic1:3:3, it will create topic1 with 3 partitions and 3 replicas.
Test Setup
We can test the Kafka cluster by a tool named
kafkacat. It can be used by both Producers and Consumers.
To publish system logs to
topic1, we can type:
1 |
tail -f /var/log/system.log | kafkacat -b $EXTERNAL_IP:9092 -t topic1 |
To consume the same logs, we can type:
1 |
kafkacat -b $EXTERNAL_IP:9092 -t topic1 |
Upgrade Kafka
Blue-Green update
Kafka itself support rolling upgrade, you can have more detail at this page.
Since we can access Kafka by any broker of the cluster, we can upgrade one pod at a time. Let’s say our Kafka service routing traffic to broker1, we can upgrade all other broker instances first. Then we can change the service to route traffic to any of the upgraded broker. At last, upgrade broker1.
We can upgrade our broker by replacing the image to the version we want like:
image: wurstmeister/kafka:$NEW_VERSION, then do:
1 |
kuberctl replace --filename kafka.yml |
After applying the same procedure to all other brokers, we can edit our service by:
1 |
kubectl edit service kafka-service |
Change
id: "1"to another upgraded broker. Save it and quit. All new connections would be established to the new broker.
At the end, we could upgrade broker1 using above step. But it will kill previous connections of producers and consumers to broker1.