Practice Exercise: Kafka Visualization

Objectives

The objective of this laboratory exercise is to understand the event-driven processing using Apache Kafka. You will learn how to produce and consume events and explore the concepts of event-driven architecture.

Prerequisites

Access to a web browser to access (https://softwaremill.com/kafka-visualisation/)

Exercises

Start by opening the Kafka Visualization tool at https://softwaremill.com/kafka-visualisation/. Follow each exercises below:

Exercise 1: Understanding partitions

Configure to 3 partitions and a replication factor of 2.
Start producing messages to the topic.
Observe how the messages are distributed across the partitions.
Try increasing the number of partitions to 5.
Observe how the messages are now distributed across the partitions.

Partitions play a critical role in Apache Kafka, enabling its scalability, performance, and fault tolerance.

Scalability

Kafka partitions allow a topic to be scaled horizontally by distributing its data across multiple brokers in a cluster. This means that Kafka can handle more data and throughput as more brokers are added to the cluster.

Performance

Kafka partitions also improve performance by allowing multiple consumers to read from a topic in parallel. Each consumer can be assigned to one or more partitions, and can read from them independently. This can significantly improve the overall throughput of the system.

Fault tolerance

Kafka partitions also enhance fault tolerance by replicating each partition across multiple brokers. If a broker fails, its partitions will be automatically served by the remaining brokers. This ensures that data is never lost, even if a broker fails.

Exercise 2: Understanding brokers

Configure the visualization to 3 partitions and a replication factor of 2.
Start producing messages to the topic.
Observe how the messages are distributed across the brokers.
Try disabling one of the brokers.
Observe how the messages are now distributed across the remaining brokers.

Kafka brokers are responsible for storing and replicating data within a Kafka cluster. They also provide a communication layer between producers and consumers.

Exercise 3: Understanding replication factor

Configure the visualization to 3 partitions and a replication factor of 2.
Start producing messages to the topic.
Observe how the messages are replicated across the brokers.
Try increasing the replication factor to 3.
Observe how the messages are now replicated across the brokers.

The replication factor in Apache Kafka is the number of copies of a topic partition that are stored on different brokers in the cluster. It is a critical setting that affects both the availability and durability of Kafka.

Availability

Replication allows Kafka to tolerate broker failures. When a broker fails, the leader for the affected partitions is elected from among the replicas. This ensures that messages can continue to be produced and consumed even if one or more brokers are unavailable.

Durability

Replication also protects against data loss. If a broker fails, the data is still available on the remaining replicas. This means that even if a broker fails while a message is being written, the message will not be lost.

Exercise 4: Understanding producers

Configure the visualization to 3 partitions and a replication factor of 2.
Start producing messages to the topic using the Kafka Console Producer.
Observe how the messages are produced to the topic.
Try increasing the producer rate.
Observe how the messages are now produced to the topic.

A Kafka producer is a client application that publishes (writes) events to a Kafka cluster. Producers are important because they are responsible for getting data into Kafka, which is the first step in any Kafka-based application.

Exercise 5: Understanding consumers

Configure the visualization to 3 partitions and a replication factor of 2.
Start consuming messages from the topic using the Kafka Console Consumer.
Observe how the messages are consumed from the topic.
Try increasing the consumer rate.
Observe how the messages are now consumed from the topic.

The role of a consumer in Apache Kafka is to read and process messages from topics. Consumers are essential components of Kafka-based applications, as they are the ones that actually consume the data that is produced to Kafka.

Consumers are important for a number of reasons:

They allow applications to scale horizontally. By adding more consumers to a consumer group, applications can process more messages in parallel. They provide fault tolerance. If a consumer fails, the other consumers in the group will continue to process messages. They allow applications to consume data at their own pace. Consumers can pull messages from Kafka at a rate that is appropriate for their needs.

Conclusion

By completing this exercises you will gain a better understanding of the key concepts in Kafka, such as partitions, brokers, replication factor, producers, and consumers.