Demystifying Kafka: Key Concepts You Need to Know
An open-source framework for streaming distributed events is called Apache Kafka. Originally created by LinkedIn, the Apache Software Foundation now owns it. Since Kafka is well known for managing high-throughput, low-latency data streams, it is frequently used in contemporary data structures. Because of its scalability, dependability, and smooth handling of real-time data, it is quite popular.
Components of Kafka :
Broker: Brokers are servers that run Kafka and are in charge of facilitating communication between various services. A Kafka cluster would consist of several brokers.
Event: Events are the communications that are sent to or received by the Kafka broker. The broker’s disk storage contains these communications in the form of bytes.
Producer and Consumer: Producers are the services that generate these events for the Kafka broker, while consumers are the services that consume these events. Additionally, the same service might be able to generate and consume Kafka messages.
Topic: Topic is used to differentiate which type of event is stored in Kafka
Partition: Topic contains partitions. Higher throughput can be achieved by further segmenting a topic into divisions.
Replication Factor: A backup copy of a partition is called a replica. The number of replicas of a partition in a topic that the Kafka cluster should maintain depends on the replication factor of that topic. Two copies of the same partition containing the same data would be kept in the Kafka cluster if the topic had partition set to 1 and replication factor set to 2.
Offset: It is an index that points to the most recent message that was consumed is kept inside Kafka and is referred to as the offset. This index aids in tracking which events have previously been eaten by the consumer. In the case that a customer declines, this offset number would enable us to determine the precise point at which the consumer must begin consuming events.
Zookeeper: A full Kafka cluster is created by combining Apache Kafka with ZooKeeper,which is a centralized coordination tool. It is used to track the status of the Kafka broker nodes and maintain client quotas (the amount of data that a producer or consumer is permitted to read or write),
Consumer Group: To collaborate and consume messages from a range of topics, a group of consumers can join. A consumer group is the name given to this collection of customers. Two customers who are in the same consumer group and have subscribed to the same topic will be given different partitions, and none of them will receive the same messages. If several customers are interested in the same subject, consumer groups can aid in increasing the rate of consumption.
Step-by-Step Guide to Creating Kafka Topics, Partitions, and Replication Factors, publishing and consuming messages.
- To start the ZooKeeper server in a Kafka setup on a Windows system.
For windows :
bin\windows\zookeeper-server-start.bat config\zookeeper.properties
For macOS :
bin/zookeeper-server-start.sh config/zookeeper.properties
- To start the Kafka server
For windows :
bin\windows\kafka-server-start.bat config\server.properties
For macOS :
bin/kafka-server-start.sh config/server.properties
- To start the ZooKeeper server
For windows :
bin\windows\zookeeper-server-start.bat config\zookeeper.properties
For macOS :
bin/zookeeper-server-start.sh config/zookeeper.properties
- To create a new Kafka topic on a Kafka broker running locally
For windows :
bin/windows/kafka-topic.bat –bootstrap-server localhost:9092 –create –topic <name of topic> –partitions <number of partitions> –replication-factor <number of replication factor>
For macOS :
bin/kafka-topics.sh –bootstrap-server localhost:9092 –create –topic <name of topic> –partitions <number of partitions> –replication-factor <number of replication factor>
- To produce messages to a Kafka Topic
For windows :
bin\windows\kafka-console-producer.bat –topic <topicName> –bootstrap-server localhost:9092
For macOS :
bin/kafka-console-producer.sh –topic <topicName> –bootstrap-server localhost:9092
- To consume messages from a Kafka topic
For windows :
kafka-console-consumer.bat –topic topicName –from-beginning –bootstrap-server localhost:9092
For macOS :
bin/kafka-console-producer.sh –topic <topicName> –bootstrap-server localhost:9092
Features of Kafka :
Scalability: Kafka is designed for horizontal scaling. By dividing up divisions over several brokers, it can manage enormous volumes of data and scale smoothly as workloads grow.
High Throughput: Kafka is appropriate for real-time applications since it can process millions of messages per second with no delay.
Durability and Fault Tolerance: Kafka guarantees data dependability even in the event of hardware malfunctions by using replication among brokers.
Distributed Architecture: Kafka’s distributed architecture improves performance and flexibility by enabling producers, brokers, and consumers to operate independently.
Message Retention: In contrast to conventional messaging systems, Kafka lets late consumers view previous events by keeping messages on disk for a defined amount of time.
Working of Kafka :
Producers Communicate:
- Producers create messaging for particular subjects.
- The messages are dispersed throughout the topic’s partitions.
Store Messages for Kafka Brokers:
- Messages are permanently stored on disk by Kafka brokers.
- Consumers can view messages at their own speed because they are stored for a customizable amount of time (e.g., seven days).
Messages Read by Consumers:
- Consumers read messages from the partitions and subscribe to one or more themes.
- To balance the load, Kafka makes sure that every consumer in a consumer group consumes messages from different partitions.
Kafka Use Cases :
- Dashboards with real-time data insights are powered by real-time analytics.
- Enabling communication between decoupled microservices.
- Centralizing logs from several systems is known as log aggregation.
- Making use of Kafka Streams to provide real-time data manipulations.
Conclusion :
Organizations managing large, real-time data streams choose Kafka because of its resilience and adaptability. The core of contemporary data infrastructure is Kafka, whether it is for analytics or event-driven systems.