Topics: It is a particular stream of data.
-It is similar to the table in the database.
-Any number of topics can be created.
-The topic is identified by its name.
Topics are split into partitions.
-Each partition is ordered.ex P0,P1,P2.
-Each message within a partition gets an incremental id called OFFSET.
Order is guaranteed only within a partition.
Data is kept only for a limited time.
Once data is written to a partition, it can’t be changed.
Broker: A Kafka is composed of multiple brokers (Servers)
Each broker is identified with is the id(integer).
Broker contains certain topic partition.
Topic Replication factor
Kafka is a distributed system. whenever the distributed system comes into the picture there could be a chance where any system can be shut down abruptly there comes the need for the replication factor to keep the data safe.
Suppose you have the 3 brokers 101,102,103 and a topic(Topic1) with a 2 partition 2 and replication factor of 2. so partition-0 of Topic-1 will be on the broker 101 and partition-1 of Topic-1 will be on 102, and the replica of partition-0 will be on 102 and replica of partition-1 will be on 103.so in a scenario when the Broker 102 gets down data will be still available on the broker 101 and 103.
Leaders in partition: At any given time only ONE broker can be a leader for a given partition. Only that leader can receive and serve data for partition. The other broker will just synchronize the data. Therefore each partition has one leader and multiple in-sync replicas.
How do we get the data in the Kafka it is from the Producer which writes the data to the Topic. Producers automatically know to which broker and partition to write. In the case of broker failure, it automatically recovers.
The producer can choose to receive an acknowledgment of data writes:
acks = 0 => Producer won’t wait for the data acknowledgment.
acks = 1 => Producer will wait for the leader to acknowledge.
acks = all => Leader and replica will acknowledge.
The producer can choose to send a key with the message (number, string,etc..).if the key is null data is sent in the round-robin fashion. if a key is sent then all the message for that key will always go to the same partition. A key is used if you need message ordering for a specific field. A key hashing technique will be used to determine which partition will data will reside.
Consumers will read data from the topic. Consumers know which broker to read from automatically. In the case of broker failure consumers also recover automatically. Data is read in order within each partition. Kafka stores the offset at which the consumer has been reading. (like checkpoint).The offset committed live in Kafka topic named _consumer_offsets. if the consumer dies in between, it will be able to read from where it left off due to committed offsets.
Every Kafka broker is called a “bootstrap server“, after connecting to the one server you will be connected to the entire cluster because each broker has the information(metadata) about all the broker, topics, and partition,
Zookeeper is the one who manages the brokers(manage the list of it). It also helps in performing the leader election for the partitions. Zookeeper sends the notification to Kafka in case of any changes (eg, new topic, broker dies, delete topics, etc..).Kafka can’t work without the zookeeper.