InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
What is the purpose of partitions in Kafka? |
|
Answer» PARTITIONS allow a single topic to be partitioned across numerous servers from the perspective of the Kafka broker. This allows you to store more data in a single topic than a single server can. If you have three brokers and need to store 10TB of data in a topic, one option is to construct a topic with only one partition and store all 10TB on one broker. ANOTHER alternative is to build a three-partitioned topic and distribute 10 TB of data AMONG all brokers. A partition is a unit of PARALLELISM from the consumer's perspective. |
|
| 2. |
Describe partitioning key in Kafka. |
|
Answer» In Kafka terminology, messages are referred to as records. Each record has a key and a VALUE, with the key being optional. For record partitioning, the record's key is used. There will be one or more partitions for each topic. Partitioning is a straightforward data structure. It's the append-only sequence of records, which is arranged chronologically by the time they were attached. Once a record is WRITTEN to a partition, it is given an offset – a sequential id that reflects the record's position in the partition and UNIQUELY identifies it inside it. Partitioning is done USING the record's key. By default, Kafka producer uses the record's key to determine which partition the record should be written to. The producer will ALWAYS choose the same partition for two records with the same key. This is important because we may have to deliver records to customers in the same order that they were made. You want these events to come in the order they were created when a consumer purchases an eBook from your webshop and subsequently cancels the transaction. If you receive a cancellation event before a buy event, the cancellation will be rejected as invalid (since the purchase has not yet been registered in the system), and the system will then record the purchase and send the product to the client (and lose you money). You might use a customer id as the key of these Kafka records to solve this problem and assure ordering. This will ensure that all of a customer's purchase events are grouped together in the same partition. |
|
| 3. |
What are the benefits of using clusters in Kafka? |
|
Answer» Kafka cluster is basically a group of multiple brokers. They are used to maintain load balance. Because Kafka brokers are stateless, they rely on Zookeeper to KEEP track of their cluster state. A single Kafka broker instance can manage HUNDREDS of thousands of reads and writes per second, and each broker can handle TBs of MESSAGES without compromising PERFORMANCE. Zookeeper can be used to CHOOSE the Kafka broker leader. Thus having a cluster of Kafka brokers heavily increases the performance. |
|
| 4. |
What do you mean by Kafka schema registry? |
|
Answer» A Schema Registry is present for both producers and consumers in a Kafka CLUSTER, and it holds Avro schemas. For easy serialization and de-serialization, Avro schemas enable the configuration of compatibility parameters between producers and consumers. The Kafka Schema Registry is used to ensure that the schema used by the consumer and the schema used by the producer are identical. The producers just NEED to submit the schema ID and not the whole schema when USING the CONFLUENT schema registry in Kafka. The consumer looks up the MATCHING schema in the Schema Registry using the schema ID. |
|
| 5. |
What are the use cases of Kafka monitoring? |
|
Answer» Following are the use cases of Kafka monitoring :
|
|
| 6. |
Tell me about some of the real-world usages of Apache Kafka. |
|
Answer» Following are some of the real-world usages of Apache Kafka:
|
|
| 7. |
What are some of the disadvantages of Kafka? |
|
Answer» Following are the DISADVANTAGES of KAFKA :
|
|
| 8. |
What do you mean by geo-replication in Kafka? |
|
Answer» Geo-Replication is a KAFKA feature that allows messages in one cluster to be copied across many data centers or cloud regions. Geo-replication entails REPLICATING all of the files and STORING them throughout the globe if necessary. Geo-replication can be ACCOMPLISHED with Kafka's MirrorMaker Tool. Geo-replication is a technique for ENSURING data backup. |
|
| 9. |
How do you start a Kafka server? |
|
Answer» FIRSTLY, we extract Kafka once we have downloaded the most recent version. We MUST make sure that our local environment has JAVA 8+ installed in order to run Kafka. The following commands must be done in order to start the Kafka server and ensure that all services are started in the correct order:
|
|
| 10. |
What does it mean if a replica is not an In-Sync Replica for a long time? |
|
Answer» A REPLICA that has been out of ISR for a long period of time indicates that the follower is unable to fetch DATA at the same RATE as the leader. |
|
| 11. |
What is the maximum size of a message that Kafka can receive? |
|
Answer» By DEFAULT, the maximum size of a Kafka message is 1MB (MEGABYTE). The broker settings allow you to modify the size. Kafka, on the other HAND, is designed to handle 1KB messages as well. |
|
| 12. |
What do you understand about a consumer group in Kafka? |
|
Answer» A consumer GROUP in Kafka is a collection of consumers who WORK together to ingest data from the same TOPIC or range of topics. The name of an application is ESSENTIALLY represented by a consumer group. Consumers in Kafka often fall into one of several categories. The ‘-group' command must be used to CONSUME messages from a consumer group. |
|
| 13. |
Why is Topic Replication important in Kafka? What do you mean by ISR in Kafka? |
|
Answer» Topic replication is CRITICAL for constructing KAFKA deployments that are both durable and highly AVAILABLE. When one broker fails, topic replicas on other brokers remain available to ensure that data is not lost and that the Kafka DEPLOYMENT is not disrupted. The replication factor specifies the number of copies of a topic that are kept across the Kafka cluster. It takes place at the partition level and is defined at the subject level. A replication factor of two, for example, will keep two copies of a topic for each partition. Each partition has an elected leader, and other brokers store a copy that can be used if necessary. Logically, the replication factor cannot be more than the cluster's total number of brokers. An In-Sync REPLICA (ISR) is a replica that is up to date with the partition's leader. |
|
| 14. |
Explain the concept of Leader and Follower in Kafka. |
|
Answer» In Kafka, each partition has one server that acts as a Leader and one or more servers that operate as Followers. The Leader is in charge of all read and WRITES requests for the partition, while the Followers are responsible for passively replicating the leader. In the case that the Leader FAILS, one of the Followers will assume leadership. The server's LOAD is BALANCED as a RESULT of this. |
|
| 15. |
Can we use Kafka without Zookeeper? |
Answer»
|
|
| 16. |
What do you mean by zookeeper in Kafka and what are its uses? |
|
Answer» Apache ZooKeeper is a naming registry for distributed applications as well as a distributed, open-source configuration and synchronization SERVICE. It KEEPS track of the Kafka CLUSTER nodes' STATUS, as well as Kafka topics, partitions, and so on. ZooKeeper is used by Kafka brokers to maintain and coordinate the Kafka cluster. When the topology of the Kafka cluster changes, such as when brokers and topics are added or removed, ZooKeeper notifies all nodes. When a new broker enters the cluster, for example, ZooKeeper notifies the cluster, as well as when a broker fails. ZooKeeper also allows brokers and topic partition pairs to elect leaders, allowing them to select which broker will be the leader for a given partition (and server read and write operations from producers and consumers), as well as which brokers contain clones of the same data. When the cluster of brokers receives a notification from ZooKeeper, they immediately begin to coordinate with one another and elect any new partition leaders that are required. This safeguards against the unexpected absence of a broker. |
|
| 17. |
What do you mean by a Partition in Kafka? |
|
Answer» Kafka TOPICS are separated into partitions, each of which contains records in a fixed order. A unique offset is assigned and attributed to each record in a partition. Multiple partition logs can be found in a single topic. This ALLOWS several users to read from the same topic at the same time. Topics can be parallelized via partitions, which split DATA into a single topic among numerous brokers. Replication in Kafka is done at the partition LEVEL. A replica is the redundant element of a topic partition. Each partition often contains one or more replicas, which means that partitions contain messages that are duplicated across many Kafka brokers in the cluster. One SERVER serves as the leader of each partition (replica), while the others function as followers. The leader replica is in charge of all read-write requests for the partition, while the followers replicate the leader. If the lead server goes down, one of the followers takes over as the leader. To disperse the burden, we should aim for a good balance of leaders, with each broker leading an equal number of partitions. |
|
| 18. |
Explain the four core API architecture that Kafka uses. |
|
Answer» Following are the four core APIs that Kafka USES:
|
|
| 19. |
What are the major components of Kafka? |
|
Answer» Following are the major components of Kafka:-
|
|
| 20. |
What are the traditional methods of message transfer? How is Kafka better from them? |
|
Answer» Following are the traditional methods of message transfer:-
|
|
| 21. |
What are some of the features of Kafka? |
|
Answer» Following are the KEY features of Kafka:-
|
|