Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

What is the purpose of partitions in Kafka?

Answer»

PARTITIONS allow a single topic to be partitioned across numerous servers from the perspective of the Kafka broker. This allows you to store more data in a single topic than a single server can. If you have three brokers and need to store 10TB of data in a topic, one option is to construct a topic with only one partition and store all 10TB on one broker. ANOTHER alternative is to build a three-partitioned topic and distribute 10 TB of data AMONG all brokers. A partition is a unit of PARALLELISM from the consumer's perspective.

2.

Describe partitioning key in Kafka.

Answer»

In Kafka terminology, messages are referred to as records. Each record has a key and a VALUE, with the key being optional. For record partitioning, the record's key is used. There will be one or more partitions for each topic. Partitioning is a straightforward data structure. It's the append-only sequence of records, which is arranged chronologically by the time they were attached. Once a record is WRITTEN to a partition, it is given an offset – a sequential id that reflects the record's position in the partition and UNIQUELY identifies it inside it.

Partitioning is done USING the record's key. By default, Kafka producer uses the record's key to determine which partition the record should be written to. The producer will ALWAYS choose the same partition for two records with the same key. 

This is important because we may have to deliver records to customers in the same order that they were made. You want these events to come in the order they were created when a consumer purchases an eBook from your webshop and subsequently cancels the transaction. If you receive a cancellation event before a buy event, the cancellation will be rejected as invalid (since the purchase has not yet been registered in the system), and the system will then record the purchase and send the product to the client (and lose you money). You might use a customer id as the key of these Kafka records to solve this problem and assure ordering. This will ensure that all of a customer's purchase events are grouped together in the same partition.

3.

What are the benefits of using clusters in Kafka?

Answer»

Kafka cluster is basically a group of multiple brokers. They are used to maintain load balance. Because Kafka brokers are stateless, they rely on Zookeeper to KEEP track of their cluster state. A single Kafka broker instance can manage HUNDREDS of thousands of reads and writes per second, and each broker can handle TBs of MESSAGES without compromising PERFORMANCE. Zookeeper can be used to CHOOSE the Kafka broker leader. Thus having a cluster of Kafka brokers heavily increases the performance.

4.

What do you mean by Kafka schema registry?

Answer»

 A Schema Registry is present for both producers and consumers in a Kafka CLUSTER, and it holds Avro schemas. For easy serialization and de-serialization, Avro schemas enable the configuration of compatibility parameters between producers and consumers. The Kafka Schema Registry is used to ensure that the schema used by the consumer and the schema used by the producer are identical. The producers just NEED to submit the schema ID and not the whole schema when USING the CONFLUENT schema registry in Kafka. The consumer looks up the MATCHING schema in the Schema Registry using the schema ID.

5.

What are the use cases of Kafka monitoring?

Answer»

Following are the use cases of Kafka monitoring :

  • Track System Resource Consumption: It can be used to keep track of system RESOURCES such as memory, CPU, and disk utilization over time.
  • Monitor threads and JVM usage: Kafka relies on the Java GARBAGE COLLECTOR to free up memory, ensuring that it runs frequently thereby guaranteeing that the Kafka cluster is more active.
  • Keep an eye on the broker, controller, and replication statistics so that the statuses of partitions and REPLICAS can be modified as needed.
  • Finding out which applications are causing excessive demand and identifying performance bottlenecks might help solve performance issues rapidly. 
6.

Tell me about some of the real-world usages of Apache Kafka.

Answer»

Following are some of the real-world usages of Apache Kafka:

  • As a Message Broker: Due to its high throughput value, Kafka is CAPABLE of managing a huge amount of comparable TYPES of messages or data. Kafka can be USED as a publish-subscribe messaging system that allows data to be read and published in a convenient manner.
  • To Monitor operational data: Kafka can be used to keep track of metrics related to certain technologies, such as security logs.
  • Website activity tracking: Kafka can be used to check that data is transferred and received successfully by websites. Kafka can handle the massive amounts of data created by websites for each page and for the activities of users.
  • Data logging: Kafka's data replication between nodes functionality can be used to restore data on nodes that have FAILED. Kafka may also be used to collect data from a variety of logs and MAKE it available to consumers.
  • Stream Processing with Kafka: Kafka may be used to handle streaming data, which is data that is read from one topic, processed, and then written to another. Users and applications will have access to a new topic containing the processed data.
7.

What are some of the disadvantages of Kafka?

Answer»

Following are the DISADVANTAGES of KAFKA :

  • Kafka performance DEGRADES if there is MESSAGE tweaking. When the message does not need to be UPDATED, Kafka works well.
  • Wildcard topic selection is not supported by Kafka. It is necessary to match the exact topic name.
  • Brokers and consumers reduce Kafka's performance when dealing with huge messages by compressing and decompressing the messages. This has an impact on Kafka's throughput and performance.
  • Certain message paradigms, including point-to-point queues and request/reply, are not supported by Kafka.
  • Kafka does not have a complete set of monitoring tools.
8.

What do you mean by geo-replication in Kafka?

Answer»

Geo-Replication is a KAFKA feature that allows messages in one cluster to be copied across many data centers or cloud regions. Geo-replication entails REPLICATING all of the files and STORING them throughout the globe if necessary. Geo-replication can be ACCOMPLISHED with Kafka's MirrorMaker Tool. Geo-replication is a technique for ENSURING data backup.

9.

How do you start a Kafka server?

Answer»

FIRSTLY, we extract Kafka once we have downloaded the most recent version. We MUST make sure that our local environment has JAVA 8+ installed in order to run Kafka.

The following commands must be done in order to start the Kafka server and ensure that all services are started in the correct order:

  • Start the ZooKeeper service by doing the following:
$bin/zookeeper-server-start.sh config/zookeeper.properties
  • To start the Kafka BROKER service, OPEN a new terminal and type the following commands:
$ bin/kafka-server-start.sh config/server.properties
10.

What does it mean if a replica is not an In-Sync Replica for a long time?

Answer»

A REPLICA that has been out of ISR for a long period of time indicates that the follower is unable to fetch DATA at the same RATE as the leader.

11.

What is the maximum size of a message that Kafka can receive?

Answer»

By DEFAULT, the maximum size of a Kafka message is 1MB (MEGABYTE). The broker settings allow you to modify the size. Kafka, on the other HAND, is designed to handle 1KB messages as well.

12.

What do you understand about a consumer group in Kafka?

Answer»

A consumer GROUP in Kafka is a collection of consumers who WORK together to ingest data from the same TOPIC or range of topics. The name of an application is ESSENTIALLY represented by a consumer group. Consumers in Kafka often fall into one of several categories. The ‘-group' command must be used to CONSUME messages from a consumer group. 

13.

Why is Topic Replication important in Kafka? What do you mean by ISR in Kafka?

Answer»

Topic replication is CRITICAL for constructing KAFKA deployments that are both durable and highly AVAILABLE. When one broker fails, topic replicas on other brokers remain available to ensure that data is not lost and that the Kafka DEPLOYMENT is not disrupted. The replication factor specifies the number of copies of a topic that are kept across the Kafka cluster. It takes place at the partition level and is defined at the subject level. A replication factor of two, for example, will keep two copies of a topic for each partition.

Each partition has an elected leader, and other brokers store a copy that can be used if necessary. Logically, the replication factor cannot be more than the cluster's total number of brokers. An In-Sync REPLICA (ISR) is a replica that is up to date with the partition's leader.

14.

Explain the concept of Leader and Follower in Kafka.

Answer»

In Kafka, each partition has one server that acts as a Leader and one or more servers that operate as Followers. The Leader is in charge of all read and WRITES requests for the partition, while the Followers are responsible for passively replicating the leader. In the case that the Leader FAILS, one of the Followers will assume leadership. The server's LOAD is BALANCED as a RESULT of this.

15.

Can we use Kafka without Zookeeper?

Answer»
  • Kafka can now be used WITHOUT ZooKeeper as of version 2.8. The release of Kafka 2.8.0 in April 2021 gave US all the opportunity to try it out without ZooKeeper. However, this version is not yet ready for PRODUCTION and lacks some key features.
  • In the previous versions, bypassing Zookeeper and connecting directly to the Kafka BROKER was not possible. This is because when the Zookeeper is down, it is unable to fulfill client requests.
16.

What do you mean by zookeeper in Kafka and what are its uses?

Answer»

Apache ZooKeeper is a naming registry for distributed applications as well as a distributed, open-source configuration and synchronization SERVICE. It KEEPS track of the Kafka CLUSTER nodes' STATUS, as well as Kafka topics, partitions, and so on.

ZooKeeper is used by Kafka brokers to maintain and coordinate the Kafka cluster. When the topology of the Kafka cluster changes, such as when brokers and topics are added or removed, ZooKeeper notifies all nodes. When a new broker enters the cluster, for example, ZooKeeper notifies the cluster, as well as when a broker fails. ZooKeeper also allows brokers and topic partition pairs to elect leaders, allowing them to select which broker will be the leader for a given partition (and server read and write operations from producers and consumers), as well as which brokers contain clones of the same data. When the cluster of brokers receives a notification from ZooKeeper, they immediately begin to coordinate with one another and elect any new partition leaders that are required. This safeguards against the unexpected absence of a broker.

17.

What do you mean by a Partition in Kafka?

Answer»

Kafka TOPICS are separated into partitions, each of which contains records in a fixed order. A unique offset is assigned and attributed to each record in a partition. Multiple partition logs can be found in a single topic. This ALLOWS several users to read from the same topic at the same time. Topics can be parallelized via partitions, which split DATA into a single topic among numerous brokers.

Replication in Kafka is done at the partition LEVEL. A replica is the redundant element of a topic partition. Each partition often contains one or more replicas, which means that partitions contain messages that are duplicated across many Kafka brokers in the cluster.

One SERVER serves as the leader of each partition (replica), while the others function as followers. The leader replica is in charge of all read-write requests for the partition, while the followers replicate the leader. If the lead server goes down, one of the followers takes over as the leader. To disperse the burden, we should aim for a good balance of leaders, with each broker leading an equal number of partitions.

18.

Explain the four core API architecture that Kafka uses.

Answer»

Following are the four core APIs that Kafka USES:

  • Producer API:
    The Producer API in Kafka allows an application to publish a stream of records to one or more Kafka topics.
  • Consumer API:
    An application can subscribe to one or more Kafka topics using the Kafka Consumer API. It also enables the application to process streams of records generated in relation to such topics.
  • Streams API:
    The Kafka Streams API allows an application to use a stream processing architecture to process data in Kafka. An application can use this API to take input streams from one or more topics, process them using streams OPERATIONS, and generate output streams to transmit to one or more topics. The Streams API allows you to convert input streams into output streams in this manner.
  • Connect API:
    The Kafka Connector API connects Kafka topics to applications. This opens up possibilities for constructing and managing the operations of producers and CONSUMERS, as well as establishing reusable LINKS between these solutions. A connector, for example, may capture all database UPDATES and ensure that they are made available in a Kafka topic.
19.

What are the major components of Kafka?

Answer»

Following are the major components of Kafka:-

  • Topic:
    • A Topic is a category or feed in which records are saved and published.
    • Topics are used to organize all of Kafka's records. Consumer apps read data from topics, whereas producer applications write data to them. Records published to the cluster remain in the cluster for the duration of a configurable retention period.
    • Kafka keeps records in the log, and it's up to the consumers to keep track of where they are in the log (the "offset"). As messages are read, a consumer typically advances the offset in a linear fashion. The consumer, on the other hand, is in charge of the position, as he or she can consume messages in any order. When reprocessing records, for example, a consumer can reset to an older offset.
  • Producer:
    • A Kafka producer is a data source for one or more Kafka topics that optimizes, writes, and publishes messages. Partitioning allows Kafka producers to serialize, compress, and load balance data among brokers.
  • Consumer:
    • Data is read by consumers by reading messages from topics to which they have subscribed. Consumers will be divided into groups. Each consumer in a consumer group will be responsible for reading a subset of the partitions of each subject to which they have subscribed.
  • Broker:
    • A Kafka broker is a server that WORKS as part of a Kafka cluster (in other words, a Kafka cluster is made up of a NUMBER of brokers). Multiple brokers typically work together to build a Kafka cluster, which provides load balancing, reliable redundancy, and failover. The cluster is managed and coordinated by brokers using Apache ZooKeeper. Without sacrificing performance, each broker instance can handle read and write volumes of hundreds of thousands per second (and gigabytes of messages). Each broker has its own ID and can be in charge of one or more topic log divisions.
    • ZooKeeper is also used by Kafka brokers for leader elections, in which a broker is chosen to lead the handling of client requests for a certain PARTITION of a topic. Connecting to any broker will bring a client up to speed with the entire Kafka cluster. A minimum of three brokers should be used to achieve reliable failover; the higher the number of brokers, the more reliable the failover.
20.

What are the traditional methods of message transfer? How is Kafka better from them?

Answer»

Following are the traditional methods of message transfer:-

  • Message Queuing:- 
    A point-to-point technique is used in the message queuing pattern. A message in the queue will be destroyed once it has been consumed, similar to how a message is removed from the server once it has been delivered in the Post Office Protocol. Asynchronous messaging is possible with these queues.
    If a network problem delays a message's delivery, such as if a consumer is unavailable, the message will be held in the queue until it can be sent. This means that messages aren't always sent in the same order. Instead, they are given on a first-come, first-served BASIS, which can improve efficiency in some situations.
  • Publisher - Subscriber Model:- 
    The publish-subscribe pattern entails publishers producing ("PUBLISHING") messages in multiple categories and subscribers consuming published messages from the VARIOUS categories to which they are subscribed. Unlike point-to-point TEXTING, a message is only removed once it has been consumed by all category subscribers.
    Kafka caters to a single consumer abstraction that encompasses both of the aforementioned- the consumer group. Following are the benefits of using Kafka over the traditional messaging transfer techniques:
    • Scalable: A cluster of devices is used to partition and streamline the data thereby, scaling up the storage capacity.
    • Faster: Thousands of clients can be served by a single Kafka broker as it can manage megabytes of reads and writes per second.
    • Durability and Fault-Tolerant: The data is kept persistent and tolerant to any hardware failures by copying the data in the clusters.
21.

What are some of the features of Kafka?

Answer»

Following are the KEY features of Kafka:-

  • Kafka is a messaging system built for high THROUGHPUT and fault tolerance.
  • Kafka has a built-in patriation system known as a Topic.
  • Kafka Includes a replication feature as well.
  • Kafka provides a queue that can handle large AMOUNTS of data and MOVE messages from one sender to another.
  • Kafka can also save the messages to storage and replicate them across the cluster.
  • For coordination and synchronization with other services, Kafka collaborates with Zookeeper.
  • Apache Spark is well supported by Kafka.