59 + Interview Questions in Kafka in Big Data Page 1 InterviewSolution

1.	How do we design consumer groups in Kafka for high throughput?
Answer» Let’s consider a scenario where we need to read data from the Kafka topic and only after some custom validation, we can add data into some data storage system. To achieve this we would develop some consumer application which will subscribe to the topic. This ensures that our application will start receiving MESSAGES from the topic on which data validation and storage process would run eventually. Now we come across a scenario where messages publishing rate to topic exceed the rate at which it is consumed by our consumer application. If we go with a single consumer then we MAY fall behind keeping our system updated with incoming messages. The solution to this PROBLEM is by adding more consumers. This will scale up the consumption of topics. This can be easily achieved by creating a consumer group, the consortium under which similar behaviour consumers would reside which can read messages from the same topic by splitting the WORKLOAD. Consumers from the same group usually get their partition of the topic which eventually scales up message consumption and throughput. In case if we have a single consumer for a given topic with 4 partitions then it will read messages from all partitions : The ideal architecture for the above scenario is as below when we have four consumers reading messages from INDIVIDUAL partition : Even in the case of more consumers then partition results in consumer sitting idle, which is also not good architecture design: There is another scenario as well where we can have more than one consumer groups subscribed to the same topic:

1.

How do we design consumer groups in Kafka for high throughput?

Answer»

Let’s consider a scenario where we need to read data from the Kafka topic and only after some custom validation, we can add data into some data storage system. To achieve this we would develop some consumer application which will subscribe to the topic. This ensures that our application will start receiving MESSAGES from the topic on which data validation and storage process would run eventually. Now we come across a scenario where messages publishing rate to topic exceed the rate at which it is consumed by our consumer application.

If we go with a single consumer then we MAY fall behind keeping our system updated with incoming messages. The solution to this PROBLEM is by adding more consumers. This will scale up the consumption of topics. This can be easily achieved by creating a consumer group, the consortium under which similar behaviour consumers would reside which can read messages from the same topic by splitting the WORKLOAD. Consumers from the same group usually get their partition of the topic which eventually scales up message consumption and throughput. In case if we have a single consumer for a given topic with 4 partitions then it will read messages from all partitions :

The ideal architecture for the above scenario is as below when we have four consumers reading messages from INDIVIDUAL partition :

Even in the case of more consumers then partition results in consumer sitting idle, which is also not good architecture design:

There is another scenario as well where we can have more than one consumer groups subscribed to the same topic:

26.	Package which need to import in java/scala?
Answer» IMPORT org.apache.kafka.clients.consumer.ConsumerRecord import org.apache.kafka.common.serialization.StringDeserializer import org.apache.spark.streaming.kafka010._ import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe

35.	What is Kafka cluster and what is the key benefits of creating Kafka cluster?
Answer» Kafka cluster is a group of more than one broker. Kafka cluster has a zero DOWNTIME, when we do the expansion of cluster. This cluster use to MANAGE the persistence and replication of message data. This cluster offer’s strong durability due to cluster centric design. In the Kafka cluster, one of the brokers SERVES as the controller, which is responsible for managing the states of partitions and REPLICAS and for performing ADMINISTRATIVE tasks like reassigning partitions.

36.	What ensures load balancing in Kafka?
Answer» The leader and follower nodes serve the purpose of load balancing in Kafka. As we know that leader does the actual writing/reading of data in a given partition while follower systems do the same in passive mode. This ensures data gets REPLICATED ACROSS different nodes. In case of any failure DUE to any reasoning system, SOFTWARE upgrade data remains available. If the leader system goes down for any reason then follower system which was working in the passive mode now becomes a leader and ensure data remains available to EXTERNAL system irrespective of any internal outage. The load balancer does the same thing, it distributes loads across multiple systems in caseload gets increased. In the same way, balances load by replicating messages on different systems and when the leader system goes down, the other follower system becomes the leader and ensure data is available to subscribers system.

38.	Within the producer can you explain when will you experience QueueFullException occur?
Answer» As the name suggests this is the exception which occurs when producer SYSTEMS SENDING more messages above and beyond the capacity of the broker system then brokers would not be able to handle the same. The QUEUE gets full at broker end so no incoming request can be handled any more. As producer systems do not have any information on the capacity of the broker system RESULTS in such exceptions. The messages GET overflowed at broker end. To avoid such a scenario we should have multiple systems working as a broker system so messages can be evenly distributed across multiple systems. The clusters environment where we have multiple nodes servicing the message processing avoid such exceptions to occur. The clustering, partitioning helps in avoiding any such exceptions.

50.	What is Broker and how Kafka utilize broker for communication?
Answer» BROKER are the system which is responsible to maintaining the publish data. Each broker may have one or more than one partition. Kafka contain MULTIPLE broker to MAIN the LOAD balancer. Kafka broker are stateless E.g.: Let’s say there are N partition in a TOPIC and there is N broker, then each broker has 1 partition.

Explore topic-wise InterviewSolutions in .

How do we design consumer groups in Kafka for high throughput?

What is the poll loop in Kafka?

In a consumer group, what is the process of assigning a partition to a particular consumer?

How is multi-tenancy achieved in Kafka?

Explain the anatomy of the Kafka topic?

What are the main features of Kafka that make it suitable for data integration and data processing in real-time?

How Kafka fit in microservices architecture?

Explain producer API in Kafka?

what is the consumer group in Kafka?

What is the difference between a shared message queue and traditional publisher-subscriber message queue?

Explain steps for Kafka installation?

What is the main difference between Kafka and Flume?

How do we send large messages with Kafka?

How do we achieve FIFO behaviour in Kafka?

What does series in Kafka?

What is partition key in Kafka?

How do we start the Kafka server?

What is geo-replication in Kafka?

What is the core API in Kafka?

How Apache Kafka is different then rabbitMQ?

What is Kafka Mirror Maker?

What is a producer in Kafka? What are the different types of Kafka producer APIs? How does Kafka producer write data to a topic containing multiple partitions?

Suggest some use cases or scenarios where Kafka is a good fit? What are the use cases in which you would prefer to use a messaging system other than Kafka?

How can Kafka producer maintain exactly once semantics?

What is meant by Consumer Lag? How can you monitor it?

Package which need to import in java/scala?

Maven dependencies needed for Kafka? Below maven dependency is enough to configure the Kafka ecosystem in the application

Role of zookeeper in Kafka?

What is a architecture of Zookeeper?

What is the use case where Kafka doesn’t fit?

What are the key advantages of using Kafka?

What is the working principle of Kafka?

What is a role of consumer in Kafka?

How producer works in the Kafka?

What is Kafka cluster and what is the key benefits of creating Kafka cluster?

What ensures load balancing in Kafka?

What are the main advantages of using Kafka?

Within the producer can you explain when will you experience QueueFullException occur?

What is replication critical in Kafka environment?

What is leader and follower in Kafka environment?

What is an offset in Kafka?

What is ZooKeeper in Kafka? Can we use Kafka without ZooKeeper?

what are the different components of Kafka?

What is Kafka and what are other alternatives to Kafka?

What is meant by Kafka producer Acknowledgement? What are the different types of acknowledgment settings provided by Kafka?

What is an offset in Kafka? What are the different ways to commit an offset? Where does Kafka maintain offset?

What is meant by fault tolerance? How does Kafka provide fault tolerance?

What is Dumb Broker/Smart Producer vs Smart Broker/Dumb Consumer? What model does Apache Kafka follow?

Let’s say that a producer is writing records to a Kafka topic at 10000 messages/sec while the consumer is only able to read 2500 messages per second. What are the different ways in which you can scale up your consumer?

What is Broker and how Kafka utilize broker for communication?