How do we design consumer groups in Kafka for high throughput?

1.	How do we design consumer groups in Kafka for high throughput?
Answer» Let’s consider a scenario where we need to read data from the Kafka topic and only after some custom validation, we can add data into some data storage system. To achieve this we would develop some consumer application which will subscribe to the topic. This ensures that our application will start receiving MESSAGES from the topic on which data validation and storage process would run eventually. Now we come across a scenario where messages publishing rate to topic exceed the rate at which it is consumed by our consumer application. If we go with a single consumer then we MAY fall behind keeping our system updated with incoming messages. The solution to this PROBLEM is by adding more consumers. This will scale up the consumption of topics. This can be easily achieved by creating a consumer group, the consortium under which similar behaviour consumers would reside which can read messages from the same topic by splitting the WORKLOAD. Consumers from the same group usually get their partition of the topic which eventually scales up message consumption and throughput. In case if we have a single consumer for a given topic with 4 partitions then it will read messages from all partitions : The ideal architecture for the above scenario is as below when we have four consumers reading messages from INDIVIDUAL partition : Even in the case of more consumers then partition results in consumer sitting idle, which is also not good architecture design: There is another scenario as well where we can have more than one consumer groups subscribed to the same topic:

Answer»

Let’s consider a scenario where we need to read data from the Kafka topic and only after some custom validation, we can add data into some data storage system. To achieve this we would develop some consumer application which will subscribe to the topic. This ensures that our application will start receiving MESSAGES from the topic on which data validation and storage process would run eventually. Now we come across a scenario where messages publishing rate to topic exceed the rate at which it is consumed by our consumer application.

If we go with a single consumer then we MAY fall behind keeping our system updated with incoming messages. The solution to this PROBLEM is by adding more consumers. This will scale up the consumption of topics. This can be easily achieved by creating a consumer group, the consortium under which similar behaviour consumers would reside which can read messages from the same topic by splitting the WORKLOAD. Consumers from the same group usually get their partition of the topic which eventually scales up message consumption and throughput. In case if we have a single consumer for a given topic with 4 partitions then it will read messages from all partitions :

The ideal architecture for the above scenario is as below when we have four consumers reading messages from INDIVIDUAL partition :

Even in the case of more consumers then partition results in consumer sitting idle, which is also not good architecture design:

There is another scenario as well where we can have more than one consumer groups subscribed to the same topic:

How do we design consumer groups in Kafka for high throughput?

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment