InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
What are Znodes in Kafka Zookeeper? How many types of Znodes are there? |
|
Answer» The nodes in a ZooKeeper tree are called znodes. Version numbers for data modifications, ACL changes, and timestamps are kept by Znodes in a structure. ZooKeeper uses the version number and timestamp to verify the CACHE and guarantee that updates are coordinated. Each time the data on ZNODE changes, the version number connected with it grows. There are three different TYPES of Znodes:
In this article, we discussed the most frequently asked interview questions on KAFKA. It should be clear why Kafka is such an effective streaming platform. Kafka is a useful solution for scenarios that require real-time data processing, application activity tracking, and monitoring. At the same time, Kafka should not be utilized for on-the-fly data conversions, data storage, or when a simple task queue is all that is required. References and Resources: Spark Interview Java Interview |
|
| 2. |
Differentiate between Kafka streams and Spark Streaming. |
||||||||||
Answer»
|
|||||||||||
| 3. |
How will you change the retention time in Kafka at runtime? |
|
Answer» A topic's retention time can be configured in Kafka. A topic's DEFAULT retention time is seven days. While creating a new subject, we can set the retention time. When a topic is GENERATED, the broker's property log.retention.hours are used to set the retention time. When configurations for a currently operating topic need to be modified, kafka-topic.sh must be used. The right COMMAND is determined on the Kafka version in USE.
|
|
| 4. |
What do you mean by BufferExhaustedException and OutOfMemoryException in Kafka? |
|
Answer» When the producer can't assign MEMORY to a record because the buffer is FULL, a BufferExhaustedException is thrown. If the producer is in non-blocking MODE, and the rate of production exceeds the rate at which data is transferred from the buffer for long enough, the allocated buffer will be depleted, the exception will be thrown. If the consumers are sending huge messages or if there is a spike in the number of messages sent at a rate quicker than the rate of DOWNSTREAM processing, an OutOfMemoryException may arise. As a result, the message QUEUE fills up, consuming memory space. |
|
| 5. |
Can the number of partitions for a topic be changed in Kafka? |
|
Answer» CURRENTLY, Kafka does not allow you to reduce the NUMBER of partitions for a TOPIC. The partitions can be expanded but not SHRUNK. The alter command in Apache Kafka allows you to change the behavior of a topic and its associated configurations. To add EXTRA partitions, use the alter command. To increase the number of partitions to five, use the following command: ./bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic sample-topic --partitions 5 |
|
| 6. |
What do you mean by graceful shutdown in Kafka? |
|
Answer» The Apache cluster will automatically identify any broker shutdown or failure. In this instance, new leaders for partitions previously handled by that device will be chosen. This can happen as a result of a server failure or even if it is shut down for maintenance or CONFIGURATION changes. When a server is taken down on purpose, Kafka PROVIDES a GRACEFUL method for terminating the server rather than killing it. When a server is switched off:
|
|
| 7. |
How will you expand a cluster in Kafka? |
|
Answer» To add a server to a Kafka cluster, it only needs to be given a unique broker id and Kafka MUST be started on that server. However, until a new topic is created, a new server will not be given any of the data partitions. As a result, when a new MACHINE is introduced to the cluster, some existing data must be migrated to these new machines. To RELOCATE some partitions to the new broker, we USE the partition reassignment tool. Kafka will make the new server a follower of the partition it is migrating to, allowing it to replicate the data on that partition completely. When all of the data has been duplicated, the new server can join the ISR, and one of the current replicas will ERASE the data it has for that partition. |
|
| 8. |
What do you mean by an unbalanced cluster in Kafka? How can you balance it? |
||||||||||||||||||||||||||||||||||||||||
|
Answer» It's as simple as assigning a UNIQUE broker id, listeners, and log directory to the server.properties file to add new brokers to an existing Kafka cluster. However, these brokers will not be allocated any data PARTITIONS from the cluster's existing topics, so they won't be performing much work unless the partitions are moved or new topics are formed. A cluster is referred to as unbalanced if it has any of the following problems : Leader Skew: Consider the following scenario: a topic with three partitions and a replication factor of three across three brokers. The leader receives all READS and writes on a partition. Followers send fetch requests to the leaders in order to receive their most recent messages. Followers exist solely for redundancy and fail-over purposes. Consider the case of a broker who has failed. It's possible that the failed broker was a collection of numerous leader partitions. Each unsuccessful broker's leader partition is promoted as the leader by its followers on the other brokers. Because fail-over to an out-of-sync replica is not allowed, the follower must be in sync with the leader in order to be promoted as the leader. If another broker goes down, all of the leaders are on the same broker, therefore there is no redundancy. When both brokers 1 and 3 go live, the partitions gain some redundancy, but the leaders stay focused on broker 2. As a result, the Kafka brokers have a leader imbalance. When a node is a leader for more partitions than the number of partitions/number of brokers, the cluster is in a leader skewed condition. Solving the leader skew problem: Kafka offers the ability to reassign leaders to the desired replicas in order to tackle this problem. This can be accomplished in one of two ways:
Broker Skew: Let us consider a Kafka cluster with nine brokers. Let the topic name be "sample_topic." The following is how the brokers are assigned to the topic in our example:
On brokers 3,4 and 5, the topic “sample_topic” is skewed. This is because if the number of partitions per broker on a given issue is more than the average, the broker is considered to be skewed. Solving the broker skew problem : The following steps can be used to solve it:
|
|||||||||||||||||||||||||||||||||||||||||
| 9. |
What are the guarantees that Kafka provides? |
|
Answer» Following are the guarantees that Kafka assures :
|
|
| 10. |
What do you understand about log compaction and quotas in Kafka? |
|
Answer» Log compaction is a way through which Kafka assures that for each topic partition, at least the last known value for each message KEY within the log of data is kept. This allows for the restoration of state following an application crash or a system failure. During any operational maintenance, it allows REFRESHING caches after an application restarts. Any consumer processing the log from the beginning will be able to see at least the FINAL state of all records in the order in which they were written, because of the log compaction. A Kafka cluster can apply quotas on producers and fetch requests as of Kafka 0.9. Quotas are byte-rate limits that are set for each client-id. A client-id is a logical identifier for a request-making application. A single client-id can THEREFORE link to numerous producers and client instances. The quota will be applied to them all as a single unit. Quotas PREVENT a single application from monopolizing broker resources and causing network saturation by consuming extremely large amounts of data. |
|
| 11. |
Tell me about some of the use cases where Kafka is not suitable. |
|
Answer» Following are some of the use cases where Kafka is not suitable :
|
|
| 12. |
Describe message compression in Kafka. What is the need of message compression in Kafka? Also mention if there are any disadvantages of it. |
|
Answer» Producers transmit data to BROKERS in JSON format in Kafka. The JSON format stores data in string form, which can result in several duplicate records being stored in the Kafka topic. As a result, the amount of disc space used increases. As a result, before delivering messages to Kafka, compression or delaying of data is performed to save disk space. Because message compression is performed on the producer side, no changes to the consumer or broker setup are required. It is advantageous because of the FOLLOWING factors:
Message Compression has the following disadvantages :
|
|
| 13. |
What do you mean by confluent kafka? What are its advantages? |
|
Answer» Confluent is an Apache Kafka-based data STREAMING platform: a full-scale streaming platform capable of not just publish-and-subscribe but also data STORAGE and processing within the stream. Confluent Kafka is a more comprehensive Apache Kafka distribution. It enhances Kafka's integration capabilities by including tools for optimizing and managing Kafka clusters, as well as ways for ensuring the streams' security. Kafka is easy to CONSTRUCT and operate because of the Confluent Platform. Confluent's software comes in three varieties:
Following are the advantages of Confluent Kafka :
|
|
| 14. |
Differentiate between Kafka and Flume. |
||||||||||||||||
|
Answer» APACHE Flume is a dependable, DISTRIBUTED, and available software for aggregating, collecting, and transporting massive amounts of log data quickly and efficiently. Its architecture is VERSATILE and simple, based on streaming data flows. It's written in the Java programming language. It features its own QUERY processing engine, allowing it to alter each fresh batch of data before sending it to its intended sink. It is designed to be adaptable. The following table illustrates the differences between Kafka and Flume :
|
|||||||||||||||||
| 15. |
What do you understand about Kafka MirrorMaker? |
|
Answer» The MirrorMaker is a standalone utility for copying data from one Apache KAFKA CLUSTER to ANOTHER. The MirrorMaker reads data from original cluster topics and WRITES it to a destination cluster with the same topic name. The source and destination clusters are separate entities that can have VARIOUS partition counts and offset values. |
|
| 16. |
Differentiate between Kafka and Java Messaging Service(JMS). |
||||||||||
|
Answer» The following table illustrates the differences between KAFKA and Java Messaging Service:
|
|||||||||||
| 17. |
Describe in what ways Kafka enforces security. |
|
Answer» The security given by Kafka is made up of three parts:
|
|
| 18. |
Differentiate between Redis and Kafka. |
||||||||||||
|
Answer» The following table ILLUSTRATES the DIFFERENCES between Redis and Kafka:
|
|||||||||||||
| 19. |
What are the parameters that you should look for while optimising kafka for optimal performance? |
|
Answer» Two MAJOR measurements are taken into account while tuning for optimal performance: latency measures, which relate to the amount of TIME it takes to process one event, and throughput measures, which refer to the number of events that can be PROCESSED in a given length of time. Most systems are tuned for one of two things: delay or throughput, whereas Kafka can do both. The following stages are involved in optimizing Kafka's performance:
|
|
| 20. |
Differentiate between Rabbitmq and Kafka. |
|||||||||||||||
|
Answer» Following are the differences between Kafka and Rabbitmq: Based on Architecture : Rabbitmq:
Kafka:
Manner of Handling Messages :
Based on Approach :
Based on Performance:
|
||||||||||||||||
| 21. |
What is a Replication Tool in Kafka? Explain some of the replication tools available in Kafka. |
|
Answer» The KAFKA Replication Tool is used to create a high-level design for the replica maintenance process. The following are some of the replication tools available:
|
|
| 22. |
What do you mean by multi-tenancy in Kafka? |
|
Answer» Multi-tenancy is a software operation MODE in which many instances of one or more programs operate in a shared environment independently of one another. The instances are considered to be physically separate yet logically connected. The level of logical isolation in a SYSTEM that supports multi-tenancy must be COMPREHENSIVE, but the level of physical integration can vary. Kafka is multi-tenant because it allows for the configuration of many topics for DATA consumption and PRODUCTION on the same cluster. |
|