1.

Describe partitioning key in Kafka.

Answer»

In Kafka terminology, messages are referred to as records. Each record has a key and a VALUE, with the key being optional. For record partitioning, the record's key is used. There will be one or more partitions for each topic. Partitioning is a straightforward data structure. It's the append-only sequence of records, which is arranged chronologically by the time they were attached. Once a record is WRITTEN to a partition, it is given an offset – a sequential id that reflects the record's position in the partition and UNIQUELY identifies it inside it.

Partitioning is done USING the record's key. By default, Kafka producer uses the record's key to determine which partition the record should be written to. The producer will ALWAYS choose the same partition for two records with the same key. 

This is important because we may have to deliver records to customers in the same order that they were made. You want these events to come in the order they were created when a consumer purchases an eBook from your webshop and subsequently cancels the transaction. If you receive a cancellation event before a buy event, the cancellation will be rejected as invalid (since the purchase has not yet been registered in the system), and the system will then record the purchase and send the product to the client (and lose you money). You might use a customer id as the key of these Kafka records to solve this problem and assure ordering. This will ensure that all of a customer's purchase events are grouped together in the same partition.



Discussion

No Comment Found