|
Answer» Kafka and flume both are offerings from Apache software only but there are some key differences. Please find below an overview of both to understand the differences : Kafka - Kafka belongs to distributed publisher-subscriber model of the messaging system. The Kafka system enables subscribers of reading precisely the messages they are keen on. The subscriber's system subscribes to the topic(different category of messages)they are interested in.
- The Kafka system allows even LATE entrant consumers to read the messages as messages are persisted for the time until they get expired. This is the reason why it is termed a pull framework.
- Kafka persists information for some time depending upon configuration, this allows information would certainly be reprocessed any number of times, by any number of consumer groups, yet above all, make the rate of those events won't over-burden the databases or the procedures attempting to get information into databases.
- It can be utilized for any framework to associate with different frameworks that require organization-level messaging (website action FOLLOWING, operational measurements, stream handling and so on) It's a broadly useful publisher-subscriber model framework, and can work with any subscriber system or producer system.
- Kafka is truly adaptable and salable. One of the key advantages of Kafka is that it is anything but difficult to include a huge number of consumers without influencing execution and without downtime.
- High availability ensures RECOVERABLE if there should be an occurrence of downtime.
Flume - Flume has been formed to ingest information into Hadoop. It is firmly coordinated with Hadoop's observing framework, file system framework, record configurations, and utilities. For the most part Flume advancement is to make it compatible with Hadoop.
- Flume is a push framework which infers information loss when consumers can't keep up. Its primary purpose includes sending messages to HDFS & HBase.
- Flume isn't as versatile as Kafka as adding more consumers to Flume means changing the topology of Flume pipeline configuration, reproducing the channel to convey the messages to another sink. It isn't generally a versatile arrangement when you have an immense number of consumers. Additionally, since the flume topology should be transformed, it requires some downtime.
- Flume does not recreate events– if there should be an occurrence of flume-agent failure, you will lose events in the channel
At the POINT when to UTILIZE: 1. Flume: When working with non-social information sources, for example, log documents which are to be gushed into Hadoop. Kafka: When needing a very dependable and versatile enterprise-level framework to interface numerous various frameworks (Including Hadoop) 2. Kafka for Hadoop: Kafka resembles a pipeline that gathers information continuously and pushes to Hadoop. Hadoop forms it inside and after that according to the prerequisite either serve to different consumers(Dashboards, BI, and so on) or stores it for further handling. Kafka
| Flume
|
|---|
Apache Kafka is multiple producers-consumers general-purpose tool.
| Apache Flume is a special-purpose tool for specific applications.
| It replicates the events.
| It does not replicate the events.
| Kafka support data streams for multiple applications
| Flume is specific for Hadoop and big data analysis.
| Apache Kafka can process and monitor data in distributed systems.
| Apache Flume gathers data from distributed systems to a centralized data store.
| Kafka supports large sets of publishers, subscribers and multiple applications.
| Flume supports a large set of source and destination types to land data on Hadoop.
|
|