1.

What is Spark Streaming and how is it implemented in Spark?

Answer»

Spark Streaming is one of the most important features PROVIDED by Spark. It is nothing but a Spark API extension for supporting stream processing of data from different sources.

  • Data from sources LIKE Kafka, Kinesis, Flume, etc are processed and pushed to various destinations like databases, dashboards, machine learning APIs, or as simple as file systems. The data is divided into various STREAMS (similar to BATCHES) and is processed accordingly.
  • Spark streaming supports highly scalable, fault-tolerant continuous stream processing which is mostly used in cases like fraud DETECTION, website monitoring, website click baits, IoT (Internet of Things) sensors, etc.
  • Spark Streaming first divides the data from the data stream into batches of X seconds which are called Dstreams or Discretized Streams. They are internally nothing but a sequence of multiple RDDs. The Spark application does the task of processing these RDDs using various Spark APIs and the results of this processing are again returned as batches. The following diagram explains the workflow of the spark streaming process.


Discussion

No Comment Found