InterviewSolution
| 1. |
How is Caching relevant in Spark Streaming? |
|
Answer» Spark Streaming involves the division of data stream’s data into batches of X seconds called DStreams. These DStreams let the developers cache the data into the memory which can be very USEFUL in CASE the data of DStream is used for multiple computations. The caching of data can be done using the cache() method or using persist() method by using appropriate persistence levels. The default persistence level value for input streams receiving data over the networks such as KAFKA, Flume, etc is set to achieve data replication on 2 nodes to accomplish fault tolerance.
The main ADVANTAGES of caching are:
|
|