InterviewSolution
Saved Bookmarks
| 1. |
What is caching or persistence in Spark? How are the two different from each other? What are various storage levels for persisting RDDs? |
|
Answer» Caching or Persistence is an optimization technique involving saving the results date result to disk or memory. An RDD can be involved in multiple transformations/actions. Each such transformation will require the same RDD to be evaluated multiple times. This is both time and memory consuming. It can be easily avoided by caching or persisting the RDD. The difference between cache() and persist() in Spark is that in the CASE of former, storage level is Memory Only while later provides a host of other storage levels. There are five different storage levels in Spark:
|
|