InterviewSolution
| 1. |
Describe the Spark Memory model. What is the difference between Off-heap and On-heap memory? |
|
Answer» On heap MEMORY refers to objects stored on JVM heap and bound by JVM Garbage Collection. Off-heap memory objects are stored outside of Java heap via serialization, managed by the application and not bound by garbage collection. This method is heavily used by Spark as it avoids frequent GC and tight control over the lifecycle of objects. However, the logic for memory allocation and release needs to be custom written by the application as is the case with Spark. Since version 1.6, Spark has been following the Unified Memory model wherein both Storage memory and Execution memory share a memory area and both can occupy each other’s free area. By default, Spark uses On-heap memory only. Its SIZE can be configured using parameter ‘spark.executor.memory’ at the time of submitting the job. On heap memory area can be divided into four parts:
Off-heap memory can be enabled by setting the parameter ‘spark.memory.offHeap.enabled’ to true. This memory area consists of only two parts – Storage memory and Execution memory. When Off-heap memory is enabled, an executor will USE both On heap and Off-heap memory. |
|