InterviewSolution
| 1. |
What is the working of DAG in Spark? |
|
Answer» DAG stands for Direct Acyclic Graph which has a set of finite vertices and edges. The vertices represent RDDs and the edges represent the operations to be performed on RDDs sequentially. The DAG created is submitted to the DAG Scheduler which splits the graphs into stages of tasks BASED on the transformations applied to the data. The stage view has the details of the RDDs of that stage. The working of DAG in SPARK is defined as per the workflow diagram below:
Each RDD keeps track of the pointer to one/more parent RDD along with its relationship with the parent. For example, consider the operation val childB=parentA.map() on RDD, then we have the RDD childB that keeps track of its parentA which is called RDD lineage. |
|