1.

What is the lineage graph and DAG?

Answer»

Lineage graph:

It is the graph of how all the parent RDD’s are connected to the derived RDD’s. It represents how each RDD is depended on others and how transformations are applied to each RDD.

For example-

val rdd1 = rdd.map()

Here result keeps a reference of the RDD data, that’s a lineage. This RDD lineage is used to recompute the data if there are faults while computing.

DAG:

DAG stands for Directed Acyclic Graph.  DAG is a collection of all the RDD and the corresponding transformations on them. DAG will be CREATED when the user creates RDD and applies transformations on them. When ACTION is performed DAG will be given to the DAG scheduler which DIVIDES DAG into stages.  DAG can help with fault tolerance.

Difference between lineage graph and DAG :

Lineage graph deals with RDD’s so it is applicable till transformations, whereas DAG shows DIFFERENT stages of Spark job.  It shows the complex task i.e. transformations + ACTIONS.



Discussion

No Comment Found