InterviewSolution
| 1. |
What is the key Spark-Driver component to handle the execution of Big Data? |
Answer»
DAGScheduler is the scheduling layer of Apache Spark that implements stage-oriented scheduling. It transforms a logical execution plan into the DAGScheduler which is the scheduling layer of Apache Spark that implements stage-oriented scheduling. SparkContext hands over a logical plan to DAGScheduler that it in turn TRANSLATES to a set of stages that are submitted as TaskSets for execution. TaskScheduler is responsible for submitting tasks for execution in a Spark application. TaskScheduler tracks the executors in a Spark application using executorHeartbeatReceived and executor Lost methods that are to inform about active and lost executors, respectively. Spark comes with the following custom TaskSchedulers: TaskSchedulerImpl — the default TaskScheduler (that the following two YARN-specific TaskSchedulers extend). YarnScheduler for Spark on YARN in client deploy mode. YarnClusterScheduler for Spark on YARN in cluster deploy mode.
BackendScheduler is a pluggable interface to support various cluster managers, cluster managers differ by their custom TASK scheduling modes and resource offers mechanisms Spark ABSTRACTS the differences in BackendScheduler contract.
Responsible for the translation of spark user code into actual spark JOBS executed on the cluster. |
|