1.

What happens when a Spark Job is submitted?

Answer»

Below is the step which spark job follows once job GET submitted:

  • A standalone application starts and instantiates a SparkContext instance and it is only then when you can call the application a driver.
  • The driver program ASKS for RESOURCES to the cluster manager to launch executors.
  • The cluster manager launches executors.
  • The driver process runs through the user application. 
  • Depending on the actions and TRANSFORMATIONS over RDDs task are sent to executors.
  • Executors run the tasks and save the results.
  • If any worker crashes, its tasks will be sent to different executors to be processed again.
  • Driver implicitly converts the code containing transformations and actions into a logical
  • directed acyclic graph (DAG). 

Spark automatically deals with failed or slow machines by re-executing failed or slow tasks. For example, if the node running a PARTITION of a map () operation crashes, Spark will rerun it on another node; and even if the node does not crash but is simply much slower than other nodes, Spark can preemptively launch a “speculative” copy of the task on another node and take its result if that finishes.



Discussion

No Comment Found