Explain the anatomy of a map-reduce job run.

1.	Explain the anatomy of a map-reduce job run.
Answer» A job consists of the following components: The client which submits map-reduce job, Resource manager which coordinates allocation of compute resources, Node managers which launch and monitor the compute containers, Hadoop Distributed FILE System (HDFS) which is used for sharing resources between the above components and Application Master which coordinates tasks running in map-reduce job. The map-reduce job begins when the client/job submitter sends the request to the Resource Manager. It asks for a new application id to be allocated. It also checks whether the output directory specified exists or not, and computes input splits for the job as well. The resources needed to run the job including the application jar are copied to HDFS. Finally, the job submitter submits the job to Resource Manager. The Resource Manager now allocates a container and launches the application master. The application master determines no of the mapper and reducer tasks that need to be LAUNCHED and requests resource manager to launch containers for the same. Resource Manager, in turn, directs Node Managers to launch the containers where the tasks get run. Once the tasks are initiated, the application master KEEPS track of them. In case any task FAILS or gets stuck it relaunches them on another container. Requests for map tasks are made first and with a higher priority than those for reduce tasks, since all the map tasks must complete before the sort phase of the reduce can start. Once the mapper task completes, its output undergoes sorting, shuffling and partitioning (in case of multiple reducers), is sent to the combiner (if any) and finally sent to reducer(s). The output of reducer is written to HDFS.

Answer»

A job consists of the following components: The client which submits map-reduce job, Resource manager which coordinates allocation of compute resources, Node managers which launch and monitor the compute containers, Hadoop Distributed FILE System (HDFS) which is used for sharing resources between the above components and Application Master which coordinates tasks running in map-reduce job.

The map-reduce job begins when the client/job submitter sends the request to the Resource Manager. It asks for a new application id to be allocated. It also checks whether the output directory specified exists or not, and computes input splits for the job as well. The resources needed to run the job including the application jar are copied to HDFS. Finally, the job submitter submits the job to Resource Manager.

The Resource Manager now allocates a container and launches the application master. The application master determines no of the mapper and reducer tasks that need to be LAUNCHED and requests resource manager to launch containers for the same. Resource Manager, in turn, directs Node Managers to launch the containers where the tasks get run. Once the tasks are initiated, the application master KEEPS track of them. In case any task FAILS or gets stuck it relaunches them on another container. Requests for map tasks are made first and with a higher priority than those for reduce tasks, since all the map tasks must complete before the sort phase of the reduce can start. Once the mapper task completes, its output undergoes sorting, shuffling and partitioning (in case of multiple reducers), is sent to the combiner (if any) and finally sent to reducer(s). The output of reducer is written to HDFS.

Explain the anatomy of a map-reduce job run.

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment