1.

How Hadoop uses HDFS staging directory as well as local directory during a job run?

Answer»

YARN requires a staging directory for temporary files created by running jobs. local directories for STORING various scripts that are generated to start up the job's containers (which will run the map reduce task).

Staging directory:

  1. When a user executes a MapReduce job, they usually invoke a job client to configure the job and lunch the same.
  2. As part of the job execution job client first checks to see if there is a staging directory under the user's name in HDFS, If not than staging directory is created under /user/<username>/.staging
  3. In addition to job-related files a file from the Hadoop JAR file named hadoop-mapreduce-client-jobclient.jar also placed in the .staging directory after renaming it to job.jar
  4. Once the staging directory is set up the job client submits the job to the resource MANAGER.
  5. Job client also sends back to the console the status of the job progression(EX map 5%, reduce 0%).

Local directory:

  1. The resource manager service selects a node manager on one of the cluster's nodes to launch the application MASTER process, which is always the very fast container to be created.in yarn job.
  2. The resource manager chooses the node manager to depend on the available resources at the time of launching the job. You cannot specify the node on which to start the job.
  3. The node manager service starts up and generates various scripts in the local cache directory to execute the application Master container/directory.
  4. The Application Masters directories are stored in the location that you have specified for the node manager's local directories with the yarn.nodemanager.local-dirs configuration property in the yarn-site.xml.
  5. The yarn.nodemanagercan store its localized file directory with the following directory structure.

[yarn.nodemanager.local-dirs]/usercache/$user/appcache/application_${app_is}



Discussion

No Comment Found

Related InterviewSolutions