1.

You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1gb and your io.sort.mb value is set to 1000mb. How would you tune your io.sort? Mb value to achieve maximum memory to I/O ratio.

Answer»

Basically DFS.HOST file contains all the data node details and it allows access to all the nodes mentioned in the DFS.HOST file. This is the default configuration used by the name node. DFS.HOST and DFS.HOST.EXCLUDE will help to re-commission and decommission the data nodes. 

Hadoop provides the decommission feature to exclude a SET of existing data nodes, the nodes to be taken out, should be included in excluding file and the exclude file name should be specified as a configuration parameter as dfs.hosts.exclude. You can find the example mentioned below.

Examples:

Modify the conf/mapred-site.xml, add:  &LT;property> <name>dfs.hosts</name> <value>/opt/hadoop/Bibhu/conf/datanode-allow.list</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/opt/hadoop/Bibhu/conf/datanode-deny.list</value> </property>

Decommission cannot happen immediately because it REQUIRES replication of potentially a large number of blocks and we do not WANT the cluster to be overwhelmed with just this one job. The decommission progress can be monitored on the name-node web UI or Cloudera UI. Till all blocks are replicated, the status of nodes will be in the "Decommission in progress" state. when decommission is done the state will change to "Decommissioned". The node can be removed whenever decommission is finished.

We can use below commands Without creating a dfs.hosts file or making any entries, run the commands hadoop.dfsadminrefreshModes on the Name Node.

# $HADOOP_HOME/bin/hadoop dfsadmin -refresh nodes

-refreshNodes, It will update the name node with a set of data nodes so that data nodes are allowed to connect the Name node.



Discussion

No Comment Found

Related InterviewSolutions