InterviewSolution
| 1. |
Explain how the Name node gets to know all the available data node in the Hadoop cluster. |
|
Answer» In Hadoop cluster when we are talking about Data node, Data node is where the actual data we are keeping. Data nodes are sending a heartbeat message to the name node in every 3 seconds to confirm that they are active. If the Name Node does not receive a heartbeat from a particular data node for 10 minutes, then it considers that data node to be dead. Then Name Node initiates the replication of Dead data node blocks to some other data nodes which are active. Data nodes can talk to each other to rebalance the data, MOVE and copy the data around and keep the replication active in the cluster. You can get the BLOCK report using below HDFS commands. Example: hadoop fsck / ==> Filesystem check on HDFS # hadoop fsck /hadoop/container/pbibhu
FSCK ended at Thu Oct 20 20:49:59 CET 2011 in 7516 milliseconds The filesystem under path '/hadoop/container/pbibhu 'is HEALTHY Name node is the node which stores the file system metadata when we are talking about metadata, it is having information LIKE List of file names, Owner, Permissions, Timestamps, Size, Replication Factor, List of Blocks for each file etc. Metadata, which files maps to what block location and which blocks are stored in which data node. When data nodes are storing a block of information, it maintains a checksum for each block as well. when any data has been written to HDFS, checksum value has been written simultaneously and when it reads by default verifies the same checksum value. The data nodes update the name node with the block information periodically and before updating verify the value of the checksum. when the checksum value is not correct for a particular block then we will consider as DISK level corruption for that particular block , it skips that block information while reporting to the name node, in this way name node will get to know the disk level corruption on that data node and takes necessary steps like it can be replicated from its alternate locations to other active data nodes to bring the replication factor back to the normal level. Data nodes can be listed in DFS.HOSTS file, It contains a list of hosts that are permitted to connect to the Name Node. Example: Add this property to hdfs-site.xml: <property> <name>dfs.hosts</name> <value>/home/hadoop/includes</value> </property> includes: host name1 hostname2 hostname3If include file is empty then all hosts are permitted but it is not a definitive list of active data nodes. Name node will consider those data nodes from which Name Node will receive the heart beats. |
|