Explain in brief about Hadoop's rack topology.

1.	Explain in brief about Hadoop's rack topology.
Answer» When we are talking about Rack, It is the collection of multiple servers based on your requirement. All these servers are connected using the same network switch and if that network goes down then all machines in that rack will be out of service and we can say rack is downstate. To mitigate the same, Rack Awareness was introduced for Hadoop by Apache. In Rack Awareness, Name Node chooses the Data Node which is closer to the rack where the Name Node will be available or nearby that rack. Name Node MAINTAINS all the Rack ids of each Data Node to get the rack information and based on Rack ID Name Node can communicate with Data Node. In Hadoop, when we are maintaining a Rack we have to FOLLOW certain rules as mentioned below. All the replicas should not be stored on the same rack or in a single rack due to which Rack Awareness Algorithm can reduce the latency as well as Fault Tolerance. By Default replication factor is 3 so according to Rack Awareness Algorithm below are the points to be FOLLOWED: The first replica of the block will be stored on a local rack. The next replica will be store another Data Node within the same rack. The THIRD replica stored on the different rack other than earlier Rack. Below are some points due to which we are following Rack Awareness in Hadoop. Please find the details as mentioned below: To improve the data high availability and consistency as the same block will be available in different Racks. The performance of the cluster will be improved as READING and writing in the cluster will be quick because two of data nodes will be available in the same rack and third data node will be available near to earlier rack. Network bandwidth will be improved for sure because of rack awareness rule Especially with rack awareness YARN is able to optimize the Map reduce job performance because YARN will assign the task to data nodes that are closer to each other based on Rack policy and where replica will be available to do the process. As per the Rack policy, the Name Node assigns 2nd & 3rd replicas of a block to Data Nodes in a rack different from Data Node where the first replica is available. It will provide data protection even against Rack failure; It is possible only if Hadoop was configured with rack awareness.

Answer»

When we are talking about Rack, It is the collection of multiple servers based on your requirement. All these servers are connected using the same network switch and if that network goes down then all machines in that rack will be out of service and we can say rack is downstate.

To mitigate the same, Rack Awareness was introduced for Hadoop by Apache. In Rack Awareness, Name Node chooses the Data Node which is closer to the rack where the Name Node will be available or nearby that rack. Name Node MAINTAINS all the Rack ids of each Data Node to get the rack information and based on Rack ID Name Node can communicate with Data Node. In Hadoop, when we are maintaining a Rack we have to FOLLOW certain rules as mentioned below.

All the replicas should not be stored on the same rack or in a single rack due to which Rack Awareness Algorithm can reduce the latency as well as Fault Tolerance.
By Default replication factor is 3 so according to Rack Awareness Algorithm below are the points to be FOLLOWED:

The first replica of the block will be stored on a local rack.
The next replica will be store another Data Node within the same rack.
The THIRD replica stored on the different rack other than earlier Rack.

Below are some points due to which we are following Rack Awareness in Hadoop. Please find the details as mentioned below:

To improve the data high availability and consistency as the same block will be available in different Racks.
The performance of the cluster will be improved as READING and writing in the cluster will be quick because two of data nodes will be available in the same rack and third data node will be available near to earlier rack.
Network bandwidth will be improved for sure because of rack awareness rule Especially with rack awareness YARN is able to optimize the Map reduce job performance because YARN will assign the task to data nodes that are closer to each other based on Rack policy and where replica will be available to do the process.
As per the Rack policy, the Name Node assigns 2nd & 3rd replicas of a block to Data Nodes in a rack different from Data Node where the first replica is available. It will provide data protection even against Rack failure; It is possible only if Hadoop was configured with rack awareness.

Explain in brief about Hadoop's rack topology.

Discussion

No Comment Found

Related InterviewSolutions

Reply to Comment