InterviewSolution
| 1. |
What is the usual block size on an HDFS? Can we make it much larger say 1 GB? What are the advantages that a block provides over a file system? |
|
Answer» The usual block size on HDFS is 128 MB. The size of the HDFS block is KEPT large enough to minimize the seek cost. When the block size is large enough the time to transfer data will be significantly longer than the time to seek the start of a block. As data transfer is much higher than the disk seek rate it is optimal to keep the block size large. The seek time is usually kept as 1% of transfer time. e.g. If seek time around 10 ms and the data transfer rate is 100MB/s then block size comes to around 128 MB. However, this doesn’t mean that the block size can be made indefinitely large. Map tasks operate on one block (assuming split size is equal to block size) at a time. Having a huge block size will result in fewer splits and hence less number of mappers which will reduce the advantage that can be gained by parallelly working on multiple blocks. Having a block abstraction for a distributed file system has many benefits.
|
|