1.

Point out the wrong statement.(a) Hadoop works better with a small number of large files than a large number of small files(b) CombineFileInputFormat is designed to work well with small files(c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job(d) None of the mentionedThis question was addressed to me in final exam.This intriguing question originated from Mapreduce Formats topic in section MapReduce Types and Formats of Hadoop

Answer»

The correct option is (c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job

Easy EXPLANATION: If the file is very SMALL (“small” means significantly smaller than an HDFS BLOCK) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (ONE per file), each of which imposes extra BOOKKEEPING overhead.



Discussion

No Comment Found

Related InterviewSolutions