22 + Interview Questions in Mapreduce Types And Formats in Hadoop Page 1 InterviewSolution

1.	Point out the wrong statement.(a) Hadoop works better with a small number of large files than a large number of small files(b) CombineFileInputFormat is designed to work well with small files(c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job(d) None of the mentionedThis question was addressed to me in final exam.This intriguing question originated from Mapreduce Formats topic in section MapReduce Types and Formats of Hadoop
Answer» The correct option is (c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job Easy EXPLANATION: If the file is very SMALL (“small” means significantly smaller than an HDFS BLOCK) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (ONE per file), each of which imposes extra BOOKKEEPING overhead.

Discussion

2.	The split size is normally the size of a ________ block, which is appropriate for most applications.(a) Generic(b) Task(c) Library(d) HDFSThe question was asked in unit test.Enquiry is from Mapreduce Formats topic in section MapReduce Types and Formats of Hadoop
Answer» Correct option is (d) HDFS For explanation: FileInputFormat SPLITS only large FILES(Here “large” means LARGER than an HDFS block).

Discussion

3.	Point out the correct statement.(a) The minimum split size is usually 1 byte, although some formats have a lower bound on the split size(b) Applications may impose a minimum split size(c) The maximum split size defaults to the maximum value that can be represented by a Java long type(d) All of the mentionedThis question was posed to me in class test.My question comes from Mapreduce Formats topic in section MapReduce Types and Formats of Hadoop
Answer» The correct CHOICE is (a) The minimum split size is usually 1 BYTE, although some FORMATS have a lower bound on the split size For explanation: The maximum split size has an effect only when it is less than the block size, FORCING SPLITS to be smaller than a block.

Discussion

4.	Which of the following writes MapFiles as output?(a) DBInpFormat(b) MapFileOutputFormat(c) SequenceFileAsBinaryOutputFormat(d) None of the mentionedThis question was posed to me in quiz.This question is from Mapreduce Formats in section MapReduce Types and Formats of Hadoop
Answer» Correct answer is (C) SequenceFileAsBinaryOutputFormat The explanation: SequenceFileAsBinaryOutputFormat writes keys and values in RAW BINARY FORMAT into a SequenceFile container.

Discussion

5.	Which of the following is the default output format?(a) TextFormat(b) TextOutput(c) TextOutputFormat(d) None of the mentionedThis question was posed to me during an interview.This key question is from Mapreduce Formats topic in division MapReduce Types and Formats of Hadoop
Answer» The correct CHOICE is (c) TEXTOUTPUTFORMAT To EXPLAIN: TextOutputFormat KEYS and values MAY be of any type.

Discussion

6.	___________ is an input format for reading data from a relational database, using JDBC.(a) DBInput(b) DBInputFormat(c) DBInpFormat(d) All of the mentionedI have been asked this question in class test.This is a very interesting question from Mapreduce Formats topic in division MapReduce Types and Formats of Hadoop
Answer» The CORRECT choice is (b) DBInputFormat The BEST explanation: DBInputFormat is the most FREQUENTLY USED format for reading data.

Discussion

7.	__________ is a variant of SequenceFileInputFormat that converts the sequence file’s keys and values to Text objects.(a) SequenceFile(b) SequenceFileAsTextInputFormat(c) SequenceAsTextInputFormat(d) All of the mentionedThis question was addressed to me in an online interview.My question is based upon Mapreduce Formats in chapter MapReduce Types and Formats of Hadoop
Answer» Correct choice is (b) SequenceFileAsTextInputFormat Best explanation: With MULTIPLE reducers, RECORDS will be ALLOCATED evenly across reduce tasks, with all records that share the same key being processed by the same reduce TASK.

Discussion

8.	__________ class allows you to specify the InputFormat and Mapper to use on a per-path basis.(a) MultipleOutputs(b) MultipleInputs(c) SingleInputs(d) None of the mentionedI have been asked this question in unit test.My query is from Mapreduce Formats in portion MapReduce Types and Formats of Hadoop
Answer» The correct option is (b) MultipleInputs Easy explanation: ONE MIGHT be tab-separated PLAIN text, the other a binary sequence file. Even if they are in the same format, they may have different REPRESENTATIONS, and THEREFORE need to be parsed differently.

Discussion

9.	Point out the wrong statement.(a) Hadoop sequence file format stores sequences of binary key-value pairs(b) SequenceFileAsBinaryInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects(c) SequenceFileAsTextInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects.(d) None of the mentionedThe question was posed to me in an online interview.My query is from Mapreduce Formats in section MapReduce Types and Formats of Hadoop
Answer» The correct choice is (c) SequenceFileAsTextInputFormat is a VARIANT of SequenceFileInputFormat that RETRIEVES the sequence FILE’s keys and values as OPAQUE binary objects. Explanation: SequenceFileAsBinaryInputFormat is used for reading keys, values from SequenceFiles in binary (raw) format.

Discussion

10.	_________ is the output produced by TextOutputFor mat, Hadoop default OutputFormat.(a) KeyValueTextInputFormat(b) KeyValueTextOutputFormat(c) FileValueTextInputFormat(d) All of the mentionedThis question was posed to me in an interview for job.My query is from Mapreduce Formats in division MapReduce Types and Formats of Hadoop
Answer» The correct ANSWER is (b) KeyValueTextOutputFormat Explanation: To INTERPRET such files CORRECTLY, KeyValueTextInputFormat is appropriate.

Discussion

11.	The key, a ____________ is the byte offset within the file of the beginning of the line.(a) LongReadable(b) LongWritable(c) LongWritable(d) All of the mentionedThis question was addressed to me in an online interview.Query is from Mapreduce Formats topic in division MapReduce Types and Formats of Hadoop
Answer» Right answer is (b) LongWritable For explanation I would SAY: The value is the contents of the line, EXCLUDING any line TERMINATORS (newline, carriage return), and is packaged as a TEXT object.

Discussion

12.	Point out the correct statement.(a) With TextInputFormat and KeyValueTextInputFormat, each mapper receives a variable number of lines of input(b) StreamXmlRecordReader, the page elements can be interpreted as records for processing by a mapper(c) The number depends on the size of the split and the length of the lines.(d) All of the mentionedI got this question during an online interview.Enquiry is from Mapreduce Formats in portion MapReduce Types and Formats of Hadoop
Answer» The CORRECT OPTION is (d) All of the mentioned The EXPLANATION: LARGE XML documents that are composed of a series of “records” can be broken into these records using simple string or regular-expression matching to find start and END tags of records.

Discussion

13.	_________ is the base class for all implementations of InputFormat that use files as their data source.(a) FileTextFormat(b) FileInputFormat(c) FileOutputFormat(d) None of the mentionedI got this question in a national level competition.The above asked question is from Mapreduce Types topic in section MapReduce Types and Formats of Hadoop
Answer» The CORRECT choice is (b) FILEINPUTFORMAT For explanation I would SAY: FileInputFormat provides implementation for generating SPLITS for the input files.

Discussion

14.	___________ takes node and rack locality into account when deciding which blocks to place in the same split.(a) CombineFileOutputFormat(b) CombineFileInputFormat(c) TextFileInputFormat(d) None of the mentionedI got this question in an interview.Enquiry is from Mapreduce Formats topic in portion MapReduce Types and Formats of Hadoop
Answer» Right ANSWER is (b) CombineFileInputFormat The explanation is: CombineFileInputFormat does not compromise the SPEED at which it can process the input in a typical MAPREDUCE job.

Discussion

15.	Which of the following is the only way of running mappers?(a) MapReducer(b) MapRunner(c) MapRed(d) All of the mentionedI had been asked this question in my homework.Asked question is from Mapreduce Types topic in portion MapReduce Types and Formats of Hadoop
Answer» The correct ANSWER is (b) MapRunner Best explanation: Having calculated the splits, the CLIENT SENDS them to the JOBTRACKER.

Discussion

16.	An ___________ is responsible for creating the input splits, and dividing them into records.(a) TextOutputFormat(b) TextInputFormat(c) OutputInputFormat(d) InputFormatI have been asked this question in an international level competition.Origin of the question is Mapreduce Types topic in division MapReduce Types and Formats of Hadoop
Answer» Right choice is (d) INPUTFORMAT Best explanation: As a MapReduce APPLICATION WRITER, you don’t need to deal with InputSplits DIRECTLY, as they are CREATED by an InputFormat.

Discussion

17.	______________ is another implementation of the MapRunnable interface that runs mappers concurrently in a configurable number of threads.(a) MultithreadedRunner(b) MultithreadedMap(c) MultithreadedMapRunner(d) SinglethreadedMapRunnerI got this question in my homework.Query is from Mapreduce Types topic in portion MapReduce Types and Formats of Hadoop
Answer» Correct choice is (C) MultithreadedMapRunner Easiest EXPLANATION: A RecordReader is little more than an iterator over RECORDS, and the map TASK uses one to generate record key-value PAIRS, which it passes to the map function.

Discussion

18.	An input _________ is a chunk of the input that is processed by a single map.(a) textformat(b) split(c) datanode(d) all of the mentionedI have been asked this question in an online quiz.The doubt is from Mapreduce Types in portion MapReduce Types and Formats of Hadoop
Answer» The correct option is (b) split For EXPLANATION: Each split is divided into RECORDS, and the MAP PROCESSES each record—a key-value pair—in turn.

Discussion

19.	Point out the wrong statement.(a) If V2 and V3 are the same, you only need to use setOutputValueClass()(b) The overall effect of Streaming job is to perform a sort of the input(c) A Streaming application can control the separator that is used when a key-value pair is turned into a series of bytes and sent to the map or reduce process over standard input(d) None of the mentionedI had been asked this question in an international level competition.My question is based upon Mapreduce Types topic in chapter MapReduce Types and Formats of Hadoop
Answer» The correct answer is (d) NONE of the mentioned Explanation: If a combine function is used then it is the same form as the reduce function, EXCEPT its output TYPES are the intermediate KEY and VALUE types (K2 and V2), so they can feed the reduce function.

Discussion

20.	In _____________ the default job is similar, but not identical, to the Java equivalent.(a) Mapreduce(b) Streaming(c) Orchestration(d) All of the mentionedThis question was addressed to me in unit test.I'd like to ask this question from Mapreduce Types topic in division MapReduce Types and Formats of Hadoop
Answer» The CORRECT choice is (B) Streaming The BEST explanation: MapReduce Types and Formats MapReduce has a simple model of DATA processing.

Discussion

21.	Point out the correct statement.(a) The reduce input must have the same types as the map output, although the reduce output types may be different again(b) The map input key and value types (K1 and V1) are different from the map output types(c) The partition function operates on the intermediate key(d) All of the mentionedThe question was posed to me in final exam.My question is based upon Mapreduce Types in division MapReduce Types and Formats of Hadoop
Answer» Right ANSWER is (d) All of the mentioned To elaborate: In practice, the partition is DETERMINED SOLELY by the key (the value is ignored).

Discussion

22.	___________ generates keys of type LongWritable and values of type Text.(a) TextOutputFormat(b) TextInputFormat(c) OutputInputFormat(d) None of the mentionedThis question was addressed to me in an interview.The origin of the question is Mapreduce Types topic in portion MapReduce Types and Formats of Hadoop
Answer» Correct answer is (b) TextInputFormat For explanation I WOULD say: If K2 and K3 are the same, you don’t NEED to CALL setMapOutputKeyClass().

Discussion

Explore topic-wise InterviewSolutions in .

The split size is normally the size of a ________ block, which is appropriate for most applications.(a) Generic(b) Task(c) Library(d) HDFSThe question was asked in unit test.Enquiry is from Mapreduce Formats topic in section MapReduce Types and Formats of Hadoop

Which of the following writes MapFiles as output?(a) DBInpFormat(b) MapFileOutputFormat(c) SequenceFileAsBinaryOutputFormat(d) None of the mentionedThis question was posed to me in quiz.This question is from Mapreduce Formats in section MapReduce Types and Formats of Hadoop

Which of the following is the default output format?(a) TextFormat(b) TextOutput(c) TextOutputFormat(d) None of the mentionedThis question was posed to me during an interview.This key question is from Mapreduce Formats topic in division MapReduce Types and Formats of Hadoop

__________ class allows you to specify the InputFormat and Mapper to use on a per-path basis.(a) MultipleOutputs(b) MultipleInputs(c) SingleInputs(d) None of the mentionedI have been asked this question in unit test.My query is from Mapreduce Formats in portion MapReduce Types and Formats of Hadoop

Which of the following is the only way of running mappers?(a) MapReducer(b) MapRunner(c) MapRed(d) All of the mentionedI had been asked this question in my homework.Asked question is from Mapreduce Types topic in portion MapReduce Types and Formats of Hadoop

An input _________ is a chunk of the input that is processed by a single map.(a) textformat(b) split(c) datanode(d) all of the mentionedI have been asked this question in an online quiz.The doubt is from Mapreduce Types in portion MapReduce Types and Formats of Hadoop