36 + Interview Questions in Developing A Mapreduce Application in Hadoop Page 1 InterviewSolution

1.	The _____________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.(a) DistributedLog(b) DistributedCache(c) DistributedJars(d) None of the mentionedI have been asked this question during an interview.I want to ask this question from MapReduce Features in section Developing a MapReduce Application of Hadoop
Answer» Correct CHOICE is (b) DistributedCache Easiest explanation: Cached libraries can be LOADED via System.loadLibrary or System.load.

Discussion

2.	____________ is the primary interface by which user-job interacts with the JobTracker.(a) JobConf(b) JobClient(c) JobServer(d) All of the mentionedThe question was asked by my school principal while I was bunking the class.Asked question is from MapReduce Features topic in section Developing a MapReduce Application of Hadoop
Answer» The CORRECT choice is (b) JobClient The best explanation: JobClient provides facilities to submit jobs, TRACK their progress, ACCESS component-tasks’ reports and logs, GET the MapReduce cluster status INFORMATION and so on.

Discussion

3.	The standard output (stdout) and error (stderr) streams of the task are read by the TaskTracker and logged to _________(a) ${HADOOP_LOG_DIR}/user(b) ${HADOOP_LOG_DIR}/userlogs(c) ${HADOOP_LOG_DIR}/logs(d) None of the mentionedThe question was asked during an online exam.My question comes from MapReduce Features in division Developing a MapReduce Application of Hadoop
Answer» Correct answer is (B) ${HADOOP_LOG_DIR}/userlogs The explanation: The child-jvm ALWAYS has its current working DIRECTORY added to the java.library.path and LD_LIBRARY_PATH.

Discussion

4.	During the execution of a streaming job, the names of the _______ parameters are transformed.(a) vmap(b) mapvim(c) mapreduce(d) mapredThe question was posed to me in a national level competition.My question is based upon MapReduce Features in portion Developing a MapReduce Application of Hadoop
Answer» Right ANSWER is (d) mapred For explanation: To get the values in a streaming JOB’s mapper/reducer USE the parameter NAMES with the UNDERSCORES.

Discussion

5.	Jobs can enable task JVMs to be reused by specifying the job configuration _________(a) mapred.job.recycle.jvm.num.tasks(b) mapissue.job.reuse.jvm.num.tasks(c) mapred.job.reuse.jvm.num.tasks(d) all of the mentionedThe question was posed to me in examination.I want to ask this question from MapReduce Features topic in section Developing a MapReduce Application of Hadoop
Answer» The correct CHOICE is (b) mapissue.job.reuse.jvm.num.tasks The explanation is: Many of my tasks had PERFORMANCE IMPROVED over 50% using mapissue.job.reuse.jvm.num.tasks.

Discussion

6.	Point out the wrong statement.(a) The task tracker has local directory to create localized cache and localized job(b) The task tracker can define multiple local directories(c) The Job tracker cannot define multiple local directories(d) None of the mentionedThe question was posed to me at a job interview.Question is taken from MapReduce Features topic in section Developing a MapReduce Application of Hadoop
Answer» The CORRECT answer is (d) NONE of the mentioned For explanation: When the job STARTS, task tracker creates a localized job directory relative to the local directory SPECIFIED in the configuration.

Discussion

7.	Map output larger than ___________ percent of the memory allocated to copying map outputs.(a) 10(b) 15(c) 25(d) 35The question was asked in an international level competition.This intriguing question comes from MapReduce Features in division Developing a MapReduce Application of Hadoop
Answer» The correct OPTION is (C) 25 The BEST I can explain: Map output will be WRITTEN directly to disk without FIRST staging through memory.

Discussion

8.	____________ specifies the number of segments on disk to be merged at the same time.(a) mapred.job.shuffle.merge.percent(b) mapred.job.reduce.input.buffer.percen(c) mapred.inmem.merge.threshold(d) io.sort.factorI got this question in an online interview.This question is from MapReduce Features in section Developing a MapReduce Application of Hadoop
Answer» The correct option is (d) io.sort.factor The explanation is: io.sort.factor LIMITS the NUMBER of OPEN files and compression codecs during the MERGE.

Discussion

9.	Point out the correct statement.(a) The number of sorted map outputs fetched into memory before being merged to disk(b) The memory threshold for fetched map outputs before an in-memory merge is finished(c) The percentage of memory relative to the maximum heap size in which map outputs may not be retained during the reduce(d) None of the mentionedI had been asked this question during an online exam.I need to ask this question from MapReduce Features topic in portion Developing a MapReduce Application of Hadoop
Answer» The correct choice is (a) The number of SORTED map outputs fetched into memory before being merged to disk Easiest EXPLANATION: When the reduce begins, map outputs will be merged to disk until those that REMAIN are under the RESOURCE limit this defines.

Discussion

10.	Which of the following parameter is the threshold for the accounting and serialization buffers?(a) io.sort.spill.percent(b) io.sort.record.percent(c) io.sort.mb(d) None of the mentionedThe question was posed to me by my college director while I was bunking the class.My query is from MapReduce Features in division Developing a MapReduce Application of Hadoop
Answer» RIGHT answer is (a) io.sort.spill.percent For explanation I would say: When the percentage of EITHER BUFFER has FILLED, their contents will be spilled to DISK in the background.

Discussion

11.	Maximum virtual memory of the launched child-task is specified using _________(a) mapv(b) mapred(c) mapvim(d) All of the mentionedThis question was addressed to me by my college director while I was bunking the class.The origin of the question is MapReduce Features topic in portion Developing a MapReduce Application of Hadoop
Answer» The correct answer is (b) mapred Best explanation: Admins can ALSO specify the maximum VIRTUAL MEMORY of the launched child-task, and any sub-process it launches RECURSIVELY, using mapred.

Discussion

12.	The ___________ executes the Mapper/ Reducer task as a child process in a separate jvm.(a) JobTracker(b) TaskTracker(c) TaskScheduler(d) None of the mentionedThe question was asked during an online exam.This interesting question is from MapReduce Features topic in section Developing a MapReduce Application of Hadoop
Answer» RIGHT option is (a) JobTracker Best EXPLANATION: The child-task inherits the environment of the PARENT TASKTRACKER.

Discussion

13.	__________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.(a) JobConfig(b) JobConf(c) JobConfiguration(d) All of the mentionedI had been asked this question in exam.My enquiry is from MapReduce Features topic in division Developing a MapReduce Application of Hadoop
Answer» Right ANSWER is (B) JobConf Best EXPLANATION: JobConf is typically used to specify the Mapper, combiner (if any), Partitioner, REDUCER, InputFormat, OutputFormat and OUTPUTCOMMITTER implementations.

Discussion

14.	Point out the wrong statement.(a) It is legal to set the number of reduce-tasks to zero if no reduction is desired(b) The outputs of the map-tasks go directly to the FileSystem(c) The Mapreduce framework does not sort the map-outputs before writing them out to the FileSystem(d) None of the mentionedThis question was addressed to me during an online interview.I'm obligated to ask this question of MapReduce Features in chapter Developing a MapReduce Application of Hadoop
Answer» Right CHOICE is (d) None of the mentioned Explanation: OUTPUTS of the map-tasks go DIRECTLY to the FILESYSTEM, into the output path set by setOutputPath(Path).

Discussion

15.	____________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.(a) OutputCompactor(b) OutputCollector(c) InputCollector(d) All of the mentionedThis question was addressed to me during an interview.This key question is from MapReduce Features topic in division Developing a MapReduce Application of Hadoop
Answer» Correct choice is (B) OutputCollector Easy explanation: Hadoop MAPREDUCE comes bundled with a library of GENERALLY USEFUL mappers, REDUCERS, and partitioners.

Discussion

16.	Which of the following partitions the key space?(a) Partitioner(b) Compactor(c) Collector(d) All of the mentionedThis question was addressed to me in unit test.The query is from MapReduce Features in portion Developing a MapReduce Application of Hadoop
Answer» RIGHT option is (a) PARTITIONER For explanation: Partitioner controls the PARTITIONING of the keys of the intermediate map-outputs.

Discussion

17.	Point out the correct statement.(a) The right number of reduces seems to be 0.95 or 1.75(b) Increasing the number of reduces increases the framework overhead(c) With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish(d) All of the mentionedThis question was addressed to me in an interview for job.My question is based upon MapReduce Features in portion Developing a MapReduce Application of Hadoop
Answer» Right choice is (C) With 0.95 all of the reduces can launch IMMEDIATELY and start transferring MAP outputs as the maps finish The BEST I can explain: With 1.75 the faster nodes will finish their FIRST round of reduces and launch a second wave of reduces doing a much better job of load balancing.

Discussion

18.	Which of the following is the default Partitioner for Mapreduce?(a) MergePartitioner(b) HashedPartitioner(c) HashPartitioner(d) None of the mentionedI got this question by my school teacher while I was bunking the class.The origin of the question is MapReduce Features topic in division Developing a MapReduce Application of Hadoop
Answer» RIGHT answer is (C) HashPartitioner Easiest explanation: The total number of partitions is the same as the number of reduce TASKS for the JOB.

Discussion

19.	The framework groups Reducer inputs by key in _________ stage.(a) sort(b) shuffle(c) reduce(d) none of the mentionedThe question was asked in an online interview.Enquiry is from Mapreduce Development in section Developing a MapReduce Application of Hadoop
Answer» The CORRECT ANSWER is (a) SORT The best I can explain: The shuffle and sort phases occur simultaneously; while map-outputs are being FETCHED they are MERGED.

Discussion

20.	The number of reduces for the job is set by the user via _________(a) JobConf.setNumTasks(int)(b) JobConf.setNumReduceTasks(int)(c) JobConf.setNumMapTasks(int)(d) All of the mentionedI got this question during an online interview.This is a very interesting question from Mapreduce Development in division Developing a MapReduce Application of Hadoop
Answer» Correct choice is (b) JobConf.setNumReduceTasks(INT) To ELABORATE: Reducer has 3 PRIMARY phases: shuffle, sort and REDUCE.

Discussion

21.	The right level of parallelism for maps seems to be around _________ maps per-node.(a) 1-10(b) 10-100(c) 100-150(d) 150-200This question was posed to me in homework.My question comes from Mapreduce Development topic in chapter Developing a MapReduce Application of Hadoop
Answer» RIGHT answer is (b) 10-100 The best EXPLANATION: Task setup takes a while, so it is best if the MAPS take at LEAST a minute to execute.

Discussion

22.	Applications can use the ____________ to report progress and set application-level status messages.(a) Partitioner(b) OutputSplit(c) Reporter(d) All of the mentionedThis question was posed to me in class test.This question is from Mapreduce Development topic in division Developing a MapReduce Application of Hadoop
Answer» The CORRECT answer is (c) REPORTER Best EXPLANATION: Reporter is also USED to update COUNTERS, or just indicate that they are alive.

Discussion

23.	Point out the wrong statement.(a) The Mapper outputs are sorted and then partitioned per Reducer(b) The total number of partitions is the same as the number of reduce tasks for the job(c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format(d) None of the mentionedThe question was asked in an international level competition.My doubt is from Mapreduce Development in chapter Developing a MapReduce Application of Hadoop
Answer» Right choice is (d) None of the mentioned Easiest explanation: All intermediate values associated with a given output key are subsequently GROUPED by the framework, and passed to the Reducer(s) to determine the FINAL output.

Discussion

24.	Users can control which keys (and hence records) go to which Reducer by implementing a custom?(a) Partitioner(b) OutputSplit(c) Reporter(d) All of the mentionedThe question was posed to me by my college professor while I was bunking the class.My question is based upon Mapreduce Development topic in chapter Developing a MapReduce Application of Hadoop
Answer» Right OPTION is (a) Partitioner The best explanation: Users can CONTROL the GROUPING by specifying a Comparator via JobConf.setOutputKeyComparatorClass(Class).

Discussion

25.	The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.(a) OutputSplit(b) InputSplit(c) InputSplitStream(d) All of the mentionedI got this question in an interview.Enquiry is from Mapreduce Development in chapter Developing a MapReduce Application of Hadoop
Answer» The correct ANSWER is (b) InputSplit The best explanation: Mapper implementations are passed the JOBCONF for the job VIA the JobConfigurable.configure(JobConf) method and OVERRIDE it to INITIALIZE themselves.

Discussion

26.	Point out the correct statement.(a) Mapper maps input key/value pairs to a set of intermediate key/value pairs(b) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods(c) Mapper and Reducer interfaces form the core of the job(d) None of the mentionedThe question was posed to me during an interview for a job.I would like to ask this question from Mapreduce Development topic in section Developing a MapReduce Application of Hadoop
Answer» The correct OPTION is (d) None of the mentioned The EXPLANATION is: The transformed INTERMEDIATE records do not NEED to be of the same type as the input records.

Discussion

27.	The Mapper implementation processes one line at a time via _________ method.(a) map(b) reduce(c) mapper(d) reducerThe question was asked during an interview.My question is taken from Mapreduce Development in portion Developing a MapReduce Application of Hadoop
Answer» Right choice is (a) map Explanation: The Mapper OUTPUTS are sorted and then PARTITIONED PER REDUCER.

Discussion

28.	_________ is the main configuration file of HBase.(a) hbase.xml(b) hbase-site.xml(c) hbase-site-conf.xml(d) none of the mentionedI have been asked this question in examination.Question is taken from Metrics in Hbase in chapter Developing a MapReduce Application of Hadoop
Answer» The CORRECT option is (b) hbase-site.xml The best I can explain: Set the data DIRECTORY to an appropriate location by opening the HBase HOME folder in /usr/local/HBase.

Discussion

29.	________ communicate with the client and handle data-related operations.(a) Master Server(b) Region Server(c) Htable(d) All of the mentionedI had been asked this question during an interview.The origin of the question is Metrics in Hbase topic in chapter Developing a MapReduce Application of Hadoop
Answer» The CORRECT CHOICE is (b) Region SERVER The explanation is: Region Server handle READ and WRITE requests for all the regions under it.

Discussion

30.	The ________ class provides the getValue() method to read the values from its instance.(a) Get(b) Result(c) Put(d) ValueI had been asked this question in a job interview.The above asked question is from Metrics in Hbase topic in chapter Developing a MapReduce Application of Hadoop
Answer» Correct option is (b) Result To ELABORATE: Get the result by passing your Get class instance to the get METHOD of the HTable class. This method returns the Result class OBJECT, which holds the REQUESTED result.

Discussion

31.	__________ class adds HBase configuration files to its object.(a) Configuration(b) Collector(c) Component(d) None of the mentionedI had been asked this question in quiz.My question is from Metrics in Hbase topic in division Developing a MapReduce Application of Hadoop
Answer» RIGHT OPTION is (a) CONFIGURATION Best explanation: You can CREATE a configuration object using the create() method of the HbaseConfiguration class.

Discussion

32.	Point out the wrong statement.(a) To read data from an HBase table, use the get() method of the HTable class(b) You can retrieve data from the HBase table using the get() method of the HTable class(c) While retrieving data, you can get a single row by id, or get a set of rows by a set of row ids, or scan an entire table or a subset of rows(d) None of the mentionedI have been asked this question at a job interview.This intriguing question originated from Metrics in Hbase topic in chapter Developing a MapReduce Application of Hadoop
Answer» The correct option is (d) None of the mentioned For explanation: You can retrieve an HBASE table DATA USING the ADD method variants in Get class.

Discussion

33.	You can delete a column family from a table using the method _________ of HBAseAdmin class.(a) delColumn()(b) removeColumn()(c) deleteColumn()(d) all of the mentionedThe question was posed to me in an interview.My enquiry is from Metrics in Hbase topic in portion Developing a MapReduce Application of Hadoop
Answer» Right CHOICE is (c) deleteColumn() For EXPLANATION I WOULD say: ALTER command ALSO can be used to delete a column family.

Discussion

34.	Which of the following is not a table scope operator?(a) MEMSTORE_FLUSH(b) MEMSTORE_FLUSHSIZE(c) MAX_FILESIZE(d) All of the mentionedI had been asked this question in an internship interview.Enquiry is from Metrics in Hbase topic in division Developing a MapReduce Application of Hadoop
Answer» The correct OPTION is (a) MEMSTORE_FLUSH The EXPLANATION: Using alter, you can SET and remove table SCOPE operators such as MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc.

Discussion

35.	Point out the correct statement.(a) You can add a column family to a table using the method addColumn()(b) Using alter, you can also create a column family(c) Using disable-all, you can truncate a column family(d) None of the mentionedThe question was posed to me in final exam.My question is based upon Metrics in Hbase in division Developing a MapReduce Application of Hadoop
Answer» Right choice is (a) You can add a COLUMN FAMILY to a table USING the method addColumn() Best explanation: Columns can ALSO be ADDED through HbaseAdmin.

Discussion

36.	_______ can change the maximum number of cells of a column family.(a) set(b) reset(c) alter(d) selectThe question was posed to me during an internship interview.The query is from Metrics in Hbase topic in division Developing a MapReduce Application of Hadoop
Answer» The correct choice is (C) ALTER Explanation: Alter is the COMMAND USED to make CHANGES to an existing table.

Discussion

Explore topic-wise InterviewSolutions in .

During the execution of a streaming job, the names of the _______ parameters are transformed.(a) vmap(b) mapvim(c) mapreduce(d) mapredThe question was posed to me in a national level competition.My question is based upon MapReduce Features in portion Developing a MapReduce Application of Hadoop

Map output larger than ___________ percent of the memory allocated to copying map outputs.(a) 10(b) 15(c) 25(d) 35The question was asked in an international level competition.This intriguing question comes from MapReduce Features in division Developing a MapReduce Application of Hadoop

Which of the following partitions the key space?(a) Partitioner(b) Compactor(c) Collector(d) All of the mentionedThis question was addressed to me in unit test.The query is from MapReduce Features in portion Developing a MapReduce Application of Hadoop

The framework groups Reducer inputs by key in _________ stage.(a) sort(b) shuffle(c) reduce(d) none of the mentionedThe question was asked in an online interview.Enquiry is from Mapreduce Development in section Developing a MapReduce Application of Hadoop

The right level of parallelism for maps seems to be around _________ maps per-node.(a) 1-10(b) 10-100(c) 100-150(d) 150-200This question was posed to me in homework.My question comes from Mapreduce Development topic in chapter Developing a MapReduce Application of Hadoop

The Mapper implementation processes one line at a time via _________ method.(a) map(b) reduce(c) mapper(d) reducerThe question was asked during an interview.My question is taken from Mapreduce Development in portion Developing a MapReduce Application of Hadoop

_________ is the main configuration file of HBase.(a) hbase.xml(b) hbase-site.xml(c) hbase-site-conf.xml(d) none of the mentionedI have been asked this question in examination.Question is taken from Metrics in Hbase in chapter Developing a MapReduce Application of Hadoop

________ communicate with the client and handle data-related operations.(a) Master Server(b) Region Server(c) Htable(d) All of the mentionedI had been asked this question during an interview.The origin of the question is Metrics in Hbase topic in chapter Developing a MapReduce Application of Hadoop

The ________ class provides the getValue() method to read the values from its instance.(a) Get(b) Result(c) Put(d) ValueI had been asked this question in a job interview.The above asked question is from Metrics in Hbase topic in chapter Developing a MapReduce Application of Hadoop

__________ class adds HBase configuration files to its object.(a) Configuration(b) Collector(c) Component(d) None of the mentionedI had been asked this question in quiz.My question is from Metrics in Hbase topic in division Developing a MapReduce Application of Hadoop

Which of the following is not a table scope operator?(a) MEMSTORE_FLUSH(b) MEMSTORE_FLUSHSIZE(c) MAX_FILESIZE(d) All of the mentionedI had been asked this question in an internship interview.Enquiry is from Metrics in Hbase topic in division Developing a MapReduce Application of Hadoop

_______ can change the maximum number of cells of a column family.(a) set(b) reset(c) alter(d) selectThe question was posed to me during an internship interview.The query is from Metrics in Hbase topic in division Developing a MapReduce Application of Hadoop