38 + Interview Questions in Hdfs – Hadoop Distributed File System in Hadoop Page 1 InterviewSolution

1.	_________ is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.(a) Flow Scheduler(b) Data Scheduler(c) Capacity Scheduler(d) None of the mentionedI got this question in an international level competition.Enquiry is from Hadoop Archives in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Right CHOICE is (c) Capacity Scheduler To elaborate: The Capacity Scheduler supports MULTIPLE queues, where a job is SUBMITTED to a queue.

Discussion

2.	The __________ guarantees that excess resources taken from a queue will be restored to it within N minutes of its need for them.(a) capacitor(b) scheduler(c) datanode(d) none of the mentionedThis question was posed to me during an internship interview.My question is based upon Hadoop Archives topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Right option is (b) scheduler Easiest EXPLANATION: Free RESOURCES can be allocated to any queue BEYOND its GUARANTEED CAPACITY.

Discussion

3.	Point out the wrong statement.(a) The Hadoop archive exposes itself as a file system layer(b) Hadoop archives are immutable(c) Archive rename, deletes and creates return an error(d) None of the mentionedThe question was asked in a national level competition.My question is taken from Hadoop Archives in division HDFS – Hadoop Distributed File System of Hadoop
Answer» The correct answer is (d) None of the mentioned Best explanation: All the fs SHELL commands in the ARCHIVES work but with a different URI.

Discussion

4.	Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system.(a) Hive(b) Pig(c) MapReduce(d) All of the mentionedThis question was posed to me by my school principal while I was bunking the class.This is a very interesting question from Hadoop Archives in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct choice is (c) MAPREDUCE Easy EXPLANATION: Hadoop Archives is exposed as a FILE system MapReduce will be ABLE to use all the logical input FILES in Hadoop Archives as input.

Discussion

5.	_________ is the name of the archive you would like to create.(a) archive(b) archiveName(c) name(d) none of the mentionedI have been asked this question in a national level competition.Origin of the question is Hadoop Archives topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» The CORRECT option is (b) archiveName Easy explanation: The name should have a *.har EXTENSION.

Discussion

6.	Point out the correct statement.(a) A Hadoop archive maps to a file system directory(b) Hadoop archives are special format archives(c) A Hadoop archive always has a *.har extension(d) All of the mentionedI have been asked this question in an interview.This intriguing question originated from Hadoop Archives topic in section HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct choice is (d) All of the mentioned For EXPLANATION: A Hadoop archive directory CONTAINS metadata (in the FORM of _index and _masterindex) and data (part-*) files.

Discussion

7.	__________ controls the partitioning of the keys of the intermediate map-outputs.(a) Collector(b) Partitioner(c) InputFormat(d) None of the mentionedI had been asked this question in an interview for internship.This interesting question is from Data Flow topic in chapter HDFS – Hadoop Distributed File System of Hadoop
Answer» CORRECT CHOICE is (b) Partitioner The EXPLANATION is: The output of the mapper is sent to the partitioner.

Discussion

8.	On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain a _________ for that split.(a) InputReader(b) RecordReader(c) OutputReader(d) None of the mentionedThe question was asked during an online exam.I want to ask this question from Data Flow in section HDFS – Hadoop Distributed File System of Hadoop
Answer» RIGHT choice is (b) RecordReader To explain I would say: The RecordReader LOADS data from its SOURCE and CONVERTS into key-value pairs suitable for reading by mapper.

Discussion

9.	The default InputFormat is __________ which treats each value of input a new value and the associated key is byte offset.(a) TextFormat(b) TextInputFormat(c) InputFormat(d) All of the mentionedThe question was posed to me during an interview.My query is from Data Flow in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Right option is (b) TextInputFormat Explanation: A RECORDREADER is little more than an ITERATOR over RECORDS, and the MAP task uses one to generate record key-value pairs.

Discussion

10.	InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.(a) puts(b) gets(c) getSplits(d) all of the mentionedI have been asked this question in examination.This question is from Data Flow in section HDFS – Hadoop Distributed File System of Hadoop
Answer» RIGHT answer is (c) getSplits Easy explanation: INPUTFORMAT uses their storage LOCATIONS to SCHEDULE map tasks to process them on the tasktrackers.

Discussion

11.	Point out the wrong statement.(a) The map function in Hadoop MapReduce have the following general form:map:(K1, V1) -> list(K2, V2)(b) The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) -> list(K3, V3)(c) MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs(d) None of the mentionedI have been asked this question at a job interview.I'm obligated to ask this question of Data Flow topic in division HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct option is (c) MapReduce has a COMPLEX MODEL of data processing: inputs and outputs for the MAP and reduce functions are key-value pairs Easy explanation: MapReduce is relatively SIMPLE model to implement in HADOOP.

Discussion

12.	The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible.(a) DataNodes(b) TaskTracker(c) ActionNodes(d) All of the mentionedThe question was asked in an interview for internship.My doubt stems from Data Flow topic in division HDFS – Hadoop Distributed File System of Hadoop
Answer» Right choice is (b) TaskTracker Best EXPLANATION: A heartbeat is sent from the TaskTracker to the JobTracker EVERY few minutes to CHECK its status WHETHER the node is dead or ALIVE.

Discussion

13.	The daemons associated with the MapReduce phase are ________ and task-trackers.(a) job-tracker(b) map-tracker(c) reduce-tracker(d) all of the mentionedThis question was posed to me during an internship interview.I want to ask this question from Data Flow topic in section HDFS – Hadoop Distributed File System of Hadoop
Answer» RIGHT CHOICE is (a) job-tracker Explanation: Map-Reduce JOBS are SUBMITTED on job-tracker.

Discussion

14.	Point out the correct statement.(a) Data locality means movement of the algorithm to the data instead of data to algorithm(b) When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm(c) Moving Computation is expensive than Moving Data(d) None of the mentionedThis question was addressed to me by my college professor while I was bunking the class.I'd like to ask this question from Data Flow in chapter HDFS – Hadoop Distributed File System of Hadoop
Answer» The correct answer is (a) Data locality MEANS movement of the algorithm to the data INSTEAD of data to algorithm Best EXPLANATION: Data flow framework possesses the feature of data locality.

Discussion

15.	________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.(a) Hive(b) MapReduce(c) Pig(d) LuceneThis question was posed to me in quiz.Asked question is from Data Flow topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Right OPTION is (b) MapReduce Easiest EXPLANATION: MapReduce is the HEART of hadoop.

Discussion

16.	Applications can use the _________ provided to report progress or just indicate that they are alive.(a) Collector(b) Reporter(c) Dashboard(d) None of the mentionedThis question was posed to me in class test.This is a very interesting question from Java Interface topic in division HDFS – Hadoop Distributed File System of Hadoop
Answer» Right CHOICE is (b) Reporter To EXPLAIN I would SAY: In scenarios where the application takes a significant amount of time to PROCESS individual key/value pairs, this is crucial SINCE the framework might assume that the task has timed-out and kill that task.

Discussion

17.	The output of the reduce task is typically written to the FileSystem via ____________(a) OutputCollector(b) InputCollector(c) OutputCollect(d) All of the mentionedThe question was asked in a job interview.My doubt is from Java Interface in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct answer is (a) OUTPUTCOLLECTOR Explanation: In reduce phase the reduce(Object, Iterator, OutputCollector, Reporter) METHOD is called for EACHPAIR in the GROUPED inputs.

Discussion

18.	Reducer is input the grouped output of a ____________(a) Mapper(b) Reducer(c) Writable(d) ReadableThis question was addressed to me in unit test.This interesting question is from Java Interface topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct answer is (a) Mapper To EXPLAIN I would say: In the phase the framework, for each Reducer, fetches the relevant partition of the OUTPUT of all the MAPPERS, via HTTP.

Discussion

19.	Interface____________ reduces a set of intermediate values which share a key to a smaller set of values.(a) Mapper(b) Reducer(c) Writable(d) ReadableThe question was posed to me in unit test.The doubt is from Java Interface in section HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct answer is (b) REDUCER The explanation: Reducer IMPLEMENTATIONS can access the JobConf for the JOB.

Discussion

20.	Point out the wrong statement.(a) The framework calls reduce method for each pair in the grouped inputs(b) The output of the Reducer is re-sorted(c) reduce method reduces values for a given key(d) None of the mentionedI have been asked this question by my school teacher while I was bunking the class.This question is from Java Interface in section HDFS – Hadoop Distributed File System of Hadoop
Answer» RIGHT CHOICE is (b) The output of the REDUCER is re-sorted The BEST explanation: The output of the Reducer is not re-sorted.

Discussion

21.	_____________ is used to read data from bytes buffers.(a) write()(b) read()(c) readwrite()(d) all of the mentionedThe question was asked in an online interview.My query is from Java Interface topic in chapter HDFS – Hadoop Distributed File System of Hadoop
Answer» CORRECT answer is (a) write() To EXPLAIN I would SAY: readfully method can also be USED instead of read method.

Discussion

22.	______________ is method to copy byte from input stream to any other stream in Hadoop.(a) IOUtils(b) Utils(c) IUtils(d) All of the mentionedThe question was asked in quiz.I'm obligated to ask this question of Java Interface topic in division HDFS – Hadoop Distributed File System of Hadoop
Answer» The CORRECT answer is (a) IOUtils The EXPLANATION: IOUtils CLASS is static METHOD in Java interface.

Discussion

23.	In order to read any file in HDFS, instance of __________ is required.(a) filesystem(b) datastream(c) outstream(d) inputstreamThe question was asked during an interview.I would like to ask this question from Java Interface topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» CORRECT option is (a) filesystem The best EXPLANATION: InputDataStream is USED to read data from file.

Discussion

24.	Point out the correct statement.(a) The framework groups Reducer inputs by keys(b) The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged(c) Since JobConf.setOutputKeyComparatorClass(Class) can be used to control how intermediate keys are grouped, these can be used in conjunction to simulate secondary sort on values(d) All of the mentionedThis question was posed to me in an interview for internship.The query is from Java Interface in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» The correct answer is (d) All of the mentioned For explanation I WOULD say: If equivalence rules for KEYS while grouping the INTERMEDIATES are different from those for grouping keys before reduction, then one MAY SPECIFY a Comparator.

Discussion

25.	For ________ the HBase Master UI provides information about the HBase Master uptime.(a) HBase(b) Oozie(c) Kafka(d) All of the mentionedThe question was asked in an online interview.I want to ask this question from Introduction to HDFS in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct choice is (a) HBASE Easy explanation: HBase Master UI provides information about the number of live, DEAD and TRANSITIONAL servers, logs, ZooKeeper information, DEBUG DUMPS, and thread stacks.

Discussion

26.	During start up, the ___________ loads the file system state from the fsimage and the edits log file.(a) DataNode(b) NameNode(c) ActionNode(d) None of the mentionedI have been asked this question in semester exam.This intriguing question comes from Introduction to HDFS topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Right CHOICE is (B) NameNode Explanation: HDFS is IMPLEMENTED on any computer which can run Java can HOST a NameNode/DataNode on it.

Discussion

27.	For YARN, the ___________ Manager UI provides host and port information.(a) Data Node(b) NameNode(c) Resource(d) ReplicationThe question was posed to me in an international level competition.The origin of the question is Introduction to HDFS topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» CORRECT option is (c) Resource The explanation: All the metadata related to HDFS including the INFORMATION about DATA nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

Discussion

28.	Point out the correct statement.(a) The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster(b) Each incoming file is broken into 32 MB by default(c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance(d) None of the mentionedThis question was addressed to me by my school principal while I was bunking the class.My doubt is from Introduction to HDFS topic in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Right answer is (a) The HADOOP framework publishes the job flow status to an internally running web server on the master NODES of the Hadoop cluster Best EXPLANATION: The web interface for the Hadoop Distributed File System (HDFS) SHOWS information about the NameNode itself.

Discussion

29.	HDFS is implemented in _____________ programming language.(a) C++(b) Java(c) Scala(d) None of the mentionedI got this question during an online exam.This intriguing question comes from Introduction to HDFS in division HDFS – Hadoop Distributed File System of Hadoop
Answer» CORRECT option is (b) Java To explain I would SAY: HDFS is IMPLEMENTED in Java and any computer which can run Java can host a NameNode/DataNode on it.

Discussion

30.	HDFS provides a command line interface called __________ used to interact with HDFS.(a) “HDFS Shell”(b) “FS Shell”(c) “DFS Shell”(d) None of the mentionedThe question was posed to me in final exam.This interesting question is from Introduction to HDFS topic in chapter HDFS – Hadoop Distributed File System of Hadoop
Answer» The correct option is (b) “FS Shell” For explanation: The FILE System (FS) shell INCLUDES various shell-like COMMANDS that directly interact with the Hadoop DISTRIBUTED File System (HDFS).

Discussion

31.	________ is the slave/worker node and holds the user data in the form of Data Blocks.(a) DataNode(b) NameNode(c) Data block(d) ReplicationI had been asked this question in homework.My question is taken from Introduction to HDFS topic in division HDFS – Hadoop Distributed File System of Hadoop
Answer» The correct choice is (a) DataNode To explain I WOULD say: A DataNode STORES data in the [HadoopFileSystem]. A FUNCTIONAL filesystem has more than ONE DataNode, with data replicated across them.

Discussion

32.	The need for data replication can arise in various scenarios like ____________(a) Replication Factor is changed(b) DataNode goes down(c) Data Blocks get corrupted(d) All of the mentionedThis question was posed to me in final exam.Enquiry is from Introduction to HDFS in section HDFS – Hadoop Distributed File System of Hadoop
Answer» RIGHT answer is (d) All of the mentioned Best explanation: DATA is replicated ACROSS different DataNodes to ensure a high degree of fault-tolerance.

Discussion

33.	Which of the following scenario may not be a good fit for HDFS?(a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file(b) HDFS is suitable for storing data related to applications requiring low latency data access(c) HDFS is suitable for storing data related to applications requiring low latency data access(d) None of the mentionedI have been asked this question in an online quiz.My question comes from Introduction to HDFS in portion HDFS – Hadoop Distributed File System of Hadoop
Answer» Correct answer is (a) HDFS is not suitable for scenarios requiring multiple/simultaneous WRITES to the same file To elaborate: HDFS can be used for storing archive DATA since it is cheaper as HDFS allows storing the data on low COST COMMODITY hardware while ensuring a high DEGREE of fault-tolerance.

Discussion

34.	Point out the wrong statement.(a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level(b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode(c) User data is stored on the local file system of DataNodes(d) DataNode is aware of the files to which the blocks stored on it belong toThe question was posed to me in quiz.This interesting question is from Introduction to HDFS in chapter HDFS – Hadoop Distributed File System of Hadoop
Answer» Right CHOICE is (d) DATANODE is aware of the files to which the BLOCKS STORED on it belong to For explanation I would SAY: NameNode is aware of the files to which the blocks stored on it belong to.

Discussion

35.	Point out the correct statement.(a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks(b) Each incoming file is broken into 32 MB by default(c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance(d) None of the mentionedI have been asked this question during an online interview.I'm obligated to ask this question of Introduction to HDFS in section HDFS – Hadoop Distributed File System of Hadoop
Answer» The CORRECT CHOICE is (a) DataNode is the slave/worker node and HOLDS the user data in the form of Data Blocks Explanation: There can be any NUMBER of DataNodes in a Hadoop CLUSTER.

Discussion

36.	________ NameNode is used when the Primary NameNode goes down.(a) Rack(b) Data(c) Secondary(d) None of the mentionedI have been asked this question by my college professor while I was bunking the class.The doubt is from Introduction to HDFS in division HDFS – Hadoop Distributed File System of Hadoop
Answer» Right OPTION is (c) Secondary The BEST EXPLANATION: Secondary namenode is used for all TIME AVAILABILITY and reliability.

Discussion

37.	HDFS works in a__________ fashion.(a) master-worker(b) master-slave(c) worker/slave(d) all of the mentionedThis question was addressed to me by my school principal while I was bunking the class.The question is from Introduction to HDFS topic in chapter HDFS – Hadoop Distributed File System of Hadoop
Answer» The CORRECT option is (a) MASTER-worker Explanation: NAMENODE SERVERS as the master and each DataNode servers as a worker/slave

Discussion

38.	A ________ serves as the master and there is only one NameNode per cluster.(a) Data Node(b) NameNode(c) Data block(d) ReplicationThis question was posed to me by my school teacher while I was bunking the class.This interesting question is from Introduction to HDFS topic in division HDFS – Hadoop Distributed File System of Hadoop
Answer» CORRECT option is (B) NameNode To elaborate: All the metadata related to HDFS including the INFORMATION about data NODES, files STORED on HDFS, and Replication, etc. are stored and maintained on the NameNode.

Discussion

Explore topic-wise InterviewSolutions in .

_________ is the name of the archive you would like to create.(a) archive(b) archiveName(c) name(d) none of the mentionedI have been asked this question in a national level competition.Origin of the question is Hadoop Archives topic in portion HDFS – Hadoop Distributed File System of Hadoop

The output of the reduce task is typically written to the FileSystem via ____________(a) OutputCollector(b) InputCollector(c) OutputCollect(d) All of the mentionedThe question was asked in a job interview.My doubt is from Java Interface in portion HDFS – Hadoop Distributed File System of Hadoop

Reducer is input the grouped output of a ____________(a) Mapper(b) Reducer(c) Writable(d) ReadableThis question was addressed to me in unit test.This interesting question is from Java Interface topic in portion HDFS – Hadoop Distributed File System of Hadoop

Interface____________ reduces a set of intermediate values which share a key to a smaller set of values.(a) Mapper(b) Reducer(c) Writable(d) ReadableThe question was posed to me in unit test.The doubt is from Java Interface in section HDFS – Hadoop Distributed File System of Hadoop

_____________ is used to read data from bytes buffers.(a) write()(b) read()(c) readwrite()(d) all of the mentionedThe question was asked in an online interview.My query is from Java Interface topic in chapter HDFS – Hadoop Distributed File System of Hadoop

______________ is method to copy byte from input stream to any other stream in Hadoop.(a) IOUtils(b) Utils(c) IUtils(d) All of the mentionedThe question was asked in quiz.I'm obligated to ask this question of Java Interface topic in division HDFS – Hadoop Distributed File System of Hadoop

In order to read any file in HDFS, instance of __________ is required.(a) filesystem(b) datastream(c) outstream(d) inputstreamThe question was asked during an interview.I would like to ask this question from Java Interface topic in portion HDFS – Hadoop Distributed File System of Hadoop

For ________ the HBase Master UI provides information about the HBase Master uptime.(a) HBase(b) Oozie(c) Kafka(d) All of the mentionedThe question was asked in an online interview.I want to ask this question from Introduction to HDFS in portion HDFS – Hadoop Distributed File System of Hadoop

HDFS is implemented in _____________ programming language.(a) C++(b) Java(c) Scala(d) None of the mentionedI got this question during an online exam.This intriguing question comes from Introduction to HDFS in division HDFS – Hadoop Distributed File System of Hadoop

________ is the slave/worker node and holds the user data in the form of Data Blocks.(a) DataNode(b) NameNode(c) Data block(d) ReplicationI had been asked this question in homework.My question is taken from Introduction to HDFS topic in division HDFS – Hadoop Distributed File System of Hadoop