435 + Interview Questions in GENERAL QA IN HADOOP Page 1 InterviewSolution

1.	Point out the correct statement.(a) YARN also extends the power of Hadoop to incumbent and new technologies found within the data center(b) YARN is the central point of investment for Hortonworks within the Apache community(c) YARN enhances a Hadoop compute cluster in many ways(d) All of the mentioned
Answer» Correct answer is (d) All of the mentioned The explanation: YARN provides ISVs and developers a consistent framework for writing data access applications that run IN Hadoop.

Discussion

2.	________ is the architectural center of Hadoop that allows multiple data processing engines.(a) YARN(b) Hive(c) Incubator(d) Chuckwa
Answer» Correct answer is (a) YARN Easy explanation: YARN is the prerequisite for Enterprise Hadoop, providing resource management and a central platform to deliver consistent operations, security, and data governance tools across Hadoop clusters.

Discussion

3.	___________ is used to decommission more than one RegionServer at a time by creating sub-znodes.(a) /hbase/master(b) /hbase/draining(c) /hbase/passive(d) None of the mentioned
Answer» The correct choice is (b) /hbase/draining The best I can explain: /hbase/draining lets you decommission multiple RegionServers without having the risk of regions temporarily moved to a RegionServer that will be decommissioned later.

Discussion

4.	Zookeeper keep track of the cluster state such as the ______ table location.(a) DOMAIN(b) NODE(c) ROOT(d) All of the mentioned
Answer» Right option is (c) ROOT Explanation: Zookeeper keeps track of list of online RegionServers, unassigned Regions.

Discussion

5.	_______ has a design policy of using ZooKeeper only for transient data.(a) Hive(b) Imphala(c) Hbase(d) Oozie
Answer» The correct choice is (c) Hbase Explanation: If the HBase ZooKeeper data is removed, only the transient operations are affected – data can continue to be written and read to/from HBase.

Discussion

6.	A ___________ server is a machine that keeps a copy of the state of the entire system and persists this information in local log files.(a) Master(b) Region(c) Zookeeper(d) All of the mentioned
Answer» Correct choice is (c) Zookeeper Best explanation: A very large Hadoop cluster can be supported by multiple ZooKeeper servers.

Discussion

7.	The ________ master will register its own address in this znode at startup, making this znode the source of truth for identifying which server is the Master.(a) active(b) passive(c) region(d) all of the mentioned
Answer» Right option is (a) active The best I can explain: Each inactive Master will register itself as backup Master by creating a sub-znode.

Discussion

8.	Point out the wrong statement.(a) All the znodes are prefixed using the default /hbase location(b) ZooKeeper provides an interactive shell that allows you to explore the ZooKeeper state(c) The znodes that you’ll most often see are the ones that coordinate operations like Region Assignment(d) All of the mentioned
Answer» Right choice is (d) All of the mentioned To elaborate: The HBase root znode path is configurable using hbase-site.xml, and by default the location is “/hbase”.

Discussion

9.	When a _______ is triggered the client receives a packet saying that the znode has changed.(a) event(b) watch(c) row(d) value
Answer» Correct option is (b) watch The best I can explain: ZooKeeper supports the concept of watches. Clients can set a watch on a znodes.

Discussion

10.	Which of the following command does not operate on tables?(a) enabled(b) disabled(c) drop(d) all of the mentioned
Answer» The correct answer is (b) disabled The best explanation: is_disabled command verifies whether a table is disabled.

Discussion

11.	You can run Pig in interactive mode using the ______ shell.(a) Grunt(b) FS(c) HDFS(d) None of the mentioned
Answer» Correct option is (a) Grunt To explain I would say: Invoke the Grunt shell using the “pig” command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line.

Discussion

12.	A ________ is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs.(a) trigger(b) view(c) schema(d) none of the mentioned
Answer» Right option is (b) view To explain: A view is an application that is deployed into the Ambari container.

Discussion

13.	The ________ class mimics the behavior of the Main class but gives users a statistics object back.(a) PigRun(b) PigRunner(c) RunnerPig(d) None of the mentioned
Answer» Right answer is (b) PigRunner The explanation: Optionally, you can call the API with an implementation of progress listener which will be invoked by Pig runtime during the execution.

Discussion

14.	Point out the wrong statement.(a) HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools(b) There is Hive-specific interface for HCatalog(c) Data is defined using HCatalog’s command line interface (CLI)(d) All of the mentioned
Answer» Right option is (b) There is Hive-specific interface for HCatalog To elaborate: Since HCatalog uses Hive’s metastore, Hive can read data in HCatalog directly.

Discussion

15.	To register a “watch” on a znode data, you need to use the _______ commands to access the current content or metadata.(a) stat(b) put(c) receive(d) gets
Answer» Right option is (a) stat The explanation: ZooKeeper can also notify you of changes in a znode content or changes in a znode children.

Discussion

16.	_________ stores its metadata on multiple disks that typically include a non-local file server.(a) DataNode(b) NameNode(c) ActionNode(d) None of the mentioned
Answer» Right answer is (b) NameNode The best I can explain: HDFS tolerates failures of storage servers (called DataNodes) and its disks.

Discussion

17.	Point out the correct statement.(a) Hive Commands are non-SQL statement such as setting a property or adding a resource(b) Set -v prints a list of configuration variables that are overridden by the user or Hive(c) Set sets a list of variables that are overridden by the user or Hive(d) None of the mentioned
Answer» Correct choice is (a) Hive Commands are non-SQL statement such as setting a property or adding a resource Easiest explanation: Commands can be used in HiveQL scripts or directly in the CLI or Beeline.

Discussion

18.	__________ abstract class has three main methods for loading data and for most use cases it would suffice to extend it.(a) Load(b) LoadFunc(c) FuncLoad(d) None of the mentioned
Answer» The correct option is (b) LoadFunc To explain I would say: LoadFunc and StoreFunc implementations should use the Hadoop 20 API based classes.

Discussion

19.	Point out the wrong statement.(a) ILLUSTRATE operator is used to review how data is transformed through a sequence of Pig Latin statements(b) ILLUSTRATE is based on an example generator(c) Several new private classes make it harder for external tools such as Oozie to integrate with Pig statistics(d) None of the mentioned
Answer» Right choice is (c) Several new private classes make it harder for external tools such as Oozie to integrate with Pig statistics For explanation: Several new public classes make it easier for external tools such as Oozie to integrate with Pig statistics.

Discussion

20.	$ pig -x tez_local … will enable ________ mode in Pig.(a) Mapreduce(b) Tez(c) Local(d) None of the mentioned
Answer» Right option is (d) None of the mentioned Best explanation: Tez Local Mode is similar to local mode, except internally Pig will invoke tez runtime engine.

Discussion

21.	Which of the following will remove the resource(s) from the distributed cache?(a) delete FILE[S] (b) delete JAR[S] (c) delete ARCHIVE[S] *(d) all of the mentioned
Answer» Right option is (d) all of the mentioned The explanation: Delete command is used to remove existing resource.

Discussion

22.	__________ command disables drops and recreates a table.(a) drop(b) truncate(c) delete(d) none of the mentioned
Answer» Correct answer is (b) truncate The best explanation: The syntax of truncate is as follows: hbase> truncate ‘table name’.

Discussion

23.	Which command is used to disable all the tables matching the given regex?(a) remove all(b) drop all(c) disable_all(d) all of the mentioned
Answer» The correct option is (c) disable_all The best I can explain: The syntax for disable_all command is as follows : hbase> disable_all ‘r.*’

Discussion

24.	Point out the correct statement.(a) During the testing phase of your implementation, you can use LOAD to display results to your terminal screen(b) You can view outer relations as well as relations defined in a nested FOREACH statement(c) Hadoop properties are interpreted by Pig(d) None of the mentioned
Answer» Correct option is (b) You can view outer relations as well as relations defined in a nested FOREACH statement To explain I would say: Viewing outer relations is possible using DESCRIBE operator.

Discussion

25.	Which of the following command is used to show values to keys used in Pig?(a) set(b) declare(c) display(d) all of the mentioned
Answer» Right answer is (a) set Explanation: All Pig and Hadoop properties can be set, either in the Pig script or via the Grunt command line.

Discussion

26.	Amazon EC2 provides virtual computing environments, known as __________(a) chunks(b) instances(c) messages(d) none of the mentioned
Answer» The correct answer is (b) instances To elaborate: Using Amazon EC2 eliminates your need to invest in hardware up fr

Discussion

27.	Point out the correct statement.(a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload(b) HDFS runs on a small cluster of commodity-class nodes(c) NEWSQL is frequently the collection point for big data(d) None of the mentioned
Answer» The correct answer is (a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload For explanation I would say: Hadoop together with a relational data warehouse, they can form very effective data warehouse infrastructure.

Discussion

28.	The EC2 can serve as a practically unlimited set of ___________ machines.(a) virtual(b) real(c) distributed(d) all of the mentioned
Answer» Correct option is (a) virtual Explanation: To use the EC2, a subscriber creates an Amazon Machine Image (AMI) containing the operating system, application programs and configuration settings.

Discussion

29.	Point out the wrong statement.(a) The load/store UDFs control how data goes into Pig and comes out of Pig.(b) LoadCaster has methods to convert byte arrays to specific types.(c) The meaning of getNext() has changed and is called by Pig runtime to get the last tuple in the data(d) None of the mentioned
Answer» Right answer is (c) The meaning of getNext() has changed and is called by Pig runtime to get the last tuple in the data Easy explanation: The meaning of getNext() has not changed and is called by Pig runtime to get the next tuple in the data.

Discussion

30.	The HCatalog interface for Pig consists of ____________ and HCatStorer, which implement the Pig load and store interfaces respectively.(a) HCLoader(b) HCatLoader(c) HCatLoad(d) None of the mentioned
Answer» The correct choice is (b) HCatLoader To explain I would say: HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement.

Discussion

31.	Which of the following operator executes a shell command from the Hive shell?(a) \|(b) !(c) ^(d) +
Answer» Right answer is (b) ! The explanation is: Exclamation operator is for execution of command.

Discussion

32.	Which of the following operator is used to view the map reduce execution plans?(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAIN
Answer» Right answer is (d) EXPLAIN To explain: EXPLAIN displays execution plans.

Discussion

33.	Which of the following has methods to deal with metadata?(a) LoadPushDown(b) LoadMetadata(c) LoadCaster(d) All of the mentioned
Answer» The correct option is (b) LoadMetadata The explanation is: Most implementation of loaders don’t need to implement this unless they interact with some metadata system.

Discussion

34.	Which of the following is not a table scope operator?(a) MEMSTORE_FLUSH(b) MEMSTORE_FLUSHSIZE(c) MAX_FILESIZE(d) All of the mentioned
Answer» The correct option is (a) MEMSTORE_FLUSH The explanation: Using alter, you can set and remove table scope operators such as MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc.

Discussion

35.	_________ operator is used to review the schema of a relation.(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAIN
Answer» The correct answer is (b) DESCRIBE Explanation: DESCRIBE returns the schema of a relation.

Discussion

36.	Point out the correct statement.(a) Invoke the Grunt shell using the “enter” command(b) Pig does not support jar files(c) Both the run and exec commands are useful for debugging because you can modify a Pig script in an editor(d) All of the mentioned
Answer» Correct answer is (c) Both the run and exec commands are useful for debugging because you can modify a Pig script in an editor Explanation: Both commands promote Pig script modularity as they allow you to reuse existing components.

Discussion

37.	Which of the following command is used to dump the log container?(a) logs(b) log(c) dump(d) all of the mentioned
Answer» The correct option is (a) logs Easiest explanation: Usage: yarn logs -applicationId .

Discussion

38.	Amazon ___________ is a Web service that provides real-time monitoring to Amazon’s EC2 customers.(a) AmWatch(b) CloudWatch(c) IamWatch(d) All of the mentioned
Answer» The correct choice is (b) CloudWatch Explanation: The current AMIs for all CoreOS channels and EC2 regions are updated frequently.

Discussion

39.	________ systems are scale-out file-based (HDD) systems moving to more uses of memory in the nodes.(a) NoSQL(b) NewSQL(c) SQL(d) All of the mentioned
Answer» Correct option is (a) NoSQL The explanation: NoSQL systems make the most sense whenever the application is based on data with varying data types and the data can be stored in key-value notation.

Discussion

40.	EC2 capacity can be increased or decreased in real time from as few as one to more than ___________ virtual machines simultaneously.(a) 1000(b) 2000(c) 3000(d) None of the mentioned
Answer» Correct answer is (a) 1000 The best I can explain: Billing takes place according to the computing and network resources consumed.

Discussion

41.	__________ get events indicating completion (success/failure) of component tasks.(a) getJobName()(b) getJobState()(c) getPriority()(d) getTaskCompletionEvents(int startFrom)
Answer» The correct answer is (d) getTaskCompletionEvents(int startFrom) To explain I would say: getPriority() provides scheduling info of the job.

Discussion

42.	Point out the wrong statement.(a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner(b) The MapReduce framework operates exclusively on pairs(c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods(d) None of the mentioned
Answer» Right answer is (d) None of the mentioned Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Discussion

43.	____________ method will be called by Pig both in the front end and back end to pass a unique signature to the Loader.(a) relativeToAbsolutePath()(b) setUdfContextSignature()(c) getCacheFiles()(d) getShipFiles()
Answer» The correct choice is (b) setUdfContextSignature() The explanation is: The signature can be used to store into the UDFContext any information which the Loader needs to store between various method invocations in the front end and back end.

Discussion

44.	Point out the correct statement.(a) HDT tool allows you to allow working with only 1.1 version of Hadoop(b) HDT tool allows you to allow working with multiple versions of Hadoop(c) HDT tool allows you to allow working with multiple versions of Hadoop from multiple IDE(d) All of the mentioned
Answer» Correct option is (b) HDT tool allows you to allow working with multiple versions of Hadoop Explanation: HDT project is currently a member of the Apache Incubator.

Discussion

45.	Point out the wrong statement.(a) Spark is intended to replace, the Hadoop stack(b) Spark was designed to read and write data from and to HDFS, as well as other storage systems(c) Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN(d) None of the mentioned
Answer» The correct choice is (a) Spark is intended to replace, the Hadoop stack The best explanation: Spark is intended to enhance, not replace, the Hadoop stack.

Discussion

46.	Which of the following is shortcut for DUMP operator?(a) de alias(b) d alias(c) q(d) None of the mentioned
Answer» Correct option is (b) d alias Explanation: If alias is ignored last defined alias will be used.

Discussion

47.	__________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.(a) Partitioner(b) OutputCollector(c) Reporter(d) All of the mentioned
Answer» The correct choice is (b) OutputCollector The explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners.

Discussion

48.	Users can easily run Spark on top of Amazon’s __________(a) Infosphere(b) EC2(c) EMR(d) None of the mentioned
Answer» Correct answer is (b) EC2 To explain I would say: Users can easily run Spark (and Shark) on top of Amazon’s EC2 either using the scripts that come with Spark.

Discussion

49.	Which of the following phases occur simultaneously?(a) Shuffle and Sort(b) Reduce and Sort(c) Shuffle and Map(d) All of the mentioned
Answer» The correct option is (a) Shuffle and Sort Easiest explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

Discussion

50.	Which of the following language is not supported by Spark?(a) Java(b) Pascal(c) Scala(d) Python
Answer» Right option is (b) Pascal To elaborate: The Spark engine runs in a variety of environments, from cloud services to Hadoop or Mesos clusters.

Discussion

Explore topic-wise InterviewSolutions in .

________ is the architectural center of Hadoop that allows multiple data processing engines.(a) YARN(b) Hive(c) Incubator(d) Chuckwa

___________ is used to decommission more than one RegionServer at a time by creating sub-znodes.(a) /hbase/master(b) /hbase/draining(c) /hbase/passive(d) None of the mentioned

Zookeeper keep track of the cluster state such as the ______ table location.(a) DOMAIN(b) NODE(c) ROOT(d) All of the mentioned

_______ has a design policy of using ZooKeeper only for transient data.(a) Hive(b) Imphala(c) Hbase(d) Oozie

A ___________ server is a machine that keeps a copy of the state of the entire system and persists this information in local log files.(a) Master(b) Region(c) Zookeeper(d) All of the mentioned

The ________ master will register its own address in this znode at startup, making this znode the source of truth for identifying which server is the Master.(a) active(b) passive(c) region(d) all of the mentioned

When a _______ is triggered the client receives a packet saying that the znode has changed.(a) event(b) watch(c) row(d) value

Which of the following command does not operate on tables?(a) enabled(b) disabled(c) drop(d) all of the mentioned

You can run Pig in interactive mode using the ______ shell.(a) Grunt(b) FS(c) HDFS(d) None of the mentioned

A ________ is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs.(a) trigger(b) view(c) schema(d) none of the mentioned

The ________ class mimics the behavior of the Main class but gives users a statistics object back.(a) PigRun(b) PigRunner(c) RunnerPig(d) None of the mentioned

Point out the wrong statement.(a) HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools(b) There is Hive-specific interface for HCatalog(c) Data is defined using HCatalog’s command line interface (CLI)(d) All of the mentioned

To register a “watch” on a znode data, you need to use the _______ commands to access the current content or metadata.(a) stat(b) put(c) receive(d) gets

_________ stores its metadata on multiple disks that typically include a non-local file server.(a) DataNode(b) NameNode(c) ActionNode(d) None of the mentioned

__________ abstract class has three main methods for loading data and for most use cases it would suffice to extend it.(a) Load(b) LoadFunc(c) FuncLoad(d) None of the mentioned

$ pig -x tez_local … will enable ________ mode in Pig.(a) Mapreduce(b) Tez(c) Local(d) None of the mentioned

Which of the following will remove the resource(s) from the distributed cache?(a) delete FILE[S] *(b) delete JAR[S] *(c) delete ARCHIVE[S] *(d) all of the mentioned

__________ command disables drops and recreates a table.(a) drop(b) truncate(c) delete(d) none of the mentioned

Which command is used to disable all the tables matching the given regex?(a) remove all(b) drop all(c) disable_all(d) all of the mentioned

Which of the following command is used to show values to keys used in Pig?(a) set(b) declare(c) display(d) all of the mentioned

Amazon EC2 provides virtual computing environments, known as __________(a) chunks(b) instances(c) messages(d) none of the mentioned

Point out the correct statement.(a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload(b) HDFS runs on a small cluster of commodity-class nodes(c) NEWSQL is frequently the collection point for big data(d) None of the mentioned

The EC2 can serve as a practically unlimited set of ___________ machines.(a) virtual(b) real(c) distributed(d) all of the mentioned

The HCatalog interface for Pig consists of ____________ and HCatStorer, which implement the Pig load and store interfaces respectively.(a) HCLoader(b) HCatLoader(c) HCatLoad(d) None of the mentioned

Which of the following operator executes a shell command from the Hive shell?(a) |(b) !(c) ^(d) +

Which of the following operator is used to view the map reduce execution plans?(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAIN

Which of the following has methods to deal with metadata?(a) LoadPushDown(b) LoadMetadata(c) LoadCaster(d) All of the mentioned

Which of the following is not a table scope operator?(a) MEMSTORE_FLUSH(b) MEMSTORE_FLUSHSIZE(c) MAX_FILESIZE(d) All of the mentioned

_________ operator is used to review the schema of a relation.(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAIN

Point out the correct statement.(a) Invoke the Grunt shell using the “enter” command(b) Pig does not support jar files(c) Both the run and exec commands are useful for debugging because you can modify a Pig script in an editor(d) All of the mentioned

Which of the following command is used to dump the log container?(a) logs(b) log(c) dump(d) all of the mentioned

Amazon ___________ is a Web service that provides real-time monitoring to Amazon’s EC2 customers.(a) AmWatch(b) CloudWatch(c) IamWatch(d) All of the mentioned

________ systems are scale-out file-based (HDD) systems moving to more uses of memory in the nodes.(a) NoSQL(b) NewSQL(c) SQL(d) All of the mentioned

EC2 capacity can be increased or decreased in real time from as few as one to more than ___________ virtual machines simultaneously.(a) 1000(b) 2000(c) 3000(d) None of the mentioned

__________ get events indicating completion (success/failure) of component tasks.(a) getJobName()(b) getJobState()(c) getPriority()(d) getTaskCompletionEvents(int startFrom)

____________ method will be called by Pig both in the front end and back end to pass a unique signature to the Loader.(a) relativeToAbsolutePath()(b) setUdfContextSignature()(c) getCacheFiles()(d) getShipFiles()

Point out the correct statement.(a) HDT tool allows you to allow working with only 1.1 version of Hadoop(b) HDT tool allows you to allow working with multiple versions of Hadoop(c) HDT tool allows you to allow working with multiple versions of Hadoop from multiple IDE(d) All of the mentioned

Which of the following is shortcut for DUMP operator?(a) de alias(b) d alias(c) q(d) None of the mentioned

__________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.(a) Partitioner(b) OutputCollector(c) Reporter(d) All of the mentioned

Users can easily run Spark on top of Amazon’s __________(a) Infosphere(b) EC2(c) EMR(d) None of the mentioned

Which of the following phases occur simultaneously?(a) Shuffle and Sort(b) Reduce and Sort(c) Shuffle and Map(d) All of the mentioned

Which of the following language is not supported by Spark?(a) Java(b) Pascal(c) Scala(d) Python

Which of the following will remove the resource(s) from the distributed cache?(a) delete FILE[S] (b) delete JAR[S] (c) delete ARCHIVE[S] *(d) all of the mentioned