InterviewSolution
Saved Bookmarks
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
Point out the correct statement.(a) YARN also extends the power of Hadoop to incumbent and new technologies found within the data center(b) YARN is the central point of investment for Hortonworks within the Apache community(c) YARN enhances a Hadoop compute cluster in many ways(d) All of the mentioned |
|
Answer» Correct answer is (d) All of the mentioned The explanation: YARN provides ISVs and developers a consistent framework for writing data access applications that run IN Hadoop. |
|
| 2. |
________ is the architectural center of Hadoop that allows multiple data processing engines.(a) YARN(b) Hive(c) Incubator(d) Chuckwa |
|
Answer» Correct answer is (a) YARN Easy explanation: YARN is the prerequisite for Enterprise Hadoop, providing resource management and a central platform to deliver consistent operations, security, and data governance tools across Hadoop clusters. |
|
| 3. |
___________ is used to decommission more than one RegionServer at a time by creating sub-znodes.(a) /hbase/master(b) /hbase/draining(c) /hbase/passive(d) None of the mentioned |
|
Answer» The correct choice is (b) /hbase/draining The best I can explain: /hbase/draining lets you decommission multiple RegionServers without having the risk of regions temporarily moved to a RegionServer that will be decommissioned later. |
|
| 4. |
Zookeeper keep track of the cluster state such as the ______ table location.(a) DOMAIN(b) NODE(c) ROOT(d) All of the mentioned |
|
Answer» Right option is (c) ROOT Explanation: Zookeeper keeps track of list of online RegionServers, unassigned Regions. |
|
| 5. |
_______ has a design policy of using ZooKeeper only for transient data.(a) Hive(b) Imphala(c) Hbase(d) Oozie |
|
Answer» The correct choice is (c) Hbase Explanation: If the HBase ZooKeeper data is removed, only the transient operations are affected – data can continue to be written and read to/from HBase. |
|
| 6. |
A ___________ server is a machine that keeps a copy of the state of the entire system and persists this information in local log files.(a) Master(b) Region(c) Zookeeper(d) All of the mentioned |
|
Answer» Correct choice is (c) Zookeeper Best explanation: A very large Hadoop cluster can be supported by multiple ZooKeeper servers. |
|
| 7. |
The ________ master will register its own address in this znode at startup, making this znode the source of truth for identifying which server is the Master.(a) active(b) passive(c) region(d) all of the mentioned |
|
Answer» Right option is (a) active The best I can explain: Each inactive Master will register itself as backup Master by creating a sub-znode. |
|
| 8. |
Point out the wrong statement.(a) All the znodes are prefixed using the default /hbase location(b) ZooKeeper provides an interactive shell that allows you to explore the ZooKeeper state(c) The znodes that you’ll most often see are the ones that coordinate operations like Region Assignment(d) All of the mentioned |
|
Answer» Right choice is (d) All of the mentioned To elaborate: The HBase root znode path is configurable using hbase-site.xml, and by default the location is “/hbase”. |
|
| 9. |
When a _______ is triggered the client receives a packet saying that the znode has changed.(a) event(b) watch(c) row(d) value |
|
Answer» Correct option is (b) watch The best I can explain: ZooKeeper supports the concept of watches. Clients can set a watch on a znodes. |
|
| 10. |
Which of the following command does not operate on tables?(a) enabled(b) disabled(c) drop(d) all of the mentioned |
|
Answer» The correct answer is (b) disabled The best explanation: is_disabled command verifies whether a table is disabled. |
|
| 11. |
You can run Pig in interactive mode using the ______ shell.(a) Grunt(b) FS(c) HDFS(d) None of the mentioned |
|
Answer» Correct option is (a) Grunt To explain I would say: Invoke the Grunt shell using the “pig” command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command line. |
|
| 12. |
A ________ is a way of extending Ambari that allows 3rd parties to plug in new resource types along with the APIs.(a) trigger(b) view(c) schema(d) none of the mentioned |
|
Answer» Right option is (b) view To explain: A view is an application that is deployed into the Ambari container. |
|
| 13. |
The ________ class mimics the behavior of the Main class but gives users a statistics object back.(a) PigRun(b) PigRunner(c) RunnerPig(d) None of the mentioned |
|
Answer» Right answer is (b) PigRunner The explanation: Optionally, you can call the API with an implementation of progress listener which will be invoked by Pig runtime during the execution. |
|
| 14. |
Point out the wrong statement.(a) HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools(b) There is Hive-specific interface for HCatalog(c) Data is defined using HCatalog’s command line interface (CLI)(d) All of the mentioned |
|
Answer» Right option is (b) There is Hive-specific interface for HCatalog To elaborate: Since HCatalog uses Hive’s metastore, Hive can read data in HCatalog directly. |
|
| 15. |
To register a “watch” on a znode data, you need to use the _______ commands to access the current content or metadata.(a) stat(b) put(c) receive(d) gets |
|
Answer» Right option is (a) stat The explanation: ZooKeeper can also notify you of changes in a znode content or changes in a znode children. |
|
| 16. |
_________ stores its metadata on multiple disks that typically include a non-local file server.(a) DataNode(b) NameNode(c) ActionNode(d) None of the mentioned |
|
Answer» Right answer is (b) NameNode The best I can explain: HDFS tolerates failures of storage servers (called DataNodes) and its disks. |
|
| 17. |
Point out the correct statement.(a) Hive Commands are non-SQL statement such as setting a property or adding a resource(b) Set -v prints a list of configuration variables that are overridden by the user or Hive(c) Set sets a list of variables that are overridden by the user or Hive(d) None of the mentioned |
|
Answer» Correct choice is (a) Hive Commands are non-SQL statement such as setting a property or adding a resource Easiest explanation: Commands can be used in HiveQL scripts or directly in the CLI or Beeline. |
|
| 18. |
__________ abstract class has three main methods for loading data and for most use cases it would suffice to extend it.(a) Load(b) LoadFunc(c) FuncLoad(d) None of the mentioned |
|
Answer» The correct option is (b) LoadFunc To explain I would say: LoadFunc and StoreFunc implementations should use the Hadoop 20 API based classes. |
|
| 19. |
Point out the wrong statement.(a) ILLUSTRATE operator is used to review how data is transformed through a sequence of Pig Latin statements(b) ILLUSTRATE is based on an example generator(c) Several new private classes make it harder for external tools such as Oozie to integrate with Pig statistics(d) None of the mentioned |
|
Answer» Right choice is (c) Several new private classes make it harder for external tools such as Oozie to integrate with Pig statistics For explanation: Several new public classes make it easier for external tools such as Oozie to integrate with Pig statistics. |
|
| 20. |
$ pig -x tez_local … will enable ________ mode in Pig.(a) Mapreduce(b) Tez(c) Local(d) None of the mentioned |
|
Answer» Right option is (d) None of the mentioned Best explanation: Tez Local Mode is similar to local mode, except internally Pig will invoke tez runtime engine. |
|
| 21. |
Which of the following will remove the resource(s) from the distributed cache?(a) delete FILE[S] *(b) delete JAR[S] *(c) delete ARCHIVE[S] *(d) all of the mentioned |
|
Answer» Right option is (d) all of the mentioned The explanation: Delete command is used to remove existing resource. |
|
| 22. |
__________ command disables drops and recreates a table.(a) drop(b) truncate(c) delete(d) none of the mentioned |
|
Answer» Correct answer is (b) truncate The best explanation: The syntax of truncate is as follows: hbase> truncate ‘table name’. |
|
| 23. |
Which command is used to disable all the tables matching the given regex?(a) remove all(b) drop all(c) disable_all(d) all of the mentioned |
|
Answer» The correct option is (c) disable_all The best I can explain: The syntax for disable_all command is as follows : hbase> disable_all ‘r.*’ |
|
| 24. |
Point out the correct statement.(a) During the testing phase of your implementation, you can use LOAD to display results to your terminal screen(b) You can view outer relations as well as relations defined in a nested FOREACH statement(c) Hadoop properties are interpreted by Pig(d) None of the mentioned |
|
Answer» Correct option is (b) You can view outer relations as well as relations defined in a nested FOREACH statement To explain I would say: Viewing outer relations is possible using DESCRIBE operator. |
|
| 25. |
Which of the following command is used to show values to keys used in Pig?(a) set(b) declare(c) display(d) all of the mentioned |
|
Answer» Right answer is (a) set Explanation: All Pig and Hadoop properties can be set, either in the Pig script or via the Grunt command line. |
|
| 26. |
Amazon EC2 provides virtual computing environments, known as __________(a) chunks(b) instances(c) messages(d) none of the mentioned |
|
Answer» The correct answer is (b) instances To elaborate: Using Amazon EC2 eliminates your need to invest in hardware up fr |
|
| 27. |
Point out the correct statement.(a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload(b) HDFS runs on a small cluster of commodity-class nodes(c) NEWSQL is frequently the collection point for big data(d) None of the mentioned |
|
Answer» The correct answer is (a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload For explanation I would say: Hadoop together with a relational data warehouse, they can form very effective data warehouse infrastructure. |
|
| 28. |
The EC2 can serve as a practically unlimited set of ___________ machines.(a) virtual(b) real(c) distributed(d) all of the mentioned |
|
Answer» Correct option is (a) virtual Explanation: To use the EC2, a subscriber creates an Amazon Machine Image (AMI) containing the operating system, application programs and configuration settings. |
|
| 29. |
Point out the wrong statement.(a) The load/store UDFs control how data goes into Pig and comes out of Pig.(b) LoadCaster has methods to convert byte arrays to specific types.(c) The meaning of getNext() has changed and is called by Pig runtime to get the last tuple in the data(d) None of the mentioned |
|
Answer» Right answer is (c) The meaning of getNext() has changed and is called by Pig runtime to get the last tuple in the data Easy explanation: The meaning of getNext() has not changed and is called by Pig runtime to get the next tuple in the data. |
|
| 30. |
The HCatalog interface for Pig consists of ____________ and HCatStorer, which implement the Pig load and store interfaces respectively.(a) HCLoader(b) HCatLoader(c) HCatLoad(d) None of the mentioned |
|
Answer» The correct choice is (b) HCatLoader To explain I would say: HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement. |
|
| 31. |
Which of the following operator executes a shell command from the Hive shell?(a) |(b) !(c) ^(d) + |
|
Answer» Right answer is (b) ! The explanation is: Exclamation operator is for execution of command. |
|
| 32. |
Which of the following operator is used to view the map reduce execution plans?(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAIN |
|
Answer» Right answer is (d) EXPLAIN To explain: EXPLAIN displays execution plans. |
|
| 33. |
Which of the following has methods to deal with metadata?(a) LoadPushDown(b) LoadMetadata(c) LoadCaster(d) All of the mentioned |
|
Answer» The correct option is (b) LoadMetadata The explanation is: Most implementation of loaders don’t need to implement this unless they interact with some metadata system. |
|
| 34. |
Which of the following is not a table scope operator?(a) MEMSTORE_FLUSH(b) MEMSTORE_FLUSHSIZE(c) MAX_FILESIZE(d) All of the mentioned |
|
Answer» The correct option is (a) MEMSTORE_FLUSH The explanation: Using alter, you can set and remove table scope operators such as MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc. |
|
| 35. |
_________ operator is used to review the schema of a relation.(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAIN |
|
Answer» The correct answer is (b) DESCRIBE Explanation: DESCRIBE returns the schema of a relation. |
|
| 36. |
Point out the correct statement.(a) Invoke the Grunt shell using the “enter” command(b) Pig does not support jar files(c) Both the run and exec commands are useful for debugging because you can modify a Pig script in an editor(d) All of the mentioned |
|
Answer» Correct answer is (c) Both the run and exec commands are useful for debugging because you can modify a Pig script in an editor Explanation: Both commands promote Pig script modularity as they allow you to reuse existing components. |
|
| 37. |
Which of the following command is used to dump the log container?(a) logs(b) log(c) dump(d) all of the mentioned |
|
Answer» The correct option is (a) logs Easiest explanation: Usage: yarn logs -applicationId |
|
| 38. |
Amazon ___________ is a Web service that provides real-time monitoring to Amazon’s EC2 customers.(a) AmWatch(b) CloudWatch(c) IamWatch(d) All of the mentioned |
|
Answer» The correct choice is (b) CloudWatch Explanation: The current AMIs for all CoreOS channels and EC2 regions are updated frequently. |
|
| 39. |
________ systems are scale-out file-based (HDD) systems moving to more uses of memory in the nodes.(a) NoSQL(b) NewSQL(c) SQL(d) All of the mentioned |
|
Answer» Correct option is (a) NoSQL The explanation: NoSQL systems make the most sense whenever the application is based on data with varying data types and the data can be stored in key-value notation. |
|
| 40. |
EC2 capacity can be increased or decreased in real time from as few as one to more than ___________ virtual machines simultaneously.(a) 1000(b) 2000(c) 3000(d) None of the mentioned |
|
Answer» Correct answer is (a) 1000 The best I can explain: Billing takes place according to the computing and network resources consumed. |
|
| 41. |
__________ get events indicating completion (success/failure) of component tasks.(a) getJobName()(b) getJobState()(c) getPriority()(d) getTaskCompletionEvents(int startFrom) |
|
Answer» The correct answer is (d) getTaskCompletionEvents(int startFrom) To explain I would say: getPriority() provides scheduling info of the job. |
|
| 42. |
Point out the wrong statement.(a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner(b) The MapReduce framework operates exclusively on pairs(c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods(d) None of the mentioned |
|
Answer» Right answer is (d) None of the mentioned Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. |
|
| 43. |
____________ method will be called by Pig both in the front end and back end to pass a unique signature to the Loader.(a) relativeToAbsolutePath()(b) setUdfContextSignature()(c) getCacheFiles()(d) getShipFiles() |
|
Answer» The correct choice is (b) setUdfContextSignature() The explanation is: The signature can be used to store into the UDFContext any information which the Loader needs to store between various method invocations in the front end and back end. |
|
| 44. |
Point out the correct statement.(a) HDT tool allows you to allow working with only 1.1 version of Hadoop(b) HDT tool allows you to allow working with multiple versions of Hadoop(c) HDT tool allows you to allow working with multiple versions of Hadoop from multiple IDE(d) All of the mentioned |
|
Answer» Correct option is (b) HDT tool allows you to allow working with multiple versions of Hadoop Explanation: HDT project is currently a member of the Apache Incubator. |
|
| 45. |
Point out the wrong statement.(a) Spark is intended to replace, the Hadoop stack(b) Spark was designed to read and write data from and to HDFS, as well as other storage systems(c) Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN(d) None of the mentioned |
|
Answer» The correct choice is (a) Spark is intended to replace, the Hadoop stack The best explanation: Spark is intended to enhance, not replace, the Hadoop stack. |
|
| 46. |
Which of the following is shortcut for DUMP operator?(a) de alias(b) d alias(c) q(d) None of the mentioned |
|
Answer» Correct option is (b) d alias Explanation: If alias is ignored last defined alias will be used. |
|
| 47. |
__________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.(a) Partitioner(b) OutputCollector(c) Reporter(d) All of the mentioned |
|
Answer» The correct choice is (b) OutputCollector The explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners. |
|
| 48. |
Users can easily run Spark on top of Amazon’s __________(a) Infosphere(b) EC2(c) EMR(d) None of the mentioned |
|
Answer» Correct answer is (b) EC2 To explain I would say: Users can easily run Spark (and Shark) on top of Amazon’s EC2 either using the scripts that come with Spark. |
|
| 49. |
Which of the following phases occur simultaneously?(a) Shuffle and Sort(b) Reduce and Sort(c) Shuffle and Map(d) All of the mentioned |
|
Answer» The correct option is (a) Shuffle and Sort Easiest explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged. |
|
| 50. |
Which of the following language is not supported by Spark?(a) Java(b) Pascal(c) Scala(d) Python |
|
Answer» Right option is (b) Pascal To elaborate: The Spark engine runs in a variety of environments, from cloud services to Hadoop or Mesos clusters. |
|