InterviewSolution
Saved Bookmarks
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 51. |
Apache Hadoop Development Tools is an effort undergoing incubation at _________(a) ADF(b) ASF(c) HCC(d) AFS |
|
Answer» Right answer is (b) ASF The explanation is: The Apache Software Foundation(ASF) is sponsored by the Apache Incubator PMC. |
|
| 52. |
__________ method tells LoadFunc which fields are required in the Pig script.(a) pushProjection()(b) relativeToAbsolutePath()(c) prepareToRead()(d) none of the mentioned |
|
Answer» Right choice is (a) pushProjection() To explain: Pig will use the column index requiredField.index to communicate with the LoadFunc about the fields required by the Pig script. |
|
| 53. |
_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.(a) Reduce(b) Map(c) Reducer(d) All of the mentioned |
|
Answer» The correct option is (a) Reduce To explain I would say: Reduce function collates the work and resolves the results. |
|
| 54. |
The output of the _______ is not sorted in the Mapreduce framework for Hadoop.(a) Mapper(b) Cascader(c) Scalding(d) None of the mentioned |
|
Answer» Right choice is (d) None of the mentioned Best explanation: The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted. |
|
| 55. |
GraphX provides an API for expressing graph computation that can model the __________ abstraction.(a) GaAdt(b) Spark Core(c) Pregel(d) None of the mentioned |
|
Answer» The correct option is (c) Pregel To elaborate: GraphX is used for machine learning. |
|
| 56. |
Point out the wrong statement.(a) ConcurScheduler detects whether the index is on SSD or not(b) Memory index supports payloads(c) Auto-IO-throttling has been added to ConcurrentMergeScheduler, to rate limit IO writes for each merge depending on incoming merge rate(d) The default codec has an option to control BEST_SPEED or BEST_COMPRESSION for stored fields |
|
Answer» Right choice is (a) ConcurScheduler detects whether the index is on SSD or not To explain: ConcurrentMergeScheduler does a better job defaulting its settings. |
|
| 57. |
Spark is packaged with higher level libraries, including support for _________ queries.(a) SQL(b) C(c) C++(d) None of the mentioned |
|
Answer» Right answer is (a) SQL The explanation: Standard libraries increase developer productivity and can be seamlessly combined to create complex workflows. |
|
| 58. |
The ___________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.(a) DataCache(b) DistributedData(c) DistributedCache(d) All of the mentioned |
|
Answer» Correct choice is (c) DistributedCache To explain I would say: The child-jvm always has its current working directory added to the java.library.path and LD_LIBRARY_PATH. |
|
| 59. |
The _____________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.(a) DistributedLog(b) DistributedCache(c) DistributedJars(d) None of the mentioned |
|
Answer» Correct choice is (b) DistributedCache Easiest explanation: Cached libraries can be loaded via System.loadLibrary or System.load. |
|
| 60. |
____________ method enables the RecordReader associated with the InputFormat provided by the LoadFunc is passed to the LoadFunc.(a) getNext()(b) relativeToAbsolutePath()(c) prepareToRead()(d) all of the mentioned |
|
Answer» The correct choice is (c) prepareToRead() The best explanation: The RecordReader can then be used by the implementation in getNext() to return a tuple representing a record of data back to pig. |
|
| 61. |
Apache _________ is a project that enables development and consumption of REST style web services.(a) Wives(b) Wink(c) Wig(d) All of the mentioned |
|
Answer» Right choice is (b) Wink To explain: The core server runtime is based on the JAX-RS (JSR 311) standard. |
|
| 62. |
The output descriptor for the table to be written is created by calling ____________(a) OutputJobInfo.describe(b) OutputJobInfo.create(c) OutputJobInfo.put(d) None of the mentioned |
|
Answer» Correct choice is (b) OutputJobInfo.create To explain I would say: The implementation of Map takes HCatRecord as an input and the implementation of Reduce produces it as an output. |
|
| 63. |
Point out the correct statement.(a) Mahout is distributed under a commercially friendly Apache Software license(b) Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm(c) Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms(d) None of the mentioned |
|
Answer» Correct choice is (d) None of the mentioned The explanation is: The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. |
|
| 64. |
PostingsFormat now uses a __________ API when writing postings, just like doc values.(a) push(b) pull(c) read(d) all of the mentioned |
|
Answer» Right answer is (b) pull To explain: This is powerful because you can do things in your postings format that require making more than one pass through the postings such as iterating over all postings. |
|
| 65. |
SolrJ now has first class support for __________ API.(a) Compactions(b) Collections(c) Distribution(d) All of the mentioned |
|
Answer» Right option is (b) Collections Explanation: Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene. |
|
| 66. |
_________ is the output produced by TextOutputFor mat, Hadoop default OutputFormat.(a) KeyValueTextInputFormat(b) KeyValueTextOutputFormat(c) FileValueTextInputFormat(d) All of the mentioned |
|
Answer» The correct answer is (b) KeyValueTextOutputFormat Explanation: To interpret such files correctly, KeyValueTextInputFormat is appropriate. |
|
| 67. |
Which of the following class provides a subset of features provided by the Unix/GNU Sort?(a) KeyFieldBased(b) KeyFieldComparator(c) KeyFieldBasedComparator(d) All of the mentioned |
|
Answer» Correct choice is (c) KeyFieldBasedComparator To explain I would say: Hadoop has a library class, KeyFieldBasedComparator, that is useful for many applications. |
|
| 68. |
___________ executes the pipeline as a series of MapReduce jobs.(a) SparkPipeline(b) MRPipeline(c) MemPipeline(d) None of the mentioned |
|
Answer» The correct answer is (b) MRPipeline For explanation I would say: Every Crunch data pipeline is coordinated by an instance of the Pipeline interface. |
|
| 69. |
__________ is a log collection and correlation software with reporting and alarming functionalities.(a) Lucene(b) ALOIS(c) Imphal(d) None of the mentioned |
|
Answer» The correct answer is (b) ALOIS Explanation: This Project activity is transferred to another Incubator project – ODE. |
|
| 70. |
_____________ will skip the nodes given in the config with the same exit transition as before.(a) ActionMega handler(b) Action handler(c) Data handler(d) None of the mentioned |
|
Answer» The correct answer is (b) Action handler Best explanation: Currently there is no way to remove an existing configuration but only override by passing a different value in the input configuration. |
|
| 71. |
Drill analyze semi-structured/nested data coming from _________ applications.(a) RDBMS(b) NoSQL(c) NewSQL(d) None of the mentioned |
|
Answer» Correct choice is (b) NoSQL The best I can explain: Modern big data applications such as social, mobile, web and IoT deal with a larger number of users and larger amount of data than the traditional transactional applications. |
|
| 72. |
Point out the wrong statement.(a) Oozie provides a unique callback URL to the task, the task should invoke the given URL to notify its completion(b) All computation/processing tasks triggered by an mechanism node are remote to Oozie(c) Oozie workflows can be parameterized(d) None of the mentioned |
|
Answer» The correct answer is (b) All computation/processing tasks triggered by an mechanism node are remote to Oozie Easy explanation: All computation/processing tasks are executed by Hadoop Map/Reduce framework. |
|
| 73. |
Hive uses _________ for logging.(a) logj4(b) log4l(c) log4i(d) log4j |
|
Answer» Correct choice is (d) log4j For explanation I would say: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation. |
|
| 74. |
Which of the following data type is supported by Hive?(a) map(b) record(c) string(d) enum |
|
Answer» The correct option is (d) enum The explanation: Hive has no concept of enums. |
|
| 75. |
New ____________ type enables Indexing and searching of date ranges, particularly multi-valued ones.(a) RangeField(b) DateField(c) DateRangeField(d) All of the mentioned |
|
Answer» The correct answer is (c) DateRangeField Explanation: A new ExitableDirectoryReader extends FilterDirectoryReader and enables exiting requests that take too long to enumerate over terms. |
|
| 76. |
Mahout provides ____________ libraries for common and primitive Java collections.(a) Java(b) Javascript(c) Perl(d) Python |
|
Answer» Correct option is (a) Java The explanation: Maths operations are focused on linear algebra and statistics. |
|
| 77. |
__________ has the world’s largest Hadoop cluster.(a) Apple(b) Datamatics(c) Facebook(d) None of the mentioned |
|
Answer» Correct answer is (c) Facebook To explain I would say: Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousing. |
|
| 78. |
______________ class allows the Map/Reduce framework to partition the map outputs based on certain key fields, not the whole keys.(a) KeyFieldPartitioner(b) KeyFieldBasedPartitioner(c) KeyFieldBased(d) None of the mentioned |
|
Answer» Right choice is (b) KeyFieldBasedPartitioner For explanation I would say: The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting. |
|
| 79. |
Lucene provides scalable, high-Performance indexing over ______ per hour on modern hardware.(a) 1 TB(b) 150GB(c) 10 GB(d) None of the mentioned |
|
Answer» Right choice is (b) 150GB Easy explanation: Lucene offers powerful features through a simple API. |
|
| 80. |
A __________ represents a distributed, immutable collection of elements of type T.(a) PCollect(b) PCollection(c) PCol(d) All of the mentioned |
|
Answer» Right choice is (b) PCollection To elaborate: PCollection |
|
| 81. |
Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system.(a) Hive(b) Pig(c) MapReduce(d) All of the mentioned |
|
Answer» Correct choice is (c) MapReduce Easy explanation: Hadoop Archives is exposed as a file system MapReduce will be able to use all the logical input files in Hadoop Archives as input. |
|
| 82. |
___________ property allows us to specify a custom dir location pattern for all the writes, and will interpolate each variable.(a) hcat.dynamic.partitioning.custom.pattern(b) hcat.append.limit(c) hcat.pig.storer.external.location(d) hcatalog.hive.client.cache.expiry.time |
|
Answer» Correct answer is (a) hcat.dynamic.partitioning.custom.pattern Best explanation: hcat.append.limit allows an HCatalog user to specify a custom append limit. |
|
| 83. |
Apache _________ provides direct queries on self-describing and semi-structured data in files.(a) Drill(b) Mahout(c) Oozie(d) All of the mentioned |
|
Answer» Right answer is (a) Drill For explanation: Users can explore live data on their own as it arrives versus spending weeks or months on data preparation, modeling, ETL and subsequent schema management. |
|
| 84. |
Nodes in the config _____________ must be completed successfully.(a) oozie.wid.rerun.skip.nodes(b) oozie.wf.rerun.skip.nodes(c) oozie.wf.run.skip.nodes(d) all of the mentioned |
|
Answer» The correct option is (b) oozie.wf.rerun.skip.nodes Easiest explanation: If no configuration is passed, existing coordinator/workflow configuration will be used. |
|
| 85. |
Point out the wrong statement.(a) Falcon promotes Javascript Programming(b) Falcon does not do any heavy lifting but delegates to tools with in the Hadoop ecosystem(c) Falcon handles retry logic and late data processing. Records audit, lineage and metrics(d) All of the mentioned |
|
Answer» Right answer is (a) Falcon promotes Javascript Programming Best explanation: Falcon promotes Polyglot Programming. |
|
| 86. |
A __________ in a social graph is a group of people who interact frequently with each other and less frequently with others.(a) semi-cluster(b) partial cluster(c) full cluster(d) none of the mentioned |
|
Answer» Correct option is (a) semi-cluster To explain: semi-cluster is different from ordinary clustering in the sense that a vertex may belong to more than one semi-cluster. |
|
| 87. |
__________ is a REST API for HCatalog.(a) WebHCat(b) WbHCat(c) InpHCat(d) None of the mentioned |
|
Answer» Right choice is (a) WebHCat To explain: REST stands for “representational state transfer”, a style of API based on HTTP verbs. |
|
| 88. |
Hive, Pig, and Cascading all use a _________ data model.(a) value centric(b) columnar(c) tuple-centric(d) none of the mentioned |
|
Answer» The correct choice is (c) tuple-centric The best I can explain: Crunch allows developers considerable flexibility in how they represent their data, which makes Crunch the best pipeline platform for developers. |
|
| 89. |
____________ is a distributed machine learning framework on top of Spark.(a) MLlib(b) Spark Streaming(c) GraphX(d) RDDs |
|
Answer» Correct answer is (a) MLlib Explanation: MLlib implements many common machine learning and statistical algorithms to simplify large scale machine learning pipelines. |
|
| 90. |
Hive also support custom extensions written in ____________(a) C#(b) Java(c) C(d) C++ |
|
Answer» The correct answer is (b) Java Easiest explanation: Hive also supports custom extensions written in Java, including user-defined functions (UDFs) and serializer-deserializers for reading and optionally writing custom formats. |
|
| 91. |
Point out the wrong statement.(a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering(b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering(c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate(d) All of the mentioned |
|
Answer» Correct answer is (a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering Best explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools. |
|
| 92. |
________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.(a) Pig Latin(b) Oozie(c) Pig(d) Hive |
|
Answer» Correct choice is (c) Pig Best explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs. |
|
| 93. |
________________ is complete FTP Server based on Mina I/O system.(a) Giraph(b) Gereition(c) FtpServer(d) Oozie |
|
Answer» The correct choice is (c) FtpServer For explanation I would say: Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework. |
|
| 94. |
___________ provides Java-based indexing and search technology.(a) Solr(b) Lucene Core(c) Lucy(d) All of the mentioned |
|
Answer» Correct answer is (b) Lucene Core The best I can explain: Lucene provides spellchecking, hit highlighting and advanced analysis/tokenization capabilities. |
|
| 95. |
_____________ is a software distribution framework based on OSGi.(a) ACE(b) Abdera(c) Zeppelin(d) Accumulo |
|
Answer» Correct answer is (a) ACE Easy explanation: ACE allows you to manage and distribute artifacts. |
|
| 96. |
Falcon provides ___________ workflow for copying data from source to target.(a) recurring(b) investment(c) data(d) none of the mentioned |
|
Answer» Right option is (a) recurring Best explanation: Falcon instruments workflows for dependencies, retry logic, Table/Partition registration, notifications, etc. |
|
| 97. |
Falcon promotes decoupling of data set location from ___________ definition.(a) Oozie(b) Impala(c) Kafka(d) Thrift |
|
Answer» The correct option is (a) Oozie Explanation: Falcon uses declarative processing with simple directives enabling rapid prototyping. |
|
| 98. |
Distributed Mode are mapped in the __________ file.(a) groomservers(b) grervers(c) grsvers(d) groom |
|
Answer» The correct choice is (a) groomservers To explain I would say: Distributed Mode is used when you have multiple machines. |
|
| 99. |
Workflow with id __________ should be in SUCCEEDED/KILLED/FAILED.(a) wfId(b) iUD(c) iFD(d) all of the mentioned |
|
Answer» The correct option is (a) wfId The best I can explain: Workflow with id wfId should exist. |
|
| 100. |
Point out the wrong statement.(a) Storm is difficult and can be used with only Java(b) Storm is fast: a benchmark clocked it at over a million tuples processed per second per node(c) Storm is scalable, fault-tolerant, guarantees your data will be processed(d) All of the mentioned |
|
Answer» Right choice is (a) Storm is difficult and can be used with only Java The best explanation: Storm is simple, can be used with any programming language. |
|