Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

51.

Apache Hadoop Development Tools is an effort undergoing incubation at _________(a) ADF(b) ASF(c) HCC(d) AFS

Answer» Right answer is (b) ASF

The explanation is: The Apache Software Foundation(ASF) is sponsored by the Apache Incubator PMC.
52.

__________  method tells LoadFunc which fields are required in the Pig script.(a) pushProjection()(b) relativeToAbsolutePath()(c) prepareToRead()(d) none of the mentioned

Answer» Right choice is (a) pushProjection()

To explain: Pig will use the column index requiredField.index to communicate with the LoadFunc about the fields required by the Pig script.
53.

_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.(a) Reduce(b) Map(c) Reducer(d) All of the mentioned

Answer» The correct option is (a) Reduce

To explain I would say: Reduce function collates the work and resolves the results.
54.

The output of the _______ is not sorted in the Mapreduce framework for Hadoop.(a) Mapper(b) Cascader(c) Scalding(d) None of the mentioned

Answer» Right choice is (d) None of the mentioned

Best explanation: The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.
55.

GraphX provides an API for expressing graph computation that can model the __________ abstraction.(a) GaAdt(b) Spark Core(c) Pregel(d) None of the mentioned

Answer» The correct option is (c) Pregel

To elaborate: GraphX is used for machine learning.
56.

Point out the wrong statement.(a) ConcurScheduler detects whether the index is on SSD or not(b) Memory index supports payloads(c) Auto-IO-throttling has been added to ConcurrentMergeScheduler, to rate limit IO writes for each merge depending on incoming merge rate(d) The default codec has an option to control BEST_SPEED or BEST_COMPRESSION for stored fields

Answer» Right choice is (a) ConcurScheduler detects whether the index is on SSD or not

To explain: ConcurrentMergeScheduler does a better job defaulting its settings.
57.

Spark is packaged with higher level libraries, including support for _________ queries.(a) SQL(b) C(c) C++(d) None of the mentioned

Answer» Right answer is (a) SQL

The explanation: Standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.
58.

The ___________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.(a) DataCache(b) DistributedData(c) DistributedCache(d) All of the mentioned

Answer» Correct choice is (c) DistributedCache

To explain I would say: The child-jvm always has its current working directory added to the java.library.path and LD_LIBRARY_PATH.
59.

The _____________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.(a) DistributedLog(b) DistributedCache(c) DistributedJars(d) None of the mentioned

Answer» Correct choice is (b) DistributedCache

Easiest explanation: Cached libraries can be loaded via System.loadLibrary or System.load.
60.

____________ method enables the RecordReader associated with the InputFormat provided by the LoadFunc is passed to the LoadFunc.(a) getNext()(b) relativeToAbsolutePath()(c) prepareToRead()(d) all of the mentioned

Answer» The correct choice is (c) prepareToRead()

The best explanation: The RecordReader can then be used by the implementation in getNext() to return a tuple representing a record of data back to pig.
61.

Apache _________ is a project that enables development and consumption of REST style web services.(a) Wives(b) Wink(c) Wig(d) All of the mentioned

Answer» Right choice is (b) Wink

To explain: The core server runtime is based on the JAX-RS (JSR 311) standard.
62.

The output descriptor for the table to be written is created by calling ____________(a) OutputJobInfo.describe(b) OutputJobInfo.create(c) OutputJobInfo.put(d) None of the mentioned

Answer» Correct choice is (b) OutputJobInfo.create

To explain I would say: The implementation of Map takes HCatRecord as an input and the implementation of Reduce produces it as an output.
63.

Point out the correct statement.(a) Mahout is distributed under a commercially friendly Apache Software license(b) Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm(c) Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms(d) None of the mentioned

Answer» Correct choice is (d) None of the mentioned

The explanation is: The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases.
64.

PostingsFormat now uses a __________ API when writing postings, just like doc values.(a) push(b) pull(c) read(d) all of the mentioned

Answer» Right answer is (b) pull

To explain: This is powerful because you can do things in your postings format that require making more than one pass through the postings such as iterating over all postings.
65.

SolrJ now has first class support for __________ API.(a) Compactions(b) Collections(c) Distribution(d) All of the mentioned

Answer» Right option is (b) Collections

Explanation: Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene.
66.

_________ is the output produced by TextOutputFor mat, Hadoop default OutputFormat.(a) KeyValueTextInputFormat(b) KeyValueTextOutputFormat(c) FileValueTextInputFormat(d) All of the mentioned

Answer» The correct answer is (b) KeyValueTextOutputFormat

Explanation: To interpret such files correctly, KeyValueTextInputFormat is appropriate.
67.

Which of the following class provides a subset of features provided by the Unix/GNU Sort?(a) KeyFieldBased(b) KeyFieldComparator(c) KeyFieldBasedComparator(d) All of the mentioned

Answer» Correct choice is (c) KeyFieldBasedComparator

To explain I would say: Hadoop has a library class, KeyFieldBasedComparator, that is useful for many applications.
68.

___________ executes the pipeline as a series of MapReduce jobs.(a) SparkPipeline(b) MRPipeline(c) MemPipeline(d) None of the mentioned

Answer» The correct answer is (b) MRPipeline

For explanation I would say: Every Crunch data pipeline is coordinated by an instance of the Pipeline interface.
69.

__________ is a log collection and correlation software with reporting and alarming functionalities.(a) Lucene(b) ALOIS(c) Imphal(d) None of the mentioned

Answer» The correct answer is (b) ALOIS

Explanation: This Project activity is transferred to another Incubator project – ODE.
70.

_____________ will skip the nodes given in the config with the same exit transition as before.(a) ActionMega handler(b) Action handler(c) Data handler(d) None of the mentioned

Answer» The correct answer is (b) Action handler

Best explanation: Currently there is no way to remove an existing configuration but only override by passing a different value in the input configuration.
71.

Drill analyze semi-structured/nested data coming from _________ applications.(a) RDBMS(b) NoSQL(c) NewSQL(d) None of the mentioned

Answer» Correct choice is (b) NoSQL

The best I can explain: Modern big data applications such as social, mobile, web and IoT deal with a larger number of users and larger amount of data than the traditional transactional applications.
72.

Point out the wrong statement.(a) Oozie provides a unique callback URL to the task, the task should invoke the given URL to notify its completion(b) All computation/processing tasks triggered by an mechanism node are remote to Oozie(c) Oozie workflows can be parameterized(d) None of the mentioned

Answer» The correct answer is (b) All computation/processing tasks triggered by an mechanism node are remote to Oozie

Easy explanation: All computation/processing tasks are executed by Hadoop Map/Reduce framework.
73.

Hive uses _________ for logging.(a) logj4(b) log4l(c) log4i(d) log4j

Answer» Correct choice is (d) log4j

For explanation I would say: By default Hive will use hive-log4j.default in the conf/ directory of the Hive installation.
74.

Which of the following data type is supported by Hive?(a) map(b) record(c) string(d) enum

Answer» The correct option is (d) enum

The explanation: Hive has no concept of enums.
75.

New ____________ type enables Indexing and searching of date ranges, particularly multi-valued ones.(a) RangeField(b) DateField(c) DateRangeField(d) All of the mentioned

Answer» The correct answer is (c) DateRangeField

Explanation: A new ExitableDirectoryReader extends FilterDirectoryReader and enables exiting requests that take too long to enumerate over terms.
76.

Mahout provides ____________ libraries for common  and primitive Java collections.(a) Java(b) Javascript(c) Perl(d) Python

Answer» Correct option is (a) Java

The explanation: Maths operations are focused on linear algebra and statistics.
77.

__________ has the world’s largest Hadoop cluster.(a) Apple(b) Datamatics(c) Facebook(d) None of the mentioned

Answer» Correct answer is (c) Facebook

To explain I would say: Facebook has many Hadoop clusters, the largest among them is the one that is used for Data warehousing.
78.

______________ class allows the Map/Reduce framework to partition the map outputs based on certain key fields, not the whole keys.(a) KeyFieldPartitioner(b) KeyFieldBasedPartitioner(c) KeyFieldBased(d) None of the mentioned

Answer» Right choice is (b) KeyFieldBasedPartitioner

For explanation I would say: The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting.
79.

Lucene provides scalable, high-Performance indexing over ______  per hour on modern hardware.(a) 1 TB(b) 150GB(c) 10 GB(d) None of the mentioned

Answer» Right choice is (b) 150GB

Easy explanation: Lucene offers powerful features through a simple API.
80.

A __________ represents a distributed, immutable collection of elements of type T.(a) PCollect(b) PCollection(c) PCol(d) All of the mentioned

Answer» Right choice is (b) PCollection

To elaborate: PCollection provides a method, parallelDo, that applies a DoFn to each element in the PCollection.
81.

Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system.(a) Hive(b) Pig(c) MapReduce(d) All of the mentioned

Answer» Correct choice is (c) MapReduce

Easy explanation: Hadoop Archives is exposed as a file system MapReduce will be able to use all the logical input files in Hadoop Archives as input.
82.

___________  property allows us to specify a custom dir location pattern for all the writes, and will interpolate each variable.(a) hcat.dynamic.partitioning.custom.pattern(b) hcat.append.limit(c) hcat.pig.storer.external.location(d) hcatalog.hive.client.cache.expiry.time

Answer» Correct answer is (a) hcat.dynamic.partitioning.custom.pattern

Best explanation: hcat.append.limit allows an HCatalog user to specify a custom append limit.
83.

Apache _________ provides direct queries on self-describing and semi-structured data in files.(a) Drill(b) Mahout(c) Oozie(d) All of the mentioned

Answer» Right answer is (a) Drill

For explanation: Users can explore live data on their own as it arrives versus spending weeks or months on data preparation, modeling, ETL and subsequent schema management.
84.

Nodes in the config _____________  must be completed successfully.(a) oozie.wid.rerun.skip.nodes(b) oozie.wf.rerun.skip.nodes(c) oozie.wf.run.skip.nodes(d) all of the mentioned

Answer» The correct option is (b) oozie.wf.rerun.skip.nodes

Easiest explanation: If no configuration is passed, existing coordinator/workflow configuration will be used.
85.

Point out the wrong statement.(a) Falcon promotes Javascript Programming(b) Falcon does not do any heavy lifting but delegates to tools with in the Hadoop ecosystem(c) Falcon handles retry logic and late data processing. Records audit, lineage and metrics(d) All of the mentioned

Answer» Right answer is (a) Falcon promotes Javascript Programming

Best explanation: Falcon promotes Polyglot Programming.
86.

A __________ in a social graph is a group of people who interact frequently with each other and less frequently with others.(a) semi-cluster(b) partial cluster(c) full cluster(d) none of the mentioned

Answer» Correct option is (a) semi-cluster

To explain: semi-cluster is different from ordinary clustering in the sense that a vertex may belong to more than one semi-cluster.
87.

__________ is a REST API for HCatalog.(a) WebHCat(b) WbHCat(c) InpHCat(d) None of the mentioned

Answer» Right choice is (a) WebHCat

To explain: REST stands for “representational state transfer”, a style of API based on HTTP verbs.
88.

Hive, Pig, and Cascading all use a _________ data model.(a) value centric(b) columnar(c) tuple-centric(d) none of the mentioned

Answer» The correct choice is (c) tuple-centric

The best I can explain: Crunch allows developers considerable flexibility in how they represent their data, which makes Crunch the best pipeline platform for developers.
89.

____________ is a distributed machine learning framework on top of Spark.(a) MLlib(b) Spark Streaming(c) GraphX(d) RDDs

Answer» Correct answer is (a) MLlib

Explanation: MLlib implements many common machine learning and statistical algorithms to simplify large scale machine learning pipelines.
90.

Hive also support custom extensions written in ____________(a) C#(b) Java(c) C(d) C++

Answer» The correct answer is (b) Java

Easiest explanation: Hive also supports custom extensions written in Java, including user-defined functions (UDFs) and serializer-deserializers for reading and optionally writing custom formats.
91.

Point out the wrong statement.(a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering(b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering(c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate(d) All of the mentioned

Answer» Correct answer is (a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering

Best explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools.
92.

________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.(a) Pig Latin(b) Oozie(c) Pig(d) Hive

Answer» Correct choice is (c) Pig

Best explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs.
93.

________________ is complete FTP Server based on Mina I/O system.(a) Giraph(b) Gereition(c) FtpServer(d) Oozie

Answer» The correct choice is (c) FtpServer

For explanation I would say: Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
94.

___________ provides Java-based indexing and search technology.(a) Solr(b) Lucene Core(c) Lucy(d) All of the mentioned

Answer» Correct answer is (b) Lucene Core

The best I can explain: Lucene provides spellchecking, hit highlighting and advanced analysis/tokenization capabilities.
95.

_____________ is a software distribution framework based on OSGi.(a) ACE(b) Abdera(c) Zeppelin(d) Accumulo

Answer» Correct answer is (a) ACE

Easy explanation: ACE allows you to manage and distribute artifacts.
96.

Falcon provides ___________ workflow for copying data from source to target.(a) recurring(b) investment(c) data(d) none of the mentioned

Answer» Right option is (a) recurring

Best explanation: Falcon instruments workflows for dependencies, retry logic, Table/Partition registration, notifications, etc.
97.

Falcon promotes decoupling of data set location from ___________ definition.(a) Oozie(b) Impala(c) Kafka(d) Thrift

Answer» The correct option is (a) Oozie

Explanation: Falcon uses declarative processing with simple directives enabling rapid prototyping.
98.

Distributed Mode are mapped in the __________ file.(a) groomservers(b) grervers(c) grsvers(d) groom

Answer» The correct choice is (a) groomservers

To explain I would say: Distributed Mode is used when you have multiple machines.
99.

Workflow with id __________ should be in SUCCEEDED/KILLED/FAILED.(a) wfId(b) iUD(c) iFD(d) all of the mentioned

Answer» The correct option is (a) wfId

The best I can explain: Workflow with id wfId should exist.
100.

Point out the wrong statement.(a) Storm is difficult and can be used with only Java(b) Storm is fast: a benchmark clocked it at over a million tuples processed per second per node(c) Storm is scalable, fault-tolerant, guarantees your data will be processed(d) All of the mentioned

Answer» Right choice is (a) Storm is difficult and can be used with only Java

The best explanation: Storm is simple, can be used with any programming language.