Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

351.

__________ is a fully integrated, state-of-the-art analytic database architected specifically to leverage strengths of Hadoop.(a) Oozie(b) Impala(c) Lucene(d) BigTop

Answer» Correct answer is (b) Impala

To elaborate: Impala provides scalability and flexibility to hadoop.
352.

Which of the following companies shipped Impala?(a) Amazon(b) Oracle(c) MapR(d) All of the mentioned

Answer» Correct option is (d) All of the mentioned

The explanation is: Impala is shipped by Cloudera, MapR, Oracle, and Amazon.
353.

Amazon EMR uses Hadoop processing combined with several __________  products.(a) AWS(b) ASQ(c) AMR(d) AWES

Answer» Correct answer is (a) AWS

For explanation I would say: Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to process large amounts of data efficiently.
354.

Impala is integrated with native Hadoop security and Kerberos for authentication via __________ module.(a) Sentinue(b) Sentry(c) Sentinar(d) All of the mentioned

Answer» Correct option is (b) Sentry

Explanation: Via the Sentry module, you can ensure that the right users and applications are authorized for the right data.
355.

The Amazon EMR default input format for Hive is __________(a) org.apache.hadoop.hive.ql.io.CombineHiveInputFormat(b) org.apache.hadoop.hive.ql.iont.CombineHiveInputFormat(c) org.apache.hadoop.hive.ql.io.CombineFormat(d) All of the mentioned

Answer» Right option is (a) org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

The best I can explain: You can specify the hive.base.inputformat option in Hive to select a different file format,
356.

Impala on Amazon EMR requires _________ running Hadoop 2.x or greater.(a) AMS(b) AMI(c) AWR(d) All of the mentioned

Answer» Correct choice is (b) AMI

The explanation: Impala is an open source tool in the Hadoop ecosystem for interactive, ad hoc querying using SQL syntax.
357.

InfoSphere ___________ provides you with the ability to flexibly meet your unique information integration requirements.(a) Data Server(b) Information Server(c) Info Server(d) All of the mentioned

Answer» Right answer is (b) Information Server

To explain I would say: IBM InfoSphere Information Server is a market-leading data integration platform which includes a family of products that enable you to understand, cleanse, monitor, transform, and deliver data.
358.

Spark is engineered from the bottom-up for performance, running ___________ faster than Hadoop by exploiting in memory computing and other optimizations.(a) 100x(b) 150x(c) 200x(d) None of the mentioned

Answer» Right choice is (a) 100x

The best explanation: Spark is fast on disk too; it currently holds the world record in large scale on-disk sorting.
359.

Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you to process text data like the unix ______ utility.(a) Copy(b) Cut(c) Paste(d) Move

Answer» Correct option is (b) Cut

The best I can explain: The map function defined in the class treats each input key/value pair as a list of fields.
360.

Point out the correct statement.(a) The sequence file also can contain a “secondary” key-value list that can be used as file Metadata(b) SequenceFile formats share a header that contains some information which allows the reader to recognize is format(c) There’re Key and Value Class Name’s that allow the reader to instantiate those classes, via reflection, for reading(d) All of the mentioned

Answer» Correct option is (d) All of the mentioned

The best I can explain: In contrast with other persistent key-value data structures like B-Trees, you can’t seek to specified key editing, adding or removing it.
361.

How many formats of SequenceFile are present in Hadoop I/O?(a) 2(b) 3(c) 4(d) 5

Answer» Right choice is (b) 3

For explanation I would say: SequenceFile has 3 available formats: An “Uncompressed” format, a “Record Compressed” format and a “Block-Compressed”.
362.

Apache Hadoop ___________ provides a persistent data structure for binary key-value pairs.(a) GetFile(b) SequenceFile(c) Putfile(d) All of the mentioned

Answer» Correct option is (b) SequenceFile

Explanation: SequenceFile is append-only.
363.

Hadoop ___________ is a utility to support running external map and reduce jobs.(a) Orchestration(b) Streaming(c) Collection(d) All of the mentioned

Answer» Right choice is (b) Streaming

The best explanation: These external jobs can be written in various programming languages such as Python or Ruby.
364.

___________ was created to allow you to flow data from a source into your Hadoop environment.(a) Imphala(b) Oozie(c) Flume(d) All of the mentioned

Answer» Right choice is (c) Flume

Explanation: In Flume, the entities you work with are called sources, decorators, and sinks.
365.

Point out the wrong statement.(a) TusCAN ia Service Component Architecture implementation(b) Tob is a JSF based framework for web-applications(c) Traffic is a scalable and extensible HTTP proxy server and cache(d) None of the mentioned

Answer» The correct choice is (a) TusCAN ia Service Component Architecture implementation

Best explanation: Tuscany is used for service Component Architecture implementation.
366.

___________ is a distributed data warehouse system for Hadoop.(a) Stratos(b) Tajo(c) Sqoop(d) Lucene

Answer» Correct choice is (b) Tajo

The best explanation: Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
367.

Many people use Kafka as a replacement for a ___________ solution.(a) log aggregation(b) compaction(c) collection(d) all of the mentioned

Answer» Correct answer is (a) log aggregation

Best explanation: Log aggregation typically collects physical log files off servers and puts them in a central place.
368.

Kafka uses key-value pairs in the ____________ file format for configuration.(a) RFC(b) Avro(c) Property(d) None of the mentioned

Answer» Right option is (c) Property

For explanation I would say: These key values can be supplied either from a file or programmatically.
369.

Point out the wrong statement.(a) CDH contains the main, core elements of Hadoop(b) In October 2012, Cloudera announced the Cloudera Impala project(c) CDH may be downloaded from Cloudera’s website at no charge(d) None of the mentioned

Answer» The correct option is (d) None of the mentioned

To explain: CDH may be downloaded from Cloudera’s website with no technical support nor Cloudera Manager.
370.

Point out the wrong statement.(a) InfoSphere DataStage also facilitates extended metadata management and enterprise connectivity(b) Real-Time Integration pack can turn server or parallel jobs into SOA services(c) In 2012 the suite was renamed to InfoSphere Information Server and the product was renamed to InfoSphere DataStage(d) None of the mentioned

Answer» The correct option is (c) In 2012 the suite was renamed to InfoSphere Information Server and the product was renamed to InfoSphere DataStage

Easiest explanation: In 2006 the product was released as part of the IBM Information Server under the Information Management family but was still known as WebSphere DataStage.
371.

The ________ method in the ModelCountReducer class “reduces” the values the mapper collects into a derived value.(a) count(b) add(c) reduce(d) all of the mentioned

Answer» Correct option is (c) reduce

Explanation: In some cases, it can be a simple sum of the values.
372.

____________ is an open-source version control system.(a) Stratos(b) Kafka(c) Sqoop(d) Subversion

Answer» The correct answer is (d) Subversion

Best explanation: Subversion contains lot of features for hadoop.
373.

To configure short-circuit local reads, you will need to enable ____________ on local Hadoop.(a) librayhadoop(b) libhadoop(c) libhad(d) none of the mentioned

Answer» Correct choice is (b) libhadoop

Explanation: Short-circuit reads make use of a UNIX domain socket.
374.

__________ is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications.(a) Wave(b) Twill(c) Usergrid(d) None of the mentioned

Answer» Right choice is (b) Twill

To elaborate: Twill allows developers to focus more on their business logic.
375.

Kafka maintains feeds of messages in categories called __________(a) topics(b) chunks(c) domains(d) messages

Answer» Correct answer is (a) topics

The explanation is: We’ll call processes that publish messages to Kafka topic producers.
376.

Applications can use the _________ provided to report progress or just indicate that they are alive.(a) Collector(b) Reporter(c) Dashboard(d) None of the mentioned

Answer» Right choice is (b) Reporter

To explain I would say: In scenarios where the application takes a significant amount of time to process individual key/value pairs, this is crucial since the framework might assume that the task has timed-out and kill that task.
377.

Point out the wrong statement.(a) The Mapper outputs are sorted and then partitioned per Reducer(b) The total number of partitions is the same as the number of reduce tasks for the job(c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format(d) None of the mentioned

Answer» Right choice is (d) None of the mentioned

Easiest explanation: All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output.
378.

The right number of reduces seems to be ____________(a) 0.90(b) 0.80(c) 0.36(d) 0.95

Answer» Right answer is (d) 0.95

The best explanation: The right number of reduces seems to be 0.95 or 1.75.
379.

The number of maps is usually driven by the total size of ____________(a) inputs(b) outputs(c) tasks(d) None of the mentioned

Answer» Correct option is (a) inputs

To explain I would say: Total size of inputs means the total number of blocks of the input files.
380.

Cloudera ___________ includes CDH and an annual subscription license (per node) to Cloudera Manager and technical support.(a) Enterprise(b) Express(c) Standard(d) All of the mentioned

Answer» Correct option is (a) Enterprise

Easiest explanation: CDH includes the core elements of Apache Hadoop plus several additional key open source projects.
381.

The number of reduces for the job is set by the user via _________(a) JobConf.setNumTasks(int)(b) JobConf.setNumReduceTasks(int)(c) JobConf.setNumMapTasks(int)(d) All of the mentioned

Answer» Correct choice is (b) JobConf.setNumReduceTasks(int)

To elaborate: Reducer has 3 primary phases: shuffle, sort and reduce.
382.

__________ maps input key/value pairs to a set of intermediate key/value pairs.(a) Mapper(b) Reducer(c) Both Mapper and Reducer(d) None of the mentioned

Answer» Right option is (a) Mapper

The explanation is: Maps are the individual tasks that transform input records into intermediate records.
383.

Applications can use the ____________ to report progress and set application-level status messages.(a) Partitioner(b) OutputSplit(c) Reporter(d) All of the mentioned

Answer» The correct answer is (c) Reporter

Best explanation: Reporter is also used to update Counters, or just indicate that they are alive.
384.

The right level of parallelism for maps seems to be around _________ maps per-node.(a) 1-10(b) 10-100(c) 100-150(d) 150-200

Answer» Right answer is (b) 10-100

The best explanation: Task setup takes a while, so it is best if the maps take at least a minute to execute.
385.

Point out the wrong statement.(a) If V2 and V3 are the same, you only need to use setOutputValueClass()(b) The overall effect of Streaming job is to perform a sort of the input(c) A Streaming application can control the separator that is used when a key-value pair is turned into a series of bytes and sent to the map or reduce process over standard input(d) None of the mentioned

Answer» The correct answer is (d) None of the mentioned

Explanation: If a combine function is used then it is the same form as the reduce function, except its output types are the intermediate key and value types (K2 and V2), so they can feed the reduce function.
386.

An input _________ is a chunk of the input that is processed by a single map.(a) textformat(b) split(c) datanode(d) all of the mentioned

Answer» The correct option is (b) split

For explanation: Each split is divided into records, and the map processes each record—a key-value pair—in turn.
387.

Point out the correct statement.(a) The reduce input must have the same types as the map output, although the reduce output types may be different again(b) The map input key and value types (K1 and V1) are different from the map output types(c) The partition function operates on the intermediate key(d) All of the mentioned

Answer» Right answer is (d) All of the mentioned

To elaborate: In practice, the partition is determined solely by the key (the value is ignored).
388.

__________ is a variant of SequenceFileInputFormat that converts the sequence file’s keys and values to Text objects.(a) SequenceFile(b) SequenceFileAsTextInputFormat(c) SequenceAsTextInputFormat(d) All of the mentioned

Answer» Correct choice is (b) SequenceFileAsTextInputFormat

Best explanation: With multiple reducers, records will be allocated evenly across reduce tasks, with all records that share the same key being processed by the same reduce task.
389.

___________ generates keys of type LongWritable and values of type Text.(a) TextOutputFormat(b) TextInputFormat(c) OutputInputFormat(d) None of the mentioned

Answer» Correct answer is (b) TextInputFormat

For explanation I would say: If K2 and K3 are the same, you don’t need to call setMapOutputKeyClass().
390.

With ______ we can store data and read it easily with various programming languages.(a) Thrift(b) Protocol Buffers(c) Avro(d) None of the mentioned

Answer» The correct option is (c) Avro

The explanation: Avro is optimized to minimize the disk space needed by our data and it is flexible.
391.

NiFi is a dataflow system based on the concepts of ________ programming.(a) structured(b) relational(c) set(d) flow-based

Answer» Right option is (d) flow-based

Explanation: NiFi is incubator made by Billie Rinaldi.
392.

____________ is a query processing and optimization system for large-scale.(a) MRQL(b) NiFi(c) OpenAz(d) ODF Toolkit

Answer» Correct answer is (a) MRQL

For explanation: MRQL is built on top of Apache Hadoop, Hama, Spark, and Flink.
393.

Point out the wrong statement.(a) Hadoop has a library package called Aggregate(b) Aggregate allows you to define a mapper plugin class that is expected to generate “aggregatable items” for each input key/value pair of the mappers(c) To use Aggregate, simply specify “-mapper aggregate”(d) None of the mentioned

Answer» Correct option is (c) To use Aggregate, simply specify “-mapper aggregate”

The best I can explain: To use Aggregate, simply specify “-reducer aggregate”
394.

________ is a columnar storage format for Hadoop.(a) MRQL(b) NiFi(c) OpenAz(d) Parquet

Answer» Right choice is (d) Parquet

The explanation is: NiFi is a dataflow system based on the concepts of flow-based programming.
395.

__________ is a columnar storage format for Hadoop.(a) Ranger(b) Parquet(c) REEF(d) None of the mentioned

Answer» The correct option is (b) Parquet

The explanation is: The Ranger project is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
396.

To set an environment variable in a streaming command use ____________(a) -cmden EXAMPLE_DIR=/home/example/dictionaries/(b) -cmdev EXAMPLE_DIR=/home/example/dictionaries/(c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/(d) -cmenv EXAMPLE_DIR=/home/example/dictionaries/

Answer» Correct answer is (c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/

For explanation I would say: Environment Variable is set using cmdenv command.
397.

Which of the following is only for storage with limited compute?(a) Hot(b) Cold(c) Warm(d) All_SSD

Answer» The correct choice is (b) Cold

Easiest explanation: When a block is cold, all replicas are stored in the ARCHIVE.
398.

During the execution of a streaming job, the names of the _______ parameters are transformed.(a) vmap(b) mapvim(c) mapreduce(d) mapred

Answer» Right answer is (d) mapred

For explanation: To get the values in a streaming job’s mapper/reducer use the parameter names with the underscores.
399.

When a block is warm, some of its replicas are stored in DISK and the remaining replicas are stored in _________(a) ROM_DISK(b) ARCHIVE(c) RAM_DISK(d) All of the mentioned

Answer» Correct answer is (b) ARCHIVE

Easiest explanation: Warm storage policy is partially hot and partially cold.
400.

The standard output (stdout) and error (stderr) streams of the task are read by the TaskTracker and logged to _________(a) ${HADOOP_LOG_DIR}/user(b) ${HADOOP_LOG_DIR}/userlogs(c) ${HADOOP_LOG_DIR}/logs(d) None of the mentioned

Answer» Correct answer is (b) ${HADOOP_LOG_DIR}/userlogs

The explanation: The child-jvm always has its current working directory added to the java.library.path and LD_LIBRARY_PATH.