Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

301.

The queue definitions and properties such as ________ ACLs can be changed, at runtime.(a) tolerant(b) capacity(c) speed(d) all of the mentioned

Answer» Correct option is (b) capacity

Best explanation: Administrators can add additional queues at runtime, but queues cannot be deleted at runtime.
302.

Point out the correct statement.(a) When there is enough space, block replicas are stored according to the storage type list(b) One_SSD is used for storing all replicas in SSD(c) Hot policy is useful only for single replica blocks(d) All of the mentioned

Answer» Correct option is (a) When there is enough space, block replicas are stored according to the storage type list

The best explanation: The first phase of Heterogeneous Storage changed datanode storage model from a single storage.
303.

_________ is a  data migration tool added for archiving data.(a) Mover(b) Hiver(c) Serde(d) None of the mentioned

Answer» Right option is (a) Mover

The best explanation: Mover periodically scans the files in HDFS to check if the block placement satisfies the storage policy.
304.

The __________ is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.(a) Manager(b) Master(c) Scheduler(d) None of the mentioned

Answer» The correct choice is (c) Scheduler

The best explanation: The Scheduler is a pure scheduler in the sense that it performs no monitoring or tracking of status for the application.
305.

Point out the correct statement.(a) Each queue has strict ACLs which controls which users can submit applications to individual queues(b) Hierarchy of queues is supported to ensure resources are shared among the sub-queues of an organization(c) Queues are allocated a fraction of the capacity of the grid in the sense that a certain capacity of resources will be at their disposal(d) All of the mentioned

Answer» Right option is (d) All of the mentioned

Explanation: All applications submitted to a queue will have access to the capacity allocated to the queue.
306.

____________ is used for storing one of the replicas in SSD.(a) Hot(b) Lazy_Persist(c) One_SSD(d) All_SSD

Answer» Correct option is (c) One_SSD

Best explanation: The remaining replicas are stored in DISK.
307.

__________ storage is a solution to decouple growing storage capacity from compute capacity.(a) DataNode(b) Archival(c) Policy(d) None of the mentioned

Answer» Correct choice is (b) Archival

The explanation is: Nodes with higher density and less expensive storage with low compute power are becoming available.
308.

Point out the correct statement.(a) Mover is not similar to Balancer(b) hdfs dfsadmin -setStoragePolicy puts a storage policy to a file or a directory.(c) addCacheArchive add archives to be localized(d) none of the mentioned

Answer» Correct choice is (c) addCacheArchive add archives to be localized

Easiest explanation: addArchiveToClassPath(Path archive) adds an archive path to the current set of classpath entries.
309.

Yarn commands are invoked by the ________ script.(a) hive(b) bin(c) hadoop(d) home

Answer» Right choice is (b) bin

Explanation: Running the yarn script without any arguments prints the description for all commands.
310.

Which of the following is a monitoring solution for hadoop?(a) Sirona(b) Sentry(c) Slider(d) Streams

Answer» Correct choice is (a) Sirona

The best I can explain: Slider is a collection of tools and technologies to package, deploy, and manage long running applications on Apache Hadoop YARN clusters.
311.

Which of the following has high storage density?(a) ROM_DISK(b) ARCHIVE(c) RAM_DISK(d) All of the mentioned

Answer» Right answer is (b) ARCHIVE

To elaborate: Little compute power is added for supporting archival storage.
312.

Point out the correct statement.(a) You can specify any executable as the mapper and/or the reducer(b) You cannot supply a Java class as the mapper and/or the reducer(c) The class you supply for the output format should return key/value pairs of Text class(d) All of the mentioned

Answer» Correct choice is (a) You can specify any executable as the mapper and/or the reducer

The explanation: If you do not specify an input format class, the TextInputFormat is used as the default.
313.

Which of the following is used to list out the storage policies?(a) hdfs storagepolicies(b) hdfs storage(c) hd storagepolicies(d) all of the mentioned

Answer» Right option is (a) hdfs storagepolicies

The explanation: Arguments are none for the hdfs storagepolicies command.
314.

Point out the wrong statement.(a) getInstance() creates a new Job with particular cluster(b) getInstance(Configuration conf) creates a new Job with no particular Cluster and a given Configuration(c) getInstance(JobStatus status, Configuration conf) creates a new Job with no particular Cluster and given Configuration and JobStatus(d) all of the mentioned

Answer» The correct option is (a) getInstance() creates a new Job with particular cluster

The explanation is: getInstance() creates a new Job with particular cluster.
315.

________ text is appropriate for most non-binary data types.(a) Character(b) Binary(c) Delimited(d) None of the mentioned

Answer» Correct choice is (c) Delimited

The best I can explain: Delimited text is the default import format.
316.

Point out the wrong statement.(a) DoFns also have a number of helper methods for working with Hadoop Counters, all named increment(b) The Crunch APIs contain a number of useful subclasses of DoFn that handle common data processing scenarios and are easier to write and test(c) FilterFn class defines a single abstract method(d) None of the mentioned

Answer» The correct choice is (d) None of the mentioned

For explanation: Counters are an incredibly useful way of keeping track of the state of long-running data pipelines and detecting any exceptional conditions that occur during processing
317.

Point out the wrong statement.(a) OpenAz is a browser based mobile phone emulator(b) Ripple is a cross platform and cross runtime testing/debugging tool(c) Ripple currently supports such runtimes as Cordova, WebWorks and the Mobile Web(d) All of the mentioned

Answer» The correct choice is (a) OpenAz is a browser based mobile phone emulator

For explanation: Ripple is a browser based mobile phone emulator designed to aid in the development of HTML5 based mobile applications.
318.

Which of the following storage policy is used for both storage and compute?(a) Hot(b) Cold(c) Warm(d) All_SSD

Answer» Correct option is (a) Hot

Best explanation: When a block is hot, all replicas are stored in DISK.
319.

The ____________ requires that paths including and leading up to the directories specified in yarn.nodemanager.local-dirs.(a) TaskController(b) LinuxTaskController(c) LinuxController(d) None of the mentioned

Answer» The correct choice is (b) LinuxTaskController

Explanation: LinuxTaskController keeps track of all paths and directories on datanode.
320.

Point out the wrong statement.(a) A Storage policy consists of the Policy ID(b) The storage policy can be specified using the “dfsadmin -setStoragePolicy” command(c) dfs.storage.policy.enabled is used for enabling/disabling the storage policy feature(d) None of the mentioned

Answer» Correct choice is (d) None of the mentioned

Explanation: The effective storage policy can be retrieved by the “dfsadmin -getStoragePolicy” command.
321.

Which of the following statement can be used to get the storage policy of a file or a directory?(a) hdfs dfsadmin -getStoragePolicy path(b) hdfs dfsadmin -setStoragePolicy path policyName(c) hdfs dfsadmin -listStoragePolicy path policyName(d) all of the mentioned

Answer» Right choice is (a) hdfs dfsadmin -getStoragePolicy path

Easy explanation: refers to the path referring to either a directory or a file.
322.

Streaming supports streaming command options as well as _________ command options.(a) generic(b) tool(c) library(d) task

Answer» Right choice is (a) generic

The best I can explain: Place the generic options before the streaming options, otherwise the command will fail.
323.

Point out the wrong statement.(a) Avro data files are a compact, efficient binary format that provides interoperability with applications written in other programming languages(b) By default, data is compressed while importing(c) Delimited text also readily supports further manipulation by other tools, such as Hive(d) None of the mentioned

Answer» The correct answer is (b) By default, data is compressed while importing

Explanation: You can compress your data by using the deflate (gzip) algorithm with the -z or –compress argument, or specify any Hadoop compression codec using the –compression-codec argument.
324.

If you set the inline LOB limit to ________ all large objects will be placed in external storage.(a) 0(b) 1(c) 2(d) 3

Answer» Correct option is (a) 0

Explanation: The size at which lobs spill into separate files is controlled by the –inline-lob-limit argument, which takes a parameter specifying the largest lob size to keep inline, in bytes.
325.

Inline DoFn that splits a line up into words is an inner class ____________(a) Pipeline(b) MyPipeline(c) ReadPipeline(d) WritePipe

Answer» Correct option is (b) MyPipeline

For explanation I would say: Inner classes contain references to their parent outer classes, so unless MyPipeline implements the Serializable interface, the NotSerializableException will be thrown when Crunch tries to serialize the inner DoFn.
326.

________ does not support the notion of enclosing characters that may include field delimiters in the enclosed string.(a) Imphala(b) Oozie(c) Sqoop(d) Hive

Answer» The correct choice is (d) Hive

Best explanation: Even though Hive supports escaping characters, it does not handle escaping of new-line character.
327.

Point out the correct statement.(a) SAMOA provides a collection of distributed streaming algorithms(b) REEF is a cross platform and cross runtime testing/debugging tool(c) Sentry is a highly modular system for providing fine grained role(d) All of the mentioned

Answer» Correct option is (a) SAMOA provides a collection of distributed streaming algorithms

Explanation: SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression.
328.

REEF is a scale-out computing fabric that eases the development of Big Data applications are ___________(a) MRQL(b) NiFi(c) REEF(d) Ripple

Answer» Correct answer is (c) REEF

Easy explanation: REEF stands for Retainable Evaluator Execution Framework.
329.

Point out the wrong statement.(a) HDFS is designed to support small files only(b) Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously(c) NameNode can be configured to support maintaining multiple copies of the FsImage and EditLog(d) None of the mentioned

Answer» Right option is (a) HDFS is designed to support small files only

For explanation: HDFS is designed to support very large files.
330.

__________ support storing a copy of data at a particular instant of time.(a) Data Image(b) Datanots(c) Snapshots(d) All of the mentioned

Answer» The correct choice is (c) Snapshots

To elaborate: One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time.
331.

_________ command is used to copy file or directories recursively.(a) dtcp(b) distcp(c) dcp(d) distc

Answer» Correct answer is (b) distcp

The best I can explain: Usage of the distcp command: hadoop distcp .
332.

Point out the wrong statement.(a) Apache Avro™ is a data serialization system(b) Avro provides simple integration with dynamic languages(c) Avro provides rich data structures(d) All of the mentioned

Answer» Correct choice is (d) All of the mentioned

Easiest explanation: Code generation is not required to read or write data files nor to use or implement RPC protocols in Avro.
333.

Point out the wrong statement.(a) TIMESTAMP is Only available starting with Hive 0.10.0(b) DECIMAL introduced in Hive 0.11.0 with a precision of 38 digits(c) Hive 0.13.0 introduced user definable precision and scale(d) All of the mentioned

Answer» The correct option is (b) DECIMAL introduced in Hive 0.11.0 with a precision of 38 digits

To explain: TIMESTAMP is available starting with Hive 0.8.0
334.

Integral literals are assumed to be _________ by default.(a) SMALL INT(b) INT(c) BIG INT(d) TINY INT

Answer» Right option is (b) INT

For explanation: Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.
335.

Which of the following writes MapFiles as output?(a) DBInpFormat(b) MapFileOutputFormat(c) SequenceFileAsBinaryOutputFormat(d) None of the mentioned

Answer» Correct answer is (c) SequenceFileAsBinaryOutputFormat

The explanation: SequenceFileAsBinaryOutputFormat writes keys and values in raw binary format into a SequenceFile container.
336.

Which of the following is the default output format?(a) TextFormat(b) TextOutput(c) TextOutputFormat(d) None of the mentioned

Answer» The correct choice is (c) TextOutputFormat

To explain: TextOutputFormat keys and values may be of any type.
337.

___________ is an input format for reading data from a relational database, using JDBC.(a) DBInput(b) DBInputFormat(c) DBInpFormat(d) All of the mentioned

Answer» The correct choice is (b) DBInputFormat

The best explanation: DBInputFormat is the most frequently used format for reading data.
338.

The _________ Server assigns regions to the region servers and takes the help of Apache ZooKeeper for this task.(a) Region(b) Master(c) Zookeeper(d) All of the mentioned

Answer» Right answer is (b) Master

The explanation is: Master Server maintains the state of the cluster by negotiating the load balancing.
339.

___________ is A WEb And SOcial Mashup Engine.(a) ServiceMix(b) Samza(c) Rave(d) All of the mentioned

Answer» Right answer is (c) Rave

To explain: Samza is a stream processing system for running continuous computation on infinite streams of data.
340.

Point out the correct statement.(a) To create a Mahout service, one has to write Thrift files that describe it, generate the code in the destination language(b) Thrift is written in Java(c) Thrift is a lean and clean library(d) None of the mentioned

Answer» Correct choice is (c) Thrift is a lean and clean library

Explanation: The predefined serialization styles include: binary, HTTP-friendly and compact binary.
341.

___________ provides multiple language implementations of the Advanced Message Queuing Protocol (AMQP).(a) RTA(b) Qpid(c) RAT(d) All of the mentioned

Answer» Right answer is (b) Qpid

Easy explanation: RAT became part of new Apache Creadur TLP.
342.

Communication between the clients and the servers is done with a simple, high-performance, language-agnostic _________ protocol.(a) IP(b) TCP(c) SMTP(d) ICMP

Answer» Right answer is (b) TCP

For explanation: Java client is provided for Kafka, but clients are available in many languages.
343.

Which of the following Uses JSON for encoding of data?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) None of the mentioned

Answer» The correct choice is (d) None of the mentioned

Explanation: TJSONProtocol is used JSON for encoding of data.
344.

Which of the following format is similar to TCompactProtocol?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocol

Answer» Correct answer is (b) TDenseProtocol

To explain: In TDenseProtocol, stripping off the meta information from what is transmitted.
345.

________ is a write-only protocol that cannot be parsed by Thrift.(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocol

Answer» The correct option is (d) TSimpleJSONProtocol

Best explanation: TSimpleJSONProtocol drops metadata using JSON. Suitable for parsing by scripting languages.
346.

Stratos will be a polyglot _________ framework.(a) Daas(b) PaaS(c) Saas(d) Raas

Answer» Right choice is (b) PaaS

To explain: PaaS provides developers a cloud-based environment for developing, testing, and running scalable applications.
347.

The key _________ command – which is traditionally a bash script – is also re-implemented as hadoop.cmd.(a) start(b) hadoop(c) had(d) hadstrat

Answer» The correct choice is (b) hadoop

The best explanation: HDInsight is the framework for the Microsoft Azure cloud implementation of Hadoop.
348.

Which of the following features is not provided by Impala?(a) SQL functionality(b) ACID(c) Flexibility(d) None of the mentioned

Answer» The correct choice is (b) ACID

The explanation is: Impala combines all of the benefits of other Hadoop frameworks, including flexibility, scalability, and cost-effectiveness, with the performance, usability, and SQL functionality necessary for an enterprise-grade analytic database.
349.

The simplest way to do authentication is using _________ command of Kerberos.(a) auth(b) kinit(c) authorize(d) all of the mentioned

Answer» The correct answer is (b) kinit

To elaborate: HTTP web-consoles should be served by principal different from RPC’s one.
350.

Impala is an integrated part of a ____________ enterprise data hub.(a) MicroSoft(b) IBM(c) Cloudera(d) All of the mentioned

Answer» Correct choice is (c) Cloudera

Best explanation: Impala is open source (Apache License), so you can self-support in perpetuity if you wish.