Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

251.

Types that may be null must be defined as a ______ of that type and Null within Avro.(a) Union(b) Intersection(c) Set(d) All of the mentioned

Answer» Correct choice is (a) Union

Easiest explanation: A null in a field that is not so defined will result in an exception during the save. No changes need be made to the Hive schema to support this, as all fields in Hive can be null.
252.

A key of type ___________ is generated which is used later to join ngrams with their heads and tails in the reducer phase.(a) GramKey(b) Primary(c) Secondary(d) None of the mentioned

Answer» The correct answer is (a) GramKey

To explain I would say: The GramKey is a composite key made up of a string n-gram fragment as the primary key and a secondary key used for grouping and sorting in the reduce phase.
253.

The ____________ is an iterator which reads through the file and returns objects using the next() method.(a) DatReader(b) DatumReader(c) DatumRead(d) None of the mentioned

Answer» Correct choice is (b) DatumReader

Easiest explanation: DatumReader reads the content through the DataFileReader implementation.
254.

Which of the following command runs the HDFS secondary namenode?(a) secondary namenode(b) secondarynamenode(c) secondary_namenode(d) none of the mentioned

Answer» Right option is (b) secondarynamenode

The explanation: The secondary NameNode merges the fsimage and the edits log files periodically and keeps edits log size within a limit.
255.

Point out the correct statement.(a) The HDFS architecture is compatible with data rebalancing schemes(b) Datablocks support storing a copy of data at a particular instant of time(c) HDFS currently support snapshots(d) None of the mentioned

Answer» Right choice is (a) The HDFS architecture is compatible with data rebalancing schemes

The explanation: A scheme might automatically move data from one DataNode to another if the free space on a DataNode falls below a certain threshold.
256.

The files that are written by the _______ job are valid Avro files.(a) Avro(b) Map Reduce(c) Hive(d) All of the mentioned

Answer» Right option is (c) Hive

To elaborate: If you copy these files out, you’ll likely want to rename them with .avro.
257.

HDFS supports the ____________ command to fetch Delegation Token and store it in a file on the local system.(a) fetdt(b) fetchdt(c) fsk(d) rec

Answer» The correct answer is (b) fetchdt

Explanation: Delegation token can be later used to access secure server from a non secure client.
258.

The ___________ machine is a single point of failure for an HDFS cluster.(a) DataNode(b) NameNode(c) ActionNode(d) All of the mentioned

Answer» The correct option is (b) NameNode

Explanation: If the NameNode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the NameNode software to another machine is not supported.
259.

Point out the correct statement.(a) The HCatLoader and HCatStorer interfaces are used with Pig scripts to read and write data in HCatalog-managed tables(b) HCatalog is not thread safe(c) HCatLoader is used with Pig scripts to read data from HCatalog-managed tables.(d) All of the mentioned

Answer» The correct option is (d) All of the mentioned

To elaborate: HCatLoader is accessed via a Pig load statement.
260.

____________  is used when you want the sink to be the input source for another operation.(a) Collector Tier Event(b) Agent Tier Event(c) Basic(d) All of the mentioned

Answer» Correct choice is (b) Agent Tier Event

The best explanation: All agents in a specific tier could be given the same name; One configuration file with … Clients send Events to Agents; Agents hosts number Flume components.
261.

Point out the correct statement.(a) Cassandra accommodates expensive, consumer SSDs extremely well(b) Cassandra re-writes or re-reads existing data, and never overwrites the rows in place(c) Cassandra uses a storage structure similar to a Log-Structured Merge Tree(d) None of the mentioned

Answer» The correct option is (c) Cassandra uses a storage structure similar to a Log-Structured Merge Tree

Easiest explanation: A log-structured engine that avoids overwrites and uses sequential IO to update data is essential for writing to hard disks (HDD) and solid-state disks (SSD).
262.

During start up, the ___________ loads the file system state from the fsimage and the edits log file.(a) DataNode(b) NameNode(c) ActionNode(d) None of the mentioned

Answer» Right choice is (b) NameNode

Explanation: HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it.
263.

_________________ property allow users to override the expiry time specified.(a) hcat.desired.partition.num.splits(b) hcatalog.hive.client.cache.expiry.time(c) hcatalog.hive.client.cache.disabled(d) hcat.append.limit

Answer» Correct answer is (b) hcatalog.hive.client.cache.expiry.time

The best explanation: This property is an int, and specifies number of seconds.
264.

___________ is where you would land a flow (or possibly multiple flows joined together) into an HDFS-formatted file system.(a) Collector Tier Event(b) Agent Tier Event(c) Basic(d) All of the mentioned

Answer» Correct choice is (a) Collector Tier Event

For explanation I would say: A number of other predefined source adapters, as well as a command exit, allow you to use any executable command to feed the flow of data.
265.

A _________ grants initial permissions, and subsequently a user may or may not be given the permission to grant/revoke permissions.(a) keyspace(b) superuser(c) sudouser(d) none of the mentioned

Answer» Correct answer is (b) superuser

Explanation: Object permission management is based on internal authorization.
266.

Using ___________ file means you don’t have to override the SSL_CERTFILE environmental variables every time.(a) qlshrc(b) cqshrc(c) cqlshrc(d) none of the mentioned

Answer» The correct answer is (c) cqlshrc

The best explanation: cqlsh is used with SSL encryption.
267.

Variable Substitution is disabled by using ___________(a) set hive.variable.substitute=false;(b) set hive.variable.substitutevalues=false;(c) set hive.variable.substitute=true;(d) all of the mentioned

Answer» Right answer is (a) set hive.variable.substitute=false;

The explanation: Variable substitution is on by default (hive.variable.substitute=true).
268.

HiveServer2 introduced in Hive 0.11 has a new CLI called __________(a) BeeLine(b) SqlLine(c) HiveLine(d) CLilLine

Answer» The correct answer is (a) BeeLine

To elaborate: Beeline is a JDBC client based on SQLLine.
269.

hiveconf variables are set as normal by using the following statement?(a) set -v x=myvalue(b) set x=myvalue(c) reset x=myvalue(d) none of the mentioned

Answer» The correct option is (d) none of the mentioned

To explain: The hiveconf variables are set as normal by set x=myvalue.
270.

Point out the wrong statement.(a) There are four namespaces for variables in Hive(b) Custom variables can be created in a separate namespace with the define(c) Custom variables can also be created in a separate namespace with hivevar(d) None of the mentioned

Answer» Correct answer is (a) There are four namespaces for variables in Hive

The explanation: Three namespaces for variables are hiveconf, system, and env.
271.

HCatalog is installed with Hive, starting with Hive release is ___________(a) 0.10.0(b) 0.9.0(c) 0.11.0(d) 0.12.0

Answer» Correct option is (c) 0.11.0

The explanation is: hcat commands can be issued as hive commands, and vice versa.
272.

Point out the correct statement.(a) Bigtop provides an integrated smoke testing framework, alongside a suite of over 10 test files(b) Bigtop includes tools and a framework for testing at various levels(c) Bigtop components supports only one Operating Systems(d) All of the mentioned

Answer» Correct choice is (b) Bigtop includes tools and a framework for testing at various levels

For explanation I would say: Bigtop  is used for both initial deployments as well as upgrade scenarios for the entire data platform, not just the individual components.
273.

Hadoop is a framework that works with a variety of related tools. Common cohorts include ____________(a) MapReduce, Hive and HBase(b) MapReduce, MySQL and Google Apps(c) MapReduce, Hummer and Iguana(d) MapReduce, Heron and Trumpet

Answer» The correct answer is (a) MapReduce, Hive and HBase

Easy explanation: To use Hive with HBase you’ll typically want to launch two clusters, one to run HBase and the other to run Hive.
274.

Point out the wrong statement.(a) Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data(b) Hadoop uses a programming model called “MapReduce”, all the programs should conform to this model in order to work on the Hadoop platform(c) The programming model, MapReduce, used by Hadoop is difficult to write and test(d) All of the mentioned

Answer» Correct answer is (c) The programming model, MapReduce, used by Hadoop is difficult to write and test

The explanation: The programming model, MapReduce, used by Hadoop is simple to write and test.
275.

Point out the wrong statement.(a) Abstract base class that holds a reference to a Blob or a Clob(b) ACCESSORTYPE is the type used to access this data in a streaming fashion(c) CONTAINERTYPE is the type used to hold this data (e.g., BytesWritable)(d) None of the mentioned

Answer» The correct option is (d) None of the mentioned

The best explanation: DATATYPE is the type being held (e.g., a byte array).
276.

A ________ is used to manage the efficient barrier synchronization of the BSPPeers.(a) GroomServers(b) BSPMaster(c) Zookeeper(d) None of the mentioned

Answer» Right answer is (c) Zookeeper

Explanation: A Groom Server is a process that performs bsp tasks assigned by BSPMaster.
277.

A __________ server and a data node should be run on one physical node.(a) groom(b) web(c) client(d) all of the mentioned

Answer» Right option is (a) groom

The explanation is: Each groom is designed to run with HDFS or other distributed storage.
278.

Point out the correct statement.(a) Snappy is licensed under the GNU Public License (GPL)(b) BgCIK needs to create an index when it compresses a file(c) The Snappy codec is integrated into Hadoop Common, a set of common utilities that supports other Hadoop subprojects(d) None of the mentioned

Answer» The correct option is (c) The Snappy codec is integrated into Hadoop Common, a set of common utilities that supports other Hadoop subprojects

For explanation I would say: You can use Snappy as an add-on for more recent versions of Hadoop that do not yet provide Snappy codec support.
279.

Which of the following supports splittable compression?(a) LZO(b) Bzip2(c) Gzip(d) All of the mentioned

Answer» Correct answer is (a) LZO

Easiest explanation: LZO enables the parallel processing of compressed text file splits by your MapReduce jobs.
280.

Which of the following compression is similar to Snappy compression?(a) LZO(b) Bzip2(c) Gzip(d) All of the mentioned

Answer» Right option is (a) LZO

The explanation: LZO is only really desirable if you need to compress text files.
281.

Node names and transitions must be conform to the following pattern =[a-zA-Z][-_a-zA-Z0-0]*=, of up to __________ characters long.(a) 10(b) 15(c) 20(d) 25

Answer» Right option is (c) 20

The best I can explain: Action nodes trigger the execution of a computation/processing task.
282.

Point out the wrong statement.(a) Record, enums and fixed are named types(b) Unions may immediately contain other unions(c) A namespace is a dot-separated sequence of such names(d) All of the mentioned

Answer» Correct choice is (b) Unions may immediately contain other unions

To explain: Unions may not immediately contain other unions.
283.

Point out the correct statement.(a) The Avro file dump utility analyzes ORC files(b) Streams are compressed using a codec, which is specified as a table property for all streams in that table(c) The ODC file dump utility analyzes ORC files(d) All of the mentioned

Answer» Right choice is (b) Streams are compressed using a codec, which is specified as a table property for all streams in that table

The explanation is: The codec can be Snappy, Zlib, or none.
284.

________ permits data written by one system to be efficiently sorted by another system.(a) Complex Data type(b) Order(c) Sort Order(d) All of the mentioned

Answer» Correct answer is (c) Sort Order

To explain: Avro binary-encoded data can be efficiently ordered without deserializing it to objects.
285.

Serialization of string columns uses a ________ to form unique column values.(a) Footer(b) STRIPES(c) Dictionary(d) Index

Answer» Right choice is (c) Dictionary

The best I can explain: The dictionary is sorted to speed up predicate filtering and improve compression ratios.
286.

Microsoft .NET Library for Avro provides data serialization for the Microsoft ___________ environment.(a) .NET(b) Hadoop(c) Ubuntu(d) None of the mentioned

Answer» The correct answer is (a) .NET

The explanation is: The Microsoft .NET Library for Avro implements the Apache Avro compact binary data interchange format for serialization for the Microsoft .NET environment.
287.

__________ is the node responsible for all reads and writes for the given partition.(a) replicas(b) leader(c) follower(d) isr

Answer» Correct option is (b) leader

To elaborate: Each node will be the leader for a randomly selected portion of the partitions.
288.

Sqoop uses _________ to fetch data from RDBMS and stores that on HDFS.(a) Hive(b) Map reduce(c) Imphala(d) BigTOP

Answer» Right option is (b) Map reduce

The best explanation: While fetching, it throttles the number of mappers accessing data on RDBMS to avoid DDoS.
289.

Which of the following will prefix the query string with parameters?(a) SET hive.exec.compress.output=false(b) SET hive.compress.output=false(c) SET hive.exec.compress.output=true(d) All of the mentioned

Answer» The correct choice is (a) SET hive.exec.compress.output=false

The explanation: Use lzop command utility or your custom Java to generate .lzo.index for the .lzo files.
290.

Point out the correct statement.(a) Interface FieldMapping is used for mapping of field(b) Interface FieldMappable is used for mapping of field(c) Sqoop is nothing but NoSQL to Hadoop(d) Sqoop internally uses ODBC interface so it should work with any JDBC compatible database

Answer» Right option is (b) Interface FieldMappable is used for mapping of field

For explanation I would say: FieldMappable Interface describes a class capable of returning a map of the fields of the object to their values.
291.

Sqoop direct mode does not support imports of ______ columns.(a) BLOB(b) LONGVARBINARY(c) CLOB(d) All of the mentioned

Answer» Right answer is (d) All of the mentioned

The best I can explain: Use JDBC-based imports for these columns; do not supply the –direct argument to the import tool.
292.

Point out the correct statement.(a) The number of sorted map outputs fetched into memory before being merged to disk(b) The memory threshold for fetched map outputs before an in-memory merge is finished(c) The percentage of memory relative to the maximum heap size in which map outputs may not be retained during the reduce(d) None of the mentioned

Answer» The correct choice is (a) The number of sorted map outputs fetched into memory before being merged to disk

Easiest explanation: When the reduce begins, map outputs will be merged to disk until those that remain are under the resource limit this defines.
293.

_________ is useful for iterating the properties when all deprecated properties for currently set properties need to be present.(a) addResource(b) setDeprecatedProperties(c) addDefaultResource(d) none of the mentioned

Answer» Correct option is (b) setDeprecatedProperties

The explanation is: setDeprecatedProperties sets all deprecated properties that are not currently set but have a corresponding new property that is set.
294.

________ checks whether the given key is deprecated.(a) isDeprecated(b) setDeprecated(c) isDeprecatedif(d) all of the mentioned

Answer» The correct answer is (a) isDeprecated

For explanation: Method returns true if the key is deprecated and false otherwise.
295.

Point out the correct statement.(a) The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop(b) If $HADOOP_HOME is set, Sqoop will use the default installation location for Cloudera’s Distribution for Hadoop(c) The active Hadoop configuration is loaded from $HADOOP_HOME/conf/, unless the $HADOOP_CONF_DIR environment variable is unset(d) None of the mentioned

Answer» Correct option is (a) The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop

To explain I would say: If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the $HADOOP_HOME environment variable.
296.

Point out the wrong statement.(a) The task tracker has local directory to create localized cache and localized job(b) The task tracker can define multiple local directories(c) The Job tracker cannot define multiple local directories(d) None of the mentioned

Answer» The correct answer is (d) None of the mentioned

For explanation: When the job starts, task tracker creates a localized job directory relative to the local directory specified in the configuration.
297.

MapReduce has undergone a complete overhaul in hadoop is _________(a) 0.21(b) 0.23(c) 0.24(d) 0.26

Answer» Right answer is (b) 0.23

Explanation: The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker.
298.

The CapacityScheduler has a predefined queue called _________(a) domain(b) root(c) rear(d) all of the mentioned

Answer» The correct answer is (b) root

To explain I would say: All queues in the system are children of the root queue.
299.

_________  tool can list all the available database schemas.(a) sqoop-list-tables(b) sqoop-list-databases(c) sqoop-list-schema(d) sqoop-list-columns

Answer» The correct option is (b) sqoop-list-databases

Easiest explanation: Sqoop also includes a primitive SQL execution shell (the sqoop-eval tool).
300.

The ____________ is the ultimate authority that arbitrates resources among all the applications in the system.(a) NodeManager(b) ResourceManager(c) ApplicationMaster(d) All of the mentioned

Answer» Correct answer is (b) ResourceManager

The explanation: The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework.