Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

401.

Jobs can enable task JVMs to be reused by specifying the job configuration _________(a) mapred.job.recycle.jvm.num.tasks(b) mapissue.job.reuse.jvm.num.tasks(c) mapred.job.reuse.jvm.num.tasks(d) all of the mentioned

Answer» The correct choice is (b) mapissue.job.reuse.jvm.num.tasks

The explanation is: Many of my tasks had performance improved over 50% using mapissue.job.reuse.jvm.num.tasks.
402.

During merging, __________ now always checks the incoming segments for corruption before merging.(a) LocalWriter(b) IndexWriter(c) ReadWriter(d) All of the mentioned

Answer» The correct answer is (b) IndexWriter

The explanation: Lucene supports random-writable and advance-able sparse bitsets.
403.

Point out the wrong statement.(a) Sqoop is used to import complete database(b) Sqoop is used to import selected columns from a particular table(c) Sqoop is used to import selected tables(d) All of the mentioned

Answer» Correct answer is (d) All of the mentioned

For explanation I would say: Apache Sqoop is a tool which allows users to import data from relational databases to HDFS and export data from HDFS to relational database.
404.

Maximum virtual memory of the launched child-task is specified using _________(a) mapv(b) mapred(c) mapvim(d) All of the mentioned

Answer» The correct answer is (b) mapred

Best explanation: Admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapred.
405.

Microsoft uses a Sqoop-based connector to help transfer data from _________ databases to Hadoop.(a) PostreSQL(b) SQL Server(c) Oracle(d) MySQL

Answer» The correct answer is (b) SQL Server

The explanation: Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.
406.

____________ specifies the number of segments on disk to be merged at the same time.(a) mapred.job.shuffle.merge.percent(b) mapred.job.reduce.input.buffer.percen(c) mapred.inmem.merge.threshold(d) io.sort.factor

Answer» The correct option is (d) io.sort.factor

The explanation is: io.sort.factor limits the number of open files and compression codecs during the merge.
407.

__________ provides a Couchbase Server-Hadoop connector by means of Sqoop.(a) MemCache(b) Couchbase(c) Hbase(d) All of the mentioned

Answer» The correct option is (a) MemCache

Easy explanation: Exports can be used to put data from Hadoop into a relational database.
408.

Point out the correct statement.(a) HDFS provides low latency access to single rows from billions of records (Random access)(b) HBase sits on top of the Hadoop File System and provides read and write access(c) HBase is a distributed file system suitable for storing large files(d) None of the mentioned

Answer» Right choice is (b) HBase sits on top of the Hadoop File System and provides read and write access

The best explanation: One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase.
409.

Point out the wrong statement.(a) There are no XML configuration files in Thrift(b) Thrift gives cross-language serialization with lower overhead than alternatives such as SOAP due to use of binary format(c) No framework to code is a feature of Thrift(d) None of the mentioned

Answer» Right answer is (d) None of the mentioned

The explanation: There are no build dependencies or non-standard software. No mix of incompatible software licenses.
410.

Which of the following parameter is the threshold for the accounting and serialization buffers?(a) io.sort.spill.percent(b) io.sort.record.percent(c) io.sort.mb(d) None of the mentioned

Answer» Right answer is (a) io.sort.spill.percent

For explanation I would say: When the percentage of either buffer has filled, their contents will be spilled to disk in the background.
411.

Sqoop is an open source tool written at ________(a) Cloudera(b) IBM(c) Microsoft(d) All of the mentioned

Answer» Right answer is (c) Microsoft

Explanation: Sqoop allows users to import data from their relational databases into HDFS and vice versa.
412.

_________ will overwrite any existing data in the table or partition.(a) INSERT WRITE(b) INSERT OVERWRITE(c) INSERT INTO(d) None of the mentioned

Answer» Right answer is (c) INSERT INTO

To elaborate: INSERT INTO will append to the table or partition, keeping the existing data intact.
413.

HBase is ________ defines only column families.(a) Row Oriented(b) Schema-less(c) Fixed Schema(d) All of the mentioned

Answer» Correct option is (b) Schema-less

For explanation I would say: HBase doesn’t have the concept of fixed columns schema.
414.

_______ is a lossless data compression library that favors speed over compression ratio.(a) LOZ(b) LZO(c) OLZ(d) All of the mentioned

Answer» Correct choice is (a) LOZ

Explanation: lzo and lzop need to be installed on every node in the Hadoop cluster.
415.

Which of the following is a multi-threaded server using non-blocking I/O?(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned

Answer» The correct choice is (a) TNonblockingServer

The explanation is: Java implementation uses NIO channels.
416.

________ uses blocking socket I/O for transport.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned

Answer» Correct choice is (c) TSocket

Explanation: TFramedTransport must be used with this server.
417.

Point out the wrong statement.(a) From a usability standpoint, LZO and Gzip are similar(b) Bzip2 generates a better compression ratio than does Gzip, but it’s much slower(c) Gzip is a compression utility that was adopted by the GNU project(d) None of the mentioned

Answer» Correct option is (a) From a usability standpoint, LZO and Gzip are similar

To explain I would say: From a usability standpoint, Bzip2 and Gzip are similar.
418.

Which of the following statement will create a column with varchar datatype?(a) CREATE TABLE foo (bar CHAR(10))(b) CREATE TABLE foo (bar VARCHAR(10))(c) CREATE TABLE foo (bar CHARVARYING(10))(d) All of the mentioned

Answer» Right option is (b) CREATE TABLE foo (bar VARCHAR(10))

Explanation: Varchar datatype was introduced in Hive 0.12.0
419.

__________ is a single-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned

Answer» Right choice is (b) TSimpleServer

Easy explanation: TSimpleServer is useful for testing.
420.

________ is a multi-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TThreadPoolServer(c) TSimpleServer(d) None of the mentioned

Answer» The correct choice is (b) TThreadPoolServer

Easiest explanation: TFramedTransport must be used with this server.
421.

Which of the following is the slowest compression technique?(a) LZO(b) Bzip2(c) Gzip(d) All of the mentioned

Answer» Right answer is (b) Bzip2

To explain I would say: Of all the available compression codecs in Hadoop, Bzip2 is by far the slowest.
422.

The _________ codec from Google provides modest compression ratios.(a) Snapcheck(b) Snappy(c) FileCompress(d) None of the mentioned

Answer» Correct answer is (b) Snappy

To explain: Snappy has fast compression and decompression speeds.
423.

_____________ are used between blocks to permit efficient splitting of files for MapReduce processing.(a) Codec(b) Data Marker(c) Synchronization markers(d) All of the mentioned

Answer» Right answer is (c) Synchronization markers

Explanation: Avro includes a simple object container file format.
424.

_______ can change the maximum number of cells of a column family.(a) set(b) reset(c) alter(d) select

Answer» The correct choice is (c) alter

Explanation: Alter is the command used to make changes to an existing table.
425.

The __________ codec uses Google’s Snappy compression library.(a) null(b) snappy(c) deflate(d) none of the mentioned

Answer» The correct choice is (b) snappy

Explanation: Snappy is a compression library developed at Google, and, like many technologies that come from Google, Snappy was designed to be fast.
426.

Which of the following performs compression using zlib?(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentioned

Answer» Correct choice is (a) TZlibTransport

The best explanation: TZlibTransport is used in conjunction with another transport. Not available in the Java implementation.
427.

Point out the correct statement.(a) BooleanSerializer is used to parse string representations of boolean values into boolean scalar types(b) BlobRef is a wrapper that holds a BLOB either directly(c) BooleanParse is used to parse string representations of boolean values into boolean scalar types(d) All of the mentioned

Answer» Correct choice is (b) BlobRef is a wrapper that holds a BLOB either directly

The best I can explain: BlobRef is used for reference to a file that holds the BLOB data.
428.

__________ encapsulates a set of delimiters used to encode a record.(a) LargeObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) LobSerializer

Answer» Correct answer is (c) DelimiterSet

Best explanation: Delimiter set is created with the specified delimiters.
429.

ClobRef is a wrapper that holds a CLOB either directly or a reference to a file that holds the ______ data.(a) CLOB(b) BLOB(c) MLOB(d) All of the mentioned

Answer» Right option is (a) CLOB

The explanation: Create a ClobRef based on parsed data from a line of text.
430.

Which of the following work is done by BigTop in Hadoop framework?(a) Packaging(b) Smoke Testing(c) Virtualization(d) All of the mentioned

Answer» Correct option is (d) All of the mentioned

Easiest explanation: BigTop is looking for comprehensive packaging, testing, and configuration of the leading open source big data components.
431.

Point out the wrong statement.(a) Bigtop-0.5.0 : Builds the 0.5.0 release(b) Bigtop-trunk-HBase builds the HCatalog packages only(c) There are also jobs for building virtual machine images(d) All of the mentioned

Answer» Right answer is (a) Bigtop-0.5.0 : Builds the 0.5.0 release

The best explanation: Bigtop provides vagrant recipes, raw images, and (work-in-progress) docker recipes for deploying Hadoop from zero.
432.

Which of the following operating system is not supported by BigTop?(a) Fedora(b) Solaris(c) Ubuntu(d) SUSE

Answer» The correct choice is (b) Solaris

For explanation I would say: Bigtop components power the leading Hadoop distros and support many Operating Systems, including Debian/Ubuntu, CentOS, Fedora, SUSE and many others.
433.

One supported data type that deserves special mention are ____________(a) money(b) counters(c) smallint(d) tinyint

Answer» Correct answer is (b) counters

The best explanation: Synchronization on counters are done on the RegionServer, not in the client.
434.

Point out the wrong statement.(a) Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach(b) Coprocessors act like RDBMS triggers(c) HBase does not currently support ‘constraints’ in traditional (SQL) database parlance(d) None of the mentioned

Answer» The correct option is (c) HBase does not currently support ‘constraints’ in traditional (SQL) database parlance

For explanation: The advised usage for Constraints is in enforcing business rules for attributes in the table.
435.

The _________ suffers from the monotonically increasing rowkey problem.(a) rowkey(b) columnkey(c) counterkey(d) all of the mentioned

Answer» Correct answer is (a) rowkey

To explain I would say: Attention must be paid to the number of buckets because this will require the same number of scans to return results.