435 + Interview Questions in GENERAL QA IN HADOOP Page 9 InterviewSolution

401.	Jobs can enable task JVMs to be reused by specifying the job configuration _________(a) mapred.job.recycle.jvm.num.tasks(b) mapissue.job.reuse.jvm.num.tasks(c) mapred.job.reuse.jvm.num.tasks(d) all of the mentioned
Answer» The correct choice is (b) mapissue.job.reuse.jvm.num.tasks The explanation is: Many of my tasks had performance improved over 50% using mapissue.job.reuse.jvm.num.tasks.

Discussion

402.	During merging, __________ now always checks the incoming segments for corruption before merging.(a) LocalWriter(b) IndexWriter(c) ReadWriter(d) All of the mentioned
Answer» The correct answer is (b) IndexWriter The explanation: Lucene supports random-writable and advance-able sparse bitsets.

Discussion

403.	Point out the wrong statement.(a) Sqoop is used to import complete database(b) Sqoop is used to import selected columns from a particular table(c) Sqoop is used to import selected tables(d) All of the mentioned
Answer» Correct answer is (d) All of the mentioned For explanation I would say: Apache Sqoop is a tool which allows users to import data from relational databases to HDFS and export data from HDFS to relational database.

Discussion

404.	Maximum virtual memory of the launched child-task is specified using _________(a) mapv(b) mapred(c) mapvim(d) All of the mentioned
Answer» The correct answer is (b) mapred Best explanation: Admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapred.

Discussion

405.	Microsoft uses a Sqoop-based connector to help transfer data from _________ databases to Hadoop.(a) PostreSQL(b) SQL Server(c) Oracle(d) MySQL
Answer» The correct answer is (b) SQL Server The explanation: Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.

Discussion

406.	____________ specifies the number of segments on disk to be merged at the same time.(a) mapred.job.shuffle.merge.percent(b) mapred.job.reduce.input.buffer.percen(c) mapred.inmem.merge.threshold(d) io.sort.factor
Answer» The correct option is (d) io.sort.factor The explanation is: io.sort.factor limits the number of open files and compression codecs during the merge.

Discussion

407.	__________ provides a Couchbase Server-Hadoop connector by means of Sqoop.(a) MemCache(b) Couchbase(c) Hbase(d) All of the mentioned
Answer» The correct option is (a) MemCache Easy explanation: Exports can be used to put data from Hadoop into a relational database.

Discussion

408.	Point out the correct statement.(a) HDFS provides low latency access to single rows from billions of records (Random access)(b) HBase sits on top of the Hadoop File System and provides read and write access(c) HBase is a distributed file system suitable for storing large files(d) None of the mentioned
Answer» Right choice is (b) HBase sits on top of the Hadoop File System and provides read and write access The best explanation: One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase.

Discussion

409.	Point out the wrong statement.(a) There are no XML configuration files in Thrift(b) Thrift gives cross-language serialization with lower overhead than alternatives such as SOAP due to use of binary format(c) No framework to code is a feature of Thrift(d) None of the mentioned
Answer» Right answer is (d) None of the mentioned The explanation: There are no build dependencies or non-standard software. No mix of incompatible software licenses.

Discussion

410.	Which of the following parameter is the threshold for the accounting and serialization buffers?(a) io.sort.spill.percent(b) io.sort.record.percent(c) io.sort.mb(d) None of the mentioned
Answer» Right answer is (a) io.sort.spill.percent For explanation I would say: When the percentage of either buffer has filled, their contents will be spilled to disk in the background.

Discussion

411.	Sqoop is an open source tool written at ________(a) Cloudera(b) IBM(c) Microsoft(d) All of the mentioned
Answer» Right answer is (c) Microsoft Explanation: Sqoop allows users to import data from their relational databases into HDFS and vice versa.

Discussion

412.	_________ will overwrite any existing data in the table or partition.(a) INSERT WRITE(b) INSERT OVERWRITE(c) INSERT INTO(d) None of the mentioned
Answer» Right answer is (c) INSERT INTO To elaborate: INSERT INTO will append to the table or partition, keeping the existing data intact.

Discussion

413.	HBase is ________ defines only column families.(a) Row Oriented(b) Schema-less(c) Fixed Schema(d) All of the mentioned
Answer» Correct option is (b) Schema-less For explanation I would say: HBase doesn’t have the concept of fixed columns schema.

Discussion

414.	_______ is a lossless data compression library that favors speed over compression ratio.(a) LOZ(b) LZO(c) OLZ(d) All of the mentioned
Answer» Correct choice is (a) LOZ Explanation: lzo and lzop need to be installed on every node in the Hadoop cluster.

Discussion

415.	Which of the following is a multi-threaded server using non-blocking I/O?(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned
Answer» The correct choice is (a) TNonblockingServer The explanation is: Java implementation uses NIO channels.

Discussion

416.	________ uses blocking socket I/O for transport.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned
Answer» Correct choice is (c) TSocket Explanation: TFramedTransport must be used with this server.

Discussion

417.	Point out the wrong statement.(a) From a usability standpoint, LZO and Gzip are similar(b) Bzip2 generates a better compression ratio than does Gzip, but it’s much slower(c) Gzip is a compression utility that was adopted by the GNU project(d) None of the mentioned
Answer» Correct option is (a) From a usability standpoint, LZO and Gzip are similar To explain I would say: From a usability standpoint, Bzip2 and Gzip are similar.

Discussion

418.	Which of the following statement will create a column with varchar datatype?(a) CREATE TABLE foo (bar CHAR(10))(b) CREATE TABLE foo (bar VARCHAR(10))(c) CREATE TABLE foo (bar CHARVARYING(10))(d) All of the mentioned
Answer» Right option is (b) CREATE TABLE foo (bar VARCHAR(10)) Explanation: Varchar datatype was introduced in Hive 0.12.0

Discussion

419.	__________ is a single-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned
Answer» Right choice is (b) TSimpleServer Easy explanation: TSimpleServer is useful for testing.

Discussion

420.	________ is a multi-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TThreadPoolServer(c) TSimpleServer(d) None of the mentioned
Answer» The correct choice is (b) TThreadPoolServer Easiest explanation: TFramedTransport must be used with this server.

Discussion

421.	Which of the following is the slowest compression technique?(a) LZO(b) Bzip2(c) Gzip(d) All of the mentioned
Answer» Right answer is (b) Bzip2 To explain I would say: Of all the available compression codecs in Hadoop, Bzip2 is by far the slowest.

Discussion

422.	The _________ codec from Google provides modest compression ratios.(a) Snapcheck(b) Snappy(c) FileCompress(d) None of the mentioned
Answer» Correct answer is (b) Snappy To explain: Snappy has fast compression and decompression speeds.

Discussion

423.	_____________ are used between blocks to permit efficient splitting of files for MapReduce processing.(a) Codec(b) Data Marker(c) Synchronization markers(d) All of the mentioned
Answer» Right answer is (c) Synchronization markers Explanation: Avro includes a simple object container file format.

Discussion

424.	_______ can change the maximum number of cells of a column family.(a) set(b) reset(c) alter(d) select
Answer» The correct choice is (c) alter Explanation: Alter is the command used to make changes to an existing table.

Discussion

425.	The __________ codec uses Google’s Snappy compression library.(a) null(b) snappy(c) deflate(d) none of the mentioned
Answer» The correct choice is (b) snappy Explanation: Snappy is a compression library developed at Google, and, like many technologies that come from Google, Snappy was designed to be fast.

Discussion

426.	Which of the following performs compression using zlib?(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentioned
Answer» Correct choice is (a) TZlibTransport The best explanation: TZlibTransport is used in conjunction with another transport. Not available in the Java implementation.

Discussion

427.	Point out the correct statement.(a) BooleanSerializer is used to parse string representations of boolean values into boolean scalar types(b) BlobRef is a wrapper that holds a BLOB either directly(c) BooleanParse is used to parse string representations of boolean values into boolean scalar types(d) All of the mentioned
Answer» Correct choice is (b) BlobRef is a wrapper that holds a BLOB either directly The best I can explain: BlobRef is used for reference to a file that holds the BLOB data.

Discussion

428.	__________ encapsulates a set of delimiters used to encode a record.(a) LargeObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) LobSerializer
Answer» Correct answer is (c) DelimiterSet Best explanation: Delimiter set is created with the specified delimiters.

Discussion

429.	ClobRef is a wrapper that holds a CLOB either directly or a reference to a file that holds the ______ data.(a) CLOB(b) BLOB(c) MLOB(d) All of the mentioned
Answer» Right option is (a) CLOB The explanation: Create a ClobRef based on parsed data from a line of text.

Discussion

430.	Which of the following work is done by BigTop in Hadoop framework?(a) Packaging(b) Smoke Testing(c) Virtualization(d) All of the mentioned
Answer» Correct option is (d) All of the mentioned Easiest explanation: BigTop is looking for comprehensive packaging, testing, and configuration of the leading open source big data components.

Discussion

431.	Point out the wrong statement.(a) Bigtop-0.5.0 : Builds the 0.5.0 release(b) Bigtop-trunk-HBase builds the HCatalog packages only(c) There are also jobs for building virtual machine images(d) All of the mentioned
Answer» Right answer is (a) Bigtop-0.5.0 : Builds the 0.5.0 release The best explanation: Bigtop provides vagrant recipes, raw images, and (work-in-progress) docker recipes for deploying Hadoop from zero.

Discussion

432.	Which of the following operating system is not supported by BigTop?(a) Fedora(b) Solaris(c) Ubuntu(d) SUSE
Answer» The correct choice is (b) Solaris For explanation I would say: Bigtop components power the leading Hadoop distros and support many Operating Systems, including Debian/Ubuntu, CentOS, Fedora, SUSE and many others.

Discussion

433.	One supported data type that deserves special mention are ____________(a) money(b) counters(c) smallint(d) tinyint
Answer» Correct answer is (b) counters The best explanation: Synchronization on counters are done on the RegionServer, not in the client.

Discussion

434.	Point out the wrong statement.(a) Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach(b) Coprocessors act like RDBMS triggers(c) HBase does not currently support ‘constraints’ in traditional (SQL) database parlance(d) None of the mentioned
Answer» The correct option is (c) HBase does not currently support ‘constraints’ in traditional (SQL) database parlance For explanation: The advised usage for Constraints is in enforcing business rules for attributes in the table.

Discussion

435.	The _________ suffers from the monotonically increasing rowkey problem.(a) rowkey(b) columnkey(c) counterkey(d) all of the mentioned
Answer» Correct answer is (a) rowkey To explain I would say: Attention must be paid to the number of buckets because this will require the same number of scans to return results.

Discussion

Explore topic-wise InterviewSolutions in .

Jobs can enable task JVMs to be reused by specifying the job configuration _________(a) mapred.job.recycle.jvm.num.tasks(b) mapissue.job.reuse.jvm.num.tasks(c) mapred.job.reuse.jvm.num.tasks(d) all of the mentioned

During merging, __________ now always checks the incoming segments for corruption before merging.(a) LocalWriter(b) IndexWriter(c) ReadWriter(d) All of the mentioned

Point out the wrong statement.(a) Sqoop is used to import complete database(b) Sqoop is used to import selected columns from a particular table(c) Sqoop is used to import selected tables(d) All of the mentioned

Maximum virtual memory of the launched child-task is specified using _________(a) mapv(b) mapred(c) mapvim(d) All of the mentioned

Microsoft uses a Sqoop-based connector to help transfer data from _________ databases to Hadoop.(a) PostreSQL(b) SQL Server(c) Oracle(d) MySQL

____________ specifies the number of segments on disk to be merged at the same time.(a) mapred.job.shuffle.merge.percent(b) mapred.job.reduce.input.buffer.percen(c) mapred.inmem.merge.threshold(d) io.sort.factor

__________ provides a Couchbase Server-Hadoop connector by means of Sqoop.(a) MemCache(b) Couchbase(c) Hbase(d) All of the mentioned

Point out the wrong statement.(a) There are no XML configuration files in Thrift(b) Thrift gives cross-language serialization with lower overhead than alternatives such as SOAP due to use of binary format(c) No framework to code is a feature of Thrift(d) None of the mentioned

Which of the following parameter is the threshold for the accounting and serialization buffers?(a) io.sort.spill.percent(b) io.sort.record.percent(c) io.sort.mb(d) None of the mentioned

Sqoop is an open source tool written at ________(a) Cloudera(b) IBM(c) Microsoft(d) All of the mentioned

_________ will overwrite any existing data in the table or partition.(a) INSERT WRITE(b) INSERT OVERWRITE(c) INSERT INTO(d) None of the mentioned

HBase is ________ defines only column families.(a) Row Oriented(b) Schema-less(c) Fixed Schema(d) All of the mentioned

_______ is a lossless data compression library that favors speed over compression ratio.(a) LOZ(b) LZO(c) OLZ(d) All of the mentioned

Which of the following is a multi-threaded server using non-blocking I/O?(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned

________ uses blocking socket I/O for transport.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned

Point out the wrong statement.(a) From a usability standpoint, LZO and Gzip are similar(b) Bzip2 generates a better compression ratio than does Gzip, but it’s much slower(c) Gzip is a compression utility that was adopted by the GNU project(d) None of the mentioned

Which of the following statement will create a column with varchar datatype?(a) CREATE TABLE foo (bar CHAR(10))(b) CREATE TABLE foo (bar VARCHAR(10))(c) CREATE TABLE foo (bar CHARVARYING(10))(d) All of the mentioned

__________ is a single-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned

________ is a multi-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TThreadPoolServer(c) TSimpleServer(d) None of the mentioned

Which of the following is the slowest compression technique?(a) LZO(b) Bzip2(c) Gzip(d) All of the mentioned

The _________ codec from Google provides modest compression ratios.(a) Snapcheck(b) Snappy(c) FileCompress(d) None of the mentioned

_____________ are used between blocks to permit efficient splitting of files for MapReduce processing.(a) Codec(b) Data Marker(c) Synchronization markers(d) All of the mentioned

_______ can change the maximum number of cells of a column family.(a) set(b) reset(c) alter(d) select

The __________ codec uses Google’s Snappy compression library.(a) null(b) snappy(c) deflate(d) none of the mentioned

Which of the following performs compression using zlib?(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentioned

__________ encapsulates a set of delimiters used to encode a record.(a) LargeObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) LobSerializer

ClobRef is a wrapper that holds a CLOB either directly or a reference to a file that holds the ______ data.(a) CLOB(b) BLOB(c) MLOB(d) All of the mentioned

Which of the following work is done by BigTop in Hadoop framework?(a) Packaging(b) Smoke Testing(c) Virtualization(d) All of the mentioned

Point out the wrong statement.(a) Bigtop-0.5.0 : Builds the 0.5.0 release(b) Bigtop-trunk-HBase builds the HCatalog packages only(c) There are also jobs for building virtual machine images(d) All of the mentioned

Which of the following operating system is not supported by BigTop?(a) Fedora(b) Solaris(c) Ubuntu(d) SUSE

One supported data type that deserves special mention are ____________(a) money(b) counters(c) smallint(d) tinyint

The _________ suffers from the monotonically increasing rowkey problem.(a) rowkey(b) columnkey(c) counterkey(d) all of the mentioned