InterviewSolution
Saved Bookmarks
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 401. |
Jobs can enable task JVMs to be reused by specifying the job configuration _________(a) mapred.job.recycle.jvm.num.tasks(b) mapissue.job.reuse.jvm.num.tasks(c) mapred.job.reuse.jvm.num.tasks(d) all of the mentioned |
|
Answer» The correct choice is (b) mapissue.job.reuse.jvm.num.tasks The explanation is: Many of my tasks had performance improved over 50% using mapissue.job.reuse.jvm.num.tasks. |
|
| 402. |
During merging, __________ now always checks the incoming segments for corruption before merging.(a) LocalWriter(b) IndexWriter(c) ReadWriter(d) All of the mentioned |
|
Answer» The correct answer is (b) IndexWriter The explanation: Lucene supports random-writable and advance-able sparse bitsets. |
|
| 403. |
Point out the wrong statement.(a) Sqoop is used to import complete database(b) Sqoop is used to import selected columns from a particular table(c) Sqoop is used to import selected tables(d) All of the mentioned |
|
Answer» Correct answer is (d) All of the mentioned For explanation I would say: Apache Sqoop is a tool which allows users to import data from relational databases to HDFS and export data from HDFS to relational database. |
|
| 404. |
Maximum virtual memory of the launched child-task is specified using _________(a) mapv(b) mapred(c) mapvim(d) All of the mentioned |
|
Answer» The correct answer is (b) mapred Best explanation: Admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapred. |
|
| 405. |
Microsoft uses a Sqoop-based connector to help transfer data from _________ databases to Hadoop.(a) PostreSQL(b) SQL Server(c) Oracle(d) MySQL |
|
Answer» The correct answer is (b) SQL Server The explanation: Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. |
|
| 406. |
____________ specifies the number of segments on disk to be merged at the same time.(a) mapred.job.shuffle.merge.percent(b) mapred.job.reduce.input.buffer.percen(c) mapred.inmem.merge.threshold(d) io.sort.factor |
|
Answer» The correct option is (d) io.sort.factor The explanation is: io.sort.factor limits the number of open files and compression codecs during the merge. |
|
| 407. |
__________ provides a Couchbase Server-Hadoop connector by means of Sqoop.(a) MemCache(b) Couchbase(c) Hbase(d) All of the mentioned |
|
Answer» The correct option is (a) MemCache Easy explanation: Exports can be used to put data from Hadoop into a relational database. |
|
| 408. |
Point out the correct statement.(a) HDFS provides low latency access to single rows from billions of records (Random access)(b) HBase sits on top of the Hadoop File System and provides read and write access(c) HBase is a distributed file system suitable for storing large files(d) None of the mentioned |
|
Answer» Right choice is (b) HBase sits on top of the Hadoop File System and provides read and write access The best explanation: One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. |
|
| 409. |
Point out the wrong statement.(a) There are no XML configuration files in Thrift(b) Thrift gives cross-language serialization with lower overhead than alternatives such as SOAP due to use of binary format(c) No framework to code is a feature of Thrift(d) None of the mentioned |
|
Answer» Right answer is (d) None of the mentioned The explanation: There are no build dependencies or non-standard software. No mix of incompatible software licenses. |
|
| 410. |
Which of the following parameter is the threshold for the accounting and serialization buffers?(a) io.sort.spill.percent(b) io.sort.record.percent(c) io.sort.mb(d) None of the mentioned |
|
Answer» Right answer is (a) io.sort.spill.percent For explanation I would say: When the percentage of either buffer has filled, their contents will be spilled to disk in the background. |
|
| 411. |
Sqoop is an open source tool written at ________(a) Cloudera(b) IBM(c) Microsoft(d) All of the mentioned |
|
Answer» Right answer is (c) Microsoft Explanation: Sqoop allows users to import data from their relational databases into HDFS and vice versa. |
|
| 412. |
_________ will overwrite any existing data in the table or partition.(a) INSERT WRITE(b) INSERT OVERWRITE(c) INSERT INTO(d) None of the mentioned |
|
Answer» Right answer is (c) INSERT INTO To elaborate: INSERT INTO will append to the table or partition, keeping the existing data intact. |
|
| 413. |
HBase is ________ defines only column families.(a) Row Oriented(b) Schema-less(c) Fixed Schema(d) All of the mentioned |
|
Answer» Correct option is (b) Schema-less For explanation I would say: HBase doesn’t have the concept of fixed columns schema. |
|
| 414. |
_______ is a lossless data compression library that favors speed over compression ratio.(a) LOZ(b) LZO(c) OLZ(d) All of the mentioned |
|
Answer» Correct choice is (a) LOZ Explanation: lzo and lzop need to be installed on every node in the Hadoop cluster. |
|
| 415. |
Which of the following is a multi-threaded server using non-blocking I/O?(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned |
|
Answer» The correct choice is (a) TNonblockingServer The explanation is: Java implementation uses NIO channels. |
|
| 416. |
________ uses blocking socket I/O for transport.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned |
|
Answer» Correct choice is (c) TSocket Explanation: TFramedTransport must be used with this server. |
|
| 417. |
Point out the wrong statement.(a) From a usability standpoint, LZO and Gzip are similar(b) Bzip2 generates a better compression ratio than does Gzip, but it’s much slower(c) Gzip is a compression utility that was adopted by the GNU project(d) None of the mentioned |
|
Answer» Correct option is (a) From a usability standpoint, LZO and Gzip are similar To explain I would say: From a usability standpoint, Bzip2 and Gzip are similar. |
|
| 418. |
Which of the following statement will create a column with varchar datatype?(a) CREATE TABLE foo (bar CHAR(10))(b) CREATE TABLE foo (bar VARCHAR(10))(c) CREATE TABLE foo (bar CHARVARYING(10))(d) All of the mentioned |
|
Answer» Right option is (b) CREATE TABLE foo (bar VARCHAR(10)) Explanation: Varchar datatype was introduced in Hive 0.12.0 |
|
| 419. |
__________ is a single-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentioned |
|
Answer» Right choice is (b) TSimpleServer Easy explanation: TSimpleServer is useful for testing. |
|
| 420. |
________ is a multi-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TThreadPoolServer(c) TSimpleServer(d) None of the mentioned |
|
Answer» The correct choice is (b) TThreadPoolServer Easiest explanation: TFramedTransport must be used with this server. |
|
| 421. |
Which of the following is the slowest compression technique?(a) LZO(b) Bzip2(c) Gzip(d) All of the mentioned |
|
Answer» Right answer is (b) Bzip2 To explain I would say: Of all the available compression codecs in Hadoop, Bzip2 is by far the slowest. |
|
| 422. |
The _________ codec from Google provides modest compression ratios.(a) Snapcheck(b) Snappy(c) FileCompress(d) None of the mentioned |
|
Answer» Correct answer is (b) Snappy To explain: Snappy has fast compression and decompression speeds. |
|
| 423. |
_____________ are used between blocks to permit efficient splitting of files for MapReduce processing.(a) Codec(b) Data Marker(c) Synchronization markers(d) All of the mentioned |
|
Answer» Right answer is (c) Synchronization markers Explanation: Avro includes a simple object container file format. |
|
| 424. |
_______ can change the maximum number of cells of a column family.(a) set(b) reset(c) alter(d) select |
|
Answer» The correct choice is (c) alter Explanation: Alter is the command used to make changes to an existing table. |
|
| 425. |
The __________ codec uses Google’s Snappy compression library.(a) null(b) snappy(c) deflate(d) none of the mentioned |
|
Answer» The correct choice is (b) snappy Explanation: Snappy is a compression library developed at Google, and, like many technologies that come from Google, Snappy was designed to be fast. |
|
| 426. |
Which of the following performs compression using zlib?(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentioned |
|
Answer» Correct choice is (a) TZlibTransport The best explanation: TZlibTransport is used in conjunction with another transport. Not available in the Java implementation. |
|
| 427. |
Point out the correct statement.(a) BooleanSerializer is used to parse string representations of boolean values into boolean scalar types(b) BlobRef is a wrapper that holds a BLOB either directly(c) BooleanParse is used to parse string representations of boolean values into boolean scalar types(d) All of the mentioned |
|
Answer» Correct choice is (b) BlobRef is a wrapper that holds a BLOB either directly The best I can explain: BlobRef is used for reference to a file that holds the BLOB data. |
|
| 428. |
__________ encapsulates a set of delimiters used to encode a record.(a) LargeObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) LobSerializer |
|
Answer» Correct answer is (c) DelimiterSet Best explanation: Delimiter set is created with the specified delimiters. |
|
| 429. |
ClobRef is a wrapper that holds a CLOB either directly or a reference to a file that holds the ______ data.(a) CLOB(b) BLOB(c) MLOB(d) All of the mentioned |
|
Answer» Right option is (a) CLOB The explanation: Create a ClobRef based on parsed data from a line of text. |
|
| 430. |
Which of the following work is done by BigTop in Hadoop framework?(a) Packaging(b) Smoke Testing(c) Virtualization(d) All of the mentioned |
|
Answer» Correct option is (d) All of the mentioned Easiest explanation: BigTop is looking for comprehensive packaging, testing, and configuration of the leading open source big data components. |
|
| 431. |
Point out the wrong statement.(a) Bigtop-0.5.0 : Builds the 0.5.0 release(b) Bigtop-trunk-HBase builds the HCatalog packages only(c) There are also jobs for building virtual machine images(d) All of the mentioned |
|
Answer» Right answer is (a) Bigtop-0.5.0 : Builds the 0.5.0 release The best explanation: Bigtop provides vagrant recipes, raw images, and (work-in-progress) docker recipes for deploying Hadoop from zero. |
|
| 432. |
Which of the following operating system is not supported by BigTop?(a) Fedora(b) Solaris(c) Ubuntu(d) SUSE |
|
Answer» The correct choice is (b) Solaris For explanation I would say: Bigtop components power the leading Hadoop distros and support many Operating Systems, including Debian/Ubuntu, CentOS, Fedora, SUSE and many others. |
|
| 433. |
One supported data type that deserves special mention are ____________(a) money(b) counters(c) smallint(d) tinyint |
|
Answer» Correct answer is (b) counters The best explanation: Synchronization on counters are done on the RegionServer, not in the client. |
|
| 434. |
Point out the wrong statement.(a) Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach(b) Coprocessors act like RDBMS triggers(c) HBase does not currently support ‘constraints’ in traditional (SQL) database parlance(d) None of the mentioned |
|
Answer» The correct option is (c) HBase does not currently support ‘constraints’ in traditional (SQL) database parlance For explanation: The advised usage for Constraints is in enforcing business rules for attributes in the table. |
|
| 435. |
The _________ suffers from the monotonically increasing rowkey problem.(a) rowkey(b) columnkey(c) counterkey(d) all of the mentioned |
|
Answer» Correct answer is (a) rowkey To explain I would say: Attention must be paid to the number of buckets because this will require the same number of scans to return results. |
|