435 + Interview Questions in GENERAL QA IN HADOOP Page 5 InterviewSolution

201.	Point out the correct statement.(a) Apache Avro is a framework that allows you to serialize data in a format that has a schema built in(b) The serialized data is in a compact binary format that doesn’t require proxy objects or code generation(c) Including schemas with the Avro messages allows any application to deserialize the data(d) All of the mentioned
Answer» Correct answer is (d) All of the mentioned Easy explanation: Instead of using generated proxy libraries and strong typing, Avro relies heavily on the schemas that are sent along with the serialized data.

Discussion

202.	Which of the following is a configuration management system?(a) Alex(b) Puppet(c) Acem(d) None of the mentioned
Answer» Right answer is (b) Puppet The best explanation: Administrators may use configuration management systems such as Puppet and Chef to manage processes.

Discussion

203.	Avro schemas describe the format of the message and are defined using ______________(a) JSON(b) XML(c) JS(d) All of the mentioned
Answer» The correct option is (a) JSON Best explanation: The JSON schema content is put into a file.

Discussion

204.	Point out the correct statement.(a) RAID is turned off by default(b) Hadoop is designed to be a highly redundant distributed system(c) Hadoop has a networked configuration system(d) None of the mentioned
Answer» The correct option is (b) Hadoop is designed to be a highly redundant distributed system To elaborate: Hadoop deployment is sometimes difficult to implement.

Discussion

205.	___________ mode allows you to suppress alerts for a host, service, role, or even the entire cluster.(a) Safe(b) Maintenance(c) Secure(d) All of the mentioned
Answer» The correct choice is (b) Maintenance For explanation: Maintenance mode can be useful when you need to take actions in your cluster and do not want to see the alerts that will be generated due to those actions.

Discussion

206.	Which of the following is a primitive data type in Avro?(a) null(b) boolean(c) float(d) all of the mentioned
Answer» Right option is (d) all of the mentioned For explanation: Primitive type names are also defined type names.

Discussion

207.	Point out the correct statement.(a) Records use the type name “record” and support three attributes(b) Enum are represented using JSON arrays(c) Avro data is always serialized with its schema(d) All of the mentioned
Answer» Correct option is (a) Records use the type name “record” and support three attributes Best explanation: A record is encoded by encoding the values of its fields in the order that they are declared.

Discussion

208.	Which of the following interface is implemented by Sqoop for recording?(a) SqoopWrite(b) SqoopRecord(c) SqoopRead(d) None of the mentioned
Answer» Right option is (b) SqoopRecord The explanation is: Class SqoopRecord is an interface implemented by the classes generated by sqoop orm.ClassWriter.

Discussion

209.	Records are terminated by a __________ character.(a) RECORD_DELIMITER(b) FIELD_DELIMITER(c) FIELD_LIMITER(d) None of the mentioned
Answer» Correct option is (a) RECORD_DELIMITER Best explanation: Class RecordParser parses a record containing one or more fields.

Discussion

210.	_________ supports null values for all types.(a) SmallObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) JdbcWritableBridge
Answer» Correct option is (d) JdbcWritableBridge The explanation: JdbcWritableBridge class contains a set of methods which can read db columns from a ResultSet into Java types.

Discussion

211.	Which of the following class is used for general processing of error?(a) LargeObjectLoader(b) ProcessingException(c) DelimiterSet(d) LobSerializer
Answer» Correct option is (b) ProcessingException The explanation is: General error occurs during the processing of a SqoopRecord.

Discussion

212.	Point out the correct statement.(a) ZooKeeper can achieve high throughput and high latency numbers(b) The fault tolerant ordering means that sophisticated synchronization primitives can be implemented at the client(c) The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access(d) All of the mentioned
Answer» Right option is (c) The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access To explain I would say: The performance aspects of ZooKeeper means it can be used in large, distributed systems.

Discussion

213.	Which of the following is a singleton instance class?(a) LargeObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) LobSerializer
Answer» The correct choice is (a) LargeObjectLoader To explain I would say: Lifetime is limited to the current TaskInputOutputContext’s life.

Discussion

214.	Which of the guarantee is provided by Zookeeper?(a) Interactivity(b) Flexibility(c) Scalability(d) Reliability
Answer» Right option is (d) Reliability To explain I would say: Once an update has been applied, it will persist from that time forward until a client overwrites the update.

Discussion

215.	__________ is used as a remote procedure call (RPC) framework for facebook.(a) Oozie(b) Mahout(c) Thrift(d) Impala
Answer» The correct answer is (c) Thrift To explain: In contrast to built-in types, created data structures are sent as a result in generated code.

Discussion

216.	The ________ class provides the getValue() method to read the values from its instance.(a) Get(b) Result(c) Put(d) Value
Answer» Correct option is (b) Result To elaborate: Get the result by passing your Get class instance to the get method of the HTable class. This method returns the Result class object, which holds the requested result.

Discussion

217.	A number of constants used in the client ZooKeeper API were renamed in order to reduce ________ collision.(a) value(b) namespace(c) counter(d) none of the mentioned
Answer» Right choice is (b) namespace To explain: ZOOKEEPER-18 removed KeeperStateChanged, use KeeperStateDisconnected instead.

Discussion

218.	ZooKeeper is especially fast in ___________ workloads.(a) write(b) read-dominant(c) read-write(d) none of the mentioned
Answer» Correct answer is (b) read-dominant For explanation: ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

Discussion

219.	Point out the correct statement.(a) Thrift is developed for scalable cross-language services development(b) Thrift includes a complete stack for creating clients and servers(c) The top part of the Thrift stack is generated code from the Thrift definition(d) All of the mentioned
Answer» Correct choice is (d) All of the mentioned Explanation: The services generate from this file client and processor code.

Discussion

220.	________ communicate with the client and handle data-related operations.(a) Master Server(b) Region Server(c) Htable(d) All of the mentioned
Answer» The correct choice is (b) Region Server The explanation is: Region Server handle read and write requests for all the regions under it.

Discussion

221.	The Email & Apps team of ___________ uses ZooKeeper to coordinate sharding and responsibility changes in a distributed email client.(a) Katta(b) Helprace(c) Rackspace(d) None of the mentioned
Answer» Correct option is (c) Rackspace Easiest explanation: ZooKeeper also provides distributed locking for connections to prevent a cluster from overwhelming servers.

Discussion

222.	Point out the correct statement.(a) With TextInputFormat and KeyValueTextInputFormat, each mapper receives a variable number of lines of input(b) StreamXmlRecordReader, the page elements can be interpreted as records for processing by a mapper(c) The number depends on the size of the split and the length of the lines.(d) All of the mentioned
Answer» The correct option is (d) All of the mentioned The explanation: Large XML documents that are composed of a series of “records” can be broken into these records using simple string or regular-expression matching to find start and end tags of records.

Discussion

223.	ZooKeeper is used for configuration, leader election in Cloud edition of ______________(a) Solr(b) Solur(c) Solar101(d) All of the mentioned
Answer» Right choice is (a) Solr To explain: ZooKeeper is used for internal application development with Solr and Hadoop with Hbase.

Discussion

224.	_________ is the main configuration file of HBase.(a) hbase.xml(b) hbase-site.xml(c) hbase-site-conf.xml(d) none of the mentioned
Answer» The correct option is (b) hbase-site.xml The best I can explain: Set the data directory to an appropriate location by opening the HBase home folder in /usr/local/HBase.

Discussion

225.	Which of the following project is interface definition language for hadoop?(a) Oozie(b) Mahout(c) Thrift(d) Impala
Answer» Correct answer is (c) Thrift For explanation: Thrift is an interface definition language and binary communication protocol that is used to define and create services for numerous languages.

Discussion

226.	The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.(a) NLineInputFormat(b) InputLineFormat(c) LineInputFormat(d) None of the mentioned
Answer» Right option is (a) NLineInputFormat The explanation: We can set the value of parameter via the Source interface’s inputConf method.

Discussion

227.	___________ takes node and rack locality into account when deciding which blocks to place in the same split.(a) CombineFileOutputFormat(b) CombineFileInputFormat(c) TextFileInputFormat(d) None of the mentioned
Answer» Right answer is (b) CombineFileInputFormat The explanation is: CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job.

Discussion

228.	The split size is normally the size of a ________ block, which is appropriate for most applications.(a) Generic(b) Task(c) Library(d) HDFS
Answer» Correct option is (d) HDFS For explanation: FileInputFormat splits only large files(Here “large” means larger than an HDFS block).

Discussion

229.	Point out the correct statement.(a) The minimum split size is usually 1 byte, although some formats have a lower bound on the split size(b) Applications may impose a minimum split size(c) The maximum split size defaults to the maximum value that can be represented by a Java long type(d) All of the mentioned
Answer» The correct choice is (a) The minimum split size is usually 1 byte, although some formats have a lower bound on the split size For explanation: The maximum split size has an effect only when it is less than the block size, forcing splits to be smaller than a block.

Discussion

230.	Point out the wrong statement.(a) Oozie v2 is a server based Coordinator Engine specialized in running workflows based on time and data triggers(b) Oozie v1 is a server based Workflow Engine specialized in running workflow jobs with actions that execute Hadoop Map/Reduce and Pig jobs(c) A Workflow application is DAG that coordinates the following types of actions(d) None of the mentioned
Answer» The correct answer is (d) None of the mentioned Easiest explanation: Cycle in workflows are not supported.

Discussion

231.	Oozie can make _________ callback notifications on action start events and workflow end events.(a) TCP(b) HTTP(c) IP(d) All of the mentioned
Answer» Correct answer is (b) HTTP Explanation: In the case of an action start failure in a workflow job, depending on the type of failure, Oozie will attempt automatic retries, it will request a manual retry or it will fail the workflow job.

Discussion

232.	Which of the following is one of the possible state for a workflow jobs?(a) PREP(b) START(c) RESUME(d) END
Answer» Correct answer is (a) PREP Easy explanation: Possible states for a workflow jobs are: PREP, RUNNING, SUSPENDED, SUCCEEDED, KILLED and FAILED.

Discussion

233.	Oozie Workflow jobs are Directed ________ graphs of actions.(a) Acyclical(b) Cyclical(c) Elliptical(d) All of the mentioned
Answer» The correct option is (a) Acyclical To elaborate: Oozie is a framework allowing to combine multiple Map/Reduce jobs into a logical unit of work.

Discussion

234.	Which of the following has the core Eclipse PDE tools for HDT development?(a) RVP(b) RAP(c) RBP(d) RVP
Answer» Right option is (b) RAP Easiest explanation: RCP/RAP developers package has the core Eclipse PDE tools.

Discussion

235.	HDT is used for listing running Jobs on __________ Cluster.(a) MR(b) Hive(c) Pig(d) None of the mentioned
Answer» Right answer is (a) MR To explain: HDT can be used for launching Mapreduce programs on a Hadoop cluster.

Discussion

236.	Which of the following tool is intended to be more compatible with HDT?(a) Git(b) Juno(c) Indigo(d) None of the mentioned
Answer» Correct answer is (c) Indigo The explanation: The HDT uses a git repository, which anyone is free to checkout.

Discussion

237.	Point out the wrong statement.(a) There is support for creating Hadoop project in HDT(b) HDT aims at bringing plugins in eclipse to simplify development on Hadoop platform(c) HDT is based on eclipse plugin architecture and can possibly support other versions like 0.23, CDH4 etc in next releases(d) None of the mentioned
Answer» The correct answer is (d) None of the mentioned Explanation: HDT aims to simplify the Hadoop platform for developers.

Discussion

238.	HDT has been tested on __________ and Juno, and can work on Kepler as well.(a) Rainbow(b) Indigo(c) Indiavo(d) Hadovo
Answer» The correct answer is (b) Indigo For explanation: HDT aims at bringing plugins in eclipse to simplify development on Hadoop platform.

Discussion

239.	Which of the following platforms does Hadoop run on?(a) Bare metal(b) Debian(c) Cross-platform(d) Unix-like
Answer» Correct option is (c) Cross-platform The best explanation: Hadoop has support for cross-platform operating system.

Discussion

240.	What was Hadoop written in?(a) Java (software platform)(b) Perl(c) Java (programming language)(d) Lua (programming language)
Answer» Correct answer is (c) Java (programming language) The best I can explain: The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command-line utilities written as shell scripts.

Discussion

241.	Which of the following genres does Hadoop produce?(a) Distributed file system(b) JAX-RS(c) Java Message Service(d) Relational Database Management System
Answer» Right answer is (a) Distributed file system For explanation I would say: The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to the user.

Discussion

242.	Point out the wrong statement.(a) Hadoop works better with a small number of large files than a large number of small files(b) CombineFileInputFormat is designed to work well with small files(c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job(d) None of the mentioned
Answer» The correct option is (c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job Easy explanation: If the file is very small (“small” means significantly smaller than an HDFS block) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (one per file), each of which imposes extra bookkeeping overhead.

Discussion

243.	Point out the correct statement.(a) Avro provides functionality similar to systems such as Thrift(b) When Avro is used in RPC, the client and server exchange data in the connection handshake(c) Apache Avro, Avro, Apache, and the Avro and Apache logos are trademarks of The Java Foundation(d) None of the mentioned
Answer» The correct answer is (a) Avro provides functionality similar to systems such as Thrift Easy explanation: Avro differs from these systems in the fundamental aspects like untagged data.

Discussion

244.	Avro supports ______ kinds of complex types.(a) 3(b) 4(c) 6(d) 7
Answer» Correct option is (d) 7 To elaborate: Avro supports six kinds of complex types: records, enums, arrays, maps, unions and fixed.

Discussion

245.	Avro schemas are defined with _____(a) JSON(b) XML(c) JAVA(d) All of the mentioned
Answer» Right choice is (a) JSON For explanation I would say: JSON implementation facilitates implementation in languages that already have JSON libraries.

Discussion

246.	Point out the correct statement.(a) Avro Fixed type should be defined in Hive as lists of tiny ints(b) Avro Bytes type should be defined in Hive as lists of tiny ints(c) Avro Enum type should be defined in Hive as strings(d) All of the mentioned
Answer» Right answer is (b) Avro Bytes type should be defined in Hive as lists of tiny ints Easiest explanation: The AvroSerde will convert these to Bytes during the saving process.

Discussion

247.	Point out the wrong statement.(a) Java code is used to deserialize the contents of the file into objects(b) Avro allows you to use complex data structures within Hadoop MapReduce jobs(c) The m2e plugin automatically downloads the newly added JAR files and their dependencies(d) None of the mentioned
Answer» The correct answer is (d) None of the mentioned Easiest explanation: A unit test is useful because you can make assertions to verify that the values of the deserialized object are the same as the original values.

Discussion

248.	Which hdfs command is used to check for various inconsistencies?(a) fsk(b) fsck(c) fetchdt(d) none of the mentioned
Answer» Right choice is (b) fsck The best I can explain: fsck is designed for reporting problems with various files, for example, missing blocks for a file or under-replicated blocks.

Discussion

249.	Which of the following is a common hadoop maintenance issue?(a) Lack of tools(b) Lack of configuration management(c) Lack of web interface(d) None of the mentioned
Answer» The correct option is (b) Lack of configuration management Best explanation: Without a centralized configuration management framework, you end up with a number of issues that can cascade just as your usage picks up.

Discussion

250.	Point out the wrong statement.(a) To create an Avro-backed table, specify the serde as org.apache.hadoop.hive.serde2.avro.AvroSerDe(b) Avro-backed tables can be created in Hive using AvroSerDe(c) The AvroSerde cannot serialize any Hive table to Avro files(d) None of the mentioned
Answer» The correct answer is (c) The AvroSerde cannot serialize any Hive table to Avro files The best I can explain: The AvroSerde can serialize any Hive table to Avro files.

Discussion

Explore topic-wise InterviewSolutions in .

Which of the following is a configuration management system?(a) Alex(b) Puppet(c) Acem(d) None of the mentioned

Avro schemas describe the format of the message and are defined using ______________(a) JSON(b) XML(c) JS(d) All of the mentioned

Point out the correct statement.(a) RAID is turned off by default(b) Hadoop is designed to be a highly redundant distributed system(c) Hadoop has a networked configuration system(d) None of the mentioned

___________ mode allows you to suppress alerts for a host, service, role, or even the entire cluster.(a) Safe(b) Maintenance(c) Secure(d) All of the mentioned

Which of the following is a primitive data type in Avro?(a) null(b) boolean(c) float(d) all of the mentioned

Point out the correct statement.(a) Records use the type name “record” and support three attributes(b) Enum are represented using JSON arrays(c) Avro data is always serialized with its schema(d) All of the mentioned

Which of the following interface is implemented by Sqoop for recording?(a) SqoopWrite(b) SqoopRecord(c) SqoopRead(d) None of the mentioned

Records are terminated by a __________ character.(a) RECORD_DELIMITER(b) FIELD_DELIMITER(c) FIELD_LIMITER(d) None of the mentioned

_________ supports null values for all types.(a) SmallObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) JdbcWritableBridge

Which of the following class is used for general processing of error?(a) LargeObjectLoader(b) ProcessingException(c) DelimiterSet(d) LobSerializer

Which of the following is a singleton instance class?(a) LargeObjectLoader(b) FieldMapProcessor(c) DelimiterSet(d) LobSerializer

Which of the guarantee is provided by Zookeeper?(a) Interactivity(b) Flexibility(c) Scalability(d) Reliability

__________ is used as a remote procedure call (RPC) framework for facebook.(a) Oozie(b) Mahout(c) Thrift(d) Impala

The ________ class provides the getValue() method to read the values from its instance.(a) Get(b) Result(c) Put(d) Value

A number of constants used in the client ZooKeeper API were renamed in order to reduce ________ collision.(a) value(b) namespace(c) counter(d) none of the mentioned

ZooKeeper is especially fast in ___________ workloads.(a) write(b) read-dominant(c) read-write(d) none of the mentioned

Point out the correct statement.(a) Thrift is developed for scalable cross-language services development(b) Thrift includes a complete stack for creating clients and servers(c) The top part of the Thrift stack is generated code from the Thrift definition(d) All of the mentioned

________ communicate with the client and handle data-related operations.(a) Master Server(b) Region Server(c) Htable(d) All of the mentioned

The Email & Apps team of ___________ uses ZooKeeper to coordinate sharding and responsibility changes in a distributed email client.(a) Katta(b) Helprace(c) Rackspace(d) None of the mentioned

ZooKeeper is used for configuration, leader election in Cloud edition of ______________(a) Solr(b) Solur(c) Solar101(d) All of the mentioned

_________ is the main configuration file of HBase.(a) hbase.xml(b) hbase-site.xml(c) hbase-site-conf.xml(d) none of the mentioned

Which of the following project is interface definition language for hadoop?(a) Oozie(b) Mahout(c) Thrift(d) Impala

The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.(a) NLineInputFormat(b) InputLineFormat(c) LineInputFormat(d) None of the mentioned

___________ takes node and rack locality into account when deciding which blocks to place in the same split.(a) CombineFileOutputFormat(b) CombineFileInputFormat(c) TextFileInputFormat(d) None of the mentioned

The split size is normally the size of a ________ block, which is appropriate for most applications.(a) Generic(b) Task(c) Library(d) HDFS

Oozie can make _________ callback notifications on action start events and workflow end events.(a) TCP(b) HTTP(c) IP(d) All of the mentioned

Which of the following is one of the possible state for a workflow jobs?(a) PREP(b) START(c) RESUME(d) END

Oozie Workflow jobs are Directed ________ graphs of actions.(a) Acyclical(b) Cyclical(c) Elliptical(d) All of the mentioned

Which of the following has the core Eclipse PDE tools for HDT development?(a) RVP(b) RAP(c) RBP(d) RVP

HDT is used for listing running Jobs on __________ Cluster.(a) MR(b) Hive(c) Pig(d) None of the mentioned

Which of the following tool is intended to be more compatible with HDT?(a) Git(b) Juno(c) Indigo(d) None of the mentioned

HDT has been tested on __________ and Juno, and can work on Kepler as well.(a) Rainbow(b) Indigo(c) Indiavo(d) Hadovo

Which of the following platforms does Hadoop run on?(a) Bare metal(b) Debian(c) Cross-platform(d) Unix-like

What was Hadoop written in?(a) Java (software platform)(b) Perl(c) Java (programming language)(d) Lua (programming language)

Which of the following genres does Hadoop produce?(a) Distributed file system(b) JAX-RS(c) Java Message Service(d) Relational Database Management System

Avro supports ______ kinds of complex types.(a) 3(b) 4(c) 6(d) 7

Avro schemas are defined with _____(a) JSON(b) XML(c) JAVA(d) All of the mentioned

Point out the correct statement.(a) Avro Fixed type should be defined in Hive as lists of tiny ints(b) Avro Bytes type should be defined in Hive as lists of tiny ints(c) Avro Enum type should be defined in Hive as strings(d) All of the mentioned

Point out the wrong statement.(a) Java code is used to deserialize the contents of the file into objects(b) Avro allows you to use complex data structures within Hadoop MapReduce jobs(c) The m2e plugin automatically downloads the newly added JAR files and their dependencies(d) None of the mentioned

Which hdfs command is used to check for various inconsistencies?(a) fsk(b) fsck(c) fetchdt(d) none of the mentioned

Which of the following is a common hadoop maintenance issue?(a) Lack of tools(b) Lack of configuration management(c) Lack of web interface(d) None of the mentioned

Point out the wrong statement.(a) To create an Avro-backed table, specify the serde as org.apache.hadoop.hive.serde2.avro.AvroSerDe(b) Avro-backed tables can be created in Hive using AvroSerDe(c) The AvroSerde cannot serialize any Hive table to Avro files(d) None of the mentioned