InterviewSolution
Saved Bookmarks
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 151. |
Point out the correct statement.(a) Cassandra delivers continuous availability, linear scalability, and operational simplicity across many commodity servers(b) Cassandra has a “masterless” architecture, meaning all nodes are the same(c) Cassandra also provides customizable replication, storing redundant copies of data across nodes that participate in a Cassandra ring(d) All of the mentioned |
|
Answer» Correct choice is (d) All of the mentioned To explain: Cassandra provides automatic data distribution across all nodes that participate in a “ring” or database cluster. |
|
| 152. |
_________ can be configured per table for non-QUORUM consistency levels.(a) Read repair(b) Read damage(c) Write repair(d) None of the mentioned |
|
Answer» The correct answer is (a) Read repair The best I can explain: If the replicas are inconsistent, the coordinator issues writes to the out-of-date replicas to update the row to the most recent values. This process is known as read repair. |
|
| 153. |
Cassandra uses a protocol called _______ to discover location and state information.(a) gossip(b) intergos(c) goss(d) all of the mentioned |
|
Answer» Correct option is (a) gossip To explain I would say: Gossip is used for internode communication. |
|
| 154. |
There are _________ types of read requests that a coordinator can send to a replica.(a) two(b) three(c) four(d) all of the mentioned |
|
Answer» Right choice is (b) three The explanation is: The coordinator node contacts one replica node with a direct read request. |
|
| 155. |
Cassandra searches the __________ to determine the approximate location on disk of the index entry.(a) partition record(b) partition summary(c) partition search(d) all of the mentioned |
|
Answer» Correct option is (b) partition summary The best I can explain: If the Bloom filter does not rule out the SSTable, Cassandra checks the partition key cache. |
|
| 156. |
For each SSTable, Cassandra creates _________ index.(a) memory(b) partition(c) in memory(d) all of the mentioned |
|
Answer» Correct choice is (b) partition Best explanation: Partition index is list of partition keys and the start position of rows in the data file (on disk). |
|
| 157. |
The type of __________ strategy Cassandra performs on your data is configurable and can significantly affect read performance.(a) compression(b) collection(c) compaction(d) decompression |
|
Answer» Correct answer is (c) compaction The best I can explain: Using the SizeTieredCompactionStrategy or DateTieredCompactionStrategy tends to cause data fragmentation when rows are frequently updated. |
|
| 158. |
Authorization capabilities for Cassandra use the familiar _________ security paradigm to manage object permissions.(a) COMMIT(b) GRANT(c) ROLLBACK(d) None of the mentioned |
|
Answer» Right option is (b) GRANT The explanation: Once authenticated into a database cluster using either internal authentication, the next security issue to be tackled is permission management. |
|
| 159. |
The _____________ allows external processes to watch the stream of chunks passing through the collector.(a) LocalWriter(b) SeqFileWriter(c) SocketTeeWriter(d) All of the mentioned |
|
Answer» Right choice is (c) SocketTeeWriter Explanation: SocketTeeWriter listens on a port (specified by conf option chukwaCollector.tee.port, defaulting to 9094.) |
|
| 160. |
Point out the correct statement.(a) chukwa supports two different reliability strategies(b) chukwaCollector.asyncAcks.scantime affects how often collectors will check the filesystem for commits(c) chukwaCollector.asyncAcks.scanperiod defaults to thrice the rotation interval(d) all of the mentioned |
|
Answer» Right choice is (a) chukwa supports two different reliability strategies To explain I would say: The first, default strategy, is as follows: collectors write data to HDFS, and as soon as the HDFS write call returns success, report success to the agent, which advances its checkpoint state. |
|
| 161. |
Point out the wrong statement.(a) Filters use the same syntax as the Dump command(b) “RAW” will send the internal data of the Chunk, without any metadata, prefixed by its length encoded as a 32-bit int(c) Specifying “WRITABLE” will cause the chunks to be written using Hadoop Writable serialization framework(d) None of the mentioned |
|
Answer» The correct option is (d) None of the mentioned Explanation: “HEADER” is similar to “RAW”, but with a one-line header in front of the content. |
|
| 162. |
Conceptually, each _________ emits a semi-infinite stream of bytes, numbered starting from zero.(a) Collector(b) Adaptor(c) Compactor(d) LocalWriter |
|
Answer» Right answer is (b) Adaptor For explanation I would say: A Chunk is a sequence of bytes, with some metadata. Several of these are set automatically by the Agent or Adaptors. |
|
| 163. |
Point out the wrong statement.(a) The framework calls reduce method for each pair in the grouped inputs(b) The output of the Reducer is re-sorted(c) reduce method reduces values for a given key(d) None of the mentioned |
|
Answer» Right choice is (b) The output of the Reducer is re-sorted The best explanation: The output of the Reducer is not re-sorted. |
|
| 164. |
_____________ is used to read data from bytes buffers.(a) write()(b) read()(c) readwrite()(d) all of the mentioned |
|
Answer» Correct answer is (a) write() To explain I would say: readfully method can also be used instead of read method. |
|
| 165. |
The _________ collocation identifier is integrated into the process that is used to create vectors from sequence files of text keys and values.(a) lbr(b) lcr(c) llr(d) lar |
|
Answer» Right option is (c) llr Easiest explanation: The –minLLR option can be used to control the cutoff that prevents collocations below the specified LLR score from being emitted. |
|
| 166. |
____________ generates NGrams and counts frequencies for ngrams, head and tail subgrams.(a) CollocationDriver(b) CollocDriver(c) CarDriver(d) All of the mentioned |
|
Answer» Right choice is (b) CollocDriver The best I can explain: Each call to the mapper passes in the full set of tokens for the corresponding document using a StringTuple. |
|
| 167. |
The _________ as just the value field append(value) and the key is a LongWritable that contains the record number, count + 1.(a) SetFile(b) ArrayFile(c) BloomMapFile(d) None of the mentioned |
|
Answer» The correct option is (b) ArrayFile To explain: The SetFile instead of append(key, value) as just the key field append(key) and the value is always the NullWritable instance. |
|
| 168. |
________ method adds the deprecated key to the global deprecation map.(a) addDeprecits(b) addDeprecation(c) keyDeprecation(d) none of the mentioned |
|
Answer» The correct answer is (b) addDeprecation For explanation: addDeprecation does not override any existing entries in the deprecation map. |
|
| 169. |
_________ method clears all keys from the configuration.(a) clear(b) addResource(c) getClass(d) none of the mentioned |
|
Answer» Right choice is (a) clear For explanation I would say: getClass is used to get the value of the name property as a Class. |
|
| 170. |
Point out the wrong statement.(a) With Thrift, it is not possible to define a service and change the protocol and transport without recompiling the code(b) Thrift includes server infrastructure to tie protocols and transports together, like blocking, non-blocking, and multi threaded servers(c) Thrift supports a number of protocols for service definition(d) None of the mentioned |
|
Answer» The correct choice is (d) None of the mentioned Best explanation: The underlying I/O part of the stack is differently implemented for different languages. |
|
| 171. |
Which of the following is a more compact binary format?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocol |
|
Answer» Right answer is (a) TCompactProtocol To elaborate: TCompactProtocol is typically more efficient to process as well. |
|
| 172. |
Which of the following is a straightforward binary format?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocol |
|
Answer» Correct answer is (c) TBinaryProtocol Best explanation: TBinaryProtocol is not optimized for space efficiency. |
|
| 173. |
Point out the wrong statement.(a) The Kafka cluster does not retain all published messages(b) A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients(c) Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization(d) Messages are persisted on disk and replicated within the cluster to prevent data loss |
|
Answer» The correct answer is (a) The Kafka cluster does not retain all published messages To explain I would say: The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. |
|
| 174. |
__________ is one of many possible IAuthorizer implementations and the one that stores permissions in the system_auth.permissions table to support all authorization-related CQL statements.(a) CassandraAuth(b) CassandraAuthorizer(c) CassAuthorizer(d) All of the mentioned |
|
Answer» The correct choice is (b) CassandraAuthorizer The best explanation: Configuration consists mainly of changing the authorizer option in the cassandra.yaml to use the CassandraAuthorizer. |
|
| 175. |
Avro is said to be the future _______ layer of Hadoop.(a) RMC(b) RPC(c) RDC(d) All of the mentioned |
|
Answer» Right answer is (b) RPC The best I can explain: When Avro is used in RPC, the client and server exchange schemas in the connection handshake. |
|
| 176. |
Thrift resolves possible conflicts through _________ of the field.(a) Name(b) Static number(c) UID(d) None of the mentioned |
|
Answer» Correct option is (b) Static number The explanation: Avro resolves possible conflicts through the name of the field. |
|
| 177. |
Point out the wrong statement.(a) HBase provides only sequential access to data(b) HBase provides high latency batch processing(c) HBase internally provides serialized access(d) All of the mentioned |
|
Answer» Right option is (c) HBase internally provides serialized access Explanation: HBase internally uses Hash tables and provides random access. |
|
| 178. |
The Apache Jenkins server runs the ______________ job whenever code is committed to the trunk branch.(a) “Bigtop-trunk”(b) “Bigtop”(c) “Big-trunk”(d) None of the mentioned |
|
Answer» Right choice is (a) “Bigtop-trunk” Easiest explanation: Jenken Server in turn runs several test jobs. |
|
| 179. |
Apache ________ is a lightweight server for ActivityStreams.(a) Sirona(b) Taverna(c) Slider(d) Streams |
|
Answer» Right choice is (d) Streams For explanation I would say: Taverna is a domain-independent suite of tools used to design and execute data-driven workflows. |
|
| 180. |
Point out the wrong statement.(a) HiveServer2 has a new JDBC driver(b) CSV and TSV output formats are maintained for forward compatibility(c) HiveServer2 supports both embedded and remote access to HiveServer2(d) None of the mentioned |
|
Answer» Correct choice is (b) CSV and TSV output formats are maintained for forward compatibility To explain: CSV and TSV output formats are maintained for backward compatibility. |
|
| 181. |
Which of the following is used to set transaction isolation level?(a) –incremental=[true/false](b) –isolation=LEVEL(c) –force=[true/false](d) –truncateTable=[true/false] |
|
Answer» Correct option is (b) –isolation=LEVEL To explain I would say: Set the transaction isolation level to TRANSACTION_READ_COMMITTED or TRANSACTION_SERIALIZABLE. |
|
| 182. |
Point out the correct statement.(a) –helpusage display a usage message(b) The JDBC connection URL format has the prefix jdbc:hive:(c) Starting with Hive 0.14, there are improved SV output formats(d) None of the mentioned |
|
Answer» Right choice is (c) Starting with Hive 0.14, there are improved SV output formats Easiest explanation: Output formats available are namely DSV, CSV2 and TSV2. |
|
| 183. |
Hive specific commands can be run from Beeline, when the Hive _______ driver is used.(a) ODBC(b) JDBC(c) ODBC-JDBC(d) All of the Mentioned |
|
Answer» The correct choice is (b) JDBC Easy explanation: Hive specific commands are same as Hive CLI commands. |
|
| 184. |
To force Hive to be more verbose, it can be started with ___________(a) *hive –hiveconf hive.root.logger=INFO,console*(b) *hive –hiveconf hive.subroot.logger=INFO,console*(c) *hive –hiveconf hive.root.logger=INFOVALUE,console*(d) All of the mentioned |
|
Answer» Correct answer is (a) *hive –hiveconf hive.root.logger=INFO,console* The best explanation: This Statement will spit orders of magnitude more information to the console and will likely include any information the AvroSerde is trying to get you about what went wrong. |
|
| 185. |
_______ supports a new command shell Beeline that works with HiveServer2.(a) HiveServer2(b) HiveServer3(c) HiveServer4(d) None of the mentioned |
|
Answer» Right answer is (a) HiveServer2 Easy explanation: The Beeline shell works in both embedded mode as well as remote mode. |
|
| 186. |
The need for data replication can arise in various scenarios like ____________(a) Replication Factor is changed(b) DataNode goes down(c) Data Blocks get corrupted(d) All of the mentioned |
|
Answer» Right answer is (d) All of the mentioned Best explanation: Data is replicated across different DataNodes to ensure a high degree of fault-tolerance. |
|
| 187. |
Which of the following scenario may not be a good fit for HDFS?(a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file(b) HDFS is suitable for storing data related to applications requiring low latency data access(c) HDFS is suitable for storing data related to applications requiring low latency data access(d) None of the mentioned |
|
Answer» Correct answer is (a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file To elaborate: HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance. |
|
| 188. |
HDFS works in a __________ fashion.(a) master-worker(b) master-slave(c) worker/slave(d) all of the mentioned |
|
Answer» The correct option is (a) master-worker Explanation: NameNode servers as the master and each DataNode servers as a worker/slave |
|
| 189. |
Point out the wrong statement.(a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level(b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode(c) User data is stored on the local file system of DataNodes(d) DataNode is aware of the files to which the blocks stored on it belong to |
|
Answer» Right choice is (d) DataNode is aware of the files to which the blocks stored on it belong to For explanation I would say: NameNode is aware of the files to which the blocks stored on it belong to. |
|
| 190. |
__________ mode is a Namenode state in which it does not accept changes to the name space.(a) Recover(b) Safe(c) Rollback(d) None of the mentioned |
|
Answer» Correct answer is (c) Rollback For explanation: dfsadmin runs a HDFS dfsadmin client. |
|
| 191. |
Point out the wrong statement.(a) classNAME displays the class name needed to get the Hadoop jar(b) Balancer Runs a cluster balancing utility(c) An administrator can simply press Ctrl-C to stop the rebalancing process(d) None of the mentioned |
|
Answer» Correct choice is (a) classNAME displays the class name needed to get the Hadoop jar Easiest explanation: classpath prints the class path needed to get the Hadoop jar and the required libraries. |
|
| 192. |
________ NameNode is used when the Primary NameNode goes down.(a) Rack(b) Data(c) Secondary(d) None of the mentioned |
|
Answer» Right option is (c) Secondary The best explanation: Secondary namenode is used for all time availability and reliability. |
|
| 193. |
On the write side, it is expected that the user pass in valid _________ with data correctly.(a) HRecords(b) HCatRecos(c) HCatRecords(d) None of the mentioned |
|
Answer» Right answer is (c) HCatRecords To elaborate: In some cases where a user of HCat (such as some older versions of pig) does not support all the datatypes supported by hive, there are a few config parameters provided to handle data promotions/conversions to allow them to read data through HCatalog. |
|
| 194. |
Point out the correct statement.(a) The framework groups Reducer inputs by keys(b) The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged(c) Since JobConf.setOutputKeyComparatorClass(Class) can be used to control how intermediate keys are grouped, these can be used in conjunction to simulate secondary sort on values(d) All of the mentioned |
|
Answer» The correct answer is (d) All of the mentioned For explanation I would say: If equivalence rules for keys while grouping the intermediates are different from those for grouping keys before reduction, then one may specify a Comparator. |
|
| 195. |
In order to read any file in HDFS, instance of __________ is required.(a) filesystem(b) datastream(c) outstream(d) inputstream |
|
Answer» Correct option is (a) filesystem The best explanation: InputDataStream is used to read data from file. |
|
| 196. |
Point out the correct statement.(a) All hadoop commands are invoked by the bin/hadoop script(b) Hadoop has an option parsing framework that employs only parsing generic options(c) Archive command creates a hadoop archive(d) All of the mentioned |
|
Answer» Correct option is (a) All hadoop commands are invoked by the bin/hadoop script Easy explanation: Running the hadoop script without any arguments prints the description for all commands. |
|
| 197. |
In ___________ mode, the NameNode will interactively prompt you at the command line about possible courses of action you can take to recover your data.(a) full(b) partial(c) recovery(d) commit |
|
Answer» The correct answer is (c) recovery Easy explanation: Recovery mode can cause you to lose data, you should always backup your edit log and fsimage before using it. |
|
| 198. |
Reducer is input the grouped output of a ____________(a) Mapper(b) Reducer(c) Writable(d) Readable |
|
Answer» Correct answer is (a) Mapper To explain I would say: In the phase the framework, for each Reducer, fetches the relevant partition of the output of all the Mappers, via HTTP. |
|
| 199. |
Point out the wrong statement.(a) Version 1.4.0 is the fourth Flume release as an Apache top-level project(b) Apache Flume 1.5.2 is a security and maintenance release that disables SSLv3 on all components in Flume that support SSL/TLS(c) Flume is backwards-compatible with previous versions of the Flume 1.x codeline(d) None of the mentioned |
|
Answer» The correct answer is (d) None of the mentioned Easiest explanation: Apache Flume 1.3.1 is a maintenance release for the 1.3.0 release, and includes several bug fixes and performance enhancements. |
|
| 200. |
A number of ____________ source adapters give you the granular control to grab a specific file.(a) multimedia file(b) text file(c) image file(d) none of the mentioned |
|
Answer» Correct option is (b) text file The best I can explain: A number of predefined source adapters are built into Flume. |
|