InterviewSolution
Saved Bookmarks
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 101. |
Apache Hama provides complete clone of _________(a) Pragmatic(b) Pregel(c) ServePreg(d) All of the mentioned |
|
Answer» The correct choice is (b) Pregel Easy explanation: Pregel is used for large processing of graphs. |
|
| 102. |
_____________ is an IaaS (“Infrastracture as a Service”) cloud orchestration platform.(a) CloudStack(b) Cazerra(c) Click(d) All of the mentioned |
|
Answer» Right option is (a) CloudStack The best I can explain: Click is a component based Java Web Framework. |
|
| 103. |
________ is the most popular high-level Java API in Hadoop Ecosystem(a) Scalding(b) HCatalog(c) Cascalog(d) Cascading |
|
Answer» Right answer is (d) Cascading Easiest explanation: Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions. |
|
| 104. |
_________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.(a) Scalding(b) HCatalog(c) Cascalog(d) All of the mentioned |
|
Answer» Right answer is (c) Cascalog To elaborate: Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name “Cascalog” is a contraction of Cascading and Datalog. |
|
| 105. |
Point out the correct statement.(a) Scrunch’s Java API is centered around three interfaces that represent distributed datasets(b) All of the other data transformation operations supported by the Crunch APIs are implemented in terms of three primitives(c) A number of common Aggregator implementations are provided in the Aggregators class(d) All of the mentioned |
|
Answer» Correct choice is (c) A number of common Aggregator To explain I would say: PGroupedTable provides a combine values operation that allows a commutative and associative Aggregator to be applied to the values of the PGroupedTable instance on both the map and reduce sides of the shuffle. |
|
| 106. |
Hama consist of mainly ________ components for large scale processing of graphs.(a) two(b) three(c) four(d) five |
|
Answer» The correct option is (b) three Explanation: Hama consists of three major components: BSPMaster, GroomServers and Zookeeper. |
|
| 107. |
_____________ is a distributed computing framework based on BSP.(a) HCataMan(b) HCatlaog(c) Hama(d) All of the mentioned |
|
Answer» Correct option is (c) Hama Explanation: BSP stands for Bulk Synchronous Parallel. |
|
| 108. |
Which of the following is java-based tool for tracking, resolving and managing project dependencies?(a) jclouds(b) JDO(c) ivy(d) All of the mentioned |
|
Answer» Correct choice is (c) ivy The best I can explain: jclouds is a cloud agnostic library that enables developers to access a variety of supported cloud providers using one API. |
|
| 109. |
Spark runs on top of ___________ a cluster manager system which provides efficient resource isolation across distributed applications.(a) Mesjs(b) Mesos(c) Mesus(d) All of the mentioned |
|
Answer» The correct option is (b) Mesos The best explanation: Mesos enables fine grained sharing which allows a Spark job to dynamically take advantage of the idle resources in the cluster during its execution. |
|
| 110. |
The __________ provides a proxy between the web applications exported by an application and an end user.(a) ProxyServer(b) WebAppProxy(c) WebProxy(d) None of the mentioned |
|
Answer» Correct option is (b) WebAppProxy Explanation: If security is enabled it will warn users before accessing a potentially unsafe web application. Authentication and authorization using the proxy is handled just like any other privileged web application. |
|
| 111. |
The web UI provides information about ________ job statistics of the Hama cluster.(a) MPP(b) BSP(c) USP(d) ISP |
|
Answer» Right choice is (b) BSP To elaborate: Running/completed/Failed jobs is detailed in UI interface. |
|
| 112. |
A recurring workflow is used for purging expired data on __________ cluster.(a) Primary(b) Secondary(c) BCP(d) None of the mentioned |
|
Answer» The correct option is (a) Primary The explanation is: Falcon provides retention workflow for each cluster based on the defined policy. |
|
| 113. |
Storm is benchmarked as processing one million _______ byte messages per second per node.(a) 10(b) 50(c) 100(d) 200 |
|
Answer» Correct answer is (c) 100 Explanation: Storm is a distributed real-time computation system. |
|
| 114. |
Kafka is comparable to traditional messaging systems such as _____________(a) Impala(b) ActiveMQ(c) BigTop(d) Zookeeper |
|
Answer» The correct choice is (b) ActiveMQ Explanation: Kafka works well as a replacement for a more traditional message broker. |
|
| 115. |
Apache __________ is a platform for building native mobile applications using HTML, CSS and JavaScript (formerly Phonegap).(a) Cazerra(b) Cordova(c) CouchDB(d) All of the mentioned |
|
Answer» Correct option is (b) Cordova Explanation: The project entered incubation as Callback, but decided to change its name to Cordova on 2011-11-28. |
|
| 116. |
__________ is the default mode if you download Hama.(a) Local Mode(b) Pseudo Distributed Mode(c) Distributed Mode(d) All of the mentioned |
|
Answer» The correct option is (a) Local Mode The explanation is: This mode can be configured via the bsp.master.address property to local. |
|
| 117. |
For Scala users, there is the __________ API, which is built on top of the Java APIs.(a) Prunch(b) Scrunch(c) Hivench(d) All of the mentioned |
|
Answer» Right option is (b) Scrunch To elaborate: It includes a REPL (read-eval-print loop) for creating MapReduce pipelines. |
|
| 118. |
Point out the wrong statement.(a) Crunch pipeline written by the development team sessionizes a set of user logs generates are then processed by a diverse collection of Pig scripts and Hive queries(b) Crunch pipelines provide a thin veneer on top of MapReduce(c) Developers have access to low-level MapReduce APIs(d) None of the mentioned |
|
Answer» Right choice is (d) None of the mentioned The explanation: Crunch is extremely fast, only slightly slower than a hand-tuned pipeline developed with the MapReduce APIs. |
|
| 119. |
Apache __________ is a generic cluster management framework used to build distributed systems.(a) Helix(b) Gereition(c) FtpServer(d) None of the mentioned |
|
Answer» The correct answer is (a) Helix Best explanation: Helix provides automatic partition management, fault tolerance and elasticity. |
|
| 120. |
A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.(a) MapReduce(b) Mapper(c) TaskTracker(d) JobTracker |
|
Answer» Right answer is (c) TaskTracker The explanation: TaskTracker receives the information necessary for the execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker. |
|
| 121. |
Hama requires JRE _______ or higher and ssh to be set up between nodes in the cluster.(a) 1.6(b) 1.7(c) 1.8(d) 2.0 |
|
Answer» Correct answer is (a) 1.6 The best explanation: Apache Hama releases are available under the Apache License, Version 2.0. |
|
| 122. |
__________ is a cluster manager that provides resource sharing and isolation across cluster applications.(a) Merlin(b) Mesos(c) Max(d) Merge |
|
Answer» The correct choice is (b) Mesos Easiest explanation: Merlin eclipse plugin is merged with an existing eclipse plugin already at avalon. |
|
| 123. |
__________ node distributes code across the cluster.(a) Zookeeper(b) Nimbus(c) Supervisor(d) None of the mentioned |
|
Answer» The correct choice is (b) Nimbus To explain: Nimbus node is master node, similar to the Hadoop JobTracker. |
|
| 124. |
The daemons associated with the MapReduce phase are ________ and task-trackers.(a) job-tracker(b) map-tracker(c) reduce-tracker(d) all of the mentioned |
|
Answer» Right choice is (a) job-tracker Explanation: Map-Reduce jobs are submitted on job-tracker. |
|
| 125. |
InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.(a) puts(b) gets(c) getSplits(d) all of the mentioned |
|
Answer» Right answer is (c) getSplits Easy explanation: InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers. |
|
| 126. |
Apache Storm added the open source, stream data processing to _________ Data Platform.(a) Cloudera(b) Hortonworks(c) Local Cloudera(d) MapR |
|
Answer» Right answer is (b) Hortonworks The best explanation: The Storm community is working to improve capabilities related to three important themes: business continuity, operations and developer productivity. |
|
| 127. |
The easiest way to have an HDP cluster is to download the _____________(a) Hadoop(b) Sandbox(c) Dashboard(d) None of the mentioned |
|
Answer» Correct option is (b) Sandbox The best explanation: The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache™ Hadoop® services. |
|
| 128. |
Storm integrates with __________ via Apache Slider.(a) Scheduler(b) YARN(c) Compaction(d) All of the mentioned |
|
Answer» Right answer is (c) Compaction Easy explanation: Impala is open source (Apache License), so you can self-support in perpetuity if you wish. |
|
| 129. |
Point out the wrong statement.(a) Knox eliminates the need for client software or client configuration and thus simplifies the access model(b) Simplified access entend Hadoop’s REST/HTTP services by encapsulating Kerberos within the cluster(c) Knox intercepts web vulnerability removal and other security services through a series of extensible interceptor pipelines(d) None of the mentioned |
|
Answer» Right option is (d) None of the mentioned The best explanation: Knox aggregates REST/HTTP calls to various components within the Hadoop ecosystem. |
|
| 130. |
__________ is OData implementation in Java.(a) Bigred(b) Nuvem(c) Olingo(d) Onami |
|
Answer» Right answer is (d) Onami Easy explanation: Apache Onami aims to create a community focused on the development and maintenance of a set of Google Guice extensions. |
|
| 131. |
Which of the following is a robust implementation of the OASIS WSDM?(a) Myfaces(b) Muse(c) modftp(d) None of the mentioned |
|
Answer» Correct answer is (b) Muse The best explanation: Muse uses Web Services (MuWS) specifications. |
|
| 132. |
______________ is another implementation of the MapRunnable interface that runs mappers concurrently in a configurable number of threads.(a) MultithreadedRunner(b) MultithreadedMap(c) MultithreadedMapRunner(d) SinglethreadedMapRunner |
|
Answer» Correct choice is (c) MultithreadedMapRunner Easiest explanation: A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function. |
|
| 133. |
The Crunch APIs are modeled after _________ which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.(a) FlagJava(b) FlumeJava(c) FlakeJava(d) All of the mentioned |
|
Answer» The correct answer is (b) FlumeJava For explanation: The Apache Crunch project develops and supports Java APIs that simplify the process of creating data pipelines on top of Apache Hadoop. |
|
| 134. |
Point out the correct statement.(a) Blur is a search platform capable of searching massive amounts of data in a cloud computing environment(b) Calcite is a not a very good customizable engine for parsing(c) Broklyn is a highly customizable engine for parsing(d) All of the mentioned |
|
Answer» Correct option is (a) Blur is a search platform capable of searching massive amounts of data in a cloud computing environment To elaborate: Blur is an incubator developed by Doug Cutting. |
|
| 135. |
The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible.(a) DataNodes(b) TaskTracker(c) ActionNodes(d) All of the mentioned |
|
Answer» Right choice is (b) TaskTracker Best explanation: A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status whether the node is dead or alive. |
|
| 136. |
How many types of nodes are present in Storm cluster?(a) 1(b) 2(c) 3(d) 4 |
|
Answer» Right answer is (c) 3 To explain: A storm cluster has three sets of nodes. |
|
| 137. |
All decision nodes must have a _____________ element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.(a) name(b) default(c) server(d) client |
|
Answer» Correct option is (b) default The explanation: The default element indicates the transition to take if none of the predicates evaluates to true. |
|
| 138. |
Apache Knox Eliminates _______ edge node risks.(a) SSL(b) SSO(c) SSH(d) All of the mentioned |
|
Answer» Right answer is (c) SSH The explanation: Knox hides Network Topology. |
|
| 139. |
__________ Manager’s Service feature monitors dozens of service health and performance metrics about the services and role instances running on your cluster.(a) Microsoft(b) Cloudera(c) Amazon(d) None of the mentioned |
|
Answer» Right choice is (b) Cloudera Explanation: Manager’s Service feature presents health and performance data in a variety of formats. |
|
| 140. |
Point out the wrong statement.(a) Cassandra supplies linear scalability, meaning that capacity may be easily added simply by adding new nodes online(b) Cassandra 2.0 included major enhancements to CQL, security, and performance(c) CQL for Cassandra 2.0.6 adds several important features including batching of conditional updates, static columns, and increased control over slicing of clustering columns(d) None of the Mentioned |
|
Answer» The correct answer is (d) None of the Mentioned The explanation is: Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. |
|
| 141. |
Knox provides perimeter _________ for Hadoop clusters.(a) reliability(b) security(c) flexibility(d) fault tolerant |
|
Answer» Right answer is (b) security Easy explanation: Kerberos requires a client side library and complex client side configuration. |
|
| 142. |
Knox integrates with prevalent identity management and _______ systems.(a) SSL(b) SSO(c) SSH(d) Kerberos |
|
Answer» Correct choice is (b) SSO Best explanation: Knox allows identities from those enterprise systems to be used for seamless, secure access to Hadoop clusters. |
|
| 143. |
For enabling streaming data to _________ chukwa collector writer class can be configured in chukwa-collector-conf.xml.(a) HCatalog(b) HBase(c) Hive(d) All of the mentioned |
|
Answer» The correct choice is (b) HBase Explanation: In this mode, the filesystem to write to is determined by the option writer.hdfs.filesystem in chukwa-collector-conf.xml. |
|
| 144. |
Point out the correct statement.(a) Knox is a stateless reverse proxy framework(b) Knox also intercepts REST/HTTP calls and provides authentication(c) Knox scales linearly by adding more Knox nodes as the load increases(d) All of the mentioned |
|
Answer» Correct answer is (d) All of the mentioned Easiest explanation: Knox can be deployed as a cluster of Knox instances that route requests to Hadoop’s REST APIs. |
|
| 145. |
A _______________ action can be configured to perform file system cleanup and directory creation before starting the mapreduce job.(a) map(b) reduce(c) map-reduce(d) none of the mentioned |
|
Answer» Right choice is (c) map-reduce For explanation I would say: The map-reduce action starts a Hadoop map/reduce job from a workflow. |
|
| 146. |
The minimum number of row versions to keep is configured per column family via _____________(a) HBaseDecriptor(b) HTabDescriptor(c) HColumnDescriptor(d) All of the mentioned |
|
Answer» Correct choice is (c) HColumnDescriptor Explanation: The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the number of row versions parameter. |
|
| 147. |
BatchEE projects aims to provide an _________ implementation. (aka JSR352)(a) JBat(b) JBatch(c) JBash(d) None of the mentioned |
|
Answer» Right choice is (b) JBatch The explanation: BatchEE provides a set of useful extensions for this specification. |
|
| 148. |
A __________ determines which data centers and racks nodes belong to it.(a) Client requests(b) Snitch(c) Partitioner(d) None of the mentioned |
|
Answer» Right option is (b) Snitch The best I can explain: Client read or write requests can be sent to any node in the cluster because all nodes in Cassandra are peers. |
|
| 149. |
A __________ can route requests to multiple Knox instances.(a) collector(b) load balancer(c) comparator(d) all of the mentioned |
|
Answer» Correct choice is (b) load balancer The explanation is: Knox is a stateless reverse proxy framework. |
|
| 150. |
Apache Cassandra is a massively scalable open source _______ database.(a) SQL(b) NoSQL(c) NewSQL(d) All of the mentioned |
|
Answer» Correct answer is (b) NoSQL For explanation: Cassandra is perfect for managing large amounts of data across multiple data centers and the cloud. |
|