Explore topic-wise InterviewSolutions in .

This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.

1.

________ is a multi-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TThreadPoolServer(c) TSimpleServer(d) None of the mentionedThe question was asked in unit test.Question is taken from Thrift with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The CORRECT choice is (b) TThreadPoolServer

Easiest EXPLANATION: TFramedTransport must be USED with this SERVER.

2.

Which of the following performs compression using zlib?(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentionedThe question was posed to me in an interview.This intriguing question comes from Thrift with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» CORRECT choice is (a) TZlibTransport

The best EXPLANATION: TZlibTransport is used in conjunction with ANOTHER transport. Not AVAILABLE in the Java implementation.
3.

__________ is a single-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentionedThe question was posed to me during an interview.My query is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right choice is (B) TSimpleServer

Easy explanation: TSimpleServer is USEFUL for TESTING.

4.

Which of the following is a multi-threaded server using non-blocking I/O?(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentionedI had been asked this question in quiz.This key question is from Thrift with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct CHOICE is (a) TNonblockingServer

The EXPLANATION is: JAVA implementation uses NIO channels.

5.

________ uses blocking socket I/O for transport.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentionedThis question was addressed to me during a job interview.I want to ask this question from Thrift with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» CORRECT choice is (c) TSocket

Explanation: TFramedTransport MUST be USED with this server.
6.

Point out the wrong statement.(a) There are no XML configuration files in Thrift(b) Thrift gives cross-language serialization with lower overhead than alternatives such as SOAP due to use of binary format(c) No framework to code is a feature of Thrift(d) None of the mentionedI got this question in a national level competition.This is a very interesting question from Thrift with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right ANSWER is (d) NONE of the mentioned

The explanation: There are no build dependencies or non-standard software. No MIX of incompatible software LICENSES.

7.

__________ uses memory for I/O in Thrift.(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentionedI had been asked this question during an online exam.The query is from Thrift with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct CHOICE is (C) TMemoryTransport

For explanation I would say: The Java IMPLEMENTATION USES a SIMPLE ByteArrayOutputStream internally.

8.

Point out the correct statement.(a) To create a Mahout service, one has to write Thrift files that describe it, generate the code in the destination language(b) Thrift is written in Java(c) Thrift is a lean and clean library(d) None of the mentionedThe question was posed to me in an interview for internship.This question is from Thrift with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct CHOICE is (c) THRIFT is a LEAN and CLEAN library

Explanation: The predefined serialization styles include: BINARY, HTTP-friendly and compact binary.

9.

_______ transport is required when using a non-blocking server.(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentionedI got this question in class test.I'm obligated to ask this question of Thrift with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» CORRECT option is (B) TFramedTransport

The best I can explain: TFramedTransport SENDS data in FRAMES, where each frame is preceded by length information.
10.

Which of the following Uses JSON for encoding of data?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) None of the mentionedThis question was addressed to me by my college director while I was bunking the class.The question is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct choice is (d) NONE of the mentioned

Explanation: TJSONProtocol is used JSON for encoding of DATA.

11.

________ is a write-only protocol that cannot be parsed by Thrift.(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolThis question was posed to me in class test.Enquiry is from Thrift with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct OPTION is (d) TSimpleJSONProtocol

Best EXPLANATION: TSimpleJSONProtocol DROPS metadata using JSON. SUITABLE for PARSING by scripting languages.

12.

Which of the following format is similar to TCompactProtocol?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolThe question was asked in my homework.Enquiry is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct answer is (B) TDenseProtocol

To explain: In TDenseProtocol, STRIPPING off the meta information from what is TRANSMITTED.

13.

Which of the following is a more compact binary format?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolThis question was addressed to me during an online interview.Enquiry is from Thrift with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» RIGHT ANSWER is (a) TCOMPACTPROTOCOL

To elaborate: TCompactProtocol is TYPICALLY more EFFICIENT to process as well.
14.

Point out the wrong statement.(a) With Thrift, it is not possible to define a service and change the protocol and transport without recompiling the code(b) Thrift includes server infrastructure to tie protocols and transports together, like blocking, non-blocking, and multi threaded servers(c) Thrift supports a number of protocols for service definition(d) None of the mentionedThis question was addressed to me by my school teacher while I was bunking the class.My question is based upon Thrift with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The CORRECT choice is (d) NONE of the mentioned

Best explanation: The underlying I/O part of the stack is DIFFERENTLY implemented for different LANGUAGES.

15.

Which of the following is a straightforward binary format?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolI have been asked this question in an online interview.This is a very interesting question from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» CORRECT ANSWER is (c) TBINARYPROTOCOL

Best explanation: TBinaryProtocol is not OPTIMIZED for SPACE efficiency.
16.

__________ is used as a remote procedure call (RPC) framework for facebook.(a) Oozie(b) Mahout(c) Thrift(d) ImpalaI had been asked this question in homework.My enquiry is from Thrift with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct answer is (c) Thrift

To explain: In CONTRAST to built-in types, created DATA STRUCTURES are sent as a result in generated code.

17.

Point out the correct statement.(a) Thrift is developed for scalable cross-language services development(b) Thrift includes a complete stack for creating clients and servers(c) The top part of the Thrift stack is generated code from the Thrift definition(d) All of the mentionedThis question was addressed to me in a national level competition.Query is from Thrift with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct choice is (d) All of the mentioned

Explanation: The services GENERATE from this file client and PROCESSOR code.

18.

Which of the following project is interface definition language for hadoop?(a) Oozie(b) Mahout(c) Thrift(d) ImpalaThis question was posed to me in an online quiz.The doubt is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct ANSWER is (c) Thrift

For explanation: Thrift is an interface definition language and binary communication protocol that is used to define and CREATE services for NUMEROUS LANGUAGES.

19.

The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.(a) NLineInputFormat(b) InputLineFormat(c) LineInputFormat(d) None of the mentionedThe question was asked by my school teacher while I was bunking the class.My question is from Crunch with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right option is (a) NLineInputFormat

The EXPLANATION: We can set the VALUE of PARAMETER VIA the Source interface’s inputConf method.

20.

The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.(a) spot(b) reflects(c) gets(d) all of the mentionedThe question was asked in a job interview.My doubt is from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» RIGHT choice is (B) reflects

To explain I would say: There are a couple of RESTRICTIONS on the STRUCTURE of the POJO.
21.

DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.(a) TaskInputContext(b) TaskInputOutputContext(c) TaskOutputContext(d) All of the mentionedThe question was posed to me in quiz.I'd like to ask this question from Crunch with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» RIGHT OPTION is (b) TaskInputOutputContext

Best EXPLANATION: There are ALSO a number of helper methods for working with the OBJECTS associated with the TaskInputOutputConte
22.

The top-level ___________ package contains three of the most important specializations in Crunch.(a) org.apache.scrunch(b) org.apache.crunch(c) org.apache.kcrunch(d) all of the mentionedI had been asked this question in an internship interview.This is a very interesting question from Crunch with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct CHOICE is (B) org.apache.crunch

To elaborate: Each of these specialized DoFn implementations has associated methods on the PCOLLECTION, PTABLE, and PGroupedTable interfaces to support common data processing STEPS.

23.

Point out the wrong statement.(a) DoFns also have a number of helper methods for working with Hadoop Counters, all named increment(b) The Crunch APIs contain a number of useful subclasses of DoFn that handle common data processing scenarios and are easier to write and test(c) FilterFn class defines a single abstract method(d) None of the mentionedI have been asked this question in unit test.My question is from Crunch with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The CORRECT choice is (d) None of the mentioned

For explanation: Counters are an incredibly USEFUL way of keeping TRACK of the state of long-running data pipelines and detecting any exceptional CONDITIONS that OCCUR during processing

24.

Inline DoFn that splits a line up into words is an inner class ____________(a) Pipeline(b) MyPipeline(c) ReadPipeline(d) WritePipeThe question was posed to me in an interview.My doubt is from Crunch with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct option is (b) MyPipeline

For explanation I would say: Inner CLASSES contain references to their parent OUTER classes, so unless MyPipeline implements the Serializable interface, the NotSerializableException will be thrown when Crunch tries to serialize the inner DOFN.

25.

Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition.(a) Transient(b) DoFns(c) Configuration(d) All of the mentionedI have been asked this question during an online exam.This is a very interesting question from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct answer is (b) DoFns

Explanation: Dofus is a FLASH based MASSIVELY MULTIPLAYER ONLINE role-playing game (MMORPG) DEVELOPED and published by Ankama Games.

26.

Point out the correct statement.(a) StreamPipeline executes the pipeline in-memory on the client(b) MemPipeline executes the pipeline by converting it to a series of Spark pipelines(c) MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster(d) All of the mentionedThis question was posed to me at a job interview.I'd like to ask this question from Crunch with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct choice is (C) MapReduce framework approach makes it EASY for the framework to serialize data from the CLIENT to the cluster

The explanation: SparkPipeline executes the PIPELINE by converting it to a series of Spark pipelines.

27.

PCollection, PTable, and PGroupedTable all support a __________ operation.(a) intersection(b) union(c) OR(d) None of the mentionedI got this question during an interview.The question is from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct option is (b) UNION

The EXPLANATION is: Union operation TAKES a series of distinct PCollections that all have the same data type and treats them as a single virtual PCOLLECTION.

28.

___________ executes the pipeline as a series of MapReduce jobs.(a) SparkPipeline(b) MRPipeline(c) MemPipeline(d) None of the mentionedThe question was asked in exam.This intriguing question originated from Crunch with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The CORRECT ANSWER is (b) MRPipeline

For explanation I would say: Every CRUNCH data PIPELINE is COORDINATED by an instance of the Pipeline interface.

29.

A __________ represents a distributed, immutable collection of elements of type T.(a) PCollect(b) PCollection(c) PCol(d) All of the mentionedI have been asked this question in an interview for job.This key question is from Crunch with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» RIGHT choice is (b) PCOLLECTION

To ELABORATE: PCollection provides a method, parallelDo, that APPLIES a DoFn to each element in the PCollection.
30.

Hive, Pig, and Cascading all use a _________ data model.(a) value centric(b) columnar(c) tuple-centric(d) none of the mentionedThis question was addressed to me in semester exam.My query is from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct choice is (c) tuple-centric

The best I can EXPLAIN: CRUNCH allows DEVELOPERS CONSIDERABLE flexibility in how they represent their data, which makes Crunch the best PIPELINE platform for developers.

31.

Point out the wrong statement.(a) Crunch pipeline written by the development team sessionizes a set of user logs generates are then processed by a diverse collection of Pig scripts and Hive queries(b) Crunch pipelines provide a thin veneer on top of MapReduce(c) Developers have access to low-level MapReduce APIs(d) None of the mentionedThe question was posed to me in my homework.My question is taken from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right choice is (d) None of the mentioned

The EXPLANATION: CRUNCH is extremely fast, only slightly slower than a hand-tuned PIPELINE developed with the MapReduce APIS.

32.

The Crunch APIs are modeled after _________which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.(a) FlagJava(b) FlumeJava(c) FlakeJava(d) All of the mentionedThis question was addressed to me in homework.Asked question is from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The CORRECT answer is (b) FlumeJava

For explanation: The Apache Crunch PROJECT develops and supports JAVA APIs that simplify the process of creating DATA pipelines on TOP of Apache Hadoop.

33.

Crunch was designed for developers who understand __________ and want to use MapReduce effectively.(a) Java(b) Python(c) Scala(d) JavascriptThe question was posed to me in class test.This intriguing question originated from Crunch with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» CORRECT option is (a) Java

The best EXPLANATION: Crunch is often USED in conjunction with HIVE and Pig.
34.

For Scala users, there is the __________ API, which is built on top of the Java APIs.(a) Prunch(b) Scrunch(c) Hivench(d) All of the mentionedI have been asked this question by my school teacher while I was bunking the class.My question comes from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» RIGHT option is (B) Scrunch

To elaborate: It includes a REPL (read-eval-print loop) for CREATING MapReduce PIPELINES.
35.

Point out the correct statement.(a) Scrunch’s Java API is centered around three interfaces that represent distributed datasets(b) All of the other data transformation operations supported by the Crunch APIs are implemented in terms of three primitives(c) A number of common Aggregator implementations are provided in the Aggregators class(d) All of the mentionedI have been asked this question in an interview.This intriguing question originated from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer» CORRECT choice is (c) A number of common Aggregator implementations are provided in the Aggregators class

To explain I would say: PGroupedTable provides a COMBINE values operation that allows a commutative and associative Aggregator to be APPLIED to the values of the PGroupedTable instance on both the map and REDUCE SIDES of the shuffle.
36.

The Apache Crunch Java library provides a framework for writing, testing, and running ___________ pipelines.(a) MapReduce(b) Pig(c) Hive(d) None of the mentionedThis question was posed to me in unit test.The query is from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right OPTION is (a) MapReduce

EASY explanation: GOAL of Crunch is to MAKE pipelines that are composed of many user-defined functions SIMPLE to write, easy to test, and efficient to run.

37.

Drill provides a __________ like internal data model to represent and process data.(a) XML(b) JSON(c) TIFF(d) None of the mentionedThis question was addressed to me during an online exam.I'd like to ask this question from Drill with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct CHOICE is (B) JSON

To explain: The flexibility of JSON data model allows Drill to query, without flattening, both SIMPLE and complex/nested data types as well as constantly changing application-driven schemas COMMONLY SEEN with Hadoop/NoSQL applications.

38.

Apache _________ provides direct queries on self-describing and semi-structured data in files.(a) Drill(b) Mahout(c) Oozie(d) All of the mentionedI had been asked this question by my school principal while I was bunking the class.The doubt is from Drill with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right answer is (a) Drill

For explanation: Users can EXPLORE LIVE data on their own as it arrives versus spending weeks or MONTHS on data preparation, modeling, ETL and subsequent schema management.

39.

Drill analyze semi-structured/nested data coming from _________ applications.(a) RDBMS(b) NoSQL(c) NewSQL(d) None of the mentionedThe question was asked during an online interview.Question is from Drill with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct choice is (B) NoSQL

The best I can explain: Modern big DATA APPLICATIONS such as social, mobile, web and IOT deal with a LARGER number of users and larger amount of data than the traditional transactional applications.

40.

Drill integrates withBI tools using a standard __________ connector.(a) JDBC(b) ODBC(c) ODBC-JDBC(d) All of the mentionedI had been asked this question during an interview for a job.I need to ask this question from Drill with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct choice is (b) ODBC

The explanation is: Drill conforms to the stringent ANSI SQL STANDARDS ENSURING compatibility with existing BI environments as WELL as Hive deployments.

41.

MapR __________ Solution Earns Highest Score in Gigaom Research Data Warehouse Interoperability Report.(a) SQL-on-Hadoop(b) Hive-on-Hadoop(c) Pig-on-Hadoop(d) All of the mentionedI got this question in class test.Asked question is from Drill with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct choice is (a) SQL-on-Hadoop

For explanation: Drill is a pioneer in DELIVERING self-service data exploration capabilities on data stored in multiple FORMATS in FILES or NOSQL databases.

42.

Point out the wrong statement.(a) Hadoop is a prerequisite for Drill(b) Drill tackles rapidly evolving application driven schemas and nested data structures(c) Drill provides a single interface for structured and semi-structured data allowing you to readily query JSON files and HBase tables as easily as a relational table(d) All of the mentionedI got this question in an international level competition.I need to ask this question from Drill with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right choice is (a) Hadoop is a prerequisite for Drill

The best explanation: Hadoop is not a prerequisite for Drill and users can start RAMPING up with Drill by running SQL queries directly on the LOCAL file SYSTEM.

43.

___________ includes Apache Drill as part of the Hadoop distribution.(a) Impala(b) MapR(c) Oozie(d) All of the mentionedThis question was posed to me in semester exam.Enquiry is from Drill with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct option is (b) MapR

Easiest explanation: The MapR Sandbox with Apache Drill is a fully functional single-node cluster that can be USED to GET an OVERVIEW on Apache Drill in a Hadoop environment.

44.

Drill is designed from the ground up to support high-performance analysis on the ____________ data.(a) semi-structured(b) structured(c) unstructured(d) none of the mentionedThis question was addressed to me during an internship interview.This interesting question is from Drill with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The CORRECT answer is (a) semi-structured

To elaborate: Drill is an APACHE open-source SQL query engine for Big DATA EXPLORATION.

45.

Point out the correct statement.(a) Drill provides plug-and-play integration with existing Apache Hive(b) Developers can use the sandbox environment to get a feel for the power and capabilities of Apache Drill by performing various types of queries(c) Drill is inspired by Google Dremel(d) None of the mentionedI got this question in a job interview.This key question is from Drill with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Correct option is (d) NONE of the mentioned

Explanation: Apache DRILL is an open source, low LATENCY SQL query engine for HADOOP and NoSQL.

46.

The _________ collocation identifier is integrated into the process that is used to create vectors from sequence files of text keys and values.(a) lbr(b) lcr(c) llr(d) larThe question was asked in quiz.My enquiry is from Mahout with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right OPTION is (C) llr

Easiest explanation: The –minLLR option can be used to control the CUTOFF that prevents collocations below the specified LLR SCORE from being emitted.

47.

A key of type ___________ is generated which is used later to join ngrams with their heads and tails in the reducer phase.(a) GramKey(b) Primary(c) Secondary(d) None of the mentionedThe question was posed to me during an online interview.My question is from Mahout with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct answer is (a) GramKey

To explain I WOULD say: The GramKey is a COMPOSITE KEY MADE up of a string n-gram fragment as the PRIMARY key and a secondary key used for grouping and sorting in the reduce phase.

48.

____________ generates NGrams and counts frequencies for ngrams, head and tail subgrams.(a) CollocationDriver(b) CollocDriver(c) CarDriver(d) All of the mentionedThe question was posed to me during an interview for a job.The query is from Mahout with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

Right choice is (b) CollocDriver

The best I can explain: Each CALL to the mapper PASSES in the full SET of TOKENS for the corresponding DOCUMENT using a StringTuple.

49.

The tokens are passed through a Lucene ____________ to produce NGrams of the desired length.(a) ShngleFil(b) ShingleFilter(c) SingleFilter(d) CollfilterThis question was addressed to me in examination.This interesting question is from Mahout with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct choice is (b) ShingleFilter

Explanation: The TOOLS that the collocation identification algorithm are EMBEDDED within either consume tokenized text as input or provide the ability to specify an implementation of the LUCENE Analyzer class perform TOKENIZATION in order to form NGRAMS.

50.

Point out the wrong statement.(a) ‘Taste’ collaborative-filtering recommender component of Mahout was originally a separate project and can run standalone without Hadoop(b) Integration of Mahout with initiatives such as the Pregel-like Giraph are actively under discussion(c) Calculating the LLR is very straightforward(d) None of the mentionedI got this question in an online quiz.I need to ask this question from Mahout with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Answer»

The correct ANSWER is (d) None of the mentioned

To EXPLAIN: There are a COUPLE WAYS to run the llr-based COLLOCATION algorithm in mahout.