145 + Interview Questions in Apache Spark in Hadoop Page 1 InterviewSolution

1.	________ is a multi-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TThreadPoolServer(c) TSimpleServer(d) None of the mentionedThe question was asked in unit test.Question is taken from Thrift with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The CORRECT choice is (b) TThreadPoolServer Easiest EXPLANATION: TFramedTransport must be USED with this SERVER.

Discussion

2.	Which of the following performs compression using zlib?(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentionedThe question was posed to me in an interview.This intriguing question comes from Thrift with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» CORRECT choice is (a) TZlibTransport The best EXPLANATION: TZlibTransport is used in conjunction with ANOTHER transport. Not AVAILABLE in the Java implementation.

Discussion

3.	__________ is a single-threaded server using standard blocking I/O.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentionedThe question was posed to me during an interview.My query is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right choice is (B) TSimpleServer Easy explanation: TSimpleServer is USEFUL for TESTING.

Discussion

4.	Which of the following is a multi-threaded server using non-blocking I/O?(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentionedI had been asked this question in quiz.This key question is from Thrift with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct CHOICE is (a) TNonblockingServer The EXPLANATION is: JAVA implementation uses NIO channels.

Discussion

5.	________ uses blocking socket I/O for transport.(a) TNonblockingServer(b) TSimpleServer(c) TSocket(d) None of the mentionedThis question was addressed to me during a job interview.I want to ask this question from Thrift with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» CORRECT choice is (c) TSocket Explanation: TFramedTransport MUST be USED with this server.

Discussion

6.	Point out the wrong statement.(a) There are no XML configuration files in Thrift(b) Thrift gives cross-language serialization with lower overhead than alternatives such as SOAP due to use of binary format(c) No framework to code is a feature of Thrift(d) None of the mentionedI got this question in a national level competition.This is a very interesting question from Thrift with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right ANSWER is (d) NONE of the mentioned The explanation: There are no build dependencies or non-standard software. No MIX of incompatible software LICENSES.

Discussion

7.	__________ uses memory for I/O in Thrift.(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentionedI had been asked this question during an online exam.The query is from Thrift with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct CHOICE is (C) TMemoryTransport For explanation I would say: The Java IMPLEMENTATION USES a SIMPLE ByteArrayOutputStream internally.

Discussion

8.	Point out the correct statement.(a) To create a Mahout service, one has to write Thrift files that describe it, generate the code in the destination language(b) Thrift is written in Java(c) Thrift is a lean and clean library(d) None of the mentionedThe question was posed to me in an interview for internship.This question is from Thrift with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct CHOICE is (c) THRIFT is a LEAN and CLEAN library Explanation: The predefined serialization styles include: BINARY, HTTP-friendly and compact binary.

Discussion

9.	_______ transport is required when using a non-blocking server.(a) TZlibTransport(b) TFramedTransport(c) TMemoryTransport(d) None of the mentionedI got this question in class test.I'm obligated to ask this question of Thrift with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» CORRECT option is (B) TFramedTransport The best I can explain: TFramedTransport SENDS data in FRAMES, where each frame is preceded by length information.

Discussion

10.	Which of the following Uses JSON for encoding of data?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) None of the mentionedThis question was addressed to me by my college director while I was bunking the class.The question is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct choice is (d) NONE of the mentioned Explanation: TJSONProtocol is used JSON for encoding of DATA.

Discussion

11.	________ is a write-only protocol that cannot be parsed by Thrift.(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolThis question was posed to me in class test.Enquiry is from Thrift with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct OPTION is (d) TSimpleJSONProtocol Best EXPLANATION: TSimpleJSONProtocol DROPS metadata using JSON. SUITABLE for PARSING by scripting languages.

Discussion

12.	Which of the following format is similar to TCompactProtocol?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolThe question was asked in my homework.Enquiry is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct answer is (B) TDenseProtocol To explain: In TDenseProtocol, STRIPPING off the meta information from what is TRANSMITTED.

Discussion

13.	Which of the following is a more compact binary format?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolThis question was addressed to me during an online interview.Enquiry is from Thrift with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» RIGHT ANSWER is (a) TCOMPACTPROTOCOL To elaborate: TCompactProtocol is TYPICALLY more EFFICIENT to process as well.

Discussion

14.	Point out the wrong statement.(a) With Thrift, it is not possible to define a service and change the protocol and transport without recompiling the code(b) Thrift includes server infrastructure to tie protocols and transports together, like blocking, non-blocking, and multi threaded servers(c) Thrift supports a number of protocols for service definition(d) None of the mentionedThis question was addressed to me by my school teacher while I was bunking the class.My question is based upon Thrift with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The CORRECT choice is (d) NONE of the mentioned Best explanation: The underlying I/O part of the stack is DIFFERENTLY implemented for different LANGUAGES.

Discussion

15.	Which of the following is a straightforward binary format?(a) TCompactProtocol(b) TDenseProtocol(c) TBinaryProtocol(d) TSimpleJSONProtocolI have been asked this question in an online interview.This is a very interesting question from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» CORRECT ANSWER is (c) TBINARYPROTOCOL Best explanation: TBinaryProtocol is not OPTIMIZED for SPACE efficiency.

Discussion

16.	__________ is used as a remote procedure call (RPC) framework for facebook.(a) Oozie(b) Mahout(c) Thrift(d) ImpalaI had been asked this question in homework.My enquiry is from Thrift with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct answer is (c) Thrift To explain: In CONTRAST to built-in types, created DATA STRUCTURES are sent as a result in generated code.

Discussion

17.	Point out the correct statement.(a) Thrift is developed for scalable cross-language services development(b) Thrift includes a complete stack for creating clients and servers(c) The top part of the Thrift stack is generated code from the Thrift definition(d) All of the mentionedThis question was addressed to me in a national level competition.Query is from Thrift with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct choice is (d) All of the mentioned Explanation: The services GENERATE from this file client and PROCESSOR code.

Discussion

18.	Which of the following project is interface definition language for hadoop?(a) Oozie(b) Mahout(c) Thrift(d) ImpalaThis question was posed to me in an online quiz.The doubt is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct ANSWER is (c) Thrift For explanation: Thrift is an interface definition language and binary communication protocol that is used to define and CREATE services for NUMEROUS LANGUAGES.

Discussion

19.	The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.(a) NLineInputFormat(b) InputLineFormat(c) LineInputFormat(d) None of the mentionedThe question was asked by my school teacher while I was bunking the class.My question is from Crunch with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right option is (a) NLineInputFormat The EXPLANATION: We can set the VALUE of PARAMETER VIA the Source interface’s inputConf method.

Discussion

20.	The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.(a) spot(b) reflects(c) gets(d) all of the mentionedThe question was asked in a job interview.My doubt is from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» RIGHT choice is (B) reflects To explain I would say: There are a couple of RESTRICTIONS on the STRUCTURE of the POJO.

Discussion

21.	DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.(a) TaskInputContext(b) TaskInputOutputContext(c) TaskOutputContext(d) All of the mentionedThe question was posed to me in quiz.I'd like to ask this question from Crunch with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» RIGHT OPTION is (b) TaskInputOutputContext Best EXPLANATION: There are ALSO a number of helper methods for working with the OBJECTS associated with the TaskInputOutputConte

Discussion

22.	The top-level ___________ package contains three of the most important specializations in Crunch.(a) org.apache.scrunch(b) org.apache.crunch(c) org.apache.kcrunch(d) all of the mentionedI had been asked this question in an internship interview.This is a very interesting question from Crunch with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct CHOICE is (B) org.apache.crunch To elaborate: Each of these specialized DoFn implementations has associated methods on the PCOLLECTION, PTABLE, and PGroupedTable interfaces to support common data processing STEPS.

Discussion

23.	Point out the wrong statement.(a) DoFns also have a number of helper methods for working with Hadoop Counters, all named increment(b) The Crunch APIs contain a number of useful subclasses of DoFn that handle common data processing scenarios and are easier to write and test(c) FilterFn class defines a single abstract method(d) None of the mentionedI have been asked this question in unit test.My question is from Crunch with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The CORRECT choice is (d) None of the mentioned For explanation: Counters are an incredibly USEFUL way of keeping TRACK of the state of long-running data pipelines and detecting any exceptional CONDITIONS that OCCUR during processing

Discussion

24.	Inline DoFn that splits a line up into words is an inner class ____________(a) Pipeline(b) MyPipeline(c) ReadPipeline(d) WritePipeThe question was posed to me in an interview.My doubt is from Crunch with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct option is (b) MyPipeline For explanation I would say: Inner CLASSES contain references to their parent OUTER classes, so unless MyPipeline implements the Serializable interface, the NotSerializableException will be thrown when Crunch tries to serialize the inner DOFN.

Discussion

25.	Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition.(a) Transient(b) DoFns(c) Configuration(d) All of the mentionedI have been asked this question during an online exam.This is a very interesting question from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct answer is (b) DoFns Explanation: Dofus is a FLASH based MASSIVELY MULTIPLAYER ONLINE role-playing game (MMORPG) DEVELOPED and published by Ankama Games.

Discussion

26.	Point out the correct statement.(a) StreamPipeline executes the pipeline in-memory on the client(b) MemPipeline executes the pipeline by converting it to a series of Spark pipelines(c) MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster(d) All of the mentionedThis question was posed to me at a job interview.I'd like to ask this question from Crunch with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct choice is (C) MapReduce framework approach makes it EASY for the framework to serialize data from the CLIENT to the cluster The explanation: SparkPipeline executes the PIPELINE by converting it to a series of Spark pipelines.

Discussion

27.	PCollection, PTable, and PGroupedTable all support a __________ operation.(a) intersection(b) union(c) OR(d) None of the mentionedI got this question during an interview.The question is from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct option is (b) UNION The EXPLANATION is: Union operation TAKES a series of distinct PCollections that all have the same data type and treats them as a single virtual PCOLLECTION.

Discussion

28.	___________ executes the pipeline as a series of MapReduce jobs.(a) SparkPipeline(b) MRPipeline(c) MemPipeline(d) None of the mentionedThe question was asked in exam.This intriguing question originated from Crunch with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The CORRECT ANSWER is (b) MRPipeline For explanation I would say: Every CRUNCH data PIPELINE is COORDINATED by an instance of the Pipeline interface.

Discussion

29.	A __________ represents a distributed, immutable collection of elements of type T.(a) PCollect(b) PCollection(c) PCol(d) All of the mentionedI have been asked this question in an interview for job.This key question is from Crunch with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» RIGHT choice is (b) PCOLLECTION To ELABORATE: PCollection provides a method, parallelDo, that APPLIES a DoFn to each element in the PCollection.

Discussion

30.	Hive, Pig, and Cascading all use a _________ data model.(a) value centric(b) columnar(c) tuple-centric(d) none of the mentionedThis question was addressed to me in semester exam.My query is from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct choice is (c) tuple-centric The best I can EXPLAIN: CRUNCH allows DEVELOPERS CONSIDERABLE flexibility in how they represent their data, which makes Crunch the best PIPELINE platform for developers.

Discussion

31.	Point out the wrong statement.(a) Crunch pipeline written by the development team sessionizes a set of user logs generates are then processed by a diverse collection of Pig scripts and Hive queries(b) Crunch pipelines provide a thin veneer on top of MapReduce(c) Developers have access to low-level MapReduce APIs(d) None of the mentionedThe question was posed to me in my homework.My question is taken from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right choice is (d) None of the mentioned The EXPLANATION: CRUNCH is extremely fast, only slightly slower than a hand-tuned PIPELINE developed with the MapReduce APIS.

Discussion

32.	The Crunch APIs are modeled after _________which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.(a) FlagJava(b) FlumeJava(c) FlakeJava(d) All of the mentionedThis question was addressed to me in homework.Asked question is from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The CORRECT answer is (b) FlumeJava For explanation: The Apache Crunch PROJECT develops and supports JAVA APIs that simplify the process of creating DATA pipelines on TOP of Apache Hadoop.

Discussion

33.	Crunch was designed for developers who understand __________ and want to use MapReduce effectively.(a) Java(b) Python(c) Scala(d) JavascriptThe question was posed to me in class test.This intriguing question originated from Crunch with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» CORRECT option is (a) Java The best EXPLANATION: Crunch is often USED in conjunction with HIVE and Pig.

Discussion

34.	For Scala users, there is the __________ API, which is built on top of the Java APIs.(a) Prunch(b) Scrunch(c) Hivench(d) All of the mentionedI have been asked this question by my school teacher while I was bunking the class.My question comes from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» RIGHT option is (B) Scrunch To elaborate: It includes a REPL (read-eval-print loop) for CREATING MapReduce PIPELINES.

Discussion

35.	Point out the correct statement.(a) Scrunch’s Java API is centered around three interfaces that represent distributed datasets(b) All of the other data transformation operations supported by the Crunch APIs are implemented in terms of three primitives(c) A number of common Aggregator implementations are provided in the Aggregators class(d) All of the mentionedI have been asked this question in an interview.This intriguing question originated from Crunch with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» CORRECT choice is (c) A number of common Aggregator implementations are provided in the Aggregators class To explain I would say: PGroupedTable provides a COMBINE values operation that allows a commutative and associative Aggregator to be APPLIED to the values of the PGroupedTable instance on both the map and REDUCE SIDES of the shuffle.

Discussion

36.	The Apache Crunch Java library provides a framework for writing, testing, and running ___________ pipelines.(a) MapReduce(b) Pig(c) Hive(d) None of the mentionedThis question was posed to me in unit test.The query is from Crunch with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right OPTION is (a) MapReduce EASY explanation: GOAL of Crunch is to MAKE pipelines that are composed of many user-defined functions SIMPLE to write, easy to test, and efficient to run.

Discussion

37.	Drill provides a __________ like internal data model to represent and process data.(a) XML(b) JSON(c) TIFF(d) None of the mentionedThis question was addressed to me during an online exam.I'd like to ask this question from Drill with Hadoop topic in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct CHOICE is (B) JSON To explain: The flexibility of JSON data model allows Drill to query, without flattening, both SIMPLE and complex/nested data types as well as constantly changing application-driven schemas COMMONLY SEEN with Hadoop/NoSQL applications.

Discussion

38.	Apache _________ provides direct queries on self-describing and semi-structured data in files.(a) Drill(b) Mahout(c) Oozie(d) All of the mentionedI had been asked this question by my school principal while I was bunking the class.The doubt is from Drill with Hadoop topic in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right answer is (a) Drill For explanation: Users can EXPLORE LIVE data on their own as it arrives versus spending weeks or MONTHS on data preparation, modeling, ETL and subsequent schema management.

Discussion

39.	Drill analyze semi-structured/nested data coming from _________ applications.(a) RDBMS(b) NoSQL(c) NewSQL(d) None of the mentionedThe question was asked during an online interview.Question is from Drill with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct choice is (B) NoSQL The best I can explain: Modern big DATA APPLICATIONS such as social, mobile, web and IOT deal with a LARGER number of users and larger amount of data than the traditional transactional applications.

Discussion

40.	Drill integrates withBI tools using a standard __________ connector.(a) JDBC(b) ODBC(c) ODBC-JDBC(d) All of the mentionedI had been asked this question during an interview for a job.I need to ask this question from Drill with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct choice is (b) ODBC The explanation is: Drill conforms to the stringent ANSI SQL STANDARDS ENSURING compatibility with existing BI environments as WELL as Hive deployments.

Discussion

41.	MapR __________ Solution Earns Highest Score in Gigaom Research Data Warehouse Interoperability Report.(a) SQL-on-Hadoop(b) Hive-on-Hadoop(c) Pig-on-Hadoop(d) All of the mentionedI got this question in class test.Asked question is from Drill with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct choice is (a) SQL-on-Hadoop For explanation: Drill is a pioneer in DELIVERING self-service data exploration capabilities on data stored in multiple FORMATS in FILES or NOSQL databases.

Discussion

42.	Point out the wrong statement.(a) Hadoop is a prerequisite for Drill(b) Drill tackles rapidly evolving application driven schemas and nested data structures(c) Drill provides a single interface for structured and semi-structured data allowing you to readily query JSON files and HBase tables as easily as a relational table(d) All of the mentionedI got this question in an international level competition.I need to ask this question from Drill with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right choice is (a) Hadoop is a prerequisite for Drill The best explanation: Hadoop is not a prerequisite for Drill and users can start RAMPING up with Drill by running SQL queries directly on the LOCAL file SYSTEM.

Discussion

43.	___________ includes Apache Drill as part of the Hadoop distribution.(a) Impala(b) MapR(c) Oozie(d) All of the mentionedThis question was posed to me in semester exam.Enquiry is from Drill with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct option is (b) MapR Easiest explanation: The MapR Sandbox with Apache Drill is a fully functional single-node cluster that can be USED to GET an OVERVIEW on Apache Drill in a Hadoop environment.

Discussion

44.	Drill is designed from the ground up to support high-performance analysis on the ____________ data.(a) semi-structured(b) structured(c) unstructured(d) none of the mentionedThis question was addressed to me during an internship interview.This interesting question is from Drill with Hadoop topic in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The CORRECT answer is (a) semi-structured To elaborate: Drill is an APACHE open-source SQL query engine for Big DATA EXPLORATION.

Discussion

45.	Point out the correct statement.(a) Drill provides plug-and-play integration with existing Apache Hive(b) Developers can use the sandbox environment to get a feel for the power and capabilities of Apache Drill by performing various types of queries(c) Drill is inspired by Google Dremel(d) None of the mentionedI got this question in a job interview.This key question is from Drill with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Correct option is (d) NONE of the mentioned Explanation: Apache DRILL is an open source, low LATENCY SQL query engine for HADOOP and NoSQL.

Discussion

46.	The _________ collocation identifier is integrated into the process that is used to create vectors from sequence files of text keys and values.(a) lbr(b) lcr(c) llr(d) larThe question was asked in quiz.My enquiry is from Mahout with Hadoop in section Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right OPTION is (C) llr Easiest explanation: The –minLLR option can be used to control the CUTOFF that prevents collocations below the specified LLR SCORE from being emitted.

Discussion

47.	A key of type ___________ is generated which is used later to join ngrams with their heads and tails in the reducer phase.(a) GramKey(b) Primary(c) Secondary(d) None of the mentionedThe question was posed to me during an online interview.My question is from Mahout with Hadoop in division Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct answer is (a) GramKey To explain I WOULD say: The GramKey is a COMPOSITE KEY MADE up of a string n-gram fragment as the PRIMARY key and a secondary key used for grouping and sorting in the reduce phase.

Discussion

48.	____________ generates NGrams and counts frequencies for ngrams, head and tail subgrams.(a) CollocationDriver(b) CollocDriver(c) CarDriver(d) All of the mentionedThe question was posed to me during an interview for a job.The query is from Mahout with Hadoop topic in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» Right choice is (b) CollocDriver The best I can explain: Each CALL to the mapper PASSES in the full SET of TOKENS for the corresponding DOCUMENT using a StringTuple.

Discussion

49.	The tokens are passed through a Lucene ____________ to produce NGrams of the desired length.(a) ShngleFil(b) ShingleFilter(c) SingleFilter(d) CollfilterThis question was addressed to me in examination.This interesting question is from Mahout with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct choice is (b) ShingleFilter Explanation: The TOOLS that the collocation identification algorithm are EMBEDDED within either consume tokenized text as input or provide the ability to specify an implementation of the LUCENE Analyzer class perform TOKENIZATION in order to form NGRAMS.

Discussion

50.	Point out the wrong statement.(a) ‘Taste’ collaborative-filtering recommender component of Mahout was originally a separate project and can run standalone without Hadoop(b) Integration of Mahout with initiatives such as the Pregel-like Giraph are actively under discussion(c) Calculating the LLR is very straightforward(d) None of the mentionedI got this question in an online quiz.I need to ask this question from Mahout with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop
Answer» The correct ANSWER is (d) None of the mentioned To EXPLAIN: There are a COUPLE WAYS to run the llr-based COLLOCATION algorithm in mahout.

Discussion

Explore topic-wise InterviewSolutions in .

__________ is used as a remote procedure call (RPC) framework for facebook.(a) Oozie(b) Mahout(c) Thrift(d) ImpalaI had been asked this question in homework.My enquiry is from Thrift with Hadoop in portion Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

Which of the following project is interface definition language for hadoop?(a) Oozie(b) Mahout(c) Thrift(d) ImpalaThis question was posed to me in an online quiz.The doubt is from Thrift with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop

___________ includes Apache Drill as part of the Hadoop distribution.(a) Impala(b) MapR(c) Oozie(d) All of the mentionedThis question was posed to me in semester exam.Enquiry is from Drill with Hadoop in chapter Apache Spark, Flume, Lucene, Hama, HCatalog, Mahout, Drill, Crunch and Thrift of Hadoop