38 + Interview Questions in Pig in Hadoop Page 1 InterviewSolution

1.	In comparison to SQL, Pig uses ______________(a) Lazy evaluation(b) ETL(c) Supports pipeline splits(d) All of the mentionedThis question was posed to me during an interview for a job.This interesting question is from Pig in Practice in portion Pig of Hadoop
Answer» The CORRECT option is (d) All of the mentioned For explanation I WOULD say: Pig Latin ability to include user code at any point in the PIPELINE is USEFUL for pipeline DEVELOPMENT.

Discussion

2.	You can specify parameter names and parameter values in one of the ways?(a) As part of a command line.(b) In parameter file, as part of a command line(c) With the declare statement, as part of Pig script(d) All of the mentionedThis question was addressed to me during an interview for a job.Question is taken from Data Processing Operators in Pig topic in division Pig of Hadoop
Answer» Right ANSWER is (d) All of the mentioned To explain I would say: Parameter SUBSTITUTION MAY be used inside of MACROS.

Discussion

3.	Which of the following file contains user defined functions (UDFs)?(a) script2-local.pig(b) pig.jar(c) tutorial.jar(d) excite.log.bz2I got this question during an online exam.This interesting question is from Data Processing Operators in Pig in division Pig of Hadoop
Answer» The correct answer is (C) tutorial.jar To EXPLAIN I WOULD say: tutorial.jar CONTAINS java classes also.

Discussion

4.	Which of the following is correct syntax for parameter substitution using cmd?(a) pig {-param param_name = param_value \| -param_file file_name} [-debug \| -dryrun] script(b) {%declare \| %default} param_name param_value(c) {%declare \| %default} param_name param_value cmd(d) All of the mentionedThe question was posed to me in an online quiz.The above asked question is from Data Processing Operators in Pig topic in division Pig of Hadoop
Answer» Right option is (a) pig {-param param_name = param_value \| -param_file file_name} [-DEBUG \| -dryrun] script The explanation is: Parameter SUBSTITUTION is used to substitute VALUES for parameters at RUN time.

Discussion

5.	Which of the following command can be used for debugging?(a) exec(b) execute(c) error(d) throwThe question was posed to me in semester exam.This question is from Data Processing Operators in Pig in section Pig of Hadoop
Answer» Correct CHOICE is (a) EXEC For EXPLANATION I would SAY: With the exec command, store STATEMENTS will not trigger execution; rather, the entire script is parsed before execution starts.

Discussion

6.	Point out the wrong statement.(a) You can run Pig scripts from the command line and from the Grunt shell(b) DECLARE defines a Pig macro(c) Use Pig scripts to place Pig Latin statements and Pig commands in a single file(d) None of the mentionedThis question was posed to me at a job interview.Origin of the question is Data Processing Operators in Pig in section Pig of Hadoop
Answer» RIGHT choice is (B) DECLARE defines a Pig MACRO The EXPLANATION is: DEFINE defines a Pig macro.

Discussion

7.	Use the __________ command to run a Pig script that can interact with the Grunt shell (interactive mode).(a) fetch(b) declare(c) run(d) all of the mentionedI have been asked this question in an interview for internship.Question is from Data Processing Operators in Pig in chapter Pig of Hadoop
Answer» RIGHT choice is (c) run Explanation: With the run COMMAND, every STORE TRIGGERS EXECUTION.

Discussion

8.	Which of the following command is used to show values to keys used in Pig?(a) set(b) declare(c) display(d) all of the mentionedThis question was addressed to me in a job interview.My question comes from Data Processing Operators in Pig in section Pig of Hadoop
Answer» Right answer is (a) SET Explanation: All PIG and HADOOP PROPERTIES can be set, EITHER in the Pig script or via the Grunt command line.

Discussion

9.	Point out the correct statement.(a) Invoke the Grunt shell using the “enter” command(b) Pig does not support jar files(c) Both the run and exec commands are useful for debugging because you can modify a Pig script in an editor(d) All of the mentionedThe question was posed to me in final exam.I need to ask this question from Data Processing Operators in Pig topic in section Pig of Hadoop
Answer» Correct ANSWER is (c) Both the run and exec commands are useful for debugging because you can MODIFY a Pig script in an editor Explanation: Both commands PROMOTE Pig script MODULARITY as they allow you to reuse existing components.

Discussion

10.	__________method tells LoadFunc which fields are required in the Pig script.(a) pushProjection()(b) relativeToAbsolutePath()(c) prepareToRead()(d) none of the mentionedThe question was asked in an international level competition.This interesting question is from User-defined Functions in Pig topic in division Pig of Hadoop
Answer» RIGHT choice is (a) pushProjection() To explain: Pig will USE the column index requiredField.index to COMMUNICATE with the LoadFunc about the fields REQUIRED by the Pig SCRIPT.

Discussion

11.	Which of the following is shortcut for DUMP operator?(a) \de alias(b) \d alias(c) \q(d) None of the mentionedThe question was posed to me by my college professor while I was bunking the class.My doubt is from Data Processing Operators in Pig topic in division Pig of Hadoop
Answer» CORRECT option is (b) \d ALIAS Explanation: If alias is IGNORED last defined alias will be USED.

Discussion

12.	____________ method enables the RecordReader associated with the InputFormat provided by the LoadFunc is passed to the LoadFunc.(a) getNext()(b) relativeToAbsolutePath()(c) prepareToRead()(d) all of the mentionedI had been asked this question in an internship interview.The query is from User-defined Functions in Pig topic in division Pig of Hadoop
Answer» The CORRECT choice is (c) prepareToRead() The best EXPLANATION: The RECORDREADER can then be used by the implementation in getNext() to return a TUPLE REPRESENTING a record of data back to pig.

Discussion

13.	The loader should use ______ method to communicate the load information to the underlying InputFormat.(a) relativeToAbsolutePath()(b) setUdfContextSignature()(c) getCacheFiles()(d) setLocation()I had been asked this question in an interview for job.Question is from User-defined Functions in Pig in portion Pig of Hadoop
Answer» The correct choice is (d) setLocation() The best I can explain: setLocation() method is CALLED by Pig to COMMUNICATE the load location to the LOADER.

Discussion

14.	___________ return a list of hdfs files to ship to distributed cache.(a) relativeToAbsolutePath()(b) setUdfContextSignature()(c) getCacheFiles()(d) getShipFiles()This question was posed to me in quiz.I'm obligated to ask this question of User-defined Functions in Pig in division Pig of Hadoop
Answer» Right answer is (d) getShipFiles() To EXPLAIN I would SAY: The default IMPLEMENTATION provided in LoadFunc handles this for FileSystem LOCATIONS.

Discussion

15.	____________ method will be called by Pig both in the front end and back end to pass a unique signature to the Loader.(a) relativeToAbsolutePath()(b) setUdfContextSignature()(c) getCacheFiles()(d) getShipFiles()This question was posed to me in quiz.The above asked question is from User-defined Functions in Pig in chapter Pig of Hadoop
Answer» The correct choice is (b) setUdfContextSignature() The explanation is: The signature can be used to store into the UDFContext any INFORMATION which the LOADER needs to store between various method INVOCATIONS in the front end and back end.

Discussion

16.	Point out the wrong statement.(a) The load/store UDFs control how data goes into Pig and comes out of Pig.(b) LoadCaster has methods to convert byte arrays to specific types.(c) The meaning of getNext() has changed and is called by Pig runtime to get the last tuple in the data(d) None of the mentionedThe question was asked in my homework.My question comes from User-defined Functions in Pig in chapter Pig of Hadoop
Answer» Right answer is (C) The meaning of getNext() has changed and is CALLED by PIG runtime to GET the last tuple in the DATA Easy explanation: The meaning of getNext() has not changed and is called by Pig runtime to get the next tuple in the data.

Discussion

17.	Which of the following has methods to deal with metadata?(a) LoadPushDown(b) LoadMetadata(c) LoadCaster(d) All of the mentionedThe question was asked during a job interview.My question comes from User-defined Functions in Pig in section Pig of Hadoop
Answer» The correct option is (b) LoadMetadata The explanation is: Most implementation of loaders don’t need to implement this UNLESS they interact with some metadata SYSTEM.

Discussion

18.	Point out the correct statement.(a) LoadMeta has methods to convert byte arrays to specific types(b) The Pig load/store API is aligned with Hadoop InputFormat class only(c) LoadPush has methods to push operations from Pig runtime into loader implementations(d) All of the mentionedI had been asked this question during an interview for a job.The query is from User-defined Functions in Pig topic in section Pig of Hadoop
Answer» Correct option is (c) LoadPush has methods to push operations from Pig runtime into loader implementations For EXPLANATION I WOULD say: Currently only the pushProjection() method is CALLED by Pig to COMMUNICATE to the loader the exact FIELDS that are required in the Pig script.

Discussion

19.	__________ abstract class has three main methods for loading data and for most use cases it would suffice to extend it.(a) Load(b) LoadFunc(c) FuncLoad(d) None of the mentionedThe question was asked during an interview.Question is from User-defined Functions in Pig in portion Pig of Hadoop
Answer» The correct option is (b) LOADFUNC To explain I would SAY: LoadFunc and StoreFunc IMPLEMENTATIONS should use the Hadoop 20 API BASED classes.

Discussion

20.	Which of the following will compile the Pigunit?(a) $pig_trunk ant pigunit-jar(b) $pig_tr ant pigunit-jar(c) $pig_ ant pigunit-jar(d) None of the mentionedI have been asked this question during an interview.The query is from Pig Latin topic in division Pig of Hadoop
Answer» CORRECT ANSWER is (a) $pig_trunk ANT pigunit-jar Easy EXPLANATION: The compile will create the pigunit.jar FILE.

Discussion

21.	___________ is a simple xUnit framework that enables you to easily test your Pig scripts.(a) PigUnit(b) PigXUnit(c) PigUnitX(d) All of the mentionedThis question was posed to me by my college professor while I was bunking the class.I'd like to ask this question from Pig Latin in section Pig of Hadoop
Answer» Correct answer is (B) PigXUnit The BEST I can explain: With PigUnit you can PERFORM unit TESTING, regression testing, and rapid prototyping. No cluster setup is REQUIRED if you run Pig in local mode.

Discussion

22.	The ________ class mimics the behavior of the Main class but gives users a statistics object back.(a) PigRun(b) PigRunner(c) RunnerPig(d) None of the mentionedThe question was asked during a job interview.Asked question is from Pig Latin in chapter Pig of Hadoop
Answer» Right answer is (B) PigRunner The explanation: Optionally, you can call the API with an IMPLEMENTATION of progress listener which will be invoked by PIG runtime during the EXECUTION.

Discussion

23.	__________ is a framework for collecting and storing script-level statistics for Pig Latin.(a) Pig Stats(b) PStatistics(c) Pig Statistics(d) None of the mentionedI had been asked this question during an interview.I'm obligated to ask this question of Pig Latin topic in portion Pig of Hadoop
Answer» RIGHT option is (C) Pig STATISTICS Explanation: The new Pig statistics and the existing Hadoop statistics can ALSO be accessed via the Hadoop job history FILE.

Discussion

24.	Point out the wrong statement.(a) ILLUSTRATE operator is used to review how data is transformed through a sequence of Pig Latin statements(b) ILLUSTRATE is based on an example generator(c) Several new private classes make it harder for external tools such as Oozie to integrate with Pig statistics(d) None of the mentionedI got this question in quiz.This intriguing question originated from Pig Latin in portion Pig of Hadoop
Answer» RIGHT choice is (c) Several new PRIVATE classes make it harder for external tools such as Oozie to integrate with PIG statistics For explanation: Several new PUBLIC classes make it easier for external tools such as Oozie to integrate with Pig statistics.

Discussion

25.	___________ operator is used to view the step-by-step execution of a series of statements.(a) ILLUSTRATE(b) DESCRIBE(c) STORE(d) EXPLAINThe question was posed to me during an interview.I'd like to ask this question from Pig Latin in chapter Pig of Hadoop
Answer» RIGHT option is (a) ILLUSTRATE The explanation: ILLUSTRATE ALLOWS you to test your programs on small datasets and GET faster TURNAROUND TIMES.

Discussion

26.	Which of the following operator is used to view the map reduce execution plans?(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAINThe question was asked during an interview for a job.The origin of the question is Pig Latin topic in section Pig of Hadoop
Answer» RIGHT answer is (d) EXPLAIN To explain: EXPLAIN DISPLAYS execution plans.

Discussion

27.	_________operator is used to review the schema of a relation.(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAINThis question was posed to me in an interview for internship.This interesting question is from Pig Latin in chapter Pig of Hadoop
Answer» The CORRECT ANSWER is (B) DESCRIBE Explanation: DESCRIBE returns the schema of a relation.

Discussion

28.	Point out the correct statement.(a) During the testing phase of your implementation, you can use LOAD to display results to your terminal screen(b) You can view outer relations as well as relations defined in a nested FOREACH statement(c) Hadoop properties are interpreted by Pig(d) None of the mentionedThis question was posed to me in an online interview.My doubt is from Pig Latin topic in section Pig of Hadoop
Answer» Correct option is (b) You can VIEW outer relations as well as relations DEFINED in a nested FOREACH statement To explain I would SAY: Viewing outer relations is POSSIBLE USING DESCRIBE operator.

Discussion

29.	$ pig -x tez_local … will enable ________ mode in Pig.(a) Mapreduce(b) Tez(c) Local(d) None of the mentionedI have been asked this question in an online quiz.My doubt stems from Introduction to Pig topic in division Pig of Hadoop
Answer» Right option is (d) None of the mentioned Best explanation: Tez Local MODE is similar to local mode, EXCEPT internally PIG will invoke tez runtime engine.

Discussion

30.	Which of the following will run pig in local mode?(a) $ pig -x local …(b) $ pig -x tez_local …(c) $ pig …(d) None of the mentionedThis question was addressed to me in class test.My question is taken from Introduction to Pig topic in portion Pig of Hadoop
Answer» Correct CHOICE is (a) $ PIG -x LOCAL … For explanation I would say: Specify local mode USING the -x FLAG (pig -x local).

Discussion

31.	Which of the following is the default mode?(a) Mapreduce(b) Tez(c) Local(d) All of the mentionedI have been asked this question during an interview.My question is based upon Introduction to Pig topic in section Pig of Hadoop
Answer» CORRECT option is (a) Mapreduce For EXPLANATION: To run PIG in mapreduce mode, you need ACCESS to a HADOOP cluster and HDFS installation.

Discussion

32.	You can run Pig in interactive mode using the ______ shell.(a) Grunt(b) FS(c) HDFS(d) None of the mentionedI have been asked this question in final exam.This question is from Introduction to Pig topic in section Pig of Hadoop
Answer» Correct OPTION is (a) GRUNT To explain I would SAY: INVOKE the Grunt shell using the “pig” command (as shown below) and then enter your Pig Latin statements and Pig commands interactively at the command LINE.

Discussion

33.	Which of the following function is used to read data in PIG?(a) WRITE(b) READ(c) LOAD(d) None of the mentionedI had been asked this question in exam.My doubt is from Introduction to Pig topic in division Pig of Hadoop
Answer» Correct CHOICE is (C) LOAD The explanation is: PigStorage is the default load FUNCTION.

Discussion

34.	Point out the wrong statement.(a) To run Pig in local mode, you need access to a single machine(b) The DISPLAY operator will display the results to your terminal screen(c) To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation(d) All of the mentionedI got this question at a job interview.I'd like to ask this question from Introduction to Pig topic in division Pig of Hadoop
Answer» Right choice is (B) The DISPLAY OPERATOR will display the results to your TERMINAL screen The BEST I can explain: The DUMP operator will display the results to your terminal screen.

Discussion

35.	Pig Latin statements are generally organized in one of the following ways?(a) A LOAD statement to read data from the file system(b) A series of “transformation” statements to process the data(c) A DUMP statement to view results or a STORE statement to save the results(d) All of the mentionedThe question was posed to me in semester exam.The doubt is from Introduction to Pig in division Pig of Hadoop
Answer» Correct option is (d) All of the mentioned The best I can EXPLAIN: A DUMP or STORE STATEMENT is required to GENERATE OUTPUT.

Discussion

36.	You can run Pig in batch mode using __________(a) Pig shell command(b) Pig scripts(c) Pig options(d) All of the mentionedI had been asked this question at a job interview.I'd like to ask this question from Introduction to Pig topic in section Pig of Hadoop
Answer» The CORRECT OPTION is (b) PIG scripts The EXPLANATION: Pig script contains Pig LATIN statements.

Discussion

37.	Point out the correct statement.(a) You can run Pig in either mode using the “pig” command(b) You can run Pig in batch mode using the Grunt shell(c) You can run Pig in interactive mode using the FS shell(d) None of the mentionedThis question was addressed to me by my school principal while I was bunking the class.My question is based upon Introduction to Pig in portion Pig of Hadoop
Answer» Correct choice is (a) You can RUN Pig in either MODE using the “pig” command Explanation: You can run Pig in either mode using the “pig” command (the bin/pig Perl SCRIPT) or the “java” command (java -CP pig.jar …).

Discussion

38.	Pig operates in mainly how many nodes?(a) Two(b) Three(c) Four(d) FiveThis question was addressed to me during an interview for a job.This intriguing question comes from Introduction to Pig topic in section Pig of Hadoop
Answer» The correct choice is (a) Two To elaborate: You can run PIG (execute Pig Latin STATEMENTS and Pig COMMANDS) using VARIOUS mode: INTERACTIVE and Batch Mode.

Discussion

Explore topic-wise InterviewSolutions in .

In comparison to SQL, Pig uses ______________(a) Lazy evaluation(b) ETL(c) Supports pipeline splits(d) All of the mentionedThis question was posed to me during an interview for a job.This interesting question is from Pig in Practice in portion Pig of Hadoop

Which of the following file contains user defined functions (UDFs)?(a) script2-local.pig(b) pig.jar(c) tutorial.jar(d) excite.log.bz2I got this question during an online exam.This interesting question is from Data Processing Operators in Pig in division Pig of Hadoop

Which of the following command can be used for debugging?(a) exec(b) execute(c) error(d) throwThe question was posed to me in semester exam.This question is from Data Processing Operators in Pig in section Pig of Hadoop

Use the __________ command to run a Pig script that can interact with the Grunt shell (interactive mode).(a) fetch(b) declare(c) run(d) all of the mentionedI have been asked this question in an interview for internship.Question is from Data Processing Operators in Pig in chapter Pig of Hadoop

Which of the following command is used to show values to keys used in Pig?(a) set(b) declare(c) display(d) all of the mentionedThis question was addressed to me in a job interview.My question comes from Data Processing Operators in Pig in section Pig of Hadoop

Which of the following is shortcut for DUMP operator?(a) \de alias(b) \d alias(c) \q(d) None of the mentionedThe question was posed to me by my college professor while I was bunking the class.My doubt is from Data Processing Operators in Pig topic in division Pig of Hadoop

___________ return a list of hdfs files to ship to distributed cache.(a) relativeToAbsolutePath()(b) setUdfContextSignature()(c) getCacheFiles()(d) getShipFiles()This question was posed to me in quiz.I'm obligated to ask this question of User-defined Functions in Pig in division Pig of Hadoop

Which of the following has methods to deal with metadata?(a) LoadPushDown(b) LoadMetadata(c) LoadCaster(d) All of the mentionedThe question was asked during a job interview.My question comes from User-defined Functions in Pig in section Pig of Hadoop

__________ abstract class has three main methods for loading data and for most use cases it would suffice to extend it.(a) Load(b) LoadFunc(c) FuncLoad(d) None of the mentionedThe question was asked during an interview.Question is from User-defined Functions in Pig in portion Pig of Hadoop

Which of the following will compile the Pigunit?(a) $pig_trunk ant pigunit-jar(b) $pig_tr ant pigunit-jar(c) $pig_ ant pigunit-jar(d) None of the mentionedI have been asked this question during an interview.The query is from Pig Latin topic in division Pig of Hadoop

The ________ class mimics the behavior of the Main class but gives users a statistics object back.(a) PigRun(b) PigRunner(c) RunnerPig(d) None of the mentionedThe question was asked during a job interview.Asked question is from Pig Latin in chapter Pig of Hadoop

__________ is a framework for collecting and storing script-level statistics for Pig Latin.(a) Pig Stats(b) PStatistics(c) Pig Statistics(d) None of the mentionedI had been asked this question during an interview.I'm obligated to ask this question of Pig Latin topic in portion Pig of Hadoop

___________ operator is used to view the step-by-step execution of a series of statements.(a) ILLUSTRATE(b) DESCRIBE(c) STORE(d) EXPLAINThe question was posed to me during an interview.I'd like to ask this question from Pig Latin in chapter Pig of Hadoop

Which of the following operator is used to view the map reduce execution plans?(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAINThe question was asked during an interview for a job.The origin of the question is Pig Latin topic in section Pig of Hadoop

_________operator is used to review the schema of a relation.(a) DUMP(b) DESCRIBE(c) STORE(d) EXPLAINThis question was posed to me in an interview for internship.This interesting question is from Pig Latin in chapter Pig of Hadoop

$ pig -x tez_local … will enable ________ mode in Pig.(a) Mapreduce(b) Tez(c) Local(d) None of the mentionedI have been asked this question in an online quiz.My doubt stems from Introduction to Pig topic in division Pig of Hadoop

Which of the following will run pig in local mode?(a) $ pig -x local …(b) $ pig -x tez_local …(c) $ pig …(d) None of the mentionedThis question was addressed to me in class test.My question is taken from Introduction to Pig topic in portion Pig of Hadoop

Which of the following is the default mode?(a) Mapreduce(b) Tez(c) Local(d) All of the mentionedI have been asked this question during an interview.My question is based upon Introduction to Pig topic in section Pig of Hadoop

You can run Pig in interactive mode using the ______ shell.(a) Grunt(b) FS(c) HDFS(d) None of the mentionedI have been asked this question in final exam.This question is from Introduction to Pig topic in section Pig of Hadoop

Which of the following function is used to read data in PIG?(a) WRITE(b) READ(c) LOAD(d) None of the mentionedI had been asked this question in exam.My doubt is from Introduction to Pig topic in division Pig of Hadoop

You can run Pig in batch mode using __________(a) Pig shell command(b) Pig scripts(c) Pig options(d) All of the mentionedI had been asked this question at a job interview.I'd like to ask this question from Introduction to Pig topic in section Pig of Hadoop

Pig operates in mainly how many nodes?(a) Two(b) Three(c) Four(d) FiveThis question was addressed to me during an interview for a job.This intriguing question comes from Introduction to Pig topic in section Pig of Hadoop