InterviewSolution
This section includes InterviewSolutions, each offering curated multiple-choice questions to sharpen your knowledge and support exam preparation. Choose a topic below to get started.
| 1. |
Why Oozie Security? |
Answer»
|
|
| 2. |
Why We Use Fork And Join Nodes Of Oozie? |
Answer»
|
|
| 3. |
What Are All The Actions Can Be Performed In Oozie? |
| Answer» | |
| 4. |
How To Execute Job? |
|
Answer» $ OOZIE JOB –run –config job.properties Job:1-20090525161321-oozie-xyz-W $ oozie job –run –config job.properties Job:1-20090525161321-oozie-xyz-W |
|
| 5. |
Mention Workflow Job Parameters? |
|
Answer» $ CAT job.properties Oozie.wf.application.path=hdfs://bar.com:9000/usr/abc/wordcount Input=/usr/abc/input-data Output=/usr/abc/output-data $ cat job.properties Oozie.wf.application.path=hdfs://bar.com:9000/usr/abc/wordcount Input=/usr/abc/input-data Output=/usr/abc/output-data |
|
| 6. |
How To Deploy Application? |
|
Answer» $ HADOOP fs-put wordcount-wf HDFS://bar.com:9000/usr/abc/wordcount $ hadoop fs-put wordcount-wf hdfs://bar.com:9000/usr/abc/wordcount |
|
| 7. |
How Does Oozie Work? |
Answer»
At the end of execution of workflow, HTTP callback is used by Oozie to update client with the workflow status. Entry-to or exit-from an action node may also trigger callback. At the end of execution of workflow, HTTP callback is used by Oozie to update client with the workflow status. Entry-to or exit-from an action node may also trigger callback. |
|
| 8. |
What Is Application Pipeline In Oozie? |
|
Answer» It is necessary to connect workflow jobs that run regularly, but at different time intervals. The outputs of multiple SUBSEQUENT runs of a workflow BECOME the input to the NEXT workflow. Chaining together these WORKFLOWS result it is referred as a data application pipeline. It is necessary to connect workflow jobs that run regularly, but at different time intervals. The outputs of multiple subsequent runs of a workflow become the input to the next workflow. Chaining together these workflows result it is referred as a data application pipeline. |
|
| 9. |
Explain Briefly About Oozie Bundle ? |
|
Answer» Oozie Bundle is a higher-level oozie abstraction that will BATCH a set of coordinator APPLICATIONS. The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting a better and easy operational CONTROL. More specififcally, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often CALLED a data PIPELINE. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline. Oozie executes workflow based on:
Oozie Bundle is a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting a better and easy operational control. More specififcally, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a data pipeline. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline. Oozie executes workflow based on: |
|
| 10. |
Explain Oozie Coordinator? |
|
Answer» Oozie Coordinator jobs are recurrent Oozie WORKFLOW jobs that are TRIGGERED by time and data availability.Oozie Coordinator can also manage multiple workflows that are dependent on the outcome of subsequent workflows. The outputs of subsequent workflows become the input to the next workflow. This chain is called a 'data application pipeline'. Oozie processes coordinator jobs in a fixed timezone with no DST (typically UTC ), this timezone is referred as ‘Oozie processing timezone’. The Oozie processing timezone is used to resolve coordinator jobs start/end times, job pause times and the initial-instance of datasets. Also, all coordinator DATASET instance URI templates are RESOLVED to a datetime in the Oozie processing time-zone. The usage of Oozie Coordinator can be categorized in 3 different segments: Small: consisting of a single coordinator application with embedded dataset definitions Medium: consisting of a single shared dataset definitions and a few coordinator applications Large: consisting of a single or multiple shared dataset definitions and SEVERAL coordinator applications Oozie Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.Oozie Coordinator can also manage multiple workflows that are dependent on the outcome of subsequent workflows. The outputs of subsequent workflows become the input to the next workflow. This chain is called a 'data application pipeline'. Oozie processes coordinator jobs in a fixed timezone with no DST (typically UTC ), this timezone is referred as ‘Oozie processing timezone’. The Oozie processing timezone is used to resolve coordinator jobs start/end times, job pause times and the initial-instance of datasets. Also, all coordinator dataset instance URI templates are resolved to a datetime in the Oozie processing time-zone. The usage of Oozie Coordinator can be categorized in 3 different segments: Small: consisting of a single coordinator application with embedded dataset definitions Medium: consisting of a single shared dataset definitions and a few coordinator applications Large: consisting of a single or multiple shared dataset definitions and several coordinator applications |
|
| 11. |
What Is Decision Node In Oozie? |
|
Answer» Decision NODES are switch statements that will run DIFFERENT jobs based on the OUTCOMES of an expression. Decision Nodes are switch statements that will run different jobs based on the outcomes of an expression. |
|
| 12. |
What Are The Extra Files We Need When We Run A Hive Action In Oozie? |
Answer»
|
|
| 13. |
What Are The Properties That We Have To Mention In .properties? |
| Answer» | |
| 14. |
What Is Oozie Workflow Application? |
|
Answer» Workflow APPLICATION is a ZIP file that includes the workflow definition and the necessary files to run all the actions. It contains the FOLLOWING files:
Workflow application is a ZIP file that includes the workflow definition and the necessary files to run all the actions. It contains the following files: |
|
| 15. |
Explain Oozie Workflow? |
|
Answer» An Oozie WORKFLOW is a collection of ACTIONS arranged in a Directed Acyclic Graph (DAG) . Control nodes define job chronology, setting rules for beginning and ending a workflow, which controls the workflow execution path with decision, fork and join nodes. Action nodes trigger the execution of tasks. Workflow nodes are CLASSIFIED in control flow nodes and action nodes: Control flow nodes: nodes that control the start and end of the workflow and workflow job execution path. Action nodes: nodes that trigger the execution of a computation/processing task. Workflow definitions can be parameterized.The parameterization of workflow definitions it done using JSP Expression Language syntax , allowing not only to support variables as parameters but also functions and complex expressions. An Oozie Workflow is a collection of actions arranged in a Directed Acyclic Graph (DAG) . Control nodes define job chronology, setting rules for beginning and ending a workflow, which controls the workflow execution path with decision, fork and join nodes. Action nodes trigger the execution of tasks. Workflow nodes are classified in control flow nodes and action nodes: Control flow nodes: nodes that control the start and end of the workflow and workflow job execution path. Action nodes: nodes that trigger the execution of a computation/processing task. Workflow definitions can be parameterized.The parameterization of workflow definitions it done using JSP Expression Language syntax , allowing not only to support variables as parameters but also functions and complex expressions. |
|
| 16. |
Explain Types Of Oozie Jobs? |
|
Answer» Oozie supports job scheduling for the full Hadoop stack like APACHE MAPREDUCE, Apache Hive, Apache Sqoop and Apache Pig. It consists of TWO parts: Workflow engine: Responsibility of a workflow engine is to store and run workflows composed of Hadoop jobs e.g., MapReduce, Pig, Hive. COORDINATOR engine: It runs workflow jobs based on predefined schedules and availability of DATA. Oozie supports job scheduling for the full Hadoop stack like Apache MapReduce, Apache Hive, Apache Sqoop and Apache Pig. It consists of two parts: Workflow engine: Responsibility of a workflow engine is to store and run workflows composed of Hadoop jobs e.g., MapReduce, Pig, Hive. Coordinator engine: It runs workflow jobs based on predefined schedules and availability of data. |
|
| 17. |
What Are The Alternatives To Oozie Workflow Scheduler? |
| Answer» | |
| 18. |
Explain Need For Oozie? |
|
Answer» With Apache Hadoop BECOMING the open source de-facto standard for processing and storing Big Data, many other languages like Pig and Hive have followed - simplifying the process of writing big data applications based on Hadoop. ALTHOUGH Pig, Hive and many OTHERS have simplified the process of writing Hadoop jobs, many times a single Hadoop Job is not sufficient to GET the desired output. Many Hadoop Jobs have to be chained, data has to be SHARED in between the jobs, which makes the whole process very complicated. With Apache Hadoop becoming the open source de-facto standard for processing and storing Big Data, many other languages like Pig and Hive have followed - simplifying the process of writing big data applications based on Hadoop. Although Pig, Hive and many others have simplified the process of writing Hadoop jobs, many times a single Hadoop Job is not sufficient to get the desired output. Many Hadoop Jobs have to be chained, data has to be shared in between the jobs, which makes the whole process very complicated. |
|
| 19. |
Mention Some Features Of Oozie? |
Answer»
|
|
| 20. |
What Is Apache Oozie? |
|
Answer» Apache OOZIE is a Java Web application USED to schedule Apache Hadoop JOBS.It is integrated with the Hadoop stack and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. Oozie is a scalable, reliable and EXTENSIBLE SYSTEM. Oozie is used in production at Yahoo!, running more than 200,000 jobs every day. Apache Oozie is a Java Web application used to schedule Apache Hadoop jobs.It is integrated with the Hadoop stack and supports Hadoop jobs for Apache MapReduce, Apache Pig, Apache Hive, and Apache Sqoop. Oozie is a scalable, reliable and extensible system. Oozie is used in production at Yahoo!, running more than 200,000 jobs every day. |
|