Why Does Gobblin On Hadoop Stall For A Long Time Between Adding Files

1.	Why Does Gobblin On Hadoop Stall For A Long Time Between Adding Files To The Distrbutedcache, And Launching The Actual Job?
Answer» Gobblin takes all WorkUnits created by the Source class and serializes each one into a file on Hadoop. These files are READ by each MAP task, and are deserialized into Gobblin Tasks. These Tasks are then run by the map-task. The reason the JOB stalls is that Gobblin is writing all these files to HDFS, which can take a while ESPECIALLY if there are a lot of tasks to run. Gobblin takes all WorkUnits created by the Source class and serializes each one into a file on Hadoop. These files are read by each map task, and are deserialized into Gobblin Tasks. These Tasks are then run by the map-task. The reason the job stalls is that Gobblin is writing all these files to HDFS, which can take a while especially if there are a lot of tasks to run.

Why Does Gobblin On Hadoop Stall For A Long Time Between Adding Files To The Distrbutedcache, And Launching The Actual Job?

Answer»

Gobblin takes all WorkUnits created by the Source class and serializes each one into a file on Hadoop. These files are READ by each MAP task, and are deserialized into Gobblin Tasks. These Tasks are then run by the map-task. The reason the JOB stalls is that Gobblin is writing all these files to HDFS, which can take a while ESPECIALLY if there are a lot of tasks to run.

Gobblin takes all WorkUnits created by the Source class and serializes each one into a file on Hadoop. These files are read by each map task, and are deserialized into Gobblin Tasks. These Tasks are then run by the map-task. The reason the job stalls is that Gobblin is writing all these files to HDFS, which can take a while especially if there are a lot of tasks to run.

Discussion

No Comment Found

Related InterviewSolutions

How Do I Add A New Maven Repository To Pull Artifacts From?
How Do I Add A New External Dependency?
How Do I Compile Gobblin Against Cdh?
How Do I Fix Unsupportedfilesystemexception: No Abstractfilesystem For Scheme: Null?
Why Does Gobblin On Hadoop Stall For A Long Time Between Adding Files To The Distrbutedcache, And Launching The Actual Job?
When Running On Hadoop, Each Map Task Quickly Reaches 100 Percent Completion, But Then Stalls For A Long Time. Why Does This Happen?
How Is Gobblin Different From Sqoop?
How Do I Run And Schedule A Gobblin Job?
What Hadoop Version Can Gobblin Run On?
Does Gobblin Require Any External Software To Be Installed?