| 1. |
Why Does Gobblin On Hadoop Stall For A Long Time Between Adding Files To The Distrbutedcache, And Launching The Actual Job? |
|
Answer» Gobblin takes all WorkUnits created by the Source class and serializes each one into a file on Hadoop. These files are READ by each MAP task, and are deserialized into Gobblin Tasks. These Tasks are then run by the map-task. The reason the JOB stalls is that Gobblin is writing all these files to HDFS, which can take a while ESPECIALLY if there are a lot of tasks to run. Gobblin takes all WorkUnits created by the Source class and serializes each one into a file on Hadoop. These files are read by each map task, and are deserialized into Gobblin Tasks. These Tasks are then run by the map-task. The reason the job stalls is that Gobblin is writing all these files to HDFS, which can take a while especially if there are a lot of tasks to run. |
|