InterviewSolution
Saved Bookmarks
| 1. |
In a map-reduce job, under what scenario does a combiner get triggered? What are the various options to reduce the shuffling of data in a map-reduce job? |
|
Answer» The map-reduce framework doesn’t guarantee that the combiner will be executed for every job run. The combiner is executed at each BUFFER spill. During a spill, the thread writing data to the disk first divides data into partitions corresponding to the number of reducers. Within each partition, the thread performs an in-memory SORT on the data and applies the combiner function (if any) on the output of sort. Various ways to reduce data shuffling in a map-reduce job are:
|
|