InterviewSolution
| 1. |
What is the role of a combiner and partitioner in a Map-Reduce job? Is the combiner triggered first or the partitioner? |
|
Answer» Map-reduce jobs are LIMITED by the bandwidth available on the cluster, hence it is beneficial if the data transferred between map and reduce tasks can be minimized. This can be achieved using Hadoop Combiner. A combiner runs on a map output and its output FORMS the input to the reducer. It decreases the amount of data that needs to be transferred between the mapper and reducer, as well as improves the performance of a map-reduce job. A combiner can, however, be used for functions that are commutative or associative. Partitioner controls which partition a given key-value pair will go to. Partitioning ensures that all the values for each key are grouped together and the values having the same key go to the same reducer. The total number of practitioners that run in a Hadoop job is equal to the number of reducers. The partition phase takes place after the map phase and the reduce phase. A map-reduce job having both partitioner and reducer work LIKE below: Output from each mapper is written to a memory buffer and spilled to a local directory in case of OVERFLOW. The spilled data is partitioned according to the partitioner. Data in each partition is sorted and combined based on the logic in the combiner. The combined data is SENT to reducer based on the partition key. |
|