InterviewSolution
| 1. |
MapReduce runs on top of yarn and utilizes YARN containers to schedule and execute its map and reduce tasks. When configuring mapreduce resource utilization on yarn, what are the aspects to be considered? |
|
Answer» When configuring MapReduce 2 resource utilization on YARN, there are three aspects to be considered:
Physical RAM limit for each Map and Reduce Task ********************************************* You can define how much MAXIMUM memory each Map and Reduce task will take. Since each Map and each Reduce task will run in a separate container, these maximum memory settings should be at least equal to or more than the YARN MINIMUM Container allocation(yarn.scheduler.minimum-allocation-mb). In mapred-site.xml: <name>mapreduce.map.memory.mb</name> <value>4096</value> <name>mapreduce.reduce.memory.mb</name> <value>8192</value> The JVM heap size limit for each task********************************* The JVM heap size should be SET to LOWER than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN. In mapred-site.xml: <name>mapreduce.map.java.opts</name> <value>-Xmx3072m</value> <name>mapreduce.reduce.java.opts</name> <value>-Xmx6144m</value> the amount of virtual memory each task will get******************************************** Virtual memory is DETERMINED on upper limit of the physical RAM that each Map and Reduce task will use.default value is 2.1. for example if Total physical RAM allocated = 4 GB than Virtual memory upper limit = is 4*2.1 = 8.2 GB |
|