1.

Each node in your Hadoop cluster with running YARN and has 140GB memory and 40 cores .your yarn-site.xml has the configuration as shown below. you want YARN to launch a maximum of 100 containers per node. Enter the property value that would restrict YARN from launching more than 100 containers per node.

Answer»

Usually, YARN is taking all of the available resources on each machine in the cluster into consideration. Based on the available resources, YARN negotiates the resources as requested from the application or map-reduce running in the cluster. YARN is allocating containers based on how much resources are required to the application. A CONTAINER is the basic unit of processing capacity in YARN, and the resource element included MEMORY CPU, etc. In the Hadoop cluster, it is required to balance the usage of memory(RAM), processors (CPU cores) and disks so that processing is not controlled by any one of these cluster resources. As per the best practice, it allows for two containers per disk and one core gives the best balance for cluster utilization.

When you are considering the appropriate YARN and MapReduce memory configurations for a cluster node, in such a case, it is an ideal situation to consider the below values in each node.

  • RAM (Amount of memory)
  • CORES (Number of CPU cores)
  • DISKS(Number of disks)

Prior to calculating how much RAM, how much CORE and how much disks are required, you have to be aware of the below parameters.

  1. Approximately how much data is required to store in your cluster for example 200TB
  2. What is the retention POLICY of the data for example 1year
  3. What kind of workload you have whether it is CPU intensive for example complex query or query which is computing a billion records, I/O Intensive for example Ingestion of data, Memory intensive for example spark processing.
  4. what kind of storage mechanism for the data for example whether the data format is plain Text or AVRO or Parque, ORC or COMPRESS GZIP or snappy 
  • Total memory available ==> 102400
  • No of containers ==> 100
  • MINIMUM memory required for container ==> 102400 MB total RAM/100 = 1024MB minimum per container.
  • The next calculation is to determine the maximum number of containers allowed per node.
  • no of containers = Minimum of (2*cores,1.8* disks,(Total available RAM/MIN_CONTAINER_SIZE)
  • RAM-per -container = Maximum of (MIN_CONATINER_SIZE,(total available RAM)/CONTAINER))


Discussion

No Comment Found

Related InterviewSolutions