Hadoop2.x(内存配置)



在YARN中,资源管理由ResourceManager和NodeManager共同完成,其中,ResourceManager中的调度器负责资源的分配,而NodeManager则负责资源的供给和隔离。ResourceManager将某个NodeManager上资源分配给任务(这就是所谓的“资源调度”)后,NodeManager需按照要求为任务提供相应的资源,甚至保证这些资源应具有独占性,为任务运行提供基础的保证,这就是所谓的资源隔离。

 

基于以上考虑,YARN允许用户配置每个节点上可用的物理内存资源,注意,这里是“可用的”,因为一个节点上的内存会被若干个服务共享,比如一部分给YARN,一部分给HDFS,一部分给HBase等,YARN配置的只是自己可以使用的,配置参数如下:

(1)yarn.nodemanager.resource.memory-mb

表示该节点上YARN可使用的物理内存总量,默认是8192(MB),注意,如果你的节点内存资源不够8GB,则需要调减小这个值,而YARN不会智能的探测节点的物理内存总量。

(2)yarn.nodemanager.vmem-pmem-ratio

任务每使用1MB物理内存,最多可使用虚拟内存量,默认是2.1。

(3) yarn.nodemanager.pmem-check-enabled

是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true。

(4) yarn.nodemanager.vmem-check-enabled

是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true。

(5)yarn.scheduler.minimum-allocation-mb

单个任务可申请的最少物理内存量,默认是1024(MB),如果一个任务申请的物理内存量少于该值,则该对应的值改为这个数。

(6)yarn.scheduler.maximum-allocation-mb

单个任务可申请的最多物理内存量,默认是8192(MB)。

默认情况下,YARN采用了线程监控的方法判断任务是否超量使用内存,一旦发现超量,则直接将其杀死。由于Cgroups对内存的控制缺乏灵活性(即任务任何时刻不能超过内存上限,如果超过,则直接将其杀死或者报OOM),而Java进程在创建瞬间内存将翻倍,之后骤降到正常值,这种情况下,采用线程监控的方式更加灵活(当发现进程树内存瞬间翻倍超过设定值时,可认为是正常现象,不会将任务杀死),因此YARN未提供Cgroups内存隔离机制。

 

可以使用如下命令在提交任务时动态设置:

Hadoop jar <jarName> -D mapreduce.reduce.memory.mb=5120

e.g.

[hadoop@cMaster hadoop-2.5.2]$ ./bin/hadoop jar /home/hadoop/jar-output/TestLoop-1024M.jar -D mapreduce.map.memory.mb=5120 AESEnTest 1024 1 1

 后面的1024及两个1均为jar的输入参数。

Hadoop2.5.2搭建好之后,运行写好的MapReduce程序出现如下问题:

Container [pid=24156,containerID=container_1427332071311_0019_01_000002] is running beyond physical memory limits. Current usage: 2.1 GB of 2 GB physical memory used; 2.7 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_1427332071311_0019_01_000002 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 24156 2787 24156 24156 (bash) 0 0 108646400 296 /bin/bash -c /usr/java/jdk1.7.0_45/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx2048m -Djava.io.tmpdir=/home/hadoop/hadoop-2.5.2/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1427332071311_0019/container_1427332071311_0019_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/hadoop-2.5.2/logs/userlogs/application_1427332071311_0019/container_1427332071311_0019_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.199.93 33497 attempt_1427332071311_0019_m_000000_0 2 1>/home/hadoop/hadoop-2.5.2/logs/userlogs/application_1427332071311_0019/containe...

分析:

根据前面所述的内存配置相关理论知识,我们可以总结如下:

(RM, Resource Manager; NM, Node Manager; AM, Application Manager)

RM内存资源配置——两个参数(yarn-site.xml)

<property>
<description>The minimum allocation for every container request at the RM,
in MBs. Memory requests lower than this won‘t take effect,
and the specified value will get allocated at minimum.</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>


<property>
<description>The maximum allocation for every container request at the RM,
in MBs. Memory requests higher than this won‘t take effect,
and will get capped to this value.</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>8192</value>
</property>

它们表示单个容器可以申请的最小与最大内存。

NM(yarn-site.xml)

<property>
<description>Amount of physical memory, in MB, that can be allocated 
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
</property>
<property>
<description>Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>

前者表示单个节点可用的最大内存,RM中的两个值都不应该超过该值。

后者表示虚拟内存率,即占task所用内存的百分比,默认为2.1.

AM(mapred-site.xml)

mapreduce.map.memory.mb

mapreduce.reduce.memory.mb

指定map和reduce task的内存大小,该值应该在RM的最大最小container之间。如果不设置,则默认用以下规则进行计算:max{MIN_Container_Size,(Total Available RAM/containers)}。

一般地,reduce设置为map的两倍。

AM的其他参数设置:

mapreduce.map.java.opts

mapreduce.reduce.java.opts

这两个参数是伪需要运行JVM程序(java,scala等)准备,通过这两个参数可以向JVM中传递参数,与内存有关的是-Xmx, -Xms等选项,数值的大小应该要再AM中的map.mb和reduce.mb之间。

对如上问题,我选择使用以下方式来解决:(根据提交的job动态设置mapreduce.map.memory.mb的值)

[hadoop@cMaster hadoop-2.5.2]$ ./bin/hadoop jar /home/hadoop/jar-output/TestLoop-1024M.jar -D mapreduce.map.memory.mb=5120 AESEnTest 1024 1 1

 参考资料:

https://altiscale.zendesk.com/hc/en-us/articles/200801519-Configuring-Memory-for-Mappers-and-Reducers-in-Hadoop-2

http://stackoverflow.com/questions/21005643/container-is-running-beyond-memory-limits

http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-memory-cpu-scheduling/



跑job 的时候发现如下错误,


Container [pid=41355,containerID=container_1451456053773_0001_01_000002] is running beyond physical memory limits. 
Current usage: 2.0 GB of 2 GB physical memory used; 5.2 GB of 4.2 GB virtual memory used. Killing container. 
Dump of the process-tree for container_1451456053773_0001_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 41538 41355 41355 41355 (java) 3092 243 5511757824 526519 
/usr/jdk64/jdk1.7.0_67/bin/java -server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN -Xmx4506m 
-Djava.io.tmpdir=/diskb/hadoop/yarn/local/usercache/hdfs/appcache/application_1451456053773_0001/container_1451456053773_0001_01_000002/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/diska/hadoop/yarn/log/application_1451456053773_0001/container_1451456053773_0001_01_000002 
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
org.apache.hadoop.mapred.YarnChild 10.111.32.92 61224 attempt_1451456053773_0001_m_000000_0 2 |- 41355 37725 41355 


大概是job运行超过了map和reduce设置的内存大小,导致任务失败,调整增加了map和reduce的内容,问题排除,一些参数介绍如下:


RM的内存资源配置,主要是通过下面的两个参数进行的(这两个值是Yarn平台特性,应在yarn-site.xml中配置好):
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
说明:单个容器可申请的最小与最大内存,应用在运行申请内存时不能超过最大值,小于最小值则分配最小值,从这个角度看,最小值有点想操作系统中的页。最小值还有另外一种用途,计算一个节点的最大container数目注:这两个值一经设定不能动态改变(此处所说的动态改变是指应用运行时)。

NM的内存资源配置,主要是通过下面两个参数进行的(这两个值是Yarn平台特性,应在yarn-sit.xml中配置) :
yarn.nodemanager.resource.memory-mb
yarn.nodemanager.vmem-pmem-ratio
说明:每个节点可用的最大内存,RM中的两个值不应该超过此值。此数值可以用于计算container最大数目,即:用此值除以RM中的最小容器内存。虚拟内存率,是占task所用内存的百分比,默认值为2.1倍;注意:第一个参数是不可修改的,一旦设置,整个运行过程中不可动态修改,且该值的默认大小是8G,即使计算机内存不足8G也会按着8G内存来使用。

AM内存配置相关参数,此处以MapReduce为例进行说明(这两个值是AM特性,应在mapred-site.xml中配置),如下:
mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
说明:这两个参数指定用于MapReduce的两个任务(Map and Reduce task)的内存大小,其值应该在RM中的最大最小container之间。如果没有配置则通过如下简单公式获得:
max(MIN_CONTAINER_SIZE, (Total Available RAM) / containers))
一般的reduce应该是map的2倍。注:这两个值可以在应用启动时通过参数改变;

AM中其它与内存相关的参数,还有JVM相关的参数,这些参数可以通过,如下选项配置:
mapreduce.map.java.opts
mapreduce.reduce.java.opts
说明:这两个参主要是为需要运行JVM程序(java、scala等)准备的,通过这两个设置可以向JVM中传递参数的,与内存有关的是,-Xmx,-Xms等选项。此数值大小,应该在AM中的map.mb和reduce.mb之间。

我们对上面的内容进行下总结,当配置Yarn内存的时候主要是配置如下三个方面:每个Map和Reduce可用物理内存限制;对于每个任务的JVM对大小的限制;虚拟内存的限制;

下面通过一个具体错误实例,进行内存相关说明,错误如下:
Container[pid=41884,containerID=container_1405950053048_0016_01_000284] is running beyond virtual memory limits. Current usage: 314.6 MB of 2.9 GB physical memory used; 8.7 GB of 6.2 GB virtual memory used. Killing container.
配置如下:


点击(此处)折叠或打开

  1. <property>
  2.             <name>yarn.nodemanager.resource.memory-mb</name>
  3.             <value>100000</value>
  4.         </property>
  5.         <property>
  6.             <name>yarn.scheduler.maximum-allocation-mb</name>
  7.             <value>10000</value>
  8.         </property>
  9.         <property>
  10.             <name>yarn.scheduler.minimum-allocation-mb</name>
  11.             <value>3000</value>
  12.         </property>
  13.        <property>
  14.             <name>mapreduce.reduce.memory.mb</name>
  15.             <value>2000</value>
  16.         </property>

通过配置我们看到,容器的最小内存和最大内存分别为:3000m和10000m,而reduce设置的默认值小于2000m,map没有设置,所以两个值均为3000m,也就是log中的“2.9 GB physical
memory used”。而由于使用了默认虚拟内存率(也就是2.1倍),所以对于Map Task和Reduce Task总的虚拟内存为都为3000*2.1=6.2G。而应用的虚拟内存超过了这个数值,故报错 。解决办

法:在启动Yarn是调节虚拟内存率或者应用运行时调节内存大小。



MapReduce作业配置参数

可在客户端的mapred-site.xml中配置,作为MapReduce作业的缺省配置参数。也可以在作业提交时,个性化指定这些参数。

参数名称 缺省值 说明
mapreduce.job.name   作业名称
mapreduce.job.priority NORMAL 作业优先级
yarn.app.mapreduce.am.resource.mb 1536 MR ApplicationMaster占用的内存量
yarn.app.mapreduce.am.resource.cpu-vcores 1 MR ApplicationMaster占用的虚拟CPU个数
mapreduce.am.max-attempts 2 MR ApplicationMaster最大失败尝试次数
mapreduce.map.memory.mb 1024 每个Map Task需要的内存量
mapreduce.map.cpu.vcores 1 每个Map Task需要的虚拟CPU个数
mapreduce.map.maxattempts 4 Map Task最大失败尝试次数
mapreduce.reduce.memory.mb 1024 每个Reduce Task需要的内存量
mapreduce.reduce.cpu.vcores 1 每个Reduce Task需要的虚拟CPU个数
mapreduce.reduce.maxattempts 4 Reduce Task最大失败尝试次数
mapreduce.map.speculative false 是否对Map Task启用推测执行机制
mapreduce.reduce.speculative false 是否对Reduce Task启用推测执行机制
mapreduce.job.queuename default 作业提交到的队列
mapreduce.task.io.sort.mb 100 任务内部排序缓冲区大小
mapreduce.map.sort.spill.percent 0.8 Map阶段溢写文件的阈值(排序缓冲区大小的百分比)
mapreduce.reduce.shuffle.parallelcopies 5 Reduce Task启动的并发拷贝数据的线程数目

注意,MRv2重新命名了MRv1中的所有配置参数,但兼容MRv1中的旧参数,只不过会打印一条警告日志提示用户参数过期。MapReduce新 旧参数对照表可参考Java类org.apache.hadoop.mapreduce.util.ConfigUtil,举例如下:

过期参数名 新参数名
mapred.job.name mapreduce.job.name
mapred.job.priority mapreduce.job.priority
mapred.job.queue.name mapreduce.job.queuename
mapred.map.tasks.speculative.execution mapreduce.map.speculative
mapred.reduce.tasks.speculative.execution mapreduce.reduce.speculative
io.sort.factor mapreduce.task.io.sort.factor
io.sort.mb mapreduce.task.io.sort.mb

手动计算YARN 和MapReduce的 内存

This section describes how to manually calculate YARN and MapReduce memory allocation settings based on the node hardware specifications.

YARN takes into account all of the available resources on each machine in the cluster. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster. YARN then provides processing capacity to each application by allocating Containers. A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, CPU, etc.).

In a Hadoop cluster, it is vital to balance the usage of memory (RAM), processors (CPU cores) and disks so that processing is not constrained by any one of these cluster resources. As a general recommendation, allowing for two Containers per disk and per core gives the best balance for cluster utilization.

When determining the appropriate YARN and MapReduce memory configurations for a cluster node, start with the available hardware resources. Specifically, note the following values on each node:

  • RAM (Amount of memory) 总内存数

  • CORES (Number of CPU cores) CPU 内核数

  • DISKS (Number of disks)  硬盘数

The total available RAM for YARN and MapReduce should take into account the Reserved Memory. Reserved Memory is the RAM needed by system processes and other Hadoop processes, such as Hbase.

Reserved Memory = Reserved for stack memory + Reserved for HBase memory (If HBase is on the same node)

Use the following table to determine the Reserved Memory per node.

Reserved Memory Recommendations

Total Memory per Node Recommended Reserved System Memory Recommended Reserved HBase Memory
4 GB 1 GB 1 GB
8 GB 2 GB 1 GB
16 GB 2 GB 2 GB
24 GB 4 GB 4 GB
48 GB 6 GB 8 GB
64 GB 8 GB 8 GB
72 GB 8 GB 8 GB
96 GB 12 GB 16 GB
128 GB 24 GB 24 GB
256 GB 32 GB 32 GB
512 GB 64 GB 64 GB

The next calculation is to determine the maximum number of Containers allowed per node. The following formula can be used:

# of Containers = minimum of (2*CORES, 1.8*DISKS, (Total available RAM) / MIN_CONTAINER_SIZE)

Where MIN_CONTAINER_SIZE is the minimum Container size (in RAM). This value is dependent on the amount of RAM available -- in smaller memory nodes, the minimum Container size should also be smaller. The following table outlines the recommended values:

Total RAM per Node Recommended Minimum Container Size
Less than 4 GB 256 MB
Between 4 GB and 8 GB 512 MB
Between 8 GB and 24 GB 1024 MB
Above 24 GB 2048 MB

The final calculation is to determine the amount of RAM per container:

RAM-per-Container = maximum of (MIN_CONTAINER_SIZE, (Total Available RAM) / Containers))

With these calculations, the YARN and MapReduce configurations can be set:

Configuration File Configuration Setting Value Calculation
yarn-site.xml yarn.nodemanager.resource.memory-mb = Containers * RAM-per-Container
yarn-site.xml yarn.scheduler.minimum-allocation-mb = RAM-per-Container
yarn-site.xml yarn.scheduler.maximum-allocation-mb = containers * RAM-per-Container
mapred-site.xml mapreduce.map.memory.mb = RAM-per-Container
mapred-site.xml mapreduce.reduce.memory.mb = 2 * RAM-per-Container
mapred-site.xml mapreduce.map.java.opts = 0.8 * RAM-per-Container
mapred-site.xml mapreduce.reduce.java.opts = 0.8 * 2 * RAM-per-Container
yarn-site.xml (check) yarn.app.mapreduce.am.resource.mb = 2 * RAM-per-Container
yarn-site.xml (check) yarn.app.mapreduce.am.command-opts = 0.8 * 2 * RAM-per-Container

Note: After installation, both yarn-site.xml and mapred-site.xml are located in the /etc/hadoop/conf folder.


例子

Cluster nodes have 12 CPU cores, 48 GB RAM, and 12 disks.

Reserved Memory = 6 GB reserved for system memory + (if HBase) 8 GB for HBase

Min Container size = 2 GB

If there is no HBase:

# of Containers = minimum of (2*12, 1.8* 12, (48-6)/2) = minimum of (24, 21.6, 21) = 21

RAM-per-Container = maximum of (2, (48-6)/21) = maximum of (2, 2) = 2

Configuration Value Calculation
yarn.nodemanager.resource.memory-mb = 21 * 2 = 42*1024 MB
yarn.scheduler.minimum-allocation-mb = 2*1024 MB
yarn.scheduler.maximum-allocation-mb = 21 * 2 = 42*1024 MB
mapreduce.map.memory.mb = 2*1024 MB
mapreduce.reduce.memory.mb = 2 * 2 = 4*1024 MB
mapreduce.map.java.opts = 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts = 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb = 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts = 0.8 * 2 * 2 = 3.2*1024 MB

If HBase is included:

# of Containers = minimum of (2*12, 1.8* 12, (48-6-8)/2) = minimum of (24, 21.6, 17) = 17

RAM-per-Container = maximum of (2, (48-6-8)/17) = maximum of (2, 2) = 2

Configuration Value Calculation
yarn.nodemanager.resource.memory-mb = 17 * 2 = 34*1024 MB
yarn.scheduler.minimum-allocation-mb = 2*1024 MB
yarn.scheduler.maximum-allocation-mb = 17 * 2 = 34*1024 MB
mapreduce.map.memory.mb = 2*1024 MB
mapreduce.reduce.memory.mb = 2 * 2 = 4*1024 MB
mapreduce.map.java.opts = 0.8 * 2 = 1.6*1024 MB
mapreduce.reduce.java.opts = 0.8 * 2 * 2 = 3.2*1024 MB
yarn.app.mapreduce.am.resource.mb = 2 * 2 = 4*1024 MB
yarn.app.mapreduce.am.command-opts = 0.8 * 2 * 2 = 3.2*1024 MB

原文地址:http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.1/bk_installing_manually_book/content/rpm-chap1-11.html




  1. You're on top of the game. Thanks for shagnir.