Home > Erlang探索, 源码分析, 调优 > R16B03新增加super carrier来减少mmap的系统调用

R16B03新增加super carrier来减少mmap的系统调用

November 3rd, 2013

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: R16B03新增加super carrier来减少mmap的系统调用

Erlang内存分配的框架一句话总结,从erts_alloc文档摘抄如下:

erts_alloc is an Erlang Run-Time System internal memory allocator library. erts_alloc provides the Erlang Run-Time System with a number of memory allocators.

可见Erlang的内存分配体系是非常复杂的,有很深的层次,erts内部开发人员面对的是erts_alloc来提供服务,比如分配port相关的数据结构代码如下:

pdhp = erts_alloc(ERTS_ALC_T_PORT_DATA_HEAP,
sizeof(ErtsPortDataHeap) + hsize*(sizeof(Eterm)-1));

使用起来非常简单。但是Erlang系统是个靠消息传递的语言,每个消息传递都需要分配内存,在自动Gc的时候需要释放内存,在典型的服务器上比如proxy, 每天单binary数据类型的分配和释放达到1亿次之多,所以内存分配器的效率就显的特别的重要。 所以erlang采用了一套非常庞杂的内存分配系统来满足这种需求,见下图:

erlang_memory_overview

粗粗的讲,内存分配器从sys_alloc和mseg_alloc批发内存,然后再零售给终端用户。其中sys_alloc就是libc的malloc, mseg_alloc就是mmap, 通过这二个接口从操作系统大批量申请内存,我们把上图的相关部分放大下看:

erlang_memory_mmap

我们今天要讲的就是红框的那部分,erlang系统偏向于从mmap申请内存,因为过程比libc或者tcmalloc比较可控。所以如果Erlang的应用内存使用非常密集和需求变化很大的时候,就需要经常从操作系统那里批发和归还内存。而批发通常是通过mmap来的,这就是为什么我们strace beam的时候,进程会发现有很多mmap系统调用。

我们知道mmap系统调用是要进入内核再出来的。内核在内核空间维护了一颗树(比如红黑树)来管理虚拟内存。当系统调用次数非常多的时候,开销就出来了。既然mmap是用树在内核空间,那为什么我们不能在erlang内存分配器里面自己来维护呢?这样算法是一样的,但是减少了进出内核的开销。基于这个思路,最近rickard-sverker同学为Erlang R16B03添加了supercarrier, 具体参见这里

这个super carrier的原理就是通过一次向内核申请大量的内存自己管理,进一步减少mmap的调用次数,虽然mseg_alloc已经做了简单的段cache有点效果了.

我们来看下supercarrier的使用文档:

+MMscmgc
Set super carrier max guaranteed no of carriers. This parameter defaults to 65536. This parameter determines an amount of pre-allocated structures that is needed in order to keep track of different areas in the super carrier. When the system runs out of such structures it may crash due to an out of memory condition.
+MMsco true|false
Set super carrier only flag. This flag defaults to true. When a super carrier is used and this flag is true, the system will crash when a carrier request cannot be satisfied by the super carrier. When the flag is false the system will try to create requested carrier by other means.

NOTE: Setting this flag to false may not be supported on all systems. This flag will in that case be ignored.

NOTE: The super carrier cannot be enabled nor disabled on halfword heap systems. This flag will be ignored on halfword heap systems.
+MMscrpm true|false
Set super carrier reserve physical memory flag. This flag defaults to true. When this flag is true, physical memory will be reserved for the whole super carrier at once when it is created. The reservation will after that be left unchanged. When this flag is set to false only virtual address space will be reserved for the super carrier upon creation. The system will attempt to reserve physical memory upon carrier creations in the super carrier, and attempt to unreserve physical memory upon carrier destructions in the super carrier.

NOTE: What reservation of physical memory actually means highly depends on the operating system, and how it is configured. For example, different memory overcommit settings on Linux drastically change the behaviour. Also note, setting this flag to false may not be supported on all systems. This flag will in that case be ignored.

NOTE: The super carrier cannot be enabled nor disabled on halfword heap systems. This flag will be ignored on halfword heap systems.
+MMscs
Set super carrier size (in MB). The super carrier size defaults to zero; i.e, the super carrier is by default disabled. The super carrier is a large continuous area in the virtual address space. The system will always try to create new carriers in the super carrier.

NOTE: The super carrier cannot be enabled nor disabled on halfword heap systems. This flag will be ignored on halfword heap systems.

关键参数有二个:MMscs控制一次向内核申请的内存的总量,MMscrpm控制申请的内存要不要马上兑现(马上分配物理内存)。

我们来演示下supercarrier的使用,我们一次性给到erts 16G内存,用到的beam版本是2013/11/02号github上的erlang/otp master分支:

$ r16131102/bin/erl +MMscs 16384 +MMscrpm true
Erlang R17A (erts-5.11) [source-69a0e01] [64-bit] [smp:16:16] [async-threads:10] [hipe] [kernel-poll:false] [lock-counting] [systemtap]

Eshell V5.11  (abort with ^G)
1> erlang:system_info({allocator,mseg_alloc})
[{erts_mmap,[{supercarrier,[{sizes,[{total,17179869184},
                                    {total_sa,34078720},
                                    {total_sua,0},
                                    {used,38199296},
                                    {used_sa,33816576},
                                    {used_sua,0}]},
                            {free_segs,[{used,1},
                                        {max,2},
                                        {allocated,64},
                                        {reserved,68480},
                                        {used_sa,1},
                                        {used_sua,0}]}]}]},
 {instance,0,
           [{version,"0.9"},
            {options,[{amcbf,4194304},
                      {rmcbf,20},
                      {mcs,10},
                      {scs,17175486464},
                      {sco,true},
                      {scrpm,true},
                      {scmgc,68480}]},
            {memkind,[{name,"all memory"},
                      {status,[{cached_segments,0},
                               {cache_hits,16},
                               {segments,9,9,10},
                               {segments_size,29097984,29097984,29360128},
                               {segments_watermark,9}]},
                      {calls,[{mseg_alloc,0,27},
                              {mseg_dealloc,0,18},
                              {mseg_realloc,0,0},
                              {mseg_create_resize,0,0},
                              {mseg_create,0,11},
                              {mseg_destroy,0,3},
                              {mseg_recreate,0,0},
                              {mseg_clear_cache,0,0},
                              {mseg_check_cache,0,2}]}]}]},
...
 {instance,16,
           [{version,"0.9"},
            {options,[{amcbf,4194304},
                      {rmcbf,20},
                      {mcs,10},
                      {scs,17175486464},
                      {sco,true},
                      {scrpm,true},
                      {scmgc,68480}]},
            {memkind,[{name,"all memory"},
                      {status,[{cached_segments,0},
                               {cache_hits,0},
                               {segments,0,0,0},
                               {segments_size,0,0,0},
                               {segments_watermark,0}]},
                      {calls,[{mseg_alloc,0,0},
                              {mseg_dealloc,0,0},
                              {mseg_realloc,0,0},
                              {mseg_create_resize,0,0},
                              {mseg_create,0,0},
                              {mseg_destroy,0,0},
                              {mseg_recreate,0,0},
                              {mseg_clear_cache,0,0},
                              {mseg_check_cache,0,0}]}]}]}]

2> erts_debug:set_internal_state(available_internal_state, true).
false

=ERROR REPORT==== 3-Nov-2013::19:56:21 ===
Process <0.32.0> enabled access to the emulator internal state.
NOTE: This is an erts internal test feature and should *only* be used by OTP test-suites.

3> erts_debug:get_internal_state(mmap).  
[{sa_free_segs,[{140124147679232,140124147941376}]},
 {sua_free_segs,[]},
 {sabot,140124119629824},
 {satop,140124153708544},
 {suabot,140141295116288},
 {suatop,140141295116288}]

通过crashdump或者erlang系统无所不在的自省系统特别是内存分配器暴露出来信息,我们可以很容易知道super carrier的使用情况和效率,方便进一步的优化参数。

小结:erlang在内存管理方面,其实就是吃了内核,再吃了libc.

祝玩得开心!

图片来源:Characterizing the Scalability of Erlang VM on Many-core Processors

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. No comments yet.
Comments are closed.