Home > Erlang探索, 源码分析, 调优 > Erlang vheap刨析和注意事项

Erlang vheap刨析和注意事项

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: Erlang vheap刨析和注意事项

Erlang从R13B03开始引入了vheap的概念,具体参见这篇文章:R13B03 binary vheap有助减少binary内存压力
官方的release note里面简单的解释了下:

OTP-8202 A new garbage collecting strategy for binaries which is more
aggressive than the previous implementation. Binaries now has
a virtual binary heap tied to each process. When binaries are
created or received to a process it will check if the heap
limit has been reached and if a reclaim should be done. This
imitates the behavior of ordinary Erlang terms. The virtual
heaps are grown and shrunk like ordinary heaps. This will
lessen the memory footprint of binaries in a system.

但除此之外,无法找到更细的文档。最近在做的服务器程序里面用到了大量的binary, 需要对binary的行为做详细的分析,所以就顺便把vheap好好整理下。

我们首先看下如何控制vheap.
首先看下全局的设置, 参见这里

+hmbs Size
Sets the default binary virtual heap size of processes to the size Size.

如果不设定的话,这个值默认是:

./erl_vm.h:62:#define VH_DEFAULT_SIZE 32768 /* default virtual (bin) heap min size (words) */

我们来验证下我们的设置:

Erlang R16B03 (erts-5.10.4) [source-73d1b4a] [64-bit] [smp:16:16] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
1>  erlang:system_info(min_bin_vheap_size).
{min_bin_vheap_size,46422}

oops, 怎么对不起来呢?
看下代码,原来在erl_init里面会对这个值再进行调整:

    BIN_VH_MIN_SIZE = erts_next_heap_size(BIN_VH_MIN_SIZE, 0);

而这个调整是根据fib数列来的,参看fib数列的实现:

static Sint heap_sizes[MAX_HEAP_SIZES]; /* Suitable heap sizes. */
static int num_heap_sizes;      /* Number of heap sizes. */

void
erts_init_gc(void)
{
{
...
    /* Growth stage 1 - Fibonacci + 1*/
    /* 12,38 will hit size 233, the old default */

    heap_sizes[0] = 12;
    heap_sizes[1] = 38;

    for(i = 2; i < 23; i++) {
        /* one extra word for block header */
        heap_sizes[i] = heap_sizes[i-1] + heap_sizes[i-2] + 1;
    }

...
}
erts_next_heap_size(Uint size, Uint offset)
{
    if (size < heap_sizes[0]) {
        return heap_sizes[0];
    } else {
        Sint* low = heap_sizes;
        Sint* high = heap_sizes + num_heap_sizes;
        Sint* mid;

        while (low < high) {
            mid = low + (high-low) / 2;
            if (size < mid[0]) {
                high = mid;
            } else if (size == mid[0]) {
                ASSERT(mid+offset-heap_sizes < num_heap_sizes);
                return mid[offset];
            } else if (size < mid[1]) {
                ASSERT(mid[0] < size && size <= mid[1]);
                ASSERT(mid+offset-heap_sizes < num_heap_sizes);
                return mid[offset+1];
            } else {
                low = mid + 1;
            }
        }
        erl_exit(1, "no next heap size found: %beu, offset %beu\n", size, offset);
    }
    return 0;
}

而vheap的扩展算法见官方:

The vheap size series will now use the golden ratio instead of doubling and fibonacci sequences.

代码如下:

static Uint64
do_next_vheap_size(Uint64 vheap, Uint64 vheap_sz) {

    /*                grow                                                                                                
     *                                                                                                                    
     * vheap_sz ======================                                                                                    
     *                                                                                                                    
     * vheap 75% +    grow                                                                                                
     *          ----------------------                                                                                    
     *                                                                                                                    
     * vheap 25 - 75% same                                                                                                
     *          ----------------------                                                                                    
     *                                                                                                                    
     * vheap ~ - 25% shrink                                                                                               
     *                                                                                                                    
     *          ----------------------                                                                                    
     */

    if ((Uint64) vheap/3 > (Uint64) (vheap_sz/4)) {
        Uint64 new_vheap_sz = vheap_sz;

        while((Uint64) vheap/3 > (Uint64) (vheap_sz/4)) {
            /* the golden ratio = 1.618 */
            new_vheap_sz = (Uint64) vheap_sz * 1.618;
            if (new_vheap_sz < vheap_sz ) {
                return vheap_sz;
            }
            vheap_sz = new_vheap_sz;
        }

        return vheap_sz;
    }

    if (vheap < (Uint64) (vheap_sz/4)) {
        return (vheap_sz >> 1);
    }

    return vheap_sz;
}

我们用gdb设个断点来验证下:

Breakpoint 1, erts_next_heap_size (size=32768, offset=0) at beam/erl_gc.c:244
(gdb) p BIN_VH_MIN_SIZE
$1 = 46422

这次数字对的起来了,默认vheapd的大小是 46422 words, 在64位机器上是 362K 左右。

和堆的最小大小设置一样,vheap也可以精细到进程级别的限制。
有以下二种方法:
1. 在创建进程的时候,用erlang:spawn_opt。 有个参数{min_bin_vheap_size, VSize :: non_neg_integer()}.
2. 进程创建好了以后,用erlang:system_flag修改。erlang:system_flag(min_bin_vheap_size, MinBinVHeapSize)

设置好了,用erlang:process_info(Pid, min_bin_vheap_size)来确认。具体文档参见这里.

当我们系统出现内存紧张,怀疑是binary过多的时候,我们如何来确认呢? 我们之前提过dbg, 而dbg也是对erlang trace机制的封装,用它来把gc_start时候进程的情况给了解到:

{trace, Pid, gc_start, Info}
Sent when garbage collection is about to be started. Info is a list of two-element tuples, where the first element is a key, and the second is the value. You should not depend on the tuples have any defined order. Currently, the following keys are defined:

heap_size
The size of the used part of the heap.
heap_block_size
The size of the memory block used for storing the heap and the stack.
old_heap_size
The size of the used part of the old heap.
old_heap_block_size
The size of the memory block used for storing the old heap.
stack_size
The actual size of the stack.
recent_size
The size of the data that survived the previous garbage collection.
mbuf_size
The combined size of message buffers associated with the process.
bin_vheap_size
The total size of unique off-heap binaries referenced from the process heap.
bin_vheap_block_size
The total size of binaries, in words, allowed in the virtual heap in the process before doing a garbage collection.
bin_old_vheap_size
The total size of unique off-heap binaries referenced from the process old heap.
bin_vheap_block_size
The total size of binaries, in words, allowed in the virtual old heap in the process before doing a garbage collection.
All sizes are in words.

从gc开始时候进程信息中,我们可以知道vheap和普通的堆是分开描述的,也就是说普通的数据结构和binary是分别进行gc收集的。
到现在为止,我们就很清楚vheap的作用。
那么系统在什么时候会触发binary gc收集调整vheap的大小呢?我们简单的grep下就知道:

./erl_process.c:7215: || (MSO(p).overhead > BIN_VHEAP_SZ(p)))) {
./beam_emu.c:364: if ((E – HTOP < need) || (MSO(c_p).overhead + (VNh) >= BIN_VHEAP_SZ(c_p))) {\
./beam_emu.c:1607: if (c_p->mbuf || MSO(c_p).overhead >= BIN_VHEAP_SZ(c_p)) {
./beam_emu.c:2585: if (c_p->mbuf || MSO(c_p).overhead >= BIN_VHEAP_SZ(c_p)) {

通俗点就是:
1. 程序需要创建binary,进行binary内存分配发现vheap不够的时候。
2. erlang进程发生切换的时候。
3. 发生消息(send),发现vheap不够的时候。
4. 调用完bif,发现vheap不够的时候。

有了vheap使用的第一手资料,我们对系统的监控就很方便了, erlang还很贴心的提供了和垃圾回收相关的重要信息。比如说erlang:process_info(Pid, total_heap_size).可以知道每个进程用到的总的堆大小。

$ erl
Erlang R15B03 (erts-5.9.3.1)  [64-bit] [smp:16:16] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9.3.1  (abort with ^G)
1> erlang:system_info(garbage_collection).
[{min_bin_vheap_size,46368},
 {min_heap_size,233},
 {fullsweep_after,65535}]

小结:erlang的相关体系做的很到位,总是让用户感觉很方便。使用大量binary的时候,一定要记得统计每个进程vheap的情况,找出可能的异常。

目前来讲vheap只是和binary有关,根据名字推断未来不排除会引入其他概念。

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. No comments yet.
  1. No trackbacks yet.