Yu Feng | 系统技术非业余研究

Erlang vheap刨析和注意事项

October 19th, 2013 Yu Feng Comments off

Erlang从R13B03开始引入了vheap的概念，具体参见这篇文章：R13B03 binary vheap有助减少binary内存压力
官方的release note里面简单的解释了下：

OTP-8202 A new garbage collecting strategy for binaries which is more
aggressive than the previous implementation. Binaries now has
a virtual binary heap tied to each process. When binaries are
created or received to a process it will check if the heap
limit has been reached and if a reclaim should be done. This
imitates the behavior of ordinary Erlang terms. The virtual
heaps are grown and shrunk like ordinary heaps. This will
lessen the memory footprint of binaries in a system.

但除此之外，无法找到更细的文档。最近在做的服务器程序里面用到了大量的binary, 需要对binary的行为做详细的分析，所以就顺便把vheap好好整理下。

我们首先看下如何控制vheap.
首先看下全局的设置，参见这里

+hmbs Size
Sets the default binary virtual heap size of processes to the size Size.

如果不设定的话，这个值默认是：

./erl_vm.h:62:#define VH_DEFAULT_SIZE 32768 /* default virtual (bin) heap min size (words) */

我们来验证下我们的设置：

Erlang R16B03 (erts-5.10.4) [source-73d1b4a] [64-bit] [smp:16:16] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
1>  erlang:system_info(min_bin_vheap_size).
{min_bin_vheap_size,46422}

oops, 怎么对不起来呢？
看下代码，原来在erl_init里面会对这个值再进行调整：
Read more…

Categories: Erlang探索, 源码分析, 调优 Tags: hmbs, min_bin_vheap_size, vheap

决定vheap大小的golden ratio算法（1.61803398875）鉴赏

October 19th, 2013 Yu Feng 1 comment

摘抄自Erlang release note:

The vheap size series will now use the golden ratio instead of doubling and fibonacci sequences.

决定binary heap的大小现在是黄金分割率算法，很有意思，给大家参考下：

/* grow
*
* vheap_sz ======================
*
* vheap 75% + grow
* ———————-
*
* vheap 25 – 75% same
* ———————-
*
* vheap ~ – 25% shrink
*
* ———————-
*/

代码如下：

//erl_gc.c:2155
static Uint64
do_next_vheap_size(Uint64 vheap, Uint64 vheap_sz) {
    if ((Uint64) vheap/3 > (Uint64) (vheap_sz/4)) {
        Uint64 new_vheap_sz = vheap_sz;
        while((Uint64) vheap/3 > (Uint64) (vheap_sz/4)) {
            /* the golden ratio = 1.618 */
            new_vheap_sz = (Uint64) vheap_sz * 1.618;
            if (new_vheap_sz < vheap_sz ) {
                return vheap_sz;
            }
            vheap_sz = new_vheap_sz;
        }

        return vheap_sz;
    }

    if (vheap < (Uint64) (vheap_sz/4)) {
        return (vheap_sz >> 1);
    }

    return vheap_sz;

}

祝大家玩得开心！

Categories: Erlang探索, 源码分析 Tags: golden ratio, vheap

heart低级bug修复过程

October 16th, 2013 Yu Feng 3 comments

昨天晚上@华侨E 同学在微博上问了个问题：

想跟你探讨下Erlang heart的一个问题？就是打开启heart的时候，调用heart:set_cmd/1了设置自启命令后，如果这个命令字符长度大于128以上的时候，再调用heart:get_cmd/0时就会获取不到上面设置的命令，接着引起系统挂起，并且与beam通讯60秒超时，然后发生重启，看了heart.c的代码也没发现什么问题。不知道你有什么思路?

heart是Erlang系统可靠性最后的防线，如果有问题后果很严重的, 晚节不保。

我们马上来重现下这个问题：

$ erl -heart
heart_beat_kill_pid = 29045
Erlang R17A (erts-5.11) [source-18d4e3e] [64-bit] [smp:16:16] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.11  (abort with ^G)
1> Cmd=string:copies("a",128).
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
2> heart:set_cmd(Cmd).
ok
3> heart:get_cmd().   
heart: Wed Oct 16 10:18:20 2013: heart-beat time-out, no activity for 63 seconds
Killed
$ sh: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: command not found
heart: Wed Oct 16 10:18:21 2013: Executed "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" -> 32512. Terminating.

果然调用heart:get_cmd的进程被挂起，63秒后系统vm进程被heart杀掉，试图重新启动一个新进程。上面的实验可以验证2个事情：
1. heart:get_cmd调用在Cmd超过128长度的时候被挂起
2. heart:set_cmd的结果是对的，我们设定的Cmd即使超过128也是正常的。
3. heart重启机制是正常的。

我们接着调查，祭出我们的利器 dbg ，来帮忙看下为什么get_cmd被挂起。
我们用dbg来跟踪下heart模块的函数调用情况：
Read more…

Categories: Erlang探索, 源码分析 Tags: get_cmd, heart

调查使用binary最多TOPN进程

October 15th, 2013 Yu Feng Comments off

Erlang程序是非常健壮的，通常一个典型的虚拟机里面跑很多进程，这些进程即使有bug，按照erlang的哲学是快速死掉，系统留下异常堆栈，很容易发现问题。照理说erlang是很少crash的，但实际情况不是这样的。

在erlang VM crash的案例中，我们会发现大部分的网络服务器的原因都是binary内存不够申请不出来，所以内存短缺是最致命的影响稳定的因素。通常设计良好的erlang程序，按照otp的设计哲学不会占用太多内存的，即使占用了gc也很快就会回收的，除了binary这个内存使用大户。

比如说网络服务器程序，我们用binary来保存用户的封包，我们无法预测用户要发送多大的包，比如上限是50M，如果我们的系统有1000个这样的用户，在极端情况下，我们是要耗用50G内存的。通常在这种情况下，我们拿不出这么多物理内存，然后crash就很大概率会发生。

调查哪些进程用掉了最多的binary内存就很有必要了，我们在极端情况下，可以选择性的杀掉这些内存，保护自己不至于毁灭。

erlang:process_info有个未公开的选项 binary用来获取这个进程拥有的binary情况。

我们看下它的实现：

static Eterm
bld_bin_list(Uint **hpp, Uint *szp, ErlOffHeap* oh)
{
    struct erl_off_heap_header* ohh;
    Eterm res = NIL;
    Eterm tuple;

    for (ohh = oh->first; ohh; ohh = ohh->next) {
        if (ohh->thing_word == HEADER_PROC_BIN) {
            ProcBin* pb = (ProcBin*) ohh;
            Eterm val = erts_bld_uword(hpp, szp, (UWord) pb->val);
            Eterm orig_size = erts_bld_uint(hpp, szp, pb->val->orig_size);

            if (szp)
                *szp += 4+2;
            if (hpp) {
		Uint refc = (Uint) erts_smp_atomic_read_nob(&pb->val->refc);
                tuple = TUPLE3(*hpp, val, orig_size, make_small(refc));
                res = CONS(*hpp + 4, tuple, res);
                *hpp += 4+2;
            }
        }
    }
    return res;
}


Eterm
process_info_aux(Process *BIF_P,
                 Process *rp,
                 Eterm rpid,
                 Eterm item,
                 int always_wrap)
{
 ...
   case am_binary: {
        Uint sz = 3;
        (void) bld_bin_list(NULL, &sz, &MSO(rp));
        hp = HAlloc(BIF_P, sz);
        res = bld_bin_list(&hp, NULL, &MSO(rp));
        break;
    }
...
}

这个选项会返回一个bin情况tuple的列表，每个tuple的第一个是binary的地址，第二个是大小，第三个是引用次数。

我们来演示下如何使用：
Read more…

Categories: Erlang探索, 源码分析, 调优 Tags: binary, process_info

服务器时间校正思考

August 30th, 2013 Yu Feng 3 comments

大部分网络业务服务器都大量用到了时间，比如各种状态机，各种超时，各种取时间戳，如果机器的挂钟时间发生突变，没有特殊处理的服务器大部分都得挂。好的服务器程序如erlang, nginx等都有time correction机制，我这里就不罗嗦了，直接摘抄erlang的time correction文档，写的很好：

2 Time and time correction in Erlang

Time is vital to an Erlang program and, more importantly, correct time is vital to an Erlang program. As Erlang is a language with soft real time properties and we have the possibility to express time in our programs, the Virtual Machine and the language has to be very careful about what is considered a correct point in time and in how time functions behave.

In the beginning, Erlang was constructed assuming that the wall clock time in the system showed a monotonic time moving forward at exactly the same pace as the definition of time. That more or less meant that an atomic clock (or better) was expected to be attached to your hardware and that the hardware was then expected to be locked away from any human (or unearthly) tinkering for all eternity. While this might be a compelling thought, it’s simply never the case.

A “normal” modern computer can not keep time. Not on itself and not unless you actually have a chip level atomic clock wired to it. Time, as perceived by your computer, will normally need to be corrected. Hence the NTP protocol that together with the ntpd process will do it’s best to keep your computers time in sync with the “real” time in the universe. Between NTP corrections, usually a less potent time-keeper than an atomic clock is used.

But NTP is not fail safe. The NTP server can be unavailable, the ntp.conf can be wrongly configured or your computer may from time to time be disconnected from the internet. Furthermore you can have a user (or even system administrator) on your system that thinks the right way to handle daylight saving time is to adjust the clock one hour two times a year (a tip, that is not the right way to do it…). To further complicate things, this user fetched your software from the internet and has never ever thought about what’s the correct time as perceived by a computer. The user simply does not care about keeping the wall clock in sync with the rest of the universe. The user expects your program to have omnipotent knowledge about the time.

Most programmers also expect time to be reliable, at least until they realize that the wall clock time on their workstation is of by a minute. Then they simply set it to the correct time, maybe or maybe not in a smooth way. Most probably not in a smooth way.

The amount of problems that arise when you expect the wall clock time on the system to always be correct may be immense. Therefore Erlang introduced the “corrected estimate of time”, or the “time correction” many years ago. The time correction relies on the fact that most operating systems have some kind of monotonic clock, either a real time extension or some built in “tick counter” that is independent of the wall clock settings. This counter may have microsecond resolution or much less, but generally it has a drift that is not to be ignored.

So we have this monotonic ticking and we have the wall clock time. Two unreliable times that together can give us an estimate of an actual wall clock time that does not jump around and that monotonically moves forward. If the tick counter has a high resolution, this is fairly easy to do, if the counter has a low resolution, it’s more expensive, but still doable down to frequencies of 50-60 Hz (of the tick counter).

So the corrected time is the nearest approximation of an atomic clock that is available on the computer. We want it to have the following properties:

Monotonic
The clock should not move backwards
Intervals should be near the truth
We want the actual time (as measured by an atomic clock or an astronomer) that passes between two time stamps, T1 and T2, to be as near to T2 – T1 as possible.
Tight coupling to the wall clock
We want a timer that is to be fired when the wall clock reaches a time in the future, to fire as near to that point in time as possible
To meet all the criteria, we have to utilize both times in such a way that Erlangs “corrected time” moves slightly slower or slightly faster than the wall clock to get in sync with it. The word “slightly” means a maximum of 1% difference to the wall clock time, meaning that a sudden change in the wall clock of one minute, takes 100 minutes to fix, by letting all “corrected time” move 1% slower or faster.

Needless to say, correcting for a faulty handling of daylight saving time may be disturbing to a user comparing wall clock time to for example calendar:now_to_local_time(erlang:now()). But calendar:now_to_local_time/1 is not supposed to be used for presenting wall clock time to the user.

Time correction is not perfect, but it saves you from the havoc of clocks jumping around, which would make timers in your program fire far to late or far to early and could bring your whole system to it’s knees (or worse) just because someone detected a small error in the wall clock time of the server where your program runs. So while it might be confusing, it is still a really good feature of Erlang and you should not throw it away using time functions which may give you higher benchmark results, not unless you really know what you’re doing.

2.1 What does time correction mean in my system?

Time correction means that Erlang estimates a time from current and previous settings of the wall clock, and it uses a fairly exact tick counter to detect when the wall clock time has jumped for some reason, slowly adjusting to the new value.

In practice, this means that the difference between two calls to time corrected functions, like erlang:now(), might differ up to one percent from the corresponding calls to non time corrected functions (like os:timestamp()). Furthermore, if comparing calendar:local_time/0 to calendar:now_to_local_time(erlang:now()), you might temporarily see a difference, depending on how well kept your system is.

It is important to understand that it is (to the program) always unknown if it is the wall clock time that moves in the wrong pace or the Erlang corrected time. The only way to determine that, is to have an external source of universally correct time. If some such source is available, the wall clock time can be kept nearly perfect at all times, and no significant difference will be detected between erlang:now/0’s pace and the wall clock’s.

Still, the time correction will mean that your system keeps it’s real time characteristics very well, even when the wall clock is unreliable.

2.2 Where does Erlang use corrected time?

For all functionality where real time characteristics are desirable, time correction is used. This basically means:

erlang:now/0
The infamous erlang:now/0 function uses time correction so that differences between two “now-timestamps” will correspond to other timeouts in the system. erlang:now/0 also holds other properties, discussed later.
receive … after
Timeouts on receive uses time correction to determine a stable timeout interval.
The timer module
As the timer module uses other built in functions which deliver corrected time, the timer module itself works with corrected time.
erlang:start_timer/3 and erlang:send_after/3
The timer BIF’s work with corrected time, so that they will not fire prematurely or too late due to changes in the wall clock time.
All other functionality in the system where erlang:now/0 or any other time corrected functionality is used, will of course automatically benefit from it, as long as it’s not “optimized” to use some other time stamp function (like os:timestamp/0).

Modules like calendar and functions like erlang:localtime/0 use the wall clock time as it is currently set on the system. They will not use corrected time. However, if you use a now-value and convert it to local time, you will get a corrected local time value, which may or may not be what you want. Typically older code tend to use erlang:now/0 as a wall clock time, which is usually correct (at least when testing), but might surprise you when compared to other times in the system.

2.3 What is erlang:now/0 really?

erlang:now/0 is a function designed to serve multiple purposes (or a multi-headed beast if you’re a VM designer). It is expected to hold the following properties:

Monotonic
erlang:now() never jumps backwards – it always moves forward
Interval correct
The interval between two erlang:now() calls is expected to correspond to the correct time in real life (as defined by an atomic clock, or better)
Absolute correctness
The erlang:now/0 value should be possible to convert to an absolute and correct date-time, corresponding to the real world date and time (the wall clock)
System correspondence
The erlang:now/0 value converted to a date-time is expected to correspond to times given by other programs on the system (or by functions like os:timestamp/0)
Unique
No two calls to erlang:now on one Erlang node should return the same value
All these requirements are possible to uphold at the same time if (and only if):

The wall clock time of the system is perfect
The system (Operating System) time needs to be perfectly in sync with the actual time as defined by an atomic clock or a better time source. A good installation using NTP, and that is up to date before Erlang starts, will have properties that for most users and programs will be near indistinguishable from the perfect time. Note that any larger corrections to the time done by hand, or after Erlang has started, will partly (or temporarily) invalidate some of the properties, as the time is no longer perfect.
Less than one call per microsecond to erlang:now/0 is done
This means that at any microsecond interval in time, there can be no more than one call to erlang:now/0 in the system. However, for the system not to loose it’s properties completely, it’s enough that it on average is no more than one call per microsecond (in one Erlang node).
The uniqueness property of erlang:now/0 is the most limiting property. It means that erlang:now() maintains a global state and that there is a hard-to-check property of the system that needs to be maintained. For most applications this is still not a problem, but a future system might very well manage to violate the frequency limit on the calls globally. The uniqueness property is also quite useless, as there are globally unique references that provide a much better unique value to programs. However the property will need to be maintained unless a really subtle backward compatibility issue is to be introduced.

2.4 Should I use erlang:now/0 or os:timestamp/0

The simple answer is to use erlang:now/0 for everything where you want to keep real time characteristics, but use os:timestamp for things like logs, user communication and debugging (typically timer:ts uses os:timestamp, as it is a test tool, not a real world application API). The benefit of using os:timestamp/0 is that it’s faster and does not involve any global state (unless the operating system has one). The downside is that it will be vulnerable to wall clock time changes.

2.5 Turning off time correction

If, for some reason, time correction causes trouble and you are absolutely confident that the wall clock on the system is nearly perfect, you can turn off time correction completely by giving the +c option to erl. The probability for this being a good idea, is very low.

祝玩得开心！

Categories: Erlang探索 Tags: time correction

application配置文件和热升级

August 29th, 2013 Yu Feng Comments off

前面我们一直说过erlang是以app为单位来组织程序，数据，配置等信息，让这些信息聚合在一起成为一个整体，设计上和unix系统一模一样。那app的配置信息存在哪里呢？

配置信息有三种方式体现(其实是4种)：
1. .app文件里面的env字段, 通常是MyApplication.app，具体参见这里
2. .config文件，通常是sys.config，具体参见这里
3. 命令行 erl -ApplName Par1 Val1 … ParN ValN 具体参见这里

我们摘抄重要的信息如下：
方式1：

7.8 Configuring an Application

An application can be configured using configuration parameters. These are a list of {Par, Val} tuples specified by a key env in the .app file.

{application, ch_app,
[{description, “Channel allocator”},
{vsn, “1”},
{modules, [ch_app, ch_sup, ch3]},
{registered, [ch3]},
{applications, [kernel, stdlib, sasl]},
{mod, {ch_app,[]}},
{env, [{file, “/usr/local/log”}]}
]}.
Par should be an atom, Val is any term. The application can retrieve the value of a configuration parameter by calling application:get_env(App, Par) or a number of similar functions, see application(3)

方式2：

A configuration file contains values for configuration parameters for the applications in the system. The erl command line argument -config Name tells the system to use data in the system configuration file Name.config.

Configuration parameter values in the configuration file will override the values in the application resource files (see app(4)). The values in the configuration file can be overridden by command line flags (see erl(1)).

The value of a configuration parameter is retrieved by calling application:get_env/1,2.

方式3：

The values in the .app file, as well as the values in a system configuration file, can be overridden directly from the command line:

% erl -ApplName Par1 Val1 … ParN ValN

这三种方式都可以很方便的来设置应用的配置信息，由于一个应用会依赖于其他很多应用，所以会有很多的配置信息，这里我比较推荐sys.config方式，这也是rebar组织配置文件的标准形式。
Read more…

Categories: Erlang探索, 源码分析 Tags: application, config_change, env

erlang和其他语言读文件性能大比拼

August 28th, 2013 Yu Feng 26 comments

百岁同学说：

今天公司技术比武，比赛题目是给一个1.1g的大文本，统计文本中词频最高的前十个词。花了两天用erlang写完了代码，但是放到公司16核的机器上这么一跑，结果不比不知道，一比吓一条。erlang写的代码执行时间花了55秒左右，同事们有的用java，有的用C，还有的用C++，用C最快一个老兄只花了2.6秒，用java的也只用了3.2秒。相比之下erlang的代码，真是一头大蜗牛，太慢了。

详细参见这篇文章：http://www.iteye.com/topic/1131748

读取文件并且分析这是很多脚本语言如perl, python,ruby经常会干的事情.这个同学的问题是很普遍的问题，不只一个人反映过慢的问题。
今天我们来重新来修正下这个看法，我们用数据说话。

首先我们来准备下文件, 这个文件是完全的随机数，有1G大小：

$ dd if=/dev/urandom  of=test.dat count=1024 bs=1024K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 188.474 s, 5.7 MB/s
$ time dd if=test.dat of=/dev/null 
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 1.16021 s, 925 MB/s

real    0m1.162s
user    0m0.219s
sys     0m0.941s
$ time dd if=test.dat of=/dev/null bs=1024k
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.264298 s, 4.1 GB/s

real    0m0.266s
user    0m0.000s
sys     0m0.267s

我们准备了1G大小左右的文件，由于用的是buffered io, 数据在准备好了后，全部缓存在pagecache里面，只要内存足够，这个测试的性能和IO设备无关。我们试着用dd读取这个文件，如果块大小是4K的话，读取这个文件花了1.16秒，而如果块大小是1M的话，0.26秒，带宽达到4.1GB每秒，远超过真实设备的速度。

那么我们用erlang来读取下这个文件来比较下，我们有三种读法：
1. 一下子读取整个1G文件。
2. 一个线程一次读取1块，比如1M大小，直到读完。
3. 多个线程读取，每个读取一大段，每次读取1M块大小。
然后比较下性能。

首先普及下背景：
1. erlang的文件IO操作由efile driver来提高，这个driver内部有个线程池，大小由+A 参数控制，所以IO是多线程完成的。
2. erlang的文件分二种模式： 1. raw模式 2. io模式在raw模式下，数据直接由driver提供给调用进程， io模式下数据先经过file_server做格式化，然后再给调用进程。
3. 数据可以以binary和list方式返回，list方式下文件内容的byte就是一个整数，在64位机器上占用8个字节内存。
Read more…

Categories: Erlang探索, Linux, 调优 Tags: dd, file, read, thread_pool_size

Newer Entries Older Entries

系统技术非业余研究

Archive

Erlang vheap刨析和注意事项

决定vheap大小的golden ratio算法（1.61803398875）鉴赏

heart低级bug修复过程

调查使用binary最多TOPN进程

服务器时间校正思考

application配置文件和热升级

erlang和其他语言读文件性能大比拼

buy me a coffee.

Recent Posts

Recent Comments

Categories

Blogroll

Archives

Meta