最快的Erlang http hello world 服务器调优指南 (20Khttp短链接请求/S每桌面CPU)

November 4th, 2009 11 comments

erl的虚拟机有2种方式 plain版本的和smp版本的。 smp版本由于锁的开销相比要比plain版本的慢很多。而32位机器由于内存访问比64位的少,也会快出很多。所有我选择在32位的linux系统下调优这个httpd服务器。这个服务器就是实现个简单的功能,在browser下返回hello world。以下我们会先编译我们的优化版本的虚拟机,然后再分别测试R13B02的标准版本的和我们优化版的性能:

root@nd-desktop:/build_opt_plain# uname -a
Linux nd-desktop 2.6.31-14-generic #3 SMP Sun Nov 1 23:03:10 CST 2009 i686 GNU/Linux

root@nd-desktop:/# apt-get build-dep erlang

#下载otp R13B02-1源码包
root@nd-desktop:/# wget

root@nd-desktop:/# tar xzvf build_opt_plain.tar.gz

root@nd-desktop:/# tar xzf otp_src_R13B02-1.tar.gz

root@nd-desktop:/# cd otp_src_R13B02-1
root@nd-desktop:/otp_src_R13B02-1# patch -p1 <../build_opt_plain/otp_src_R13B02-1_patch_by_yufeng
patching file erts/emulator/beam/erl_binary.h
patching file erts/emulator/beam/erl_process.c
patching file erts/emulator/beam/sys.h
patching file erts/emulator/drivers/common/inet_drv.c
patching file erts/preloaded/src/Makefile
patching file erts/preloaded/src/prim_inet.erl
patching file lib/asn1/src/Makefile
patching file lib/hipe/Makefile
patching file lib/parsetools/src/Makefile
root@nd-desktop:/otp_src_R13B02-1# ../build_opt_plain/build.plain

如果编译都没有任何错误的话, 就大功告成了。

好 现在我们开始性能比较:


root@nd-desktop:/otp_src_R13B02-1# cd ../build_opt_plain
root@nd-desktop:/build_opt_plain# ulimit -n 99999

root@nd-desktop:/build_opt_plain# erlc ehttpd.erl
root@nd-desktop:/build_opt_plain# taskset -c 1 erl +K true +h 99999  +P 99999 -smp enable +S 2:1 -s ehttpd
Erlang R13B03 (erts-5.7.4) [source][/source] [smp:2:1] [rq:2] [async-threads:0] [hipe] [kernel-poll:true]

ehttpd ready with 2 schedulers on port 8888
Eshell V5.7.4  (abort with ^G)


[root@localhost src]# ab -c 60 -n 100000
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Copyright 2006 The Apache Software Foundation,

Benchmarking (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Finished 100000 requests

Server Software:       
Server Hostname:
Server Port:            8888

Document Path:          /
Document Length:        12 bytes

Concurrency Level:      60
Time taken for tests:   8.925945 seconds
Complete requests:      100000
Failed requests:        0
Write errors:           0
Total transferred:      5100051 bytes
HTML transferred:       1200012 bytes
Requests per second:    11203.29 [#/sec] (mean)
Time per request:       5.356 [ms] (mean)
Time per request:       0.089 [ms] (mean, across all concurrent requests)
Transfer rate:          557.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1  65.7      0    3001
Processing:     0    3   1.5      4       7
Waiting:        0    2   1.8      4       6
Total:          0    4  65.8      4    3007
WARNING: The median and mean for the waiting time are not within a normal deviation
        These results are probably not that reliable.

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      5
  95%      5
  98%      5
  99%      5
100%   3007 (longest request)

标准smp版本1个CPU的结果是: 11203.29 [#/sec] (mean)


root@nd-desktop:/build_opt_plain# erlc +native +"{hipe, [o3]}" ehttpd.erl
root@nd-desktop:/build_opt_plain# taskset -c 1 erl +K true +h 99999  +P 99999 -smp enable +S 2:1 -s ehttpd
Erlang R13B03 (erts-5.7.4) [source][/source] [smp:2:1] [rq:2] [async-threads:0] [hipe] [kernel-poll:true]

ehttpd ready with 2 schedulers on port 8888
Eshell V5.7.4  (abort with ^G)

标准smp hipe版本1个CPU结果是: 12390.32 [#/sec] (mean)


root@nd-desktop:/build_opt_plain#  ../otp_src_R13B02-1/bin/erlc  ehttpd.erl
root@nd-desktop:/build_opt_plain# taskset -c 1   ../otp_src_R13B02-1/bin/erl +K true +h 99999  +P 99999   -s ehttpd
Erlang R13B02 (erts-5.7.3) [source][/source] [rq:1] [hipe] [kernel-poll:true]

ehttpd ready with 1 schedulers on port 8888
Eshell V5.7.3  (abort with ^G)

优化版本单个cpu: 19662.37 [#/sec] (mean)


root@nd-desktop:/build_opt_plain#  ../otp_src_R13B02-1/bin/erlc +native +"{hipe, [o3]}"  ehttpd.erl
root@nd-desktop:/build_opt_plain# taskset -c 1   ../otp_src_R13B02-1/bin/erl +K true +h 99999  +P 99999   -s ehttpd
Erlang R13B02 (erts-5.7.3) [source][/source] [rq:1] [hipe] [kernel-poll:true]

ehttpd ready with 1 schedulers on port 8888
Eshell V5.7.3  (abort with ^G)

优化版本启用hipe单个cpu:20090.83 [#/sec] (mean)

附上我们的最小的高性能的http echo 服务器:

root@nd-desktop:/build_opt_plain# cat ehttpd.erl

start() ->
start(Port) ->
    N = erlang:system_info(schedulers),
    listen(Port, N),
    io:format("ehttpd ready with ~b schedulers on port ~b~n", [N, Port]),

    register(?MODULE, self()),
    receive Any -> io:format("~p~n", [Any]) end.  %% to stop: ehttpd!stop.

listen(Port, N) ->
    Opts = [{active, false},
            {backlog, 256},
            {packet, http_bin},
            {raw,6,9,<<1:32/native>>}, %defer accept
            {reuseaddr, true}],

    {ok, S} = gen_tcp:listen(Port, Opts),
    Spawn = fun(I) ->    
                    register(list_to_atom("acceptor_" ++ integer_to_list(I)),
                             spawn_opt(?MODULE, accept, [S, I], [link, {scheduler, I}]))
    lists:foreach(Spawn, lists:seq(1, N)).

accept(S, I) ->
    case gen_tcp:accept(S) of
        {ok, Socket} -> spawn_opt(?MODULE, loop, [Socket], [{scheduler, I}]);
        Error    -> erlang:error(Error)
    accept(S, I).

loop(S) ->
    case gen_tcp:recv(S, 0) of
        {ok, http_eoh} ->
            Response = <<"HTTP/1.1 200 OK\r\nContent-Length: 12\r\n\r\nhello world!">>,
            gen_tcp:send(S, Response),

        {ok, _Data} ->

        Error ->


root@nd-desktop:/build_opt_plain# cat /proc/cpuinfo
model name      : Pentium(R) Dual-Core  CPU      E5200  @ 2.50GHz

注:这个http服务器基本上是在c的程序跑,erlang的代码执行的很少, 所以hipe的提升效果不是很明显。对于复杂的业务,应该是有很大的帮助的。


优化版本的和标准版本的 20090:11203, 性能提高了将近80% 还是非常可观的。


转:CPU密集型计算 erlang和C 大比拼

August 30th, 2009 Comments off

Normalerweise compiliert Erlang Bytecode (heißt das so in Erlang?). Das coole daran ist, dass man die beam files leicht auf anderen Rechnern benutzen kann. Aber die Geschwindigkeit von diesem Code hat mich nicht überzeugen können. Darum habe ich ausprobiert wie gut der native Code ist den Erlang baut.

Der Versuchsaufbau ist einfach: Ich habe eine simple rekursive Funktion geschrieben, die Fibonaccizahlen berechnet. Dann wir 5-mal Fibonacci von 40 berechnet und die Zeit gemessen. Das ganze mache ich mit nur einem Kern. Diesen Test mache ich insgesamt 3-mal. Einmal mit nativem Erlangcode, einmal mit nicht nativem Erlangcode und einmal mit einem in C geschriebenen Programm. Der Benchmark besteht aus drei Dateien:



fib_test() -&gt;
 fib(40), fib(40), fib(40), fib(40), fib(40).

fib(0) -&gt; 1;
fib(1) -&gt; 1;
fib(N) -&gt; fib(N-1) + fib(N-2).


unsigned int fib(unsigned int n) {
 if (n == 0 || n == 1) {
 return 1;
 return fib(n-1) + fib(n-2);

int main() {
 fib(40); fib(40); fib(40); fib(40); fib(40);
 return 0;

all: native normal c

@erlc +native cpu_intensive.erl
@echo “”
@echo “Fibonacci Erlang native code”
@time erl -noshell -s cpu_intensive fib_test -s erlang halt

@erlc cpu_intensive.erl
@echo “”
@echo “Fibonacci Erlang non-native code”
@time erl -noshell -s cpu_intensive fib_test -s erlang halt

@gcc -O0 -o cpu_intensive cpu_intensive.c
@echo “”
@echo “Fibonacci written in C without optimizations”
@time ./cpu_intensive
Ich habe obige drei Dateien angelegt und die Makefile ausgeführt. Das Ergebnis war bei meinem Core 2 Duo 8400

Fibonacci Erlang native code
13,99 real        13,00 user         0,95 sys

Fibonacci Erlang non-native code
116,81 real       115,46 user         1,00 sys

Fibonacci written in C without optimizations
11,14 real        11,10 user         0,00 sys

August 23rd, 2009 2 comments

前篇文章 讲述了如何启用erlang hipe支持,但是用户程序大量依赖的标准库如stdlib, kernel等默认都不是native模式的, 所以我们的程序虽然启用了hipe,但是只是部分启用了。用oprofile等工具可以看到我们的程序还是在process_main(虚拟机的代码解释 在这里)里面打转。 我们来个极致的,通通hipe化。

1. 在编译otp_src的时候 export ERL_COMPILE_FLAGS=’+native +”{hipe, [o3]}”‘ 但是这个方案有个问题就是
native方式是和beam的模式有关的 如beam和beam.smp它的代码是不同的,但是所有的beam又公用一套库,这样只能舍弃一个了。所以这个方案就比较麻烦。

# erl
Erlang R13B01 (erts-5.7.2) [source][/source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.7.2  (abort with ^G)
1>  %%没问题

#erl -smp disable
<HiPE (v 3.7.2)> Warning: not loading native code for module fib: it was compiled for an incompatible runtime system; please regenerate native code for this runtime system
Erlang R13B01 (erts-5.7.2) [source][/source] [64-bit] [rq:1] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)

这个也可以通过修改 alias erl=erl -smp disable 以便欺骗编译器生成单cpu模式的beam

2. 动态编译, 等系统运行起来以后,动态把相关的模块编译一遍,这个思路看起来最简单。

我做了个原型 证明这样是可行的。。。

# cat hi.erl

[ turn(M, P)|| {M, P} &lt;-code:all_loaded(), P=/=preloaded].

turn(M, P) -&gt;
P1 = binary_to_list(iolist_to_binary(re:replace(filename:join(filename:dirname(P), filename:basename(P, ".beam")), "ebin", "src"))),
L = M:module_info(),
COpts = get_compile_options(L),

COpts1 = lists:foldr(fun({K, V}, Acc) when is_list(V) and is_integer(hd(V)) -&gt;[{K, tr(V)}] ++ Acc ; (Skip, Acc) -&gt; Acc ++ [Skip] end, [], COpts),
c:c(P1, COpts1 ++ [native, "{hipe, [o3]}"]).

binary_to_list(iolist_to_binary(re:replace(P, "/net/isildur/ldisk/daily_build/otp_prebuild_r13b01.2009-06-07_20/", "/home/yufeng/"))).  %%%这个地方要根据实际情况调整 具体的参看 m(lists).

get_compile_options(L) -&gt;
case get_compile_info(L, options) of
{ok,Val} -&gt; Val;
error -&gt; []

get_compile_info(L, Tag) -&gt;
case lists:keysearch(compile, 1, L) of
{value, {compile, I}} -&gt;
case lists:keysearch(Tag, 1, I) of
{value, {Tag, Val}} -&gt; {ok,Val};
false -&gt; error
false -&gt; error
#erl -nostick
Erlang R13B01 (erts-5.7.2) [source][/source][/source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.2  (abort with ^G)
1&gt; mnesia:start().  %启动我们的应用程序
2&gt; hi:do().

3&gt; m(dict).
Module dict compiled: Date: August 23 2009, Time: 17.20
Compiler options:  [{cwd,"/home/yufeng/otp_src_R13B01/lib/stdlib/src"},
debug_info,<span style="color: red;">native,"{hipe, [o3]}"</span>]
Object file: /home/yufeng/otp_src_R13B01/lib/stdlib/src/../ebin/dict.beam



不过编译过程中有几个模块是有点问题, 得改进下。

Post Footer automatically generated by wp-posturl plugin for wordpress.