port | 系统技术非业余研究

R16B port并行机制详解

October 20th, 2013 Yu Feng Comments off

原创文章，转载请注明： 转载自系统技术非业余研究

R16B发布的时候，其中一个很大的亮点就是R16B port并行机制, 摘抄官方的release note如下：

— Latency of signals sent from processes to ports — Signals
from processes to ports where previously always delivered
immediately. This kept latency for such communication to a
minimum, but it could cause lock contention which was very
expensive for the system as a whole. In order to keep this
latency low also in the future, most signals from processes
to ports are by default still delivered immediately as long
as no conflicts occur. Such conflicts include not being able
to acquire the port lock, but also include other conflicts.
When a conflict occur, the signal will be scheduled for
delivery at a later time. A scheduled signal delivery may
cause a higher latency for this specific communication, but
improves the overall performance of the system since it
reduce lock contention between schedulers. The default
behavior of only scheduling delivery of these signals on
conflict can be changed by passing the +spp command line flag
to erl(1). The behavior can also be changed on port basis
using the parallelism option of the open_port/2 BIF.

而且Jeff Martin同学也在qcon上发表了一篇文章特地提到这个事情，英文版见这里，中文版见这里

那么到底什么是R16B port并行机制呢？简单的说就是erl的这个选项：

+spp Bool
Set default scheduler hint for port parallelism. If set to true, the VM will schedule port tasks when it by this can improve the parallelism in the system. If set to false, the VM will try to perform port tasks immediately and by this improve latency at the expense of parallelism. If this flag has not been passed, the default scheduler hint for port parallelism is currently false. The default used can be inspected in runtime by calling erlang:system_info(port_parallelism). The default can be overriden on port creation by passing the parallelism option to open_port/2

作用呢？我们知道每个port都会有个锁来保证送给port的消息的先来后到，当有多个进程给port发送消息的话，必然要排队等前面的消息处理完毕。这是比较正常的行为。但是Erlang设计的哲学就是消息和异步通信，进程好好的时间浪费在排队上面总是不太爽。所以就搞了个port并行机制. 当进程发现需要排队的时候，他就把消息扔给port调度器，他自己就该干啥干啥去了，反正消息是异步的，他相信port调度器会把消息投递到。port调度器拿到用户委托的消息后，择机调度请求port去完成具体的任务。

类比下现实生活的例子。比如说我去邮局寄快递，比如顺风快递，我寄了后，他会给我一个邮单号码，时候顺风会通知我邮包的情况，当然我也可以用这个邮单号码主动去查询状态。我到邮局一看，顺风快递的柜台只有一个工作人员在忙，而且寄东西人的队伍比较排很长了，这时候我有二个选择： 1. 在队伍的后面排队。 2. 我请求邮局的工作人员（比如保安）（当然可以给点小费）把我的邮包先收下，在寄东西人少的时候帮我寄下，而我就可以走了。虽然我多花钱了，但是我花在上面的时间少了，这个小费可以挣的回来的。

port并行机制也是类似的原理。启用这个机制有二种方法：
1. 全局的。erl +spp Bool
2. per port的。open_port(PortName, PortSettings)的时候打开{parallelism, true}选项。

但是任何事情都有二面性。打开这个选项后需要注意什么呢？

我们还是拿前面的寄快递的例子来看，如果每个人都象我这样的都把邮包委托给保安去寄的话，那人多的话会有什么情况呢？保安那边有成堆的邮件，他领导一看，肯定要生气了，所以保安肯定会限制邮包数目。超过了，他就不接了。所以这就是调度器的水位线。而且顺风快递工作人员也有水位线，不如全杭州的人都来寄邮件他受的了？

那这二个水位线分别是多少呢？我之前写的这篇文章 gen_tcp发送缓冲区以及水位线问题分析解释的很清楚，我简单的复述下：

1. port自己的水位线，比如说inet_tcp是：
#define INET_HIGH_WATERMARK (1024*8) /* 8k pending high => busy */
#define INET_LOW_WATERMARK (1024*4) /* 4k pending => allow more */

这个水位线可以透过inet:setopts选项来设置：
{low_watermark, Size}
{high_watermark, Size} (TCP/IP sockets)

2. MSGQ高低水位线也是8/4K，最小值是1，高不封顶。当然也有选项可以设置。
{high_msgq_watermark, Size}
{low_msgq_watermark, Size}

这篇文章还解释了“A signal delivery”这个动作。每个port都要把消息发送出去处理了才有意义，那么这个发送动作其实就是call_driver_outputv，调用port特有的driver_outputv回调函数去做实际的事情。说白了port并行机制就是控制什么时候调用call_driver_outputv, 从原来的直接调，改成如果条件不合适，就让port调度器线程择机来调用。

小结：通过port并行机制可以大大提高整个VM中大量port的吞吐量，对于port或者网络密集型(gen_tcp就是个port)的应用会有很大的帮助。

祝玩得开心！

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析, 调优 Tags: +spp, parallelism, port, watermark

Erlang gen_tcp相关问题汇编索引

May 14th, 2013 Yu Feng 2 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: Erlang gen_tcp相关问题汇编索引

gen_tcp是erlang做网络应用最核心的一个模块，实践中使用起来会有很多问题，我把团队和我自己过去碰到的问题汇编下，方便大家对症下药.

以下是gen_tcp,tcp,port相关的博文：

待续，欢迎补充！

祝玩得开心！

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags: gen_tcp, port, TCP

Erlang open_port极度影响性能的因素

November 22nd, 2011 Yu Feng 4 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: Erlang open_port极度影响性能的因素

Erlang的port相当于系统的IO，打开了Erlang世界通往外界的通道，可以很方便的执行外部程序。但是open_port的性能对整个系统来讲非常的重要，我就带领大家看看open_port影响性能的因素。

首先看下open_port的文档：

{spawn, Command}

Starts an external program. Command is the name of the external program which will be run. Command runs outside the Erlang work space unless an Erlang driver with the name Command is found. If found, that driver will be started. A driver runs in the Erlang workspace, which means that it is linked with the Erlang runtime system.

When starting external programs on Solaris, the system call vfork is used in preference to fork for performance reasons, although it has a history of being less robust. If there are problems with using vfork, setting the environment variable ERL_NO_VFORK to any value will cause fork to be used instead.

For external programs, the PATH is searched (or an equivalent method is used to find programs, depending on operating system). This is done by invoking the shell och certain platforms. The first space separated token of the command will be considered as the name of the executable (or driver). This (among other things) makes this option unsuitable for running programs having spaces in file or directory names. Use {spawn_executable, Command} instead if spaces in executable file names is desired.

open_port一个外部程序的时候流程大概是这样的：beam.smp先vfork, 子进程调用child_setup程序，做进一步的清理操作。清理完成后才真正exec我们的外部程序。

再来看下open_port实现的代码：
Read more…

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 调优 Tags: ERL_NO_VFORK, open_port, port, vfork

Erlang新添加选项 +zerts_de_busy_limit 控制节点间通讯的数据量

September 21st, 2010 Yu Feng 1 comment

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: Erlang新添加选项 +zerts_de_busy_limit 控制节点间通讯的数据量

erlang节点间通信默认是通过tcp通道进行的, 而且每对节点间只有一个tcp链接,所有的rpc和内置的类似monitor这样的消息也都是通过这个通道进行的. 当数据量过大的时候, 系统就会发出 busy distribution port警告, 同时限制数据的吞吐. 这个值默认是128k.

现在可以通过 erl +zerts_de_busy_limit size 来修改这个值了.
Set the value of erts_de_busy_limit. Larger values can help prevent busy distribution port system messages.
The default limit is 128 kilobytes.

如果在system monitor的时候发现busy dist port, 不妨改大这个值, 这个值的下限是4k.

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索 Tags: busy, dist, port, zerts_de_busy_limit

节点间通讯的通道微调

September 23rd, 2009 Yu Feng 10 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: 节点间通讯的通道微调

erlang节点间通讯是可以配置的，默认的是inet_tcp 。当2个节点要沟通的时候，net_kernel模块会负责建立必要的连接。 inet_tcp会调用底层的gen_tcp进行数据发送接受。 rpc或者节点间的消息交互都是通过这个port出去的。

在分布节点间，有时候会有大量的消息流动，那么所有的消息都是通过这个port出去进来，所以这个port的性能极大的影响了节点间通讯的效率。那么有时候，我们会想微调这个port的参数，根据业务的特点实现效率最大化，但是port如何得到呢？

node_port(Node)->
    {_, Owner}=lists:keyfind(owner, 1, element(2, net_kernel:node_info(Node))),
    hd([P|| P<-erlang:ports(), erlang:port_info(P, connected) == {connected,Owner}])

有了Port, 那么我们就可以设置tcp port的水位线，buffer等等。

inet:setopts(node_port('xx@nd-desktop'), [{high_watermark, 131072}]).

另外要注意 nodeup nodedown可能会换了个tcp链接要注意重新获取。

还有另外一种方法，设置所有gen_tcp的行为，比如以下方法：

erl -kernel inet_default_connect_options '[{sndbuf, 1048576}, {high_watermark, 131072}]'

但是这个影响面非常大，影响到正常tcp的参数了。

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索 Tags: net_kernel, node, port

高強度的port(Pipe)的性能測試

September 13th, 2009 Yu Feng 3 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: 高強度的port(Pipe)的性能測試

在我的項目里面, 很多運算logic是由外部的程序來計算的那么消息先透過pipe發到外部程序,外部程序讀到消息, 處理消息, 寫消息, erlang程序讀到消息, 這條鏈路很長,而且涉及到pipe讀寫,上下文切換,這個開銷是很大的.但是具體是多少呢?

我設計了個這樣的ring. 每個ring有N個環組成, 每個環開個port. 當ring收到個數字的時候如果數字不為0, 那么把這個數字發到外部成程序,這個外部程序echo回來數字,收到echo回來的消息后,把數字減1,繼續傳遞.當數字減少到0的時候銷毀整個ring.
/* 注意這個數字非常重要它影響了Erlang程序3個地方 1. epoll的句柄集大小 2. MAX_PORT 以及port的表格大小 3. open_port的時候子進程關閉的文件句柄大小*/

root@nd-desktop:~/test#ulimit -n 1024 
root@nd-desktop:~/test# cat pipe_ring.erl

-module(pipe_ring). 

-export([start/1]). 
-export([make_relay/1, run/3]). 

make_relay(Next)-> 
    Port = open_port({spawn, "/bin/cat"}, [in, out, {line, 128}]), 
    relay_loop(Next, Port). 

relay_loop(Next, Port) -> 
    receive 
        {Port, {data, {eol, Line}}} -> 
            Next ! (list_to_integer(Line) - 1), 
            relay_loop(Next, Port); 
        K when is_integer(K) andalso K > 0 -> 
            port_command(Port, integer_to_list(K) ++ "\n"), 
            relay_loop(Next, Port); 
        K when is_integer(K) andalso K =:=0 -> 
            port_close(Port), 
            Next ! K 
end. 

build_ring(K, Current, N, F) when N > 1 -> 
    build_ring(K, spawn(?MODULE, make_relay, [Current]), N - 1, F); 

build_ring(_, Current, _, F) -> 
    F(), 
    make_relay(Current). 

run(N, K, Par) -> 
    Parent = self(), 
    Cs = [spawn(fun ()-> Parent!run1(N, K, P) end) || P<-lists:seq(1, Par)], 
    [receive _-> ok end || _<-Cs]. 
    
run1(N, K, P)-> 
    T1 = now(), 
    build_ring(K, self(), N, fun ()-> io:format("(ring~w setup time: ~ws)~n", [P, timer:now_diff(now(), T1) /1000]), self() ! K end). 

start(Args) -> 
    Args1 = [N, K, Par] = [list_to_integer(atom_to_list(X)) || X<-Args], 
    {Time, _} = timer:tc(?MODULE, run, Args1), 
    io:format("(total run (N:~w K:~w Par:~w) ~wms ~w/s)~n", [N, K, Par, round(Time/1000), round(K*Par*1000000/Time)]), 
    halt(0).

root@nd-desktop:~/test# erl +Bd -noshell +K true -smp disable -s pipe_ring start 10 100000 8 
(ring1 setup time: 0.021s) 
(ring2 setup time: 0.02s) 
(ring3 setup time: 0.019s) 
(ring4 setup time: 0.03s) 
(ring5 setup time: 0.018s) 
(ring6 setup time: 0.031s) 
(ring7 setup time: 0.027s) 
(ring8 setup time: 0.039s) 
(total run (N:10 K:100000 Par:8) 23158ms 34546/s)

參數的意義:
N K Par
N：ring有幾個環每個環開一個port
K：每個環傳遞多少消息
Par: 多少ring一起跑

總的消息數是 K * Par.

我們可以看到每秒可以處理大概 3.4W個消息我有2個核心. 也就是說每個消息的開銷大概是 30us. 每個port的創建時間不算多, 1ms一個.

root@nd-desktop:~/test# dstat 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- 
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
33  18  50   0   0   1|   0     0 | 438B 2172B|   0     0 |5329    33k 
42  11  48   0   0   0|   0     0 | 212B  404B|   0     0 |5729    58k 
41  11  49   0   0   0|   0     0 | 244B 1822B|   0     0 |5540    59k 
40  11  49   0   0   0|   0     0 | 304B  404B|   0     0 |4970    60k

注意上面的csw 達到6W每秒.

root@nd-desktop:~/test# pstree 
├─sshd─┬─sshd─┬─bash───pstree 
     │      │      └─bash───man───pager 
     │      ├─sshd───bash─┬─beam─┬─80*[cat] 
     │      │             │      └─{beam} 
     │      │             └─emacs 
     │      ├─sshd───bash───emacs 
     │      └─sshd───bash───nmon

我們運行了80個echo程序(/bin/cat)

讀者有興趣的話可以用systemtap 詳細了解 pipe的讀寫花費,以及context_switch情況, 具體腳本可以向我索要.

root@nd-desktop:~# cat /proc/cpuinfo 
processor       : 1 
vendor_id       : GenuineIntel 
cpu family      : 6 
model           : 23 
model name      : Pentium(R) Dual-Core  CPU      E5200  @ 2.50GHz 
stepping        : 6 
cpu MHz         : 1200.000 
cache size      : 2048 KB 
physical id     : 0 
siblings        : 2 
core id         : 1 
cpu cores       : 2 
apicid          : 1 
initial apicid  : 1 
fdiv_bug        : no 
hlt_bug         : no 
f00f_bug        : no 
coma_bug        : no 
fpu             : yes 
fpu_exception   : yes 
cpuid level     : 10 
wp              : yes 
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl em 
bogomips        : 4987.44 
clflush size    : 64 
power management:

結論是: 用port的這種架構的開銷是可以接受的.

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索 Tags: pipe, port, ring, spawn

系统技术非业余研究

Archive

R16B port并行机制详解

Erlang gen_tcp相关问题汇编索引

Erlang open_port极度影响性能的因素

Erlang新添加选项 +zerts_de_busy_limit 控制节点间通讯的数据量

节点间通讯的通道微调

高強度的port(Pipe)的性能測試

buy me a coffee.

Recent Posts

Recent Comments

Categories

Blogroll

Archives

Meta