Home > Erlang探索, 源码分析, 调优 > inet驱动新增加{active,N} socket选项

inet驱动新增加{active,N} socket选项

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: inet驱动新增加{active,N} socket选项

Erlang实现的网络服务器性能是非常高的,一个典型的服务器比如proxy我们可以处理40万个包的进出,链接数在万级别的。当然这么高的网络能力和底层的epoll实现有很大关系。那么通常我们的gen_tcp收到内核协议栈过来完整的封包的时候,有三种方式可以通知到我们,参见inet:setopts文档

{active, true | false | once}
If the value is true, which is the default, everything received from the socket will be sent as messages to the receiving process. If the value is false (passive mode), the process must explicitly receive incoming data by calling gen_tcp:recv/2,3 or gen_udp:recv/2,3 (depending on the type of socket).

If the value is once ({active, once}), one data message from the socket will be sent to the process. To receive one more message, setopts/2 must be called again with the {active, once} option.

When using {active, once}, the socket changes behaviour automatically when data is received. This can sometimes be confusing in combination with connection oriented sockets (i.e. gen_tcp) as a socket with {active, false} behaviour reports closing differently than a socket with {active, true} behaviour. To make programming easier, a socket where the peer closed and this was detected while in {active, false} mode, will still generate the message {tcp_closed,Socket} when set to {active, once} or {active, true} mode. It is therefore safe to assume that the message {tcp_closed,Socket}, possibly followed by socket port termination (depending on the exit_on_close option) will eventually appear when a socket changes back and forth between {active, true} and {active, false} mode. However, when peer closing is detected is all up to the underlying TCP/IP stack and protocol.

Note that {active,true} mode provides no flow control; a fast sender could easily overflow the receiver with incoming messages. Use active mode only if your high-level protocol provides its own flow control (for instance, acknowledging received messages) or the amount of data exchanged is small. {active,false} mode or use of the {active, once} mode provides flow control; the other side will not be able send faster than the receiver can read.

效率最高的当然是{active, true}方式,因为这种实现一个链接只一次epoll_ctl把socket的读事件挂上去,但是这种方式有致命的缺点。因为收到的包是通过消息的方式来通知我们的,完全是异步的。在正常情况下,没啥问题,但是如果我们的服务面对互联网就有很大的风险,如果遭受攻击的时候,对端发送大量的数据包的时候,我们的系统就会异步收到大量的消息,可能会超过我们的进程处理能力。最要命的是,我们无法让包停止下来,最后的结局就是我们的服务器因为缺少内存crash了。所以在实践中,我们都会用{active,once}方式来控制包的接收频率,这样避免了安全的问题,但是带来了性能的问题。每次设定{active,once}都意味着调用一次epoll_ctl。 如果strace我们的程序会发现有大量的epoll_ctl调用,基本上每秒达到QPS的数量。还有个问题也加剧了这个性能退化:erlang只有一个线程会收割epoll_wait事件,如果大量的ctl时间阻塞了事件的收割,网络处理的能力会大大下降。未来的版本官方计划会支持多个线程收割,但是现在还不行。

所以现在问题就来了,性能和安全如何平衡。Erlang出手拯救我们了,见这里

inet driver add {active,N} socket option for TCP, UDP, and SCTP

这个功能在版本R16b03可用。

解决问题的思路很简单:
{active, true}有安全问题, {active, once}太慢, {active,N}我们一次设定来收N个消息包,摊薄epoll_ctl的代价,这样就可以大大缓解性能的压力。

官方的说法:

Note that {active, true} mode provides no flow control; a fast sender could easily overflow the receiver with incoming messages. The same is true of {active, N} mode while the message count is greater than zero. Use active mode only if your high-level protocol provides its own flow control (for instance, acknowledging received messages) or the amount of data exchanged is small. {active, false} mode, use of the {active, once} mode or {active, N} mode with values of N appropriate for the application provides flow control; the other side will not be able send faster than the receiver can read.

如果收了N个包以后,我们会收到{tcp_passive, Socket}消息提示我们socket进入被动模式,需要重新设置active, 挂上epoll事件:

If the socket is in {active, N} mode (see inet:setopts/2 for details) and its message counter drops to 0, the following message is delivered to indicate that the socket has transitioned to passive ({active, false}) mode:
{tcp_passive, Socket}

具体的使用参看:lib/kernel/test/gen_tcp_misc_SUITE.erl 里面写的很清楚。
小结:{active,N}选项大大提升网络能力。
祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. piboyeliu
    November 4th, 2013 at 14:52 | #1

    好东西, 以前就是靠 {atcive, one}来防攻击, 但会把管道塞住变得同步化, 导致处理慢下来。 {active, N} 就可以异步化管道, 又安全。

    [Reply]

  2. kevin
    November 9th, 2013 at 13:57 | #2

    在实际项目中,这个N的取值应该有点考究;要是有一些参考意见就更好,:-)

    [Reply]

    Yu Feng Reply:

    N和实际的业务有关系,如果不考虑安全隐患的话,可以取很大。我们的做法是看内存和资源消耗的情况,如果正常的话,N就线性加大,如果出现异常的话,N就指数下降。

    [Reply]

  3. sobuh
    December 12th, 2013 at 17:48 | #3

    R16b03 出来了,好像这个参数还是不能用啊

    [Reply]

    bighunter Reply:

    R17是有的

    [Reply]

  1. February 26th, 2014 at 13:02 | #1
  2. June 12th, 2014 at 15:36 | #2