Home > Erlang探索, 调优 > Erlang open_port极度影响性能的因素

Erlang open_port极度影响性能的因素

November 22nd, 2011 Leave a comment Go to comments

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: Erlang open_port极度影响性能的因素

Erlang的port相当于系统的IO,打开了Erlang世界通往外界的通道,可以很方便的执行外部程序。 但是open_port的性能对整个系统来讲非常的重要,我就带领大家看看open_port影响性能的因素。

首先看下open_port的文档:

{spawn, Command}

Starts an external program. Command is the name of the external program which will be run. Command runs outside the Erlang work space unless an Erlang driver with the name Command is found. If found, that driver will be started. A driver runs in the Erlang workspace, which means that it is linked with the Erlang runtime system.

When starting external programs on Solaris, the system call vfork is used in preference to fork for performance reasons, although it has a history of being less robust. If there are problems with using vfork, setting the environment variable ERL_NO_VFORK to any value will cause fork to be used instead.

For external programs, the PATH is searched (or an equivalent method is used to find programs, depending on operating system). This is done by invoking the shell och certain platforms. The first space separated token of the command will be considered as the name of the executable (or driver). This (among other things) makes this option unsuitable for running programs having spaces in file or directory names. Use {spawn_executable, Command} instead if spaces in executable file names is desired.

open_port一个外部程序的时候流程大概是这样的:beam.smp先vfork, 子进程调用child_setup程序,做进一步的清理操作。 清理完成后才真正exec我们的外部程序。

再来看下open_port实现的代码:

// sys.c:L1352
static ErlDrvData spawn_start(ErlDrvPort port_num, char* name, SysDriverOpts* opts)
{
...
#if !DISABLE_VFORK
    int no_vfork;
    size_t no_vfork_sz = sizeof(no_vfork);

    no_vfork = (erts_sys_getenv("ERL_NO_VFORK",
                                (char *) &no_vfork,
                                &no_vfork_sz) >= 0);
#endif
...
else { /* Use vfork() */
        char **cs_argv= erts_alloc(ERTS_ALC_T_TMP,(CS_ARGV_NO_OF_ARGS + 1)*
                                   sizeof(char *));
        char fd_close_range[44];                  /* 44 bytes are enough to  */
        char dup2_op[CS_ARGV_NO_OF_DUP2_OPS][44]; /* hold any "%d:%d" string */
                                                  /* on a 64-bit machine.    */

        /* Setup argv[] for the child setup program (implemented in                                                                                                     
           erl_child_setup.c) */
        i = 0;
        if (opts->use_stdio) {
            if (opts->read_write & DO_READ){
                /* stdout for process */
                sprintf(&dup2_op[i++][0], "%d:%d", ifd[1], 1);
                if(opts->redir_stderr)
                    /* stderr for process */
                    sprintf(&dup2_op[i++][0], "%d:%d", ifd[1], 2);
            }
            if (opts->read_write & DO_WRITE)
                /* stdin for process */
                sprintf(&dup2_op[i++][0], "%d:%d", ofd[0], 0);
        } else {        /* XXX will fail if ofd[0] == 4 (unlikely..) */
            if (opts->read_write & DO_READ)
                sprintf(&dup2_op[i++][0], "%d:%d", ifd[1], 4);
            if (opts->read_write & DO_WRITE)
                sprintf(&dup2_op[i++][0], "%d:%d", ofd[0], 3);
        }
        for (; i < CS_ARGV_NO_OF_DUP2_OPS; i++)
            strcpy(&dup2_op[i][0], "-");
        sprintf(fd_close_range, "%d:%d", opts->use_stdio ? 3 : 5, max_files-1);

        cs_argv[CS_ARGV_PROGNAME_IX] = child_setup_prog;
        cs_argv[CS_ARGV_WD_IX] = opts->wd ? opts->wd : ".";
        cs_argv[CS_ARGV_UNBIND_IX] = erts_sched_bind_atvfork_child(unbind);
        cs_argv[CS_ARGV_FD_CR_IX] = fd_close_range;
        for (i = 0; i < CS_ARGV_NO_OF_DUP2_OPS; i++)
            cs_argv[CS_ARGV_DUP2_OP_IX(i)] = &dup2_op[i][0];
        if (opts->spawn_type == ERTS_SPAWN_EXECUTABLE) {
            int num = 0;
            int j = 0;
            if (opts->argv != NULL) {
                for(; opts->argv[num] != NULL; ++num)
                    ;
            }
            cs_argv = erts_realloc(ERTS_ALC_T_TMP,cs_argv, (CS_ARGV_NO_OF_ARGS + 1 + num + 1) * sizeof(char *));
            cs_argv[CS_ARGV_CMD_IX] = "-";
            cs_argv[CS_ARGV_NO_OF_ARGS] = cmd_line;
            if (opts->argv != NULL) {
                for (;opts->argv[j] != NULL; ++j) {
                    if (opts->argv[j] == erts_default_arg0) {
                        cs_argv[CS_ARGV_NO_OF_ARGS + 1 + j] = cmd_line;
                    } else {
                        cs_argv[CS_ARGV_NO_OF_ARGS + 1 + j] = opts->argv[j];
                    }
                }
            }
            cs_argv[CS_ARGV_NO_OF_ARGS + 1 + j] = NULL;
        } else {
            cs_argv[CS_ARGV_CMD_IX] = cmd_line; /* Command */
            cs_argv[CS_ARGV_NO_OF_ARGS] = NULL;  
        }
        DEBUGF(("Using vfork\n"));
        pid = vfork();

	if (pid == 0) {
	    /* The child! */

	    /* Observe!                                                                                                      
             * OTP-4389: The child setup program (implemented in                                                             
             * erl_child_setup.c) will perform the necessary setup of the                                                    
             * child before it execs to the user program. This because                                                       
             * vfork() only allow an *immediate* execve() or _exit() in the                                                  
             * child.                                                                                                        
             */
            execve(child_setup_prog, cs_argv, new_environ);
	    _exit(1);
        }
        erts_free(ERTS_ALC_T_TMP,cs_argv);
...
}

在支持vfork的系统下,比如说linux,除非禁止,默认会采用vfork来执行child_setup来调用外部程序。
看下vfork的文档:

vfork() differs from fork() in that the parent is suspended until the child makes a call to execve(2) or _exit(2). The child shares all memory
with its parent, including the stack, until execve() is issued by the child. The child must not return from the current function or call
exit(), but may call _exit().

vfork的时候beam.smp整个进程会被阻塞,所以这里是个很重要的性能影响点。

我们再看下erl_child_setup.c的代码:

// erl_child_setup.c:111
// 1.  取消绑定
if (strcmp("false", argv[CS_ARGV_UNBIND_IX]) != 0)
	if (erts_unbind_from_cpu_str(argv[CS_ARGV_UNBIND_IX]) != 0)
            return 1;
// 2.  复制句柄
 for (i = 0; i < CS_ARGV_NO_OF_DUP2_OPS; i++) {
        if (argv[CS_ARGV_DUP2_OP_IX(i)][0] == '-'
            && argv[CS_ARGV_DUP2_OP_IX(i)][1] == '\0')
            break;
        if (sscanf(argv[CS_ARGV_DUP2_OP_IX(i)], "%d:%d", &from, &to) != 2)
            return 1;
        if (dup2(from, to) < 0)
            return 1;
    }
// 3. 关闭句柄     
if (sscanf(argv[CS_ARGV_FD_CR_IX], "%d:%d", &from, &to) != 2)
        return 1;
    for (i = from; i <= to; i++)
        (void) close(i);

// 4. 调用外部程序
if (erts_spawn_executable) {
        if (argv[CS_ARGV_NO_OF_ARGS + 1] == NULL) {
            execl(argv[CS_ARGV_NO_OF_ARGS],argv[CS_ARGV_NO_OF_ARGS],
                  (char *) NULL);
        } else {
            execv(argv[CS_ARGV_NO_OF_ARGS],&(argv[CS_ARGV_NO_OF_ARGS + 1]));
        }
    } else {
        execl("/bin/sh", "sh", "-c", argv[CS_ARGV_CMD_IX], (char *) NULL);
    }
...

这是一个非常流程多的过程,而且1,2,3这三个步骤都非常的耗时。 特别是3对于一个繁忙的IO服务器来讲,会打开大量的句柄,可能都有几十万,关闭这么多的句柄会是个灾难。

我们来演习下这个流程和具体的性能数字:
首先我们设计个open_port的场景,服务器打开768个socke句柄,再运行cat外部程序。

$ cat demo.erl
-module(demo).
-compile(export_all).

start()->
    _ = [gen_udp:open(0) || _ <- lists:seq(1,768)],
    Port = open_port({spawn, "/bin/cat"}, [in, out, {line, 128}]),
    port_close(Port),
    ok.

我们再准备个stap脚本,用来分析这些行为和性能数字:

$ cat demo.stp
global t0, t1, t2

probe process("beam.smp").function("spawn_start") {
        printf("spawn %\s\n", user_string($name))
        t0 = gettimeofday_us()
}

probe process("beam.smp").statement("*@sys.c:1607") {
        t1 = gettimeofday_ns()
}

probe process("beam.smp").statement("*@sys.c:1627") {
        printf("vfork take %d ns\n", gettimeofday_ns() - t1);
}

probe process("child_setup").function("main") {
        t2 = gettimeofday_us()
}

probe process("child_setup").statement("*@erl_child_setup.c:111") {
        t3 = gettimeofday_us()
        printf("spawn take %d us, child_setup take %d us\n", t3 - t0, t3 - t2) 
}

probe syscall.execve {
        printf("%s, arg %s\n", name, argstr)
}

probe syscall.fork {
        printf("%s, arg %s\n", name, argstr)
}

probe begin {
        println(")");

我们在一个终端下运行stap脚本观察行为:

$ erlc demo.erl
$ PATH=otp/bin/x86_64-unknown-linux-gnu/:$PATH sudo stap demo.stp
)
fork, arg 
execve, arg otp/bin/erl 
fork, arg 
fork, arg 
fork, arg 
execve, arg /bin/sed "s/.*\\///"
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/erlexec 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/beam.smp "--" "-root" "/home/chuba/otp" "-progname" "erl" "--" "-home" "/home/chuba" "--"
clone, arg .
..
clone, arg CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
spawn inet_gethost 4 
fork, arg 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/child_setup "FFFF" "." "exec inet_gethost 4 " "3:327679" "8:1" "9:0" "-"
vfork take 8487 ns
spawn take 173707 us, child_setup take 94535 us
execve, arg /bin/sh "-c" "exec inet_gethost 4 "
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/inet_gethost "4"
fork, arg 
clone, arg CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
clone, arg CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
spawn /bin/cat
fork, arg 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/child_setup "FFFF" "." "exec /bin/cat" "3:327679" "2312:1" "2313:0" "-"
vfork take 5298 ns
spawn take 180974 us, child_setup take 101646 us
execve, arg /bin/sh "-c" "exec /bin/cat"
execve, arg /bin/cat 
spawn /bin/cat
fork, arg 
execve, arg /home/chuba/otp/bin/x86_64-unknown-linux-gnu/child_setup "FFFF" "." "exec /bin/cat" "3:327679" "3080:1" "3081:0" "-"
vfork take 8929 ns
spawn take 169569 us, child_setup take 90163 us
execve, arg /bin/sh "-c" "exec /bin/cat"
execve, arg /bin/cat 
...

在另外一个终端下运行我们的测试案例:

$ otp/bin/erl
Erlang R14B04 (erts-5.8.5) [/source] [64-bit] [smp:16:16] [rq:16] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.5  (abort with ^G)
1> demo:start().
ok
2> demo:start().
ok
3> 

我们可以看到二次执行的开销差不多:
vfork take 8929 ns
spawn take 169569 us, child_setup take 90163 us

从实验得来的数字来看:
vfork需要阻塞beam.smp 8个us时间,而整个spawn下来要169ms, 其中 child_setup关闭句柄等等花了90ms, 数字无情的告诉我们这些性能杀手不容忽视。

解决方案:
1. 改用fork避免阻塞beam.smp, erl -env ERL_NO_VFORK 1
2. 减少文件句柄,如果确实需要大量的open_port让另外一个专注的节点来做。

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. November 23rd, 2011 at 16:37 | #1

    老大最近高产,fork拷贝父进程数据和堆栈是否会造成新的瓶颈?

    [Reply]

    Yu Feng Reply:

    fork和vfork是在linux下是COW,不会拷贝父进程数据和堆栈

    [Reply]

    Jovi Reply:

    同意fork,在这种场景下个人觉得fork会比vfork性能要好(尽管其他大多数情况下vfork胜于fork)
    1)fork时父进程无需schedule,vfork时父进程先sleep,然后被wakeup;
    2)sys.c中从fork到execve这段代码中几乎太多写用户空间内存的操作,所以这里COW的开销相比vfork不算太大;
    3)使用vfork会有两次execve

    [Reply]

    Yu Feng Reply:

    发现问题就好解决了。

  1. No trackbacks yet.