Home > Erlang探索, 源码分析 > 进程死亡原因调查:被杀?

进程死亡原因调查:被杀?

July 25th, 2013

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: 进程死亡原因调查:被杀?

最近MySQL平台化系统都是用热升级来更新的,在线上的日志发现类似的crashlog:

2013-07-24 23:54:06 =ERROR REPORT====
** Generic server <0.31760.980> terminating
** Last message in was {‘EXIT’,<0.29814.980>,killed}
** When Server state == {state,”app873″,false,172683,33,<<"app873">>,<0.29814.980>,ump_proxy_session,1,[59,32,204,78,86,208,242,122,240,207,269,79,80],[],2,true,<<>>,0,0,{conn_info,{10,246,161,112},10145,”app873″,”8813684fc05fb6cd”},<0.31762.980>,1,{conn_info,{172,18,134,8},10085,”app873″,”8813684fc05fb6cd”},ump_proxy_cherly_server,<0.31763.980>,1,undefined,<<>>,false,true,[],{dict,2,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[[{session,names},103,98,107]],[],[],[],[],[],[],[[{session,”character_set_results”},78,85,76,76]],[],[],[],[],[],[],[]}}},1,0,0,0,0,0,200,1374681206757732}
** Reason for termination ==
** killed

其中Reason是killed, 有点困扰。

我们知道在热升级的时候会purge用旧代码的进程, purge的时候发现有必要就会exit(P, kill)让进程死亡,但是怎么kill变成了killed呢?

我们看下release_handler和code的代码,验证我们的判断:

%%release_handler_1.erl
eval({purge, Modules}, EvalState) ->
    % Now, if there are any processes still executing old code, OR
    % if some new processes started after suspend but before load,
    % these are killed.
    lists:foreach(fun(Mod) -> code:purge(Mod) end, Modules),
    EvalState;

%%code_server.erl
do_purge([P|Ps], Mod, Purged) ->
    case erlang:check_process_code(P, Mod) of
        true ->
            Ref = erlang:monitor(process, P),
            exit(P, kill),
            receive
                {'DOWN',Ref,process,_Pid,_} -> ok
            end,
            do_purge(Ps, Mod, true);
        false ->
            do_purge(Ps, Mod, Purged)
    end;

release_handler最终确实是调用了exit(P, kill)杀人,可是为什么对端收到killed死因呢?

再深入调查下原因,我们知道exit是虚拟机内部的实现,简单的grep下killed就可以看到send_exit_signal这个执行函数:

/*erl_process.c:L8279*/
static ERTS_INLINE int
send_exit_signal(Process *c_p,          /* current process if and only                                                    
                                           if reason is stored on it */
                 Eterm from,            /* Id of sender of signal */
                 Process *rp,           /* receiving process */
                 ErtsProcLocks *rp_locks,/* current locks on receiver */
                 Eterm reason,          /* exit reason */
                 Eterm exit_tuple,      /* Prebuild exit tuple                                                            
                                           or THE_NON_VALUE */
                 Uint exit_tuple_sz,    /* Size of prebuilt exit tuple                                                    
                                           (if exit_tuple != THE_NON_VALUE) */
                 Eterm token,           /* token */
                 Process *token_update, /* token updater */
                 Uint32 flags           /* flags */
    )
{
 Eterm rsn = reason == am_kill ? am_killed : reason;
...
}

这下清楚了,原来如果死亡原因是kill,那么运行期会好意改成killed.

再回到文档:

exit(Pid, Reason) -> true

Types:

Pid = pid() | port()
Reason = term()
Sends an exit signal with exit reason Reason to the process or port identified by Pid.

The following behavior apply if Reason is any term except normal or kill:

If Pid is not trapping exits, Pid itself will exit with exit reason Reason. If Pid is trapping exits, the exit signal is transformed into a message {‘EXIT’, From, Reason} and delivered to the message queue of Pid. From is the pid of the process which sent the exit signal. See also process_flag/2.

If Reason is the atom normal, Pid will not exit. If it is trapping exits, the exit signal is transformed into a message {‘EXIT’, From, normal} and delivered to its message queue.

If Reason is the atom kill, that is if exit(Pid, kill) is called, an untrappable exit signal is sent to Pid which will unconditionally exit with exit reason killed.

文档写的很清楚,只是我们看的时候没在意。

小结:个人感觉有点多此一举,忠实于用户的指示比较好,不至于造成困扰。

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags:
  1. littlepeng
    July 20th, 2014 at 22:14 | #1

    以前一直也困扰 exit(Pid, kill) 过去变成了killed。

    不知道有没有在热升级中碰到下面问题:
    emulator Discarding message {stop,,0} from to in an old incarnation (3) of this node (1)
    翻了下beam源码,会输出这个错误的两处都是 to 不是本地节点时,这个日志明显是本地的Pid,搞不明白挖。

    littlepeng Reply:

    好吧,评论里面pid 被html编码了,我再发一遍 emulator Discarding message {stop,<0.24230.6>,0} from <0.24230.6> to <0.24726.7195> in an old incarnation (3) of this node (1)

    Yu Feng Reply:

    我有一篇写old incarnation的博客

Comments are closed.