application之染色特性分析和应用

Home > Erlang探索, 源码分析 > application之染色特性分析和应用

August 18th, 2013 Yu Feng

原创文章，转载请注明： 转载自系统技术非业余研究

我们知道典型的erlang虚拟机里面会运行好多application，这些app互相依赖，相互协作，形成一个生态圈。典型场景见下图：

每个app里面都会有很多进程，这些进程为这个app负责，会有些共同特性。那么这些进程如何区分出来属于哪个app的呢？就像我们伟大的祖国，有56个民族一样，这些民族都有自己的文化、服饰，甚至相貌，一看就和其他族群不太一样。他们的基因里面就携带了某种东西，这些东西子子孙孙传下去，一直保持下去。那么同样的，每个app里面的进程就和我们人，一样也会生老病死，也会有生命周期。他们是靠什么来识别的呢？典型的application里面有很多层次的进程，通常成树状，和我们人类的组织差不多，见下图：

我们先来看下application的文档和关键的几个函数：

which_applications() -> [{Application, Description, Vsn}]
Returns a list with information about the applications which are currently running. Application is the application name. Description and Vsn are the values of its description and vsn application specification keys, respectively.

示例如下：

1> application:which_applications().
[{os_mon,”CPO CXC 138 46″,”2.2.9″},
{sasl,”SASL CXC 138 11″,”2.2.1″},
{stdlib,”ERTS CXC 138 10″,”1.18.3″},
{kernel,”ERTS CXC 138 10″,”2.15.3″}]

我们可以看到我们运行的几个app的名字，版本号，描述等基本信息，再细节的就没有了。那第一，二个图中的这些信息是哪里来的呢？

好吧，还有个为公开的info函数，从名字就知道它的作用是获取application模块内部的状态信息的。
示例如下：

2> application:info().
[{loaded,[{os_mon,”CPO CXC 138 46″,”2.2.9″},
{kernel,”ERTS CXC 138 10″,”2.15.3″},
{sasl,”SASL CXC 138 11″,”2.2.1″},
{stdlib,”ERTS CXC 138 10″,”1.18.3″}]},
{loading,[]},
{started,[{os_mon,temporary},
{sasl,temporary},
{stdlib,permanent},
{kernel,permanent}]},
{start_p_false,[]},
{running,[{os_mon,<0.49.0>},
{sasl,<0.39.0>},
{stdlib,undefined},
{kernel,<0.8.0>}]},
{starting,[]}]

从上面的信息可以看到，每个app和它对应的进程组的pid. 那么这些进程组之间有什么关系吗？
我们以os_mon为例来分析下，它的进程组的pid是0.49.0。同时由于os_mon的几个服务disksup,memsup,cpu_sup都登记了名字，透过regs就很容易知道他们的pid。我们来获取下：

3> regs().

** Registered procs on node nonode@nohost **
Name Pid Initial Call Reds Msgs
…
cpu_sup <0.55.0> cpu_sup:init/1 33 0
disksup <0.52.0> disksup:init/1 2608 0
memsup <0.53.0> memsup:init/1 22875 0
os_mon_sup <0.51.0> supervisor:os_mon/1 279 0
…

从图中我们可以看到os_mon共有6个进程，这些进程的pid，目前我们目前知道了5个。还有一个os_mon_sup的父亲我们不知道。
那我们想办法来获取下.首先这些进程都是通过link在一起的，方便互相监督，所以我们就可以从os_mon_sup和进程组的pid的link信息可以得到，交集部分就是那么缺的那个pid.

我来演示下：

11> process_info(whereis(os_mon_sup), links).
{links,[<0.52.0>,<0.53.0>,<0.55.0>,<0.50.0>]}
12> process_info(pid(0, 49, 0), links).
{links,[<0.6.0>,<0.50.0>]}

不费吹灰之力就得到了<0.50.0>。
那么现在这6个进程分别是：<0.49.0>,<0.50.0>,<0.51.0>,<0.52.0>,<0.53.0>,<0.55.0>.
他们中间有什么联系吗？暂时看不出来吧。

不急，我们先来看下文档：

get_application() -> undefined | {ok, Application}
Returns the name of the application to which the process Pid or the module Module belongs. Providing no argument is the same as calling get_application(self()).

If the specified process does not belong to any application, or if the specified process or module does not exist, the function returns undefined.

这个函数很神奇的很够获取到调用该函数的当前pid属于那个app. 很不解。
我们来看下它是如何实现的。上代码：

%application.erl
get_application(Pid) when is_pid(Pid) ->
    case process_info(Pid, group_leader) of
        {group_leader, Gl} ->
            application_controller:get_application(Gl);
        undefined ->
            undefined
    end;

这个函数先是要了group_leader, 然后去找application_controller获取到名字的。我们先学习下group_leader是什么东西。

group_leader() -> pid()
Returns the pid of the group leader for the process which evaluates the function.

Every process is a member of some process group and all groups have a group leader. All IO from the group is channeled to the group leader. When a new process is spawned, it gets the same group leader as the spawning process. Initially, at system start-up, init is both its own group leader and the group leader of all processes.

那么这个group_leader有什么神奇的地方吗？

group_leader最初的设计是为了解决io的问题，具体的来讲，每个进程调用类似io:format的操作的时候，都会把io请求转发到gl去处理，这样就很好的解决了比如在跨机做rpc的时候，日志打印的东西看不到的问题或者说日志收集的问题。
这里group_leader有个很明显的特征，它是继承的！

所以我们看下回头看下os_mon那几个进程的gl情况：

13> process_info(pid(0, 49, 0), group_leader).
{group_leader,<0.49.0>}
14> process_info(pid(0, 50, 0), group_leader).
{group_leader,<0.49.0>}
15> process_info(pid(0, 51, 0), group_leader).
{group_leader,<0.49.0>}
16> process_info(pid(0, 52, 0), group_leader).
{group_leader,<0.49.0>}
17> process_info(pid(0, 53, 0), group_leader).
{group_leader,<0.49.0>}
18> process_info(pid(0, 54, 0), group_leader).
{group_leader,<0.49.0>}
19> process_info(pid(0, 55, 0), group_leader).
{group_leader,<0.49.0>}

看出来了吧，每个进程的gl都是<0.49.0>, 包括<0.49.0>本身。

所以app染色的最关键的点是：代表这个app的第一个进程在创建的时候，把gl设成它自己，然后新创建的进程，子子孙孙都会继承下去，形成唯一的特征。
我们来看代码验证下：

%application_master.erl
%%%-----------------------------------------------------------------
%%% The logical and physical process structrure is as follows:
%%%
%%%         logical                physical
%%%
%%%         --------               --------
%%%         |AM(GL)|               |AM(GL)|
%%%         --------               --------
%%%            |                       |
%%%         --------               --------
%%%         |Appl P|               |   X  |
%%%         --------               --------
%%%                                    |
%%%                                --------
%%%                                |Appl P|
%%%                                --------
%%%
%%% Where AM(GL) == Application Master (Group Leader)
%%%       Appl P == The application specific root process (child to AM)
%%%       X      == A special 'invisible' process
%%% The reason for not using the logical structrure is that
%%% the application start function is synchronous, and
%%% that the AM is GL.  This means that if AM executed the start
%%% function, and this function uses spawn_request/1
%%% or io, deadlock would occur.  Therefore, this function is
%%% executed by the process X.  Also, AM needs three loops;
%%% init_loop (waiting for the start function to return)
%%% main_loop
%%% terminate_loop (waiting for the process to die)
%%% In each of these loops, io and other requests are handled.
%%%-----------------------------------------------------------------
%%% Internal functions
%%%-----------------------------------------------------------------
init(Parent, Starter, ApplData, Type) ->
    link(Parent),
    process_flag(trap_exit, true),
    OldGleader = group_leader(),
    group_leader(self(), self()),
    %% Insert ourselves as master for the process.  This ensures that
    %% the processes in the application can use get_env/1 at startup.
    Name = ApplData#appl_data.name,
    ets:insert(ac_tab, {{application_master, Name}, self()}),
    State = #state{appl_data = ApplData, gleader = OldGleader},
    case start_it(State, Type) of
        {ok, Pid} ->          % apply(M,F,A) returned ok
            set_timer(ApplData#appl_data.maxT),
            unlink(Starter),
            proc_lib:init_ack(Starter, {ok,self()}),
            main_loop(Parent, State#state{child = Pid});
        {error, Reason} ->    % apply(M,F,A) returned error
            exit(Reason);
        Else ->               % apply(M,F,A) returned erroneous
            exit(Else)
    end.

代码和注释都非常好的解释了这点！ yeah!

可是我们之前说过group_leader这个特性是继承的，读者会问这有证据吗？
当然有了，而且这个动作应该是发生在进程创建的时间点。我们同样找代码验证下：

/* erl_process.c*/
Eterm
erl_create_process(Process* parent, /* Parent of process (default group leader). */
                   Eterm mod,   /* Tagged atom for module. */
                   Eterm func,  /* Tagged atom for function. */
                   Eterm args,  /* Arguments for function (must be well-formed list). */
                   ErlSpawnOpts* so) /* Options for spawn. */
{
...

    ASSERT(is_pid(parent->group_leader));

    if (parent->group_leader == ERTS_INVALID_PID)
        p->group_leader = p->id;
    else {
        /* Needs to be done after the heap has been set up */
        p->group_leader =
            IS_CONST(parent->group_leader)
            ? parent->group_leader
            : STORE_NC(&p->htop, &p->off_heap, parent->group_leader);
    }
...
}

很明显，新创建的进程会采用父进程的group_leader。当然除了group_leader外，能继承的还有erlang:trace的某些标志，读者可以自己去分析。

到处为止，染色的原理分析好了。现在讲讲这个染色的用途。

get_env(Par) -> undefined | {ok, Val}
get_env(Application, Par) -> undefined | {ok, Val}

Returns the value of the configuration parameter Par for Application. If the application argument is omitted, it defaults to the application of the calling process.

If the specified application is not loaded, or the configuration parameter does not exist, or if the process executing the call does not belong to any application, the function returns undefined.

比如说我们要获取app的环境变量的时候，我们会调用application:get_env(someapp, somekey). 来获取，但是如果这段代码是作为公共库的代码的时候，那写上someapp就很蛋疼了。那么我们会利用染色这个东西，我们会写application:get_env(somekey)，让调用的进程属于那个应用自己来补充，代码也更有通用性。

再给个利用染色特性的代码供大家参考：

get_old_code_process_num(AppName) ->
   Processes = [Pid || Pid <- processes(),
              application:get_application(Pid) =:= {ok, AppName}],
   {ok, Mods} = application:get_key(AppName, modules),
   OldNum = length([Pid || Pid <- Processes, is_using_old_code(Pid, Mods)]),
   TotalNum = length(Processes),
   case TotalNum of
        0 -> {0, 0, 0};
        _ ->
           Ratio  =  round(OldNum / TotalNum * 10000),
           {OldNum, TotalNum, Ratio}
   end.
is_using_old_code(_, []) -> false;
is_using_old_code(Pid, [H|T]) ->
   erlang:check_process_code(Pid, H) orelse is_using_old_code(Pid, T).

小结：架构很多时候是参照人类社会的。

祝玩得开心！

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags: application, get_application, group_leader

Comments are closed.

Erlang heart – 高可靠性的最后防线 Erlang新增全面的系统信息收集器-system_information模块

系统技术非业余研究