Yu Feng | 系统技术非业余研究

erlang关键的环境变量

August 23rd, 2013 Yu Feng Comments off

Erlang新增全面的系统信息收集器-system_information模块这个模块列出来以下的环境变量，对系统运行非常关键，你都了解吗？

+%% get known useful erts environment
+
+os_getenv_erts_specific() ->
+ os_getenv_erts_specific([
+ “BINDIR”,
+ “DIALYZER_EMULATOR”,
+ “CERL_DETACHED_PROG”,
+ “EMU”,
+ “ERL_CONSOLE_MODE”,
+ “ERL_CRASH_DUMP”,
+ “ERL_CRASH_DUMP_NICE”,
+ “ERL_CRASH_DUMP_SECONDS”,
+ “ERL_EPMD_PORT”,
+ “ERL_EMULATOR_DLL”,
+ “ERL_FULLSWEEP_AFTER”,
+ “ERL_LIBS”,
+ “ERL_MALLOC_LIB”,
+ “ERL_MAX_PORTS”,
+ “ERL_MAX_ETS_TABLES”,
+ “ERL_NO_VFORK”,
+ “ERL_NO_KERNEL_POLL”,
+ “ERL_THREAD_POOL_SIZE”,
+ “ERLC_EMULATOR”,
+ “ESCRIPT_EMULATOR”,
+ “HOME”,
+ “HOMEDRIVE”,
+ “HOMEPATH”,
+ “LANG”,
+ “LC_ALL”,
+ “LC_CTYPE”,
+ “PATH”,
+ “PROGNAME”,
+ “RELDIR”,
+ “ROOTDIR”,
+ “TERM”,
+ %”VALGRIND_LOG_XML”,
+
+ %% heart
+ “COMSPEC”,
+ “HEART_COMMAND”,
+
+ %% run_erl
+ “RUN_ERL_LOG_ALIVE_MINUTES”,
+ “RUN_ERL_LOG_ACTIVITY_MINUTES”,
+ “RUN_ERL_LOG_ALIVE_FORMAT”,
+ “RUN_ERL_LOG_ALIVE_IN_UTC”,
+ “RUN_ERL_LOG_GENERATIONS”,
+ “RUN_ERL_LOG_MAXSIZE”,
+ “RUN_ERL_DISABLE_FLOWCNTRL”,
+
+ %% driver getenv
+ “CALLER_DRV_USE_OUTPUTV”,
+ “ERL_INET_GETHOST_DEBUG”,
+ “ERL_EFILE_THREAD_SHORT_CIRCUIT”,
+ “ERL_WINDOW_TITLE”,
+ “ERL_ABORT_ON_FAILURE”,
+ “TTYSL_DEBUG_LOG”
+ ]).

翻文档吧，都是很有意思的控制参数。

祝玩得开心！

Categories: Erlang探索, 源码分析 Tags: getenv

再谈crashdump产生注意事项

August 23rd, 2013 Yu Feng Comments off

在前面的博文里面，我们提到了crashdump的作用, 以及看门狗heart的工作原理，我们可以在程序crash后，让heart看门狗重新帮我们拉起来。

这里有几个问题需要注意：
1. 看门狗检查失效的时间，默认是65秒。
2. erlang系统在crash的时候会记录crashdump, 操作系统会产生coredump, 这个时间到底是多长。

代码证明如下：

/* heart.c */
...
/*  Maybe interesting to change */
/* Times in seconds */
#define  HEART_BEAT_BOOT_DELAY       60  /* 1 minute */
#define  SELECT_TIMEOUT               5  /* Every 5 seconds we reset the                                                  
                                            watchdog timer */

/* heart_beat_timeout is the maximum gap in seconds between two                                                           
   consecutive heart beat messages from Erlang, and HEART_BEAT_BOOT_DELAY                                                 
   is the the extra delay that wd_keeper allows for, to give heart a                                                      
   chance to reboot in the "normal" way before the hardware watchdog                                                      
   enters the scene. heart_beat_report_delay is the time allowed for reporting                                            
   before rebooting under VxWorks. */

int heart_beat_timeout = 60;
int heart_beat_report_delay = 30;
int heart_beat_boot_delay = HEART_BEAT_BOOT_DELAY;
...

这二个时间都会影响系统重新启动的间隔时间。
而crashdump的dump文件名、dump时间和优先级由下面几个变量来控制：

ERL_CRASH_DUMP
If the emulator needs to write a crash dump, the value of this variable will be the file name of the crash dump file. If the variable is not set, the name of the crash dump file will be erl_crash.dump in the current directory.

ERL_CRASH_DUMP_NICE
Unix systems: If the emulator needs to write a crash dump, it will use the value of this variable to set the nice value for the process, thus lowering its priority. The allowable range is 1 through 39 (higher values will be replaced with 39). The highest value, 39, will give the process the lowest priority.

ERL_CRASH_DUMP_SECONDS
Unix systems: This variable gives the number of seconds that the emulator will be allowed to spend writing a crash dump. When the given number of seconds have elapsed, the emulator will be terminated by a SIGALRM signal.

If the environment variable is not set or it is set to zero seconds, ERL_CRASH_DUMP_SECONDS=0, the runtime system will not even attempt to write the crash dump file. It will just terminate.

If the environment variable is set to negative valie, e.g. ERL_CRASH_DUMP_SECONDS=-1, the runtime system will wait indefinitely for the crash dump file to be written.

This environment variable is used in conjuction with heart if heart is running:

ERL_CRASH_DUMP_SECONDS=0
Suppresses the writing a crash dump file entirely, thus rebooting the runtime system immediately. This is the same as not setting the environment variable.

ERL_CRASH_DUMP_SECONDS=-1
Setting the environment variable to a negative value will cause the termination of the runtime system to wait until the crash dump file has been completly written.

ERL_CRASH_DUMP_SECONDS=S
Will wait for S seconds to complete the crash dump file and then terminate the runtime system.

如果我们不想产生coredump 可以透过 -env ERL_CRASH_DUMP_SECONDS 0 来关掉，避免产生dump时间过长的悲剧。同时每次crashdump产生的文件名相同，可以在启动通过 -env ERL_CRASH_DUMP erl_crash_date_time.dump 来修改，避免覆盖掉。

祝玩得开心！

Categories: Erlang探索, 源码分析 Tags: crashdump

Erlang heart – 高可靠性的最后防线

August 23rd, 2013 Yu Feng Comments off

我们写的程序不可能都没有bug, 都存在crash的危险。很多时候我们需要个看门狗(watchdog)程序，在发现系统不正常的时候，就把系统重新启动。这类watchdog程序从内核到各种高可用程序都会设置有一个。erlang系统当然不能免俗，也会有几个heart.

我们来看下流程和效果：

$ export HEART_COMMAND="erl -heart"
$ erl -heart
heart_beat_kill_pid = 12640
Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:16:16] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9.3.1  (abort with ^G)
1> os:getpid().
"12640"
2> 
CTRL + Z 挂起erlang

$ pstree -p
…
+-beam.smp(12640)-+-heart(12670)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | … | |-{beam.smp}(12647)
| |-{beam.smp}(12648)
| |-{beam.smp}(12650)
| |-{beam.smp}(12653)
| |-{beam.smp}(12654)
| |-{beam.smp}(12655)
| |-{beam.smp}(12656)
| |-{beam.smp}(12657)
| |-{beam.smp}(12658)
| |-{beam.smp}(12659)
| |-{beam.smp}(12660)
| |-{beam.smp}(12661)
| |-{beam.smp}(12662)
| |-{beam.smp}(12663)
| |-{beam.smp}(12664)
| |-{beam.smp}(12665)
| |-{beam.smp}(12666)
| |-{beam.smp}(12667)
| |-{beam.smp}(12668)
| `-{beam.smp}(12669)
`-pstree(13702)

$ heart: Fri Aug 23 20:36:25 2013: heart-beat time-out, no activity for 65 seconds
heart_beat_kill_pid = 27920

我们看到erl重新被启动起来了。现在简单的分析下原理：
heart由2部份组成：1. 外部程序: heart 2. erlang port模块: heart.erl

当开启heart的时候（erl – heart …) 外部程序heart被erlang模块heart.erl作为独立的进程启动起来，监视emulator的运作. heart.erl 每隔一定的时间向heart外部程序报告状态。当外部heart没有监测到心跳的时候就要采取行动, 重新运行$HEART_COMMAND所指定的命令。
Read more…

Categories: Erlang探索, 源码分析 Tags: heart

application之染色特性分析和应用

August 18th, 2013 Yu Feng Comments off

我们知道典型的erlang虚拟机里面会运行好多application，这些app互相依赖，相互协作，形成一个生态圈。典型场景见下图：

每个app里面都会有很多进程，这些进程为这个app负责，会有些共同特性。那么这些进程如何区分出来属于哪个app的呢？就像我们伟大的祖国，有56个民族一样，这些民族都有自己的文化、服饰，甚至相貌，一看就和其他族群不太一样。他们的基因里面就携带了某种东西，这些东西子子孙孙传下去，一直保持下去。那么同样的，每个app里面的进程就和我们人，一样也会生老病死，也会有生命周期。他们是靠什么来识别的呢？典型的application里面有很多层次的进程，通常成树状，和我们人类的组织差不多，见下图：

我们先来看下application的文档和关键的几个函数：

which_applications() -> [{Application, Description, Vsn}]
Returns a list with information about the applications which are currently running. Application is the application name. Description and Vsn are the values of its description and vsn application specification keys, respectively.

示例如下：

1> application:which_applications().
[{os_mon,”CPO CXC 138 46″,”2.2.9″},
{sasl,”SASL CXC 138 11″,”2.2.1″},
{stdlib,”ERTS CXC 138 10″,”1.18.3″},
{kernel,”ERTS CXC 138 10″,”2.15.3″}]

我们可以看到我们运行的几个app的名字，版本号，描述等基本信息，再细节的就没有了。那第一，二个图中的这些信息是哪里来的呢？

Erlang新增全面的系统信息收集器-system_information模块

July 29th, 2013 Yu Feng Comments off

Erlang其实是个操作系统，从下面的图可以看出它的架构：

erlang系统设计的时候是完全按照unix的理念来平行移植的，它的引导进程叫otp_ring0，第一个进程叫init,有没有感觉很熟悉？
既然是一个操作系统，那么这个系统就是一个非常复杂的系统。erlang vm运行的时候是以unix的进程方式体现的，然后这个进程本身是个小世界，这个世界会跑着很多application,每个application包含配置，数据，模块代码等，多个applcaiton之间相互协作，完成指定的业务目标。

当我们的业务系统出现不按预期执行的时候，那么问题出在哪里呢？我们如何调查呢？
首先erlang当然提供了非常多的调查类的函数，如erlang:system_info,erlang:memory,os:getenv等提供各种各样的信息。但是问题是这些信息是散落在各地的，很难汇集起来提供全面的诊断信息。

官方在最近也认识到了这个问题，在最新的R16版本添加了system_information模块来解决这个问题，具体的patch参考这里

这个patch总的来讲做了三件比较大的事情：

1. Add system information aggregate
2. Add erts app-file
3. erts: Add cflags, ldflags and config.h into executable

除聚合信息以外，还有一个就是把编译beam的时候的配置， cflags, ldflags 什么的都聚合进去。因为beam vm是依靠操作系统提供各种服务的，如锁，原子，epoll，网络等服务，这些服务每个系统都是不太一样的，出了问题很有必要先调查这些结合处。

新增加的system_information模块位于runtime_tools应用中，它的注释里面也写的清楚：

%% The main purpose of system_information is to aggregate all information
%% deemed useful for investigation, i.e. system_information:report/0.

简单的分析下代码，它聚合的信息主要有以下几块：
Read more…

Categories: Erlang探索, 源码分析 Tags: system_information

webtool小问题

July 29th, 2013 Yu Feng Comments off

erlang的观察工具如crashdump,appmon,cover等工具有二种不同的界面：gs(wx)和web。这些tool都遵循一定的接口，用户可以自行扩展这些功能，使的能够融入toolbar或者webtool体系。其中webtool在线上使用的时候比较方便，因为是web界面，容易过防火墙什么的。

webtool的使用界面如下：

但是默认的webtool在启动的时候，默认只在127.0.0.1：8888网络上监听，无法在其他机器上查看状态，这样使用起来很不方便。

演示下：

$ erl
Erlang R15B03 (erts-5.9.3.1) [source] [64-bit] [smp:16:16] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.9.3.1  (abort with ^G)
1> webtool:start().
WebTool is available at http://localhost:8888/
Or  http://127.0.0.1:8888/
{ok,<0.35.0>}
2>

粗粗的研究下webtool的启动代码，可以透过下面的方式来绕开这个问题，注意下面这三个参数都需要的，具体的值用户自己配：

webtool:start(standard_path, [{bind_address, {0,0,0,0}},{port, 8888},{server_name, “foobar”}]).

这种方式默认是全部接口监听：

$ hostname -i
10.232.64.76

看下具体的网卡地址就可以访问到服务了，本例子中是: http://10.232.64.76:8888

祝玩的开心！

Categories: Erlang探索, 源码分析 Tags: webtool

ms()用途浅析

July 27th, 2013 Yu Feng 1 comment

erlang系统的application，稍微复杂一点的都会提供一个ms/0的导出函数，而且这个导出函数通常在文档里面找不到描述，很奇怪不是吗？
比如mnesia就有这样的ms, 我们来看下：

$ erl
Erlang R17A (erts-5.11) [source-b7fbc28] [64-bit] [smp:16:16] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.11  (abort with ^G)
1> mnesia:ms().
[mnesia,mnesia_backup,mnesia_bup,mnesia_checkpoint,
 mnesia_checkpoint_sup,mnesia_controller,mnesia_dumper,
 mnesia_loader,mnesia_frag,mnesia_frag_hash,
 mnesia_frag_old_hash,mnesia_index,mnesia_kernel_sup,
 mnesia_late_loader,mnesia_lib,mnesia_log,mnesia_registry,
 mnesia_schema,mnesia_snmp_hook,mnesia_snmp_sup,
 mnesia_subscr,mnesia_sup,mnesia_text,mnesia_tm,
 mnesia_recover,mnesia_locker,mnesia_monitor,mnesia_event]
2>

看起来貌似只是返回组成mnesia的模块列表而已。
那么它的作用是什么呢？

复杂一点的程序都需要在运行期间来进行观察或者优化，比如说dbg跟踪一个函数或者模块运作的时候，是需要这个模块的名字的，如：

dbg:tp(Module,MatchSpec)

那么如果要跟踪整个application的运作，我们通常会写这样的代码：
[do_some_thing(M) || M<-myapp:ms()]. 所以这就是ms的意义所在。现在的问题是mnesia的代码是把ms的模块硬编码的，这样会带来一个维护的问题，比如添加，改名或者减少一个模块都要记得去修改这个列表，很麻烦。 [erlang] %mnesia.erl ms() -> [ mnesia, mnesia_backup, mnesia_bup, ... ]. [/erlang] 程序员是个很懒的群体，必定不会这么挫的，于是rebar就专门花了点力气把这个事情做的漂亮。 rebar在编译application的时候，会把src/myapp.app.src的内容添加以下内容：

{modules,[mod_a, mod_b,…]}

形成ebin//myapp.app文件，这个文件是每个app必须的!

rebar处理这个事情的核心代码如下：

%%rebar_otp_app.erl
AppVars = load_app_vars(Config1) ++ [{modules, ebin_modules()}],
ebin_modules() ->
    lists:sort([rebar_utils:beam_to_mod("ebin", N) ||
                   N <- rebar_utils:beams("ebin")]).

系统技术非业余研究

Archive

erlang关键的环境变量

再谈crashdump产生注意事项

Erlang heart – 高可靠性的最后防线

application之染色特性分析和应用

Erlang新增全面的系统信息收集器-system_information模块

webtool小问题

ms()用途浅析

buy me a coffee.

Recent Posts

Recent Comments

Categories

Blogroll

Archives

Meta