Archive

Posts Tagged ‘crashdump’

heroku_crashdumps避免crashdump覆盖

December 22nd, 2013 Comments off

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: heroku_crashdumps避免crashdump覆盖

erlang虚拟机挂掉的时候会产生个crashdump, 尽可能多的保存当时的现场,相关的使用和配置之前我写了不少的 博文 来介绍。
但是实际使用中的时候,每次crash的时候这个现场文件都叫erl_crash.dump,会把前一次的覆盖掉,不利于生产环境定位问题。

当然我们可以透过环境变量来配置:

ERL_CRASH_DUMP
If the emulator needs to write a crash dump, the value of this variable will be the file name of the crash dump file. If the variable is not set, the name of the crash dump file will be erl_crash.dump in the current directory

或者我们还有heroku_crashdumps项目可以帮我们更省力,项目参见这里,核心代码就这几行:

  case dumpdir() of
        undefined -> ok;
        Dir ->
            File = string:join([os:getenv("INSTANCE_NAME"),
                                "boot",
                                datetime(now)], "_"),
            DumpFile = filename:join(Dir, File),
            error_logger:info_msg("at=setup_crashdumps dumpfile=~p", [DumpFile]),
            os:putenv("ERL_CRASH_DUMP", DumpFile)
    end.

它的价值在于以application方式来进行的,有二个好处:
1. 容易打包。因为包括rebar在内的打包工具都是以app为单位的。
2. 使用起来更简单。 crashdump自动会把日期时间加上去。

具体使用够简单,参见项目说明,我就不罗嗦了,我来演示下效果:

$ git clone https://github.com/heroku/heroku_crashdumps.git
Initialized empty Git repository in /home/chuba/heroku_crashdumps/heroku_crashdumps/.git/
remote: Counting objects: 11, done.
remote: Compressing objects: 100% (8/8), done.
cremote: Total 11 (delta 2), reused 11 (delta 2)
Unpacking objects: 100% (11/11), done.
$ cd heroku_crashdumps/
$ ./rebar compile
==> heroku_crashdumps (compile)
Compiled src/heroku_crashdumps_app.erl

$ ERL_CRASH_DUMP="." INSTANCE_NAME="erl_crash.dump"  erl -pa ebin
Erlang R16B02 (erts-5.10.3) [source] [64-bit] [smp:16:16] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.3  (abort with ^G)
1> heroku_crashdumps_app:start().

=INFO REPORT==== 22-Dec-2013::16:12:41 ===
at=setup_crashdumps dumpfile="./erl_crash.dump_boot_2013-12-22T08:12:41+00:00"true
2> 
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution
A

Crash dump was written to: ./erl_crash.dump_boot_2013-12-22T08:12:41+00:00
Crash dump requested by userAborted
$ ls ./erl_crash.dump_boot_2013-12-22T08:12:41+00:00
./erl_crash.dump_boot_2013-12-22T08:12:41+00:00

小结:偷懒是美德!

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

再谈crashdump产生注意事项

August 23rd, 2013 Comments off

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: 再谈crashdump产生注意事项

在前面的博文里面,我们提到了crashdump的作用, 以及看门狗heart的工作原理,我们可以在程序crash后,让heart看门狗重新帮我们拉起来。

这里有几个问题需要注意:
1. 看门狗检查失效的时间,默认是65秒。
2. erlang系统在crash的时候会记录crashdump, 操作系统会产生coredump, 这个时间到底是多长。

代码证明如下:

/* heart.c */
...
/*  Maybe interesting to change */
/* Times in seconds */
#define  HEART_BEAT_BOOT_DELAY       60  /* 1 minute */
#define  SELECT_TIMEOUT               5  /* Every 5 seconds we reset the                                                  
                                            watchdog timer */

/* heart_beat_timeout is the maximum gap in seconds between two                                                           
   consecutive heart beat messages from Erlang, and HEART_BEAT_BOOT_DELAY                                                 
   is the the extra delay that wd_keeper allows for, to give heart a                                                      
   chance to reboot in the "normal" way before the hardware watchdog                                                      
   enters the scene. heart_beat_report_delay is the time allowed for reporting                                            
   before rebooting under VxWorks. */

int heart_beat_timeout = 60;
int heart_beat_report_delay = 30;
int heart_beat_boot_delay = HEART_BEAT_BOOT_DELAY;
...

这二个时间都会影响系统重新启动的间隔时间。
而crashdump的dump文件名、dump时间和优先级由下面几个变量来控制:

ERL_CRASH_DUMP
If the emulator needs to write a crash dump, the value of this variable will be the file name of the crash dump file. If the variable is not set, the name of the crash dump file will be erl_crash.dump in the current directory.

ERL_CRASH_DUMP_NICE
Unix systems: If the emulator needs to write a crash dump, it will use the value of this variable to set the nice value for the process, thus lowering its priority. The allowable range is 1 through 39 (higher values will be replaced with 39). The highest value, 39, will give the process the lowest priority.

ERL_CRASH_DUMP_SECONDS
Unix systems: This variable gives the number of seconds that the emulator will be allowed to spend writing a crash dump. When the given number of seconds have elapsed, the emulator will be terminated by a SIGALRM signal.

If the environment variable is not set or it is set to zero seconds, ERL_CRASH_DUMP_SECONDS=0, the runtime system will not even attempt to write the crash dump file. It will just terminate.

If the environment variable is set to negative valie, e.g. ERL_CRASH_DUMP_SECONDS=-1, the runtime system will wait indefinitely for the crash dump file to be written.

This environment variable is used in conjuction with heart if heart is running:

ERL_CRASH_DUMP_SECONDS=0
Suppresses the writing a crash dump file entirely, thus rebooting the runtime system immediately. This is the same as not setting the environment variable.

ERL_CRASH_DUMP_SECONDS=-1
Setting the environment variable to a negative value will cause the termination of the runtime system to wait until the crash dump file has been completly written.

ERL_CRASH_DUMP_SECONDS=S
Will wait for S seconds to complete the crash dump file and then terminate the runtime system.

如果我们不想产生coredump 可以透过 -env ERL_CRASH_DUMP_SECONDS 0 来关掉,避免产生dump时间过长的悲剧。同时每次crashdump产生的文件名相同,可以在启动通过 -env ERL_CRASH_DUMP erl_crash_date_time.dump 来修改,避免覆盖掉。

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags: