Home > Erlang探索, 源码分析 > 再谈crashdump产生注意事项

再谈crashdump产生注意事项

August 23rd, 2013

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: 再谈crashdump产生注意事项

在前面的博文里面,我们提到了crashdump的作用, 以及看门狗heart的工作原理,我们可以在程序crash后,让heart看门狗重新帮我们拉起来。

这里有几个问题需要注意:
1. 看门狗检查失效的时间,默认是65秒。
2. erlang系统在crash的时候会记录crashdump, 操作系统会产生coredump, 这个时间到底是多长。

代码证明如下:

/* heart.c */
...
/*  Maybe interesting to change */
/* Times in seconds */
#define  HEART_BEAT_BOOT_DELAY       60  /* 1 minute */
#define  SELECT_TIMEOUT               5  /* Every 5 seconds we reset the                                                  
                                            watchdog timer */

/* heart_beat_timeout is the maximum gap in seconds between two                                                           
   consecutive heart beat messages from Erlang, and HEART_BEAT_BOOT_DELAY                                                 
   is the the extra delay that wd_keeper allows for, to give heart a                                                      
   chance to reboot in the "normal" way before the hardware watchdog                                                      
   enters the scene. heart_beat_report_delay is the time allowed for reporting                                            
   before rebooting under VxWorks. */

int heart_beat_timeout = 60;
int heart_beat_report_delay = 30;
int heart_beat_boot_delay = HEART_BEAT_BOOT_DELAY;
...

这二个时间都会影响系统重新启动的间隔时间。
而crashdump的dump文件名、dump时间和优先级由下面几个变量来控制:

ERL_CRASH_DUMP
If the emulator needs to write a crash dump, the value of this variable will be the file name of the crash dump file. If the variable is not set, the name of the crash dump file will be erl_crash.dump in the current directory.

ERL_CRASH_DUMP_NICE
Unix systems: If the emulator needs to write a crash dump, it will use the value of this variable to set the nice value for the process, thus lowering its priority. The allowable range is 1 through 39 (higher values will be replaced with 39). The highest value, 39, will give the process the lowest priority.

ERL_CRASH_DUMP_SECONDS
Unix systems: This variable gives the number of seconds that the emulator will be allowed to spend writing a crash dump. When the given number of seconds have elapsed, the emulator will be terminated by a SIGALRM signal.

If the environment variable is not set or it is set to zero seconds, ERL_CRASH_DUMP_SECONDS=0, the runtime system will not even attempt to write the crash dump file. It will just terminate.

If the environment variable is set to negative valie, e.g. ERL_CRASH_DUMP_SECONDS=-1, the runtime system will wait indefinitely for the crash dump file to be written.

This environment variable is used in conjuction with heart if heart is running:

ERL_CRASH_DUMP_SECONDS=0
Suppresses the writing a crash dump file entirely, thus rebooting the runtime system immediately. This is the same as not setting the environment variable.

ERL_CRASH_DUMP_SECONDS=-1
Setting the environment variable to a negative value will cause the termination of the runtime system to wait until the crash dump file has been completly written.

ERL_CRASH_DUMP_SECONDS=S
Will wait for S seconds to complete the crash dump file and then terminate the runtime system.

如果我们不想产生coredump 可以透过 -env ERL_CRASH_DUMP_SECONDS 0 来关掉,避免产生dump时间过长的悲剧。同时每次crashdump产生的文件名相同,可以在启动通过 -env ERL_CRASH_DUMP erl_crash_date_time.dump 来修改,避免覆盖掉。

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags:
Comments are closed.