system_monitor | 系统技术非业余研究

R16B03提供long_schedule监控阻塞调度器的行为

October 30th, 2013 Yu Feng Comments off

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: R16B03提供long_schedule监控阻塞调度器的行为

Erlang很关键的一个能力就是软实时，可以显著提高应用的QOS。为什么可以做到软实时呢？参看这篇译文。这里面有二个东西没处理好会破坏erlang的公平调度机制：
1. BIF，靠trap机制来让出执行。
2. NIF，靠减少reductions来让出执行。

这二个机制都运行用户自己写的c代码来扩展erlang vm的功能。这些代码是跑在虚拟机的调度线程里头的，一旦每次处理太多东西，或者死锁什么的，会阻塞调度器,导致VM挂起，问题还是比较严重的。

erlang在最新的R16B03的版本中，很贴心的提供了long_schedule监控，让用户来提前发现这个问题并且解决这个问题。我摘抄下long_schedule的描述：

erlang:system_monitor(Arg) -> MonSettings

{long_schedule, Time}
If a process or port in the system runs uninterrupted for at least Time wall clock milliseconds, a message {monitor, PidOrPort, long_schedule, Info} is sent to MonitorPid. PidOrPort is the process or port that was running and Info is a list of two-element tuples describing the event. In case of a pid(), the tuples {timeout, Millis}, {in, Location} and {out, Location} will be present, where Location is either an MFA ({Module, Function, Arity}) describing the function where the process was scheduled in/out, or the atom undefined. In case of a port(), the tuples {timeout, Millis} and {port_op,Op} will be present. Op will be one of proc_sig, timeout, input, output, event or dist_cmd, depending on which driver callback was executing. proc_sig is an internal operation and should never appear, while the others represent the corresponding driver callbacks timeout, ready_input, ready_output, event and finally outputv (when the port is used by distribution). The Millis value in the timeout tuple will tell you the actual uninterrupted execution time of the process or port, which will always be >= the Time value supplied when starting the trace. New tuples may be added to the Info list in the future, and the order of the tuples in the list may be changed at any time without prior notice.

This can be used to detect problems with NIF’s or drivers that take too long to execute. Generally, 1 ms is considered a good maximum time for a driver callback or a NIF. However, a time sharing system should usually consider everything below 100 ms as “possible” and fairly “normal”. Schedule times above that might however indicate swapping or a NIF/driver that is misbehaving. Misbehaving NIF’s and drivers could cause bad resource utilization and bad overall performance of the system.

github上的提交参看这里，里面的testcase很好的演示了这点。

小结：system monitor能发现好多vm 潜在的问题，需要多挖掘。

祝玩得开心！

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags: long_schedule, system_monitor

Erlang集群RPC通道拥塞问题及解决方案

May 2nd, 2013 Yu Feng 1 comment

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: Erlang集群RPC通道拥塞问题及解决方案

Erlang的集群默认情况下是全联通的，也就是当一个节点加入集群的时候，介绍人会推荐集群里面所有的节点主动来和新加入的节点建立联系，
效果如下图：

我们这次不讲如何避免全联通而是来讲这个节点间通道的问题。

我们知道erlang的消息发送是透明的，只要调用Pid!Msg, 虚拟机和集群的基础设施会保证消息到达指定的进程的消息队列，这个是语义方面的保证。那么如果该Pid是在别的节点，这个消息就会通过节点间的rpc通道来传递。rpc模块就是基于erlang的这个语义在上面实现了远程函数调用。

目前社区推比较推荐erlang服务分层，所以层和层之间的交互基本上透过rpc来进行的。类似下图的分层结构越来越多，当大量的消息在节点间流动的话，势必会造成通道拥塞。

阻塞会导致发送进程被挂起，而rpc是单进程(gen_server)的，被挂起，rpc调用就废了。当然除了RPC， Pid!Msg 这种方式还是可以并行的走的。
这种阻塞极大的影响力系统的rt, 对性能和体验有很大的影响。

那这个问题如何定位、解决呢？Erlang很贴心的提供了一揽子解决方案：

首先是发现问题：

erlang:system_monitor(MonitorPid, Options) -> MonSettings

busy_dist_port
If a process in the system gets suspended because it sends to a process on a remote node whose inter-node communication was handled by a busy port, a message {monitor, SusPid, busy_dist_port, Port} is sent to MonitorPid. SusPid is the pid that got suspended when sending through the inter-node communication port Port.

比如说 riak_sysmon 就用了以下代码：

 BusyDistPortP = get_busy_dist_port(),
    Opts = lists:flatten(
             [[{long_gc, GcMsLimit} || lists:member(gc, MonitorProps)
                                           andalso GcMsLimit > 0],
              [{large_heap, HeapWordLimit} || lists:member(heap, MonitorProps)
                                                  andalso HeapWordLimit > 0],
              [busy_port || lists:member(port, MonitorProps)
                                andalso BusyPortP],
              [busy_dist_port || lists:member(dist_port, MonitorProps)
                                     andalso BusyDistPortP]]),
    _ = erlang:system_monitor(self(), Opts),

当我们收到{monitor, SusPid, busy_dist_port, Port}消息的时候，就可以确认系统经常有阻塞问题。

那么如何解决呢？

社区早就认识到这个问题，所以设计dist_buf_busy_limit是个可配置的值。
Read more…

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags: +zdbbl, busy_dist_port, dist_buf_busy_limit, system_monitor

系统技术非业余研究

Archive

R16B03提供long_schedule监控阻塞调度器的行为

Erlang集群RPC通道拥塞问题及解决方案

buy me a coffee.

Recent Posts

Recent Comments

Categories

Blogroll

Archives

Meta