Erlang节点重启导致的incarnation问题
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: Erlang节点重启导致的incarnation问题
今天晚上mingchaoyan同学在线上问以下这个问题:
152489 =ERROR REPORT==== 2013-06-28 19:57:53 ===
152490 Discarding message {send,<<19 bytes>>} from <0.86.1> to <0.6743.0> in an old incarnation (1 ) of this node (2)
152491
152492
152493 =ERROR REPORT==== 2013-06-28 19:57:55 ===
152494 Discarding message {send,<<22 bytes>>} from <0.1623.1> to <0.6743.0> in an old incarnation (1) of this node (2我们中午服务器更新后,日志上满屏的这些错误,请问您有遇到过类似的错误吗?或者提过些定位问题,解决问题的思路,谢谢
这个问题有点意思,从日志提示来再结合源码来看,马上我们就可以找到打出这个提示的地方:
/*bif.c*/ Sint do_send(Process *p, Eterm to, Eterm msg, int suspend) { Eterm portid; ... } else if (is_external_pid(to)) { dep = external_pid_dist_entry(to); if(dep == erts_this_dist_entry) { erts_dsprintf_buf_t *dsbufp = erts_create_logger_dsbuf(); erts_dsprintf(dsbufp, "Discarding message %T from %T to %T in an old " "incarnation (%d) of this node (%d)\n", msg, p->id, to, external_pid_creation(to), erts_this_node->creation); erts_send_error_to_logger(p->group_leader, dsbufp); return 0; } .. }
触发这句警告提示必须满足以下条件:
1. 目标Pid必须是external_pid。
2. 该pid归宿的外部节点所对应的dist_entry和当前节点的dist_entry相同。
通过google引擎,我找到了和这个描述很相近的问题:参见 这里 ,该作者很好的描述和重现了这个现象,但是他没有解释出具体的原因。
好,那我们顺着他的路子来重新下这个问题.
但演示之前,我们先巩固下基础,首先需要明白pid的格式:
可以参见这篇文章:
pid的核心内容摘抄如下:
Printed process ids < A.B.C > are composed of [6]:
A, the node number (0 is the local node, an arbitrary number for a remote node)
B, the first 15 bits of the process number (an index into the process table) [7]
C, bits 16-18 of the process number (the same process number as B) [7]
再参见Erlang External Term Format 文档的章节9.10
描述了PID_EXT的组成:
1 N 4 4 1
103 Node ID Serial Creation
Table 9.16:
Encode a process identifier object (obtained from spawn/3 or friends). The ID and Creation fields works just like in REFERENCE_EXT, while the Serial field is used to improve safety. In ID, only 15 bits are significant; the rest should be 0.
Post Footer automatically generated by wp-posturl plugin for wordpress.
Recent Comments