Home > Erlang探索 > Erlang match_spec引擎介绍和应用

Erlang match_spec引擎介绍和应用

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: Erlang match_spec引擎介绍和应用

match_spec是什么呢?

A “match specification” (match_spec) is an Erlang term describing a small “program” that will try to match something (either the parameters to a function as used in the erlang:trace_pattern/2 BIF, or the objects in an ETS table.). The match_spec in many ways works like a small function in Erlang, but is interpreted/compiled by the Erlang runtime system to something much more efficient than calling an Erlang function. The match_spec is also very limited compared to the expressiveness of real Erlang functions.

具体参见这里
说白了它就是个erlang term得过滤器,可以让用户来自己选择需要匹配什么,需要从term里面抽取什么数据。那同学可能就有疑问了,Erlang的函数不是很强大吗,它能做的函数也能做,那为什么要重新费劲做一个呢?
Erlang实现这个match_spec得原因有2个:1. 运行效率 2. 小巧可以在运行期使用。

它的实现思路是: match_spec是个引擎,有自己的语法,先把语句编译成专用的opcode, 然后在在匹配的时候运行opcode,获取结果,可以理解为erlang的DSL。

接下来我带大家先感性的认识下这个DSL:

// erl_db_util.c:L5038
#ifdef DMC_DEBUG

/*                                                                                                                           
** Disassemble match program                                                                                                 
*/
void db_match_dis(Binary *bp)
{
...
}
#endif /* DMC_DEBUG */

从上面代码我们知道如何编译出一个带matchspec反汇编功能的beam.smp,步骤如下:

$ cd otp
$ export ERL_TOP=`pwd`
$ cd erts
$ make debug FLAVOR=smp DMC_DEBUG=1
$ sudo make install

有了支持反汇编的运行期,我们来试验看下matchspec的opcode:

$ erl
Erlang R14B04 (erts-5.8.5) [/source] [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.5  (abort with ^G)
1> Spec=ets:fun2ms(fun({A, tag}) when A>1 ->A end).
[{{'$1',tag},[{'>','$1',1}],['$1']}]
%%指示打开反汇编标志
2> erlang:match_spec_test({1234,tag}, Spec, dis).  
true
3> erlang:match_spec_test({1234,tag}, Spec, table).
Tuple	2
Bind	1
Eq  	tag
PushC	1
PushV	1
Call2	'>'
True
Catch
PushVResult	1
Return
Halt


term_save: {}
num_bindings: 2
heap_size: 24
stack_offset: 12
text: 0x006e3ecc
stack_size: 12 (words)
{ok,1234,[],[]}
4> 

可以看出它有自己的opcode,有自己的vm, 而且语法看起来很不直观。为了方便大家的使用,Erlang提供了一种方法把函数翻译成match_spec. 这个模块就是是ms_transform, 参见这里
我们通常通过dbg:fun2ms或者ets:fun2ms来翻译的,我们来看下如何实现的:
看下ms_transform.hrl代码只有一句:

-compile({parse_transform,ms_transform}).
%%ms_transform.erl
copy({call,Line,{remote,_Line2,{atom,_Line3,ets},{atom,_Line4,fun2ms}},
      As0},Bound) ->
    {transform_call(ets,Line,As0,Bound),Bound};
copy({call,Line,{remote,_Line2,{record_field,_Line3,
                                {atom,_Line4,''},{atom,_Line5,ets}},
                 {atom,_Line6,fun2ms}}, As0},Bound) ->
    %% Packages...                                                                                                           
    {transform_call(ets,Line,As0,Bound),Bound};
copy({call,Line,{remote,_Line2,{atom,_Line3,dbg},{atom,_Line4,fun2ms}},
      As0},Bound) ->
    {transform_call(dbg,Line,As0,Bound),Bound};

从代码可以看出,ms_transform只对dbg:fun2ms或者ets:fun2ms做matchspec的编译期计算,
我们来验证下我们的理解:

$ cat veri.erl
-module(veri).
-export([start/0]).
-include_lib("stdlib/include/ms_transform.hrl").

start()->
    ets:fun2ms(fun({M,N}) when N > 3 -> M end).

$ erlc +"'S'" veri.erl
$ cat veri.S
{module, veri}.  %% version = 0

{exports, [{module_info,0},{module_info,1},{start,0}]}.

{attributes, []}.

{labels, 7}.


{function, start, 0, 2}.
  {label,1}.
    {func_info,{atom,veri},{atom,start},0}.
  {label,2}.
    {move,{literal,[{{'$1','$2'},[{'>','$2',3}],['$1']}]},{x,0}}.
    return.


{function, module_info, 0, 4}.
  {label,3}.
    {func_info,{atom,veri},{atom,module_info},0}.
  {label,4}.
    {move,{atom,veri},{x,0}}.
    {call_ext_only,1,{extfunc,erlang,get_module_info,1}}.


{function, module_info, 1, 6}.
  {label,5}.
    {func_info,{atom,veri},{atom,module_info},1}.
  {label,6}.
    {move,{x,0},{x,1}}.
    {move,{atom,veri},{x,0}}.
    {call_ext_only,2,{extfunc,erlang,get_module_info,2}}.

$ erl 
Erlang R14B04 (erts-5.8.5)  [smp:2:2] [rq:2] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.8.5  (abort with ^G)
1> veri:start().
[{{'$1','$2'},[{'>','$2',3}],['$1']}]

从汇编码,我们可以看出 ets:fun2ms(fun({M,N}) when N > 3 -> M end). 的结果是编译期间已经出来了,也就是说它在运行期不会占用执行时间的,用的时候不用担心性能问题的。

那么如何使用呢?ETS给我们很好的例子:

match_spec_compile(MatchSpec) -> CompiledMatchSpec

Types:
MatchSpec = match_spec()
CompiledMatchSpec = comp_match_spec()

This function transforms a match_spec into an internal representation that can be used in subsequent calls to ets:match_spec_run/2. The internal representation is opaque and can not be converted to external term format and then back again without losing its properties (meaning it can not be sent to a process on another node and still remain a valid compiled match_spec, nor can it be stored on disk). The validity of a compiled match_spec can be checked using ets:is_compiled_ms/1.

If the term MatchSpec can not be compiled (does not represent a valid match_spec), a badarg fault is thrown.
Note

This function has limited use in normal code, it is used by Dets to perform the dets:select operations.

match_spec_run(List,CompiledMatchSpec) -> list()

Types:
List = [ tuple() ]
CompiledMatchSpec = comp_match_spec()

This function executes the matching specified in a compiled match_spec on a list of tuples. The CompiledMatchSpec term should be the result of a call to ets:match_spec_compile/1 and is hence the internal representation of the match_spec one wants to use.

The matching will be executed on each element in List and the function returns a list containing all results. If an element in List does not match, nothing is returned for that element. The length of the result list is therefore equal or less than the the length of the parameter List. The two calls in the following example will give the same result (but certainly not the same execution time…):

Table = ets:new…
MatchSpec = ….
% The following call…
ets:match_spec_run(ets:tab2list(Table),
ets:match_spec_compile(MatchSpec)),
% …will give the same result as the more common (and more efficient)
ets:select(Table,MatchSpec),

Note

This function has limited use in normal code, it is used by Dets to perform the dets:select operations and by Mnesia during transactions.

$ grep -rin db_match_dbterm .
./erl_db_hash.c:1307:       (match_res = db_match_dbterm(&tb->common, p, mp, all_objects,
./erl_db_hash.c:1472:           match_res = db_match_dbterm(&tb->common, p, mpi.mp, 0,
./erl_db_hash.c:1638:           if (db_match_dbterm(&tb->common, p, mpi.mp, 0,
./erl_db_hash.c:1787:       if (db_match_dbterm(&tb->common, p, mpi.mp, 0,
./erl_db_hash.c:1898:       if (db_match_dbterm(&tb->common, p, mp, 0,
./erl_db_hash.c:1998:       if (db_match_dbterm(&tb->common, p, mp, 0, &current->dbterm,
./erl_db_tree.c:2971:    ret = db_match_dbterm(&tb->common,sc->p,sc->mp,sc->all_objects,
./erl_db_tree.c:3004:    ret = db_match_dbterm(&tb->common, sc->p, sc->mp, 0,
./erl_db_tree.c:3036:    ret = db_match_dbterm(&tb->common, sc->p, sc->mp, sc->all_objects,
./erl_db_tree.c:3073:    ret = db_match_dbterm(&tb->common, sc->p, sc->mp, 0,

ets可以参考我之前写的erlang数据库 ETS 工作原理分析 http://mryufeng.iteye.com/blog/113856

总结:match_spec可以用在我们自己实现的driver模块里面用来做过滤和匹配。

祝玩得开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. genesislive
    September 3rd, 2013 at 13:14 | #1

    普通的record的列表有没有match spec ?

    [Reply]

    Yu Feng Reply:

    可以的,类似ets:match_spec_run(List, CompiledMatchSpec) -> list()

    [Reply]

  1. No trackbacks yet.