Erlang 新数据类型Map的定位和性能
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: Erlang 新数据类型Map的定位和性能
Erlang R17最大的语言层面的变化莫过是引入 Map数据结构,参见:Erlang R17新特性浅评 还有 这里。
Map相关的细节在EEP 43上,参见 这里。
定位:
A record replacement is just that, a replacement. It’s like asking the question, “What do we have?” instead of “What can we get?” The instant rebuttal would be “What do we need?” I say Maps.
满足的约束:
The new data-type shall have semantics, syntax and operations that:
> provides an association set from key terms to value terms which can be constructed, accessed and updated using language syntax
> can be uniquely distinguished from every other data-type in the language
> has no compile-time dependency for constructing, accessing or updating contents of maps nor for passing maps between modules, processes or over Erlang distribution
> can be used in matching expressions in the language
> has a one-to-one association between printing and parsing the data-type
> has a well defined order between terms of the type and other Erlang types
> has at most O(log N) time complexity in insert and lookup operations, where N is the number of key-value associations.
思遥同学很贴心的写了一篇maps的分析,参看 Erlang 的新数据结构 map 浅析
从源码来分析的话:
/* erl_map.h */ typedef struct map_s { Eterm thing_word; Uint size; Eterm keys; /* tuple */ } map_t; /* map node * * ----------- * Eterm THING * Uint size * Eterm Keys -> {K1, K2, K3, ..., Kn} where n = size * ---- * Eterm V1 * ... * Eterm Vn, where n = size * ----------- */ * maps:get/2 * return value if key *matches* a key in the map * exception bad_key if none matches */ /*erl_map.c */ int erts_maps_get(Eterm key, Eterm map, Eterm *value) { Eterm *ks,*vs; map_t *mp; Uint n,i; mp = (map_t*)map_val(map); n = map_get_size(mp); if (n == 0) return 0; ks = map_get_keys(mp); vs = map_get_values(mp); if (is_immed(key)) { for( i = 0; i < n; i++) { if (ks[i] == key) { *value = vs[i]; return 1; } } } for( i = 0; i < n; i++) { if (EQ(ks[i], key)) { *value = vs[i]; return 1; } } return 0; } int erts_validate_and_sort_map(map_t* mp) { ... }
我们知道map的数据结构非常简单:记录个数,keys, values. keys是以tuple有序存放的。所以插入和移除的代价都很高,查询也是遍历方式进行的。
map数据结构的复杂性和进程字典(put,get)相比都简单很多, 所以map第一天设计就不是为了大量存放kv数据的目的,它的目的是提供一种更好的record的替代品,在细节上改进:支持atom以外的key, 语法和语义更自然。
小结: 如果要性能,请使用ets, dict等数据结构;如果要替代record用的更顺手,用map。
祝玩的开心!
Post Footer automatically generated by wp-posturl plugin for wordpress.
现在的实现还有点糙,EEP里面提到的是insert和lookup最坏O(log n),目前的实现是线性搜索。其实key都是有序的,改成二分搜索就可以实现proposed的复杂度了。
Yu Feng Reply:
March 12th, 2014 at 8:59 pm
确实
map比record是灵活多了,原来record生成的代码是用element()和setelement模拟的,字段位置固定,一旦有不是故意的字段次序变动,所有使用这个record的模块都要重新编译。而且新字段不能动态添加也是个问题。
蛋疼的是lists里面的keyxxx方法全是针对record的,在考虑把record替换成maps的时候想到lists那些方法可能都要替换或者重写,有点犹豫