Home > Erlang探索, 源码分析 > Erlang 新数据类型Map的定位和性能

Erlang 新数据类型Map的定位和性能

March 12th, 2014

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: Erlang 新数据类型Map的定位和性能

Erlang R17最大的语言层面的变化莫过是引入 Map数据结构,参见:Erlang R17新特性浅评 还有 这里

Map相关的细节在EEP 43上,参见 这里

定位:

A record replacement is just that, a replacement. It’s like asking the question, “What do we have?” instead of “What can we get?” The instant rebuttal would be “What do we need?” I say Maps.

满足的约束:

The new data-type shall have semantics, syntax and operations that:

> provides an association set from key terms to value terms which can be constructed, accessed and updated using language syntax
> can be uniquely distinguished from every other data-type in the language
> has no compile-time dependency for constructing, accessing or updating contents of maps nor for passing maps between modules, processes or over Erlang distribution
> can be used in matching expressions in the language
> has a one-to-one association between printing and parsing the data-type
> has a well defined order between terms of the type and other Erlang types
> has at most O(log N) time complexity in insert and lookup operations, where N is the number of key-value associations.

思遥同学很贴心的写了一篇maps的分析,参看 Erlang 的新数据结构 map 浅析


从源码来分析的话:

/* erl_map.h */
typedef struct map_s {
    Eterm thing_word;
    Uint  size;
    Eterm keys;      /* tuple */
} map_t;
/* map node                                                                                                                                                                                    
 *                                                                                                                                                                                             
 * -----------                                                                                                                                                                                 
 * Eterm   THING                                                                                                                                                                               
 * Uint    size                                                                                                                                                                                
 * Eterm   Keys -> {K1, K2, K3, ..., Kn} where n = size                                                                                                                                        
 * ----                                                                                                                                                                                        
 * Eterm   V1                                                                                                                                                                                  
 * ...                                                                                                                                                                                         
 * Eterm   Vn, where n = size                                                                                                                                                                  
 * -----------                                                                                                                                                                                 
 */

* maps:get/2                                                                                                                                                                                  
 * return value if key *matches* a key in the map                                                                                                                                              
 * exception bad_key if none matches                                                                                                                                                           
 */

/*erl_map.c */
int erts_maps_get(Eterm key, Eterm map, Eterm *value) {
    Eterm *ks,*vs;
    map_t *mp;
    Uint n,i;

    mp  = (map_t*)map_val(map);
    n   = map_get_size(mp);

    if (n == 0)
        return 0;

    ks  = map_get_keys(mp);
    vs  = map_get_values(mp);

    if (is_immed(key)) {
        for( i = 0; i < n; i++) {
            if (ks[i] == key) {
                *value = vs[i];
		return 1;
            }
	}
    }
   for( i = 0; i < n; i++) {
        if (EQ(ks[i], key)) {
            *value = vs[i];
            return 1;
        }
    }
    return 0;
}

int erts_validate_and_sort_map(map_t* mp)
{
 ...
}

我们知道map的数据结构非常简单:记录个数,keys, values. keys是以tuple有序存放的。所以插入和移除的代价都很高,查询也是遍历方式进行的。

map数据结构的复杂性和进程字典(put,get)相比都简单很多, 所以map第一天设计就不是为了大量存放kv数据的目的,它的目的是提供一种更好的record的替代品,在细节上改进:支持atom以外的key, 语法和语义更自然。

小结: 如果要性能,请使用ets, dict等数据结构;如果要替代record用的更顺手,用map。

祝玩的开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, 源码分析 Tags: ,
  1. March 12th, 2014 at 18:42 | #1

    现在的实现还有点糙,EEP里面提到的是insert和lookup最坏O(log n),目前的实现是线性搜索。其实key都是有序的,改成二分搜索就可以实现proposed的复杂度了。

    Yu Feng Reply:

    确实

  2. Xu Yifeng
    March 13th, 2014 at 11:07 | #2

    map比record是灵活多了,原来record生成的代码是用element()和setelement模拟的,字段位置固定,一旦有不是故意的字段次序变动,所有使用这个record的模块都要重新编译。而且新字段不能动态添加也是个问题。

  3. dcy
    November 4th, 2014 at 15:15 | #3

    蛋疼的是lists里面的keyxxx方法全是针对record的,在考虑把record替换成maps的时候想到lists那些方法可能都要替换或者重写,有点犹豫

Comments are closed.