False sharing问题及其解决方法

Home > Linux > False sharing问题及其解决方法

October 21st, 2010 Yu Feng

原创文章，转载请注明： 转载自系统技术非业余研究

在做多线程程序的时候,为了避免使用锁,我们通常会采用这样的数据结构:根据线程的数目,安排一个数组, 每个线程一个项,互相不冲突. 从逻辑上看这样的设计无懈可击,但是实践的过程我们会发现这样并没有提高速度. 问题在于cpu的cache line. 我们在读主存的时候,数据同时被读到L1,L2中去,而且在L1中是以cache line(通常64)字节为单位的. 每个Core都有自己的L1,L2,所以每个线程在读取自己的项的时候, 也把别人的项读进去, 所以在更新的时候,为了保持数据的一致性, core之间cache要进行同步, 这个会导致严重的性能问题. 这就是所谓的False sharing问题, 有兴趣的同学可以wiki下.

具体的参考文章: http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/

解决方法很简单:
把每个项凑齐cache line的长度,实现隔离.

typedef union {
    erts_smp_rwmtx_t rwmtx;
    byte cache_line_align__[ERTS_ALC_CACHE_LINE_ALIGN_SIZE(
				sizeof(erts_smp_rwmtx_t))];
} erts_meta_main_tab_lock_t;
或者 
_declspec (align(64)) int thread1_global_variable;
__declspec (align(64)) int thread2_global_variable;

这就是为什么在高性能服务器中到处看到cache_line_align, 号称是避免cache的trash.

类似valgrind和intel vtune的工具可以做这个层次的性能微调.

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Linux Tags: align, cache line, false sharing

Comments (6)

caosuwei

October 21st, 2010 at 23:44 | #1

Reply | Quote

这样会不会有更多的cache miss?

Yu Feng Reply:
October 22nd, 2010 at 1:19 pm
不会，cache也是64字节一行的，但是会浪费点内存如果你的数组比较大的话。
ahfu

October 22nd, 2010 at 14:09 | #2

Reply | Quote

在Intel TBB这本书里面讲，cache line是128字节呢！
64还是128？

Yu Feng Reply:
October 22nd, 2010 at 2:28 pm
通常情况下是64, 我还没有看到了128的。
你可以这样了解 cat /proc/cpuinfo
ahfu

October 25th, 2010 at 13:36 | #3

Reply | Quote

@Yu Feng
cat /proc/cpuinfo 并未发现cache line size这样的信息啊！

Yu Feng Reply:
October 26th, 2010 at 12:52 am
…
bogomips : 3999.45
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
…
zhanhc

October 4th, 2011 at 16:06 | #4

Reply | Quote

长见识了，看到有人说淘宝的笔试题目有这个，特地搜了下。原以为就是为了内存对齐，没想到问题的根是多核。今年也参加淘宝的笔试，期待面试能过！

Yu Feng Reply:
October 5th, 2011 at 6:13 pm
祝你好运！
Min Zhou

April 14th, 2012 at 14:17 | #5

Reply | Quote

Valgrind号称是有false sharing的detection, 貌似不能用; VTune要看CPU型号, 我的本子 i5 M640就不支持这个事件~
Min Zhou

April 16th, 2012 at 13:33 | #6

Reply | Quote

霸爷, 这图应该是SMP上的false sharing, 和CMP不同.

Yu Feng Reply:
April 16th, 2012 at 7:43 pm
恩

Comments are closed.

如何在TILEPro64多核心板卡上编译和运行Erlang ECUG2010分享：C1000K高性能服务器构架技术

系统技术非业余研究