Home > Linux, 工具介绍 > Likwid-高性能服务器开发不可缺少的工具箱

Likwid-高性能服务器开发不可缺少的工具箱

January 16th, 2013

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: Likwid-高性能服务器开发不可缺少的工具箱

做高性能服务器的时候,知道如何开发高性能代码是一个事情,开发出来的系统是不是高性能那就是另外一个事情了。

通常我们需要了解系统的CPU拓扑结构,内存使用情况,各种CPU性能计数器的数字,各种CPU Cache的使用情况,命中率等等信息,这些信息有效的结合在一起才能准确的分析出我们程序的缺陷,从而找到更好的优化点。 通常这些信息是散落在系统的各个地方,对于普通的开发人员很难汇总起来,形成合力。

好了,以精细出名的德国人又来帮忙了,隆重推出Likwid。

Likwid

Likwid项目的地址在这里。 根据主页的上的描述:

Likwid stands for Like I knew what I am doing. This project contributes easy to use command line tools for Linux to support programmers in developing high performance multi threaded programs.

It contains the following tools:

likwid-topology: Show the thread and cache topology
likwid-perfctr: Measure hardware performance counters on Intel and AMD processors
likwid-features: Show and Toggle hardware prefetch control bits on Intel Core 2 processors
likwid-pin: Pin your threaded application without touching your code (supports pthreads, Intel OpenMP and gcc OpenMP)
likwid-bench: Benchmarking framework allowing rapid prototyping of threaded assembly kernels
likwid-mpirun: Script enabling simple and flexible pinning of MPI and MPI/threaded hybrid applications
likwid-perfscope: Frontend for likwid-perfctr timeline mode. Allows live plotting of performance metrics.
likwid-powermeter: Tool for accessing RAPL counters and query Turbo mode steps on Intel processor.
likwid-memsweeper: Tool to cleanup ccNUMA memory domains.
Likwid stands out because:

No kernel patching, any vanilla linux 2.6 or newer kernel works
Transparent, always clear which events are chosen, event tags have the same naming as in documentation
Lightweight, LIKWID tries to add no overhead and keeps out of your way.
Easy to use, simple to build, no need to touch your code, configurable from outside. Clear CLI interface.
Multiplatform, likwid supports Intel and AMD processors
Up to date, likwid tries to fully support new processors as soon as possible
Extensible, you can add functionality by means of simple text files

同时他的文档还是做的非常不错的,使用的介绍在这里

具体的使用我就不墨迹了,文档里面都有。我在这里秀下他的功能:


[chuba@rds064075.sqa.cm4 likwid-3.0]$ sudo ./likwid-topology 
-------------------------------------------------------------
CPU type:       Intel Core Westmere processor 
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets:        2 
Cores per socket:       4 
Threads per core:       2 
-------------------------------------------------------------
HWThread        Thread          Core            Socket
0               0               0               1
1               0               1               1
2               0               9               1
3               0               10              1
4               0               0               0
5               0               1               0
6               0               9               0
7               0               10              0
8               1               0               1
9               1               1               1
10              1               9               1
11              1               10              1
12              1               0               0
13              1               1               0
14              1               9               0
15              1               10              0
-------------------------------------------------------------
Socket 0: ( 4 12 5 13 6 14 7 15 )
Socket 1: ( 0 8 1 9 2 10 3 11 )
-------------------------------------------------------------

*************************************************************
Cache Topology
*************************************************************
Level:  1
Size:   32 kB
Cache groups:   ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7 15 ) ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 )
-------------------------------------------------------------
Level:  2
Size:   256 kB
Cache groups:   ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7 15 ) ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 )
-------------------------------------------------------------
Level:  3
Size:   12 MB
Cache groups:   ( 4 12 5 13 6 14 7 15 ) ( 0 8 1 9 2 10 3 11 )
-------------------------------------------------------------

*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 2 
-------------------------------------------------------------
Domain 0:
Processors:  4 5 6 7 12 13 14 15
Relative distance to nodes:  10 20
Memory: 16222.4 MB free of total 24567.1 MB
-------------------------------------------------------------
Domain 1:
Processors:  0 1 2 3 8 9 10 11
Relative distance to nodes:  20 10
Memory: 5424.19 MB free of total 24576 MB
-------------------------------------------------------------



$ sudo ./likwid-perfctr  -C 0-3 -g MEM sleep 10
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Core Westmere processor 
CPU clock:      2.13 GHz 
Measuring group MEM
-------------------------------------------------------------
sleep 10
Status: 0x400000000 
Status: 0x0 
Status: 0x0 
Status: 0x0 
+--------------------------------+-------------+-------------+-------------+-------------+
|             Event              |   core 0    |   core 1    |   core 2    |   core 3    |
+--------------------------------+-------------+-------------+-------------+-------------+
|       INSTR_RETIRED_ANY        | 1.15794e+08 | 3.30559e+08 | 9.21383e+07 | 6.13907e+07 |
|     CPU_CLK_UNHALTED_CORE      | 2.16557e+08 | 5.36794e+08 | 1.60588e+08 | 1.07672e+08 |
|      CPU_CLK_UNHALTED_REF      | 2.1624e+08  | 5.15724e+08 | 1.55415e+08 | 1.0452e+08  |
|    UNC_QMC_NORMAL_READS_ANY    | 1.42469e+07 |      0      |      0      |      0      |
|    UNC_QMC_WRITES_FULL_ANY     | 3.3378e+06  |      0      |      0      |      0      |
| UNC_QHL_REQUESTS_REMOTE_READS  | 5.95875e+06 |      0      |      0      |      0      |
|  UNC_QHL_REQUESTS_LOCAL_READS  | 9.16778e+06 |      0      |      0      |      0      |
| UNC_QHL_REQUESTS_REMOTE_WRITES |   163766    |      0      |      0      |      0      |
+--------------------------------+-------------+-------------+-------------+-------------+
+-------------------------------------+-------------+-------------+-------------+-------------+
|                Event                |     Sum     |     Max     |     Min     |     Avg     |
+-------------------------------------+-------------+-------------+-------------+-------------+
|       INSTR_RETIRED_ANY STAT        | 5.99881e+08 | 3.30559e+08 | 6.13907e+07 | 1.4997e+08  |
|     CPU_CLK_UNHALTED_CORE STAT      | 1.02161e+09 | 5.36794e+08 | 1.07672e+08 | 2.55403e+08 |
|      CPU_CLK_UNHALTED_REF STAT      | 9.91899e+08 | 5.15724e+08 | 1.0452e+08  | 2.47975e+08 |
|    UNC_QMC_NORMAL_READS_ANY STAT    | 1.42469e+07 | 1.42469e+07 |      0      | 3.56173e+06 |
|    UNC_QMC_WRITES_FULL_ANY STAT     | 3.3378e+06  | 3.3378e+06  |      0      |   834449    |
| UNC_QHL_REQUESTS_REMOTE_READS STAT  | 5.95875e+06 | 5.95875e+06 |      0      | 1.48969e+06 |
|  UNC_QHL_REQUESTS_LOCAL_READS STAT  | 9.16778e+06 | 9.16778e+06 |      0      | 2.29194e+06 |
| UNC_QHL_REQUESTS_REMOTE_WRITES STAT |   163766    |   163766    |      0      |   40941.5   |
+-------------------------------------+-------------+-------------+-------------+-------------+
+-----------------------------+----------+----------+-----------+-----------+
|           Metric            |  core 0  |  core 1  |  core 2   |  core 3   |
+-----------------------------+----------+----------+-----------+-----------+
|     Runtime (RDTSC) [s]     | 10.0024  | 10.0024  |  10.0024  |  10.0024  |
|    Runtime unhalted [s]     | 0.101511 | 0.251623 | 0.0752758 | 0.0504714 |
|         Clock [MHz]         | 2136.45  | 2220.49  |  2204.33  |  2197.66  |
|             CPI             |  1.8702  |  1.6239  |  1.7429   |  1.75388  |
| Memory bandwidth [MBytes/s] | 112.515  |    0     |     0     |     0     |
| Memory data volume [GBytes] | 1.12542  |    0     |     0     |     0     |
|  Remote Read BW [MBytes/s]  | 38.1267  |    0     |     0     |     0     |
| Remote Write BW [MBytes/s]  | 1.04785  |    0     |     0     |     0     |
|    Remote BW [MBytes/s]     | 39.1746  |    0     |     0     |     0     |
+-----------------------------+----------+----------+-----------+-----------+
+----------------------------------+----------+----------+-----------+----------+
|              Metric              |   Sum    |   Max    |    Min    |   Avg    |
+----------------------------------+----------+----------+-----------+----------+
|     Runtime (RDTSC) [s] STAT     | 40.0097  | 10.0024  |  10.0024  | 10.0024  |
|    Runtime unhalted [s] STAT     | 0.478882 | 0.251623 | 0.0504714 | 0.11972  |
|         Clock [MHz] STAT         | 8758.93  | 2220.49  |  2136.45  | 2189.73  |
|             CPI STAT             | 1.70302  |  1.8702  |  1.6239   | 0.425755 |
| Memory bandwidth [MBytes/s] STAT | 112.515  | 112.515  |     0     | 28.1287  |
| Memory data volume [GBytes] STAT | 1.12542  | 1.12542  |     0     | 0.281355 |
|  Remote Read BW [MBytes/s] STAT  | 38.1267  | 38.1267  |     0     | 9.53168  |
| Remote Write BW [MBytes/s] STAT  | 1.04785  | 1.04785  |     0     | 0.261962 |
|    Remote BW [MBytes/s] STAT     | 39.1746  | 39.1746  |     0     | 9.79365  |
+----------------------------------+----------+----------+-----------+----------+

各种信息就在你指尖。

祝玩的开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Linux, 工具介绍 Tags: , ,
  1. Chen
    January 17th, 2013 at 01:43 | #1

    居然是在我们学校搞了高性能技术。感觉好亲切。小兴奋一下。现在在纠结容错这块。发现发现这个也很不错哇。很好玩啊!

    Yu Feng Reply:

    在德国上学?强呀!

  2. tk
    February 26th, 2013 at 18:40 | #2

    高手,这款程序在32位下是不是没法编译成功?如果在32位下编译,发现它生成的GCC/*.s文件包含32位指令和64位指令。。(64位编译没问题)

    Yu Feng Reply:

    找个64位的机器就避免这样的问题,先用起来。

  3. kain
    April 6th, 2013 at 19:45 | #3

    原来是@淘宝褚霸 ,最近实验室要估测graph500的性能,使下likwid

    Yu Feng Reply:

    likwid 不错!

    kain Reply:

    这几天在剖析likwid-perfctr源码,有的地方实现的还是有些问题,例如-C 1-3源码会分配给0,而且
    比重很大

    Yu Feng Reply:

    牛!能给作者提提bug吗?

Comments are closed.