CPU拓扑结构的调查
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: CPU拓扑结构的调查
在做多核程序的时候(比如Erlang程序),我们需要了解cpu的拓扑结构, 了解logic CPU和物理的CPU的映射关系,以及了解CPU的内部的硬件参数,比如说
L1,L2 cache的大小等信息。
Linux下的/proc/cpuinfo提供了相应的信息,但是比较不全面。 /sys/devices/system/cpu/也提供了topology结构但是比较难解读。
很多时候我们需要更专业的工具了。intel提供了这样的救助。参见: http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
下载下来编译执行就好。
[admin@my174 cpu-topology]$ ./cpu_topology64.out
Advisory to Users on system topology enumeration
This utility is for demonstration purpose only. It assumes the hardware topology
configuration within a coherent domain does not change during the life of an OS
session. If an OS support advanced features that can change hardware topology
configurations, more sophisticated adaptation may be necessary to account for
the hardware configuration change that might have added and reduced the number
of logical processors being managed by the OS.
User should also`be aware that the system topology enumeration algorithm is
based on the assumption that CPUID instruction will return raw data reflecting
the native hardware configuration. When an application runs inside a virtual
machine hosted by a Virtual Machine Monitor (VMM), any CPUID instructions
issued by an app (or a guest OS) are trapped by the VMM and it is the VMM’s
responsibility and decision to emulate/supply CPUID return data to the virtual
machines. When deploying topology enumeration code based on querying CPUID
inside a VM environment, the user must consult with the VMM vendor on how an VMM
will emulate CPUID instruction relating to topology enumeration.
Software visible enumeration in the system:
Number of logical processors visible to the OS: 16
Number of logical processors visible to this process: 16
Number of processor cores visible to this process: 8
Number of physical packages visible to this process: 2
Hierarchical counts by levels of processor topology:
# of cores in package 0 visible to this process: 4 .
# of logical processors in Core 0 visible to this process: 2 .
# of logical processors in Core 1 visible to this process: 2 .
# of logical processors in Core 2 visible to this process: 2 .
# of logical processors in Core 3 visible to this process: 2 .
# of cores in package 1 visible to this process: 4 .
# of logical processors in Core 0 visible to this process: 2 .
# of logical processors in Core 1 visible to this process: 2 .
# of logical processors in Core 2 visible to this process: 2 .
# of logical processors in Core 3 visible to this process: 2 .
Affinity masks per SMT thread, per core, per package:
Individual:
P:0, C:0, T:0 –> 1
P:0, C:0, T:1 –> 100
Core-aggregated:
P:0, C:0 –> 101
Individual:
P:0, C:1, T:0 –> 4
P:0, C:1, T:1 –> 400
Core-aggregated:
P:0, C:1 –> 404
Individual:
P:0, C:2, T:0 –> 10
P:0, C:2, T:1 –> 1z3
Core-aggregated:
P:0, C:2 –> 1010
Individual:
P:0, C:3, T:0 –> 40
P:0, C:3, T:1 –> 4z3
Core-aggregated:
P:0, C:3 –> 4040
Pkg-aggregated:
P:0 –> 5555
Individual:
P:1, C:0, T:0 –> 2
P:1, C:0, T:1 –> 200
Core-aggregated:
P:1, C:0 –> 202
Individual:
P:1, C:1, T:0 –> 8
P:1, C:1, T:1 –> 800
Core-aggregated:
P:1, C:1 –> 808
Individual:
P:1, C:2, T:0 –> 20
P:1, C:2, T:1 –> 2z3
Core-aggregated:
P:1, C:2 –> 2020
Individual:
P:1, C:3, T:0 –> 80
P:1, C:3, T:1 –> 8z3
Core-aggregated:
P:1, C:3 –> 8080
Pkg-aggregated:
P:1 –> aaaa
APIC ID listings from affinity masks
OS cpu 0, Affinity mask 000001 – apic id 10
OS cpu 1, Affinity mask 000002 – apic id 0
OS cpu 2, Affinity mask 000004 – apic id 12
OS cpu 3, Affinity mask 000008 – apic id 2
OS cpu 4, Affinity mask 000010 – apic id 14
OS cpu 5, Affinity mask 000020 – apic id 4
OS cpu 6, Affinity mask 000040 – apic id 16
OS cpu 7, Affinity mask 000080 – apic id 6
OS cpu 8, Affinity mask 000100 – apic id 11
OS cpu 9, Affinity mask 000200 – apic id 1
OS cpu 10, Affinity mask 000400 – apic id 13
OS cpu 11, Affinity mask 000800 – apic id 3
OS cpu 12, Affinity mask 001000 – apic id 15
OS cpu 13, Affinity mask 002000 – apic id 5
OS cpu 14, Affinity mask 004000 – apic id 17
OS cpu 15, Affinity mask 008000 – apic id 7
Package 0 Cache and Thread details
Box Description:
Cache is cache level designator
Size is cache size
OScpu# is cpu # as seen by OS
Core is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with ‘z#’
where # is number of zeroes (so ‘8z5’ is ‘0x800000’)
L1D is Level 1 Data cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4
L1I is Level 1 Instruction cache, size(KBytes)= 32, Cores/cache= 2, Caches/package= 4
L2 is Level 2 Unified cache, size(KBytes)= 256, Cores/cache= 2, Caches/package= 4
L3 is Level 3 Unified cache, size(KBytes)= 8192, Cores/cache= 8, Caches/package= 1
+———–+———–+———–+———–+
Cache | L1D | L1D | L1D | L1D |
Size | 32K | 32K | 32K | 32K |
OScpu#| 0 8| 2 10| 4 12| 6 14|
Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
AffMsk| 1 100| 4 400| 10 1z3| 40 4z3|
CmbMsk| 101 | 404 | 1010 | 4040 |
+———–+———–+———–+———–+
Cache | L1I | L1I | L1I | L1I |
Size | 32K | 32K | 32K | 32K |
+———–+———–+———–+———–+
Cache | L2 | L2 | L2 | L2 |
Size | 256K | 256K | 256K | 256K |
+———–+———–+———–+———–+
Cache | L3 |
Size | 8M |
CmbMsk| 5555 |
+———————————————–+
Combined socket AffinityMask= 0x5555
Package 1 Cache and Thread details
Box Description:
Cache is cache level designator
Size is cache size
OScpu# is cpu # as seen by OS
Core is core#[_thread# if > 1 thread/core] inside socket
AffMsk is AffinityMask(extended hex) for core and thread
CmbMsk is Combined AffinityMask(extended hex) for hw threads sharing cache
CmbMsk will differ from AffMsk if > 1 hw_thread/cache
Extended Hex replaces trailing zeroes with ‘z#’
where # is number of zeroes (so ‘8z5’ is ‘0x800000’)
+———–+———–+———–+———–+
Cache | L1D | L1D | L1D | L1D |
Size | 32K | 32K | 32K | 32K |
OScpu#| 1 9| 3 11| 5 13| 7 15|
Core |c0_t0 c0_t1|c1_t0 c1_t1|c2_t0 c2_t1|c3_t0 c3_t1|
AffMsk| 2 200| 8 800| 20 2z3| 80 8z3|
CmbMsk| 202 | 808 | 2020 | 8080 |
+———–+———–+———–+———–+
Cache | L1I | L1I | L1I | L1I |
Size | 32K | 32K | 32K | 32K |
+———–+———–+———–+———–+
Cache | L2 | L2 | L2 | L2 |
Size | 256K | 256K | 256K | 256K |
+———–+———–+———–+———–+
Cache | L3 |
Size | 8M |
CmbMsk| aaaa |
+———————————————–+
我们可以很清楚的看到我们CPU的信息,L1,L2,L3, cacheline的大小等,这些信息我们在做程序的时候经常需要的。
玩的开心!
参考文献:
2. http://www.kernel.org/doc/Documentation/ABI/testing/sysfs-devices-system-cpu
3. http://chemnitzer.linux-tage.de/2010/vortraege/shortpaper/470-slides.pdf
4. http://software.intel.com/sites/oss/pdfs/mclinux.pdf
Post Footer automatically generated by wp-posturl plugin for wordpress.
赞~~
很强大。。
请问怎么通过工具看到cpu和内存连接的拓扑图呢, numa架构的, 想知道哪些内存和哪个cpu是一个节点内
Yu Feng Reply:
April 18th, 2013 at 5:27 pm
”哪些内存“是指地址还是?