工具介绍 | 系统技术非业余研究

Systemtap的另类用法

November 10th, 2010 Yu Feng 17 comments

原创文章，转载请注明： 转载自系统技术非业余研究

通常我们在做内核编程的时候，会用到内核的数据结构，比如说textsearch提供了几种算法用于支付串查找。在用于正式的项目前，我们会希望考察下他的用法以及想体验下。最通常的做法是自己写个module,写个makefile,编译，运行，然后去dmesg里面看printk的结果。这个过程没啥问题，就是太罗嗦。好了，现在我们有更方便的方法了：systemtap.

Systemtap是个脚本，先翻译成c kernel模块代码，然后编译，插入到内核运行，同时提供最基本的内核和应用模块的通讯管道，在应用模块这里收集信息。它还支持guru模式，让用户直接插入c代码。这样我们就可以利用stap的这一特性来做我们的实验。
Read more…

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, Linux, 工具介绍 Tags: compile, module, systemtap, 头文件, 编译

如何在TILEPro64多核心板卡上编译和运行Erlang

November 2nd, 2010 Yu Feng 21 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: 如何在TILEPro64多核心板卡上编译和运行Erlang

参考文章：
1. https://groups.google.com/group/erlang-programming/msg/2d61b1083a10a7b6

2. http://erlang.2086793.n4.nabble.com/How-to-Cross-compile-Erlang-OTP-R13B04-for-TileraPro64-td2119304.html

美国Tilera公司的众核服务器，单颗内核包含64颗CPU。硬件架构图：

卡长这样的：

Erlang已经可以在这款CPU上成功运行，我们可以参考Ulf Wiger在Multicore ☺ Message-passing Concurrency 文档中关于Erlang在Tilera上的性能图.

Erlang系统前2年就开始正式支持Tilera，一直用这个CPU来调整他的调度器，所以性能和基础的编译运行支持都很到位。

Linux内核2.6.36起就开始支持Tilera的CPU架构了，看起来前途不错。

最近上海泛腾电子科技开始在国内销售 Tilera机器，我公司也得到一台样机，使得我有机会把玩下这个高科技！

该测试机是PCI-e的形式，是单板机，直接安装在PC机或者是服务器里,好处是可以通过主机的VGA口接显示器直接调试。当然也可以作为智能网卡来使用。构成一个与Host的异构结构，通过PCI-e总线进行通讯。

还需要相应的配套SDK: 目前有TileraMDE-2.1.2.112814 和 TileraMDE-3.0.alpha3.116173 二个版本, 来负责和板卡的通信。推荐用2.0的，好像不容易出问题。

废话少说，让我们开始享受64核心快乐旅程吧！
Read more…

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Erlang探索, Linux, 工具介绍 Tags: 64, compile, Erlang探索, install, kernel, otp, Tilera, 并发，并行

Linux下谁在消耗我们的cache

September 25th, 2010 Yu Feng 20 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: Linux下谁在消耗我们的cache

Linux下对文件的访问和设备的访问通常会被cache起来加快访问速度，这个是系统的默认行为。而cache需要耗费我们的内存，虽然这个内存最后可以通过echo 3>/proc/sys/vm/drop_caches这样的命令来主动释放。但是有时候我们还是需要理解谁消耗了我们的内存。

我们来先了解下内存的使用情况:

[root@my031045 ~]# free
             total       used       free     shared    buffers     cached
Mem:      24676836     626568   24050268          0      30884     508312
-/+ buffers/cache:      87372   24589464
Swap:      8385760

有了伟大的systemtap, 我们可以用stap脚本来了解谁在消耗我们的cache了：

#这个命令行用来调查谁在加数据入page_cache
[root@my031045 ~]# stap -e 'probe vfs.add_to_page_cache {printf("dev=%d, devname=%s, ino=%d, index=%d, nrpages=%d\n", dev, devname, ino, index, nrpages )}'
...
dev=2, devname=N/A, ino=0, index=2975, nrpages=1777
dev=2, devname=N/A, ino=0, index=3399, nrpages=2594
dev=2, devname=N/A, ino=0, index=3034, nrpages=1778
dev=2, devname=N/A, ino=0, index=3618, nrpages=2595
dev=2, devname=N/A, ino=0, index=1694, nrpages=106
dev=2, devname=N/A, ino=0, index=1703, nrpages=107
dev=2, devname=N/A, ino=0, index=1810, nrpages=210
dev=2, devname=N/A, ino=0, index=1812, nrpages=211
...

这时候我们拷贝个大文件：

[chuba@my031045 ~]$ cp huge_foo.file  bar

#这时候我们可以看到文件的内容被猛的添加到cache去：
...
dev=8388614, devname=sda6, ino=2399271, index=39393, nrpages=39393
dev=8388614, devname=sda6, ino=2399271, index=39394, nrpages=39394
dev=8388614, devname=sda6, ino=2399271, index=39395, nrpages=39395
dev=8388614, devname=sda6, ino=2399271, index=39396, nrpages=39396
dev=8388614, devname=sda6, ino=2399271, index=39397, nrpages=39397
dev=8388614, devname=sda6, ino=2399271, index=39398, nrpages=39398
dev=8388614, devname=sda6, ino=2399271, index=39399, nrpages=39399
dev=8388614, devname=sda6, ino=2399271, index=39400, nrpages=39400
dev=8388614, devname=sda6, ino=2399271, index=39401, nrpages=39401
dev=8388614, devname=sda6, ino=2399271, index=39402, nrpages=39402
dev=8388614, devname=sda6, ino=2399271, index=39403, nrpages=39403
dev=8388614, devname=sda6, ino=2399271, index=39404, nrpages=39404
dev=8388614, devname=sda6, ino=2399271, index=39405, nrpages=39405
dev=8388614, devname=sda6, ino=2399271, index=39406, nrpages=39406
dev=8388614, devname=sda6, ino=2399271, index=39407, nrpages=39407
dev=8388614, devname=sda6, ino=2399271, index=39408, nrpages=39408
dev=8388614, devname=sda6, ino=2399271, index=39409, nrpages=39409
dev=8388614, devname=sda6, ino=2399271, index=39410, nrpages=39410
dev=8388614, devname=sda6, ino=2399271, index=39411, nrpages=39411
...

此外加入我们想了解下系统的cache都谁在用呢, 那个文件用到多少页了呢？
我们有个脚本可以做到，这里非常谢谢子团让我使用他的代码。

[chuba@my031045 ~]# stap -g viewcache.stp

在另外的shell里面 
[chuba@my031045 ~]# dmesg
...
inode: 116397109, num: 5
inode: 116397111, num: 2
inode: 116397112, num: 1
inode: 116397149, num: 2
inode: 116397152, num: 1
inode: 116397336, num: 2
inode: 116397343, num: 1
inode: 116397371, num: 4
inode: 116397372, num: 2
...

非常清楚的看出来每个inode占用了多少页，用工具转换下就知道哪个文件耗费了多少内存。

点击下载viewcache.stp

另外小TIPS：

从inode到文件名的转换
find / -inum your_inode

从文件名到inode的转换
stat -c “%i” your_filename
或者 ls -i your_filename

我们套用了下就马上知道那个文件占用的cache很多。

[chuba@my031045 ~]$ sudo find / -inum 2399248
/home/chuba/kernel-debuginfo-2.6.18-164.el5.x86_64.rpm

玩的开心。

参考资料:
page cache和buffer cache的区别:
这篇文章总结的最靠谱: http://blog.chinaunix.net/u/1595/showart.php?id=2209511

后记:
linux下有个这样的系统调用可以知道页面的状态:mincore – determine whether pages are resident in memory
同时有人作个脚本fincore更方便大家的使用, 点击下载fincore

后来子团告诉我还有这个工具: https://code.google.com/p/linux-ftools/

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: 工具介绍 Tags: buffer, cache, fincore, free, mincore, page, stap, systemtap, vfs

Fio IO性能测试工具介绍

September 25th, 2010 Yu Feng 26 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: Fio IO性能测试工具介绍

官网：http://freshmeat.net/projects/fio/

fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 13 different types of I/O engines (sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi, solarisaio, and more), I/O priorities (for newer Linux kernels), rate I/O, forked or threaded jobs, and much more. It can work on block devices as well as files. fio accepts job descriptions in a simple-to-understand text format. Several example job files are included. fio displays all sorts of I/O performance information. It supports Linux, FreeBSD, NetBSD, OS X, and OpenSolaris.

Ubuntu下可以用apt-get install fio安装就好。

这个工具最大的特点是使用简单，支持的文件操作非常多, 可以覆盖到我们能见到的文件使用方式:
sync：Basic read(2) or write(2) I/O. fseek(2) is used to position the I/O location.
psync：Basic pread(2) or pwrite(2) I/O.
vsync: Basic readv(2) or writev(2) I/O. Will emulate queuing by coalescing adjacents IOs into a single submission.
libaio: Linux native asynchronous I/O.
posixaio: glibc POSIX asynchronous I/O using aio_read(3) and aio_write(3).
mmap: File is memory mapped with mmap(2) and data copied using memcpy(3).
splice： splice(2) is used to transfer the data and vmsplice(2) to transfer data from user-space to the kernel.
syslet-rw： Use the syslet system calls to make regular read/write asynchronous.
sg：SCSI generic sg v3 I/O.
net ： Transfer over the network. filename must be set appropriately to `host/port’ regardless of data direction. If receiving,
only the port argument is used.
netsplice： Like net, but uses splice(2) and vmsplice(2) to map data and send/receive.
guasi The GUASI I/O engine is the Generic Userspace Asynchronous Syscall Interface approach to asycnronous I/O.

还可以控制io depth对于测试磁盘的性能很有帮助，对结果的解读也做的很明白。

典型的使用如下：
fio –filename=/dev/sdc1 –direct=1 –rw=randread –bs=4k –size=60G –numjobs=64 –runtime=10 –group_reporting –name=fileXXX

玩的开心。

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: 工具介绍 Tags: fio, 性能测试

nmon（Linux下很好用的性能监测工具）介绍

September 25th, 2010 Yu Feng Comments off

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: nmon（Linux下很好用的性能监测工具）介绍

The nmon tool is designed for AIX and Linux performance specialists to use for monitoring and analyzing performance data, including:
* CPU utilization
* Memory use
* Kernel statistics and run queue information
* Disks I/O rates, transfers, and read/write ratios
* Free space on file systems
* Disk adapters
* Network I/O rates, transfers, and read/write ratios
* Paging space and paging rates
* CPU and AIX specification
* Top processors
* IBM HTTP Web cache
* User-defined disk groups
* Machine details and resources
* Asynchronous I/O — AIX only
* Workload Manager (WLM) — AIX only
* IBM TotalStorage® Enterprise Storage Server® (ESS) disks — AIX only
* Network File System (NFS)
* Dynamic LPAR (DLPAR) changes — only pSeries p5 and OpenPower for either AIX or Linux

Ubuntu下可以用 apt-get -y install nmon安装就好。这个工具的最大特点是日常所需要的性能监测数据都有了，而且图形化表示，很容易解读而且信息很丰富。

具体的可以参考文章：http://www.ibm.com/developerworks/aix/library/au-analyze_aix/ 里面有截屏，很清楚。

祝大家玩的开心。

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: 工具介绍 Tags: linux, nmon

CPU拓扑结构的调查

September 25th, 2010 Yu Feng 4 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: CPU拓扑结构的调查

在做多核程序的时候(比如Erlang程序)，我们需要了解cpu的拓扑结构, 了解logic CPU和物理的CPU的映射关系，以及了解CPU的内部的硬件参数，比如说
L1，L2 cache的大小等信息。

Linux下的/proc/cpuinfo提供了相应的信息，但是比较不全面。 /sys/devices/system/cpu/也提供了topology结构但是比较难解读。

很多时候我们需要更专业的工具了。intel提供了这样的救助。参见： http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/

下载下来编译执行就好。

[admin@my174 cpu-topology]$ ./cpu_topology64.out
Read more…

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: 工具介绍 Tags: cpu, topology

gcc mudflap 用来检测内存越界的问题

September 25th, 2010 Yu Feng 5 comments

原创文章，转载请注明： 转载自系统技术非业余研究

本文链接地址: gcc mudflap 用来检测内存越界的问题

参考资料：http://www.redhat.com/magazine/015jan06/features/valgrind/
http://www.stlinux.com/devel/debug/mudflap

我们用C语言在做大型服务器程序的时候，不可避免的要面对内存错误的问题。典型的问题是内存泄漏，越界，随机乱写等问题。在linux下valgrind是个很好的工具，大部分问题都可以查的到的。但是对于更微妙的越界问题，valgrind有时候也是无能为力的。比如下面的问题。

[admin@my174 ~]$ cat bug.c

int a[10];
int b[10];
int main(void) {
   return a[11];
}

[admin@my174 ~]$ gcc -g -o bug bug.c
[admin@my174 ~]$ valgrind ./bug
==5791== Memcheck, a memory error detector.
==5791== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==5791== Using LibVEX rev 1658, a library for dynamic binary translation.
==5791== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==5791== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==5791== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==5791== For more details, rerun with: -v
==5791== 
==5791== 
==5791== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1)
==5791== malloc/free: in use at exit: 0 bytes in 0 blocks.
==5791== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==5791== For counts of detected errors, rerun with: -v
==5791== All heap blocks were freed -- no leaks are possible.
[admin@my174 ~]$

valgrind报告一切安好。

[admin@my174 ~]$ gcc -o bug bug.c -g -fmudflap -lmudflap
[admin@my174 ~]$ ./bug
*******
mudflap violation 1 (check/read): time=1285386334.204054 ptr=0x700e00 size=48
pc=0x2b6c3013c4c1 location=`bug.c:5 (main)'
      /usr/lib64/libmudflap.so.0(__mf_check+0x41) [0x2b6c3013c4c1]
      ./bug(main+0x7a) [0x400952]
      /lib64/libc.so.6(__libc_start_main+0xf4) [0x39ea21d994]
Nearby object 1: checked region begins 0B into and ends 8B after
mudflap object 0x16599370: name=`bug.c:1 a'
bounds=[0x700e00,0x700e27] size=40 area=static check=3r/0w liveness=3
alloc time=1285386334.204025 pc=0x2b6c3013bfe1
number of nearby objects: 1

mudflap就很顺利的检查出来了。

[admin@my174 ~]$ gcc -v
…
gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)

当然我们的这个例子很简单，典型的服务器要比这个复杂很多，而且mudflap的运行开销也非常高，我们在定位此类bug的时候不妨实验下。
Have fun!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: 工具介绍 Tags: gcc, mudflap, valgrind

Newer Entries Older Entries

系统技术非业余研究

Archive

Systemtap的另类用法

如何在TILEPro64多核心板卡上编译和运行Erlang

Linux下谁在消耗我们的cache

Fio IO性能测试工具介绍

nmon（Linux下很好用的性能监测工具）介绍

CPU拓扑结构的调查

gcc mudflap 用来检测内存越界的问题

buy me a coffee.

Recent Posts

Recent Comments

Categories

Blogroll

Archives

Meta