Linux下谁在切换我们的进程
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: Linux下谁在切换我们的进程
我们在做Linux服务器的时候经常会需要知道谁在做进程切换,什么原因需要做进程切换。 因为进程切换的代价很高,我给出一个LMbench测试出来的数字:
Context switching – times in microseconds – smaller is better
————————————————————————-
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
——— ————- —— —— —— —— —— ——- ——-
my174.cm4 Linux 2.6.18- 6.1100 7.0200 6.1100 8.7400 7.7200 8.96000 9.62000
在我的很高端的服务器上,进程切换的开销在8us左右, 这个相对于高性能的服务器是不可接受的, 所以我们要在一个时间片内尽可能的多做事情,而不是把时间浪费在无谓的切换上。
好奇害死猫,我们来调查下谁在切换我们的进程:
[root@my174 admin]# dstat 1 ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 0 100 0 0 0| 0 0 | 796B 1488B| 0 0 |1004 128 0 0 100 0 0 0| 0 0 | 280B 728B| 0 0 |1005 114 0 0 100 0 0 0| 0 0 | 280B 728B| 0 0 |1005 128 0 0 100 0 0 0| 0 0 | 280B 728B| 0 0 |1005 114 0 0 100 0 0 0| 0 320k| 280B 728B| 0 0 |1008 143 ...
我们可以看到 csw的数目是 120/S, 但是dstat或者vmstat类似的工具并没有告诉我们谁在干坏事。好吧!我们自己动手行吧。
祭出我们可爱的systemtap!
[root@my174 admin]# cat >cswmon.stp #! /usr/bin/env stap # # global csw_count global idle_count probe scheduler.cpu_off { csw_count[task_prev, task_next]++ idle_count+=idle } function fmt_task(task_prev, task_next) { return sprintf("%s(%d)->%s(%d)", task_execname(task_prev), task_pid(task_prev), task_execname(task_next), task_pid(task_next)) } function print_cswtop () { printf ("%45s %10s\n", "Context switch", "COUNT") foreach ([task_prev, task_next] in csw_count- limit 20) { printf("%45s %10d\n", fmt_task(task_prev, task_next), csw_count[task_prev, task_next]) } printf("%45s %10d\n", "idle", idle_count) delete csw_count delete idle_count } probe timer.s($1) { print_cswtop () printf("--------------------------------------------------------------\n") } CTRL+D
这个脚本会每隔设定的时间打印出TOP 20切换最多的进程和他的pid, 我们来看下结果把:
[root@my174 admin]# stap cswmon.stp 5 Context switch COUNT swapper(0)->systemtap/11(908) 500 systemtap/11(908)->swapper(0) 498 swapper(0)->fct1-worker(2492) 50 fct1-worker(2492)->swapper(0) 50 swapper(0)->fct0-worker(2191) 50 fct0-worker(2191)->swapper(0) 50 swapper(0)->bond0(3432) 50 bond0(3432)->swapper(0) 50 stapio(879)->swapper(0) 26 swapper(0)->stapio(879) 25 stapio(879)->swapper(0) 19 swapper(0)->stapio(879) 17 swapper(0)->watchdog/9(31) 5 watchdog/9(31)->swapper(0) 5 swapper(0)->mysqld(18346) 5 mysqld(18346)->swapper(0) 5 swapper(0)->watchdog/13(43) 5 watchdog/13(43)->swapper(0) 5 swapper(0)->watchdog/14(46) 5 watchdog/14(46)->swapper(0) 5 idle 859 -------------------------------------------------------------- ...
我们可以看到进程从哪里切换到哪里,并且发生了多少次, 最后一行,我打印出来idle的次数,也就是说这时候系统没啥事情做,就切换到idle(0)这个进程去休息去了。
通过上面的调查,我们会很清楚的了解到我们系统的开销发生在那里,方便我们定位问题。
玩的开心!
Post Footer automatically generated by wp-posturl plugin for wordpress.
Linux下谁在消耗我们的cache
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: Linux下谁在消耗我们的cache
Linux下对文件的访问和设备的访问通常会被cache起来加快访问速度,这个是系统的默认行为。 而cache需要耗费我们的内存,虽然这个内存最后可以通过echo 3>/proc/sys/vm/drop_caches这样的命令来主动释放。但是有时候我们还是需要理解谁消耗了我们的内存。
我们来先了解下内存的使用情况:
[root@my031045 ~]# free total used free shared buffers cached Mem: 24676836 626568 24050268 0 30884 508312 -/+ buffers/cache: 87372 24589464 Swap: 8385760
有了伟大的systemtap, 我们可以用stap脚本来了解谁在消耗我们的cache了:
#这个命令行用来调查谁在加数据入page_cache [root@my031045 ~]# stap -e 'probe vfs.add_to_page_cache {printf("dev=%d, devname=%s, ino=%d, index=%d, nrpages=%d\n", dev, devname, ino, index, nrpages )}' ... dev=2, devname=N/A, ino=0, index=2975, nrpages=1777 dev=2, devname=N/A, ino=0, index=3399, nrpages=2594 dev=2, devname=N/A, ino=0, index=3034, nrpages=1778 dev=2, devname=N/A, ino=0, index=3618, nrpages=2595 dev=2, devname=N/A, ino=0, index=1694, nrpages=106 dev=2, devname=N/A, ino=0, index=1703, nrpages=107 dev=2, devname=N/A, ino=0, index=1810, nrpages=210 dev=2, devname=N/A, ino=0, index=1812, nrpages=211 ...
这时候我们拷贝个大文件:
[chuba@my031045 ~]$ cp huge_foo.file bar #这时候我们可以看到文件的内容被猛的添加到cache去: ... dev=8388614, devname=sda6, ino=2399271, index=39393, nrpages=39393 dev=8388614, devname=sda6, ino=2399271, index=39394, nrpages=39394 dev=8388614, devname=sda6, ino=2399271, index=39395, nrpages=39395 dev=8388614, devname=sda6, ino=2399271, index=39396, nrpages=39396 dev=8388614, devname=sda6, ino=2399271, index=39397, nrpages=39397 dev=8388614, devname=sda6, ino=2399271, index=39398, nrpages=39398 dev=8388614, devname=sda6, ino=2399271, index=39399, nrpages=39399 dev=8388614, devname=sda6, ino=2399271, index=39400, nrpages=39400 dev=8388614, devname=sda6, ino=2399271, index=39401, nrpages=39401 dev=8388614, devname=sda6, ino=2399271, index=39402, nrpages=39402 dev=8388614, devname=sda6, ino=2399271, index=39403, nrpages=39403 dev=8388614, devname=sda6, ino=2399271, index=39404, nrpages=39404 dev=8388614, devname=sda6, ino=2399271, index=39405, nrpages=39405 dev=8388614, devname=sda6, ino=2399271, index=39406, nrpages=39406 dev=8388614, devname=sda6, ino=2399271, index=39407, nrpages=39407 dev=8388614, devname=sda6, ino=2399271, index=39408, nrpages=39408 dev=8388614, devname=sda6, ino=2399271, index=39409, nrpages=39409 dev=8388614, devname=sda6, ino=2399271, index=39410, nrpages=39410 dev=8388614, devname=sda6, ino=2399271, index=39411, nrpages=39411 ...
此外加入我们想了解下系统的cache都谁在用呢, 那个文件用到多少页了呢?
我们有个脚本可以做到,这里非常谢谢 子团 让我使用他的代码。
[chuba@my031045 ~]# stap -g viewcache.stp 在另外的shell里面 [chuba@my031045 ~]# dmesg ... inode: 116397109, num: 5 inode: 116397111, num: 2 inode: 116397112, num: 1 inode: 116397149, num: 2 inode: 116397152, num: 1 inode: 116397336, num: 2 inode: 116397343, num: 1 inode: 116397371, num: 4 inode: 116397372, num: 2 ...
非常清楚的看出来每个inode占用了多少页,用工具转换下就知道哪个文件耗费了多少内存。
另外小TIPS:
从inode到文件名的转换
find / -inum your_inode
从文件名到inode的转换
stat -c “%i” your_filename
或者 ls -i your_filename
我们套用了下就马上知道那个文件占用的cache很多。
[chuba@my031045 ~]$ sudo find / -inum 2399248 /home/chuba/kernel-debuginfo-2.6.18-164.el5.x86_64.rpm
玩的开心。
参考资料:
page cache和buffer cache的区别:
这篇文章总结的最靠谱: http://blog.chinaunix.net/u/1595/showart.php?id=2209511
后记:
linux下有个这样的系统调用可以知道页面的状态:mincore – determine whether pages are resident in memory
同时有人作个脚本fincore更方便大家的使用, 点击下载fincore
后来子团告诉我还有这个工具: https://code.google.com/p/linux-ftools/
Post Footer automatically generated by wp-posturl plugin for wordpress.
Fio IO性能测试工具介绍
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: Fio IO性能测试工具介绍
官网:http://freshmeat.net/projects/fio/
fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 13 different types of I/O engines (sync, mmap, libaio, posixaio, SG v3, splice, null, network, syslet, guasi, solarisaio, and more), I/O priorities (for newer Linux kernels), rate I/O, forked or threaded jobs, and much more. It can work on block devices as well as files. fio accepts job descriptions in a simple-to-understand text format. Several example job files are included. fio displays all sorts of I/O performance information. It supports Linux, FreeBSD, NetBSD, OS X, and OpenSolaris.
Ubuntu下可以用apt-get install fio安装就好。
这个工具最大的特点是使用简单,支持的文件操作非常多, 可以覆盖到我们能见到的文件使用方式:
sync:Basic read(2) or write(2) I/O. fseek(2) is used to position the I/O location.
psync:Basic pread(2) or pwrite(2) I/O.
vsync: Basic readv(2) or writev(2) I/O. Will emulate queuing by coalescing adjacents IOs into a single submission.
libaio: Linux native asynchronous I/O.
posixaio: glibc POSIX asynchronous I/O using aio_read(3) and aio_write(3).
mmap: File is memory mapped with mmap(2) and data copied using memcpy(3).
splice: splice(2) is used to transfer the data and vmsplice(2) to transfer data from user-space to the kernel.
syslet-rw: Use the syslet system calls to make regular read/write asynchronous.
sg:SCSI generic sg v3 I/O.
net : Transfer over the network. filename must be set appropriately to `host/port’ regardless of data direction. If receiving,
only the port argument is used.
netsplice: Like net, but uses splice(2) and vmsplice(2) to map data and send/receive.
guasi The GUASI I/O engine is the Generic Userspace Asynchronous Syscall Interface approach to asycnronous I/O.
还可以控制io depth对于测试磁盘的性能很有帮助,对结果的解读也做的很明白。
典型的使用如下:
fio –filename=/dev/sdc1 –direct=1 –rw=randread –bs=4k –size=60G –numjobs=64 –runtime=10 –group_reporting –name=fileXXX
玩的开心。
Post Footer automatically generated by wp-posturl plugin for wordpress.
nmon(Linux下很好用的性能监测工具)介绍
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: nmon(Linux下很好用的性能监测工具)介绍
The nmon tool is designed for AIX and Linux performance specialists to use for monitoring and analyzing performance data, including:
* CPU utilization
* Memory use
* Kernel statistics and run queue information
* Disks I/O rates, transfers, and read/write ratios
* Free space on file systems
* Disk adapters
* Network I/O rates, transfers, and read/write ratios
* Paging space and paging rates
* CPU and AIX specification
* Top processors
* IBM HTTP Web cache
* User-defined disk groups
* Machine details and resources
* Asynchronous I/O — AIX only
* Workload Manager (WLM) — AIX only
* IBM TotalStorage® Enterprise Storage Server® (ESS) disks — AIX only
* Network File System (NFS)
* Dynamic LPAR (DLPAR) changes — only pSeries p5 and OpenPower for either AIX or Linux
Ubuntu下可以用 apt-get -y install nmon安装就好。 这个工具的最大特点是日常所需要的性能监测数据都有了,而且图形化表示,很容易解读而且信息很丰富。
具体的可以参考文章:http://www.ibm.com/developerworks/aix/library/au-analyze_aix/ 里面有截屏,很清楚。
祝大家玩的开心。
Post Footer automatically generated by wp-posturl plugin for wordpress.
CPU拓扑结构的调查
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: CPU拓扑结构的调查
在做多核程序的时候(比如Erlang程序),我们需要了解cpu的拓扑结构, 了解logic CPU和物理的CPU的映射关系,以及了解CPU的内部的硬件参数,比如说
L1,L2 cache的大小等信息。
Linux下的/proc/cpuinfo提供了相应的信息,但是比较不全面。 /sys/devices/system/cpu/也提供了topology结构但是比较难解读。
很多时候我们需要更专业的工具了。intel提供了这样的救助。参见: http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
下载下来编译执行就好。
[admin@my174 cpu-topology]$ ./cpu_topology64.out
Read more…
Post Footer automatically generated by wp-posturl plugin for wordpress.
gcc mudflap 用来检测内存越界的问题
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: gcc mudflap 用来检测内存越界的问题
参考资料:http://www.redhat.com/magazine/015jan06/features/valgrind/
http://www.stlinux.com/devel/debug/mudflap
我们用C语言在做大型服务器程序的时候,不可避免的要面对内存错误的问题。典型的问题是内存泄漏,越界,随机乱写等问题。 在linux下valgrind是个很好的工具,大部分问题都可以查的到的。但是对于更微妙的越界问题,valgrind有时候也是无能为力的。比如下面的问题。
[admin@my174 ~]$ cat bug.c
int a[10]; int b[10]; int main(void) { return a[11]; }
[admin@my174 ~]$ gcc -g -o bug bug.c [admin@my174 ~]$ valgrind ./bug ==5791== Memcheck, a memory error detector. ==5791== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al. ==5791== Using LibVEX rev 1658, a library for dynamic binary translation. ==5791== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP. ==5791== Using valgrind-3.2.1, a dynamic binary instrumentation framework. ==5791== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al. ==5791== For more details, rerun with: -v ==5791== ==5791== ==5791== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1) ==5791== malloc/free: in use at exit: 0 bytes in 0 blocks. ==5791== malloc/free: 0 allocs, 0 frees, 0 bytes allocated. ==5791== For counts of detected errors, rerun with: -v ==5791== All heap blocks were freed -- no leaks are possible. [admin@my174 ~]$
valgrind报告一切安好。
[admin@my174 ~]$ gcc -o bug bug.c -g -fmudflap -lmudflap [admin@my174 ~]$ ./bug ******* mudflap violation 1 (check/read): time=1285386334.204054 ptr=0x700e00 size=48 pc=0x2b6c3013c4c1 location=`bug.c:5 (main)' /usr/lib64/libmudflap.so.0(__mf_check+0x41) [0x2b6c3013c4c1] ./bug(main+0x7a) [0x400952] /lib64/libc.so.6(__libc_start_main+0xf4) [0x39ea21d994] Nearby object 1: checked region begins 0B into and ends 8B after mudflap object 0x16599370: name=`bug.c:1 a' bounds=[0x700e00,0x700e27] size=40 area=static check=3r/0w liveness=3 alloc time=1285386334.204025 pc=0x2b6c3013bfe1 number of nearby objects: 1
mudflap就很顺利的检查出来了。
[admin@my174 ~]$ gcc -v
…
gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
当然我们的这个例子很简单,典型的服务器要比这个复杂很多, 而且mudflap的运行开销也非常高,我们在定位此类bug的时候不妨实验下。
Have fun!
Post Footer automatically generated by wp-posturl plugin for wordpress.
iozone文件系统性能测试工具
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: iozone文件系统性能测试工具
IOzone官网: http://www.iozone.org/
IOzone is a filesystem benchmark tool. The benchmark generates and measures a variety of file operations. Iozone has been ported to many machines and runs under many operating systems.
Benchmark Features:
* ANSII C source
* POSIX async I/O
* Mmap() file I/O
* Normal file I/O
* Single stream measurement
* Multiple stream measurement
* Distributed fileserver measurements (Cluster)
* POSIX pthreads
* Multi-process measurement
* Excel importable output for graph generation
* Latency plots
* 64bit compatible source
* Large file compatible
* Stonewalling in throughput tests to eliminate straggler effects
* Processor cache size configurable
* Selectable measurements with fsync, O_SYNC
* Builds for: AIX, BSDI, HP-UX, IRIX, FreeBSD, Linux, OpenBSD, NetBSD, OSFV3, OSFV4, OSFV5, SCO OpenServer, Solaris, MAC OS X, Windows (95/98/Me/NT/2K/XP)
他的定位非常明确是针对文件系统的性能测试的。和常用的IO性能测试工具sysbench, fio, iometer不同, 它主要是通过模拟用户访问文件模式的不同,典型的如下面的几种:
(0=write/rewrite, 1=read/re-read, 2=random-read/write
3=Read-backwards, 4=Re-write-record, 5=stride-read, 6=fwrite/re-fwrite
7=fread/Re-fread, 8=random_mix, 9=pwrite/Re-pwrite, 10=pread/Re-pread
11=pwritev/Re-pwritev, 12=preadv/Re-preadv)
来达到隔离访问文件系统的meta信息和data信息的不同的开销, 从而反应文件系统的性能。
Ubuntu下可以用 apt-get -y install iozone安装就好。
他有二种模式: 1. 测试吞吐量模式。 2. 测试文件系统对记录大小,文件大小不同组合的反应。
以下是我用过的测试吞吐量模式的参数:
iozone -t -l 1 -u 16 -L 64 -S 8192 -b fio.xls -R -M -s 10G -r 32k -I -T -C -j 32 -+p 60
参数解释
-t -> Throughput test
-s 10G -> File size set to 18874368 KB
-M ->Machine = Linux my174.cm4 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009
-r ->32k Record Size 32 KB
-I ->O_DIRECT feature enabled
-S 8192 ->Processor cache size set to 8192 Kbytes.
-L 64 ->Processor cache line size set to 64 bytes.
-j 32 ->File stride size set to 32 * record size.
-l 1 ->Min thread = 1
-u 16 ->Max thread = 16
-R ->Excel chart generation enabled
-b fio.xls ->产生的二进制格式execl文件名
-+p 60 ->Percent read in mix test is 60
测试文件系统对记录大小,文件大小不同组合的反应时候的参数:
TODO
玩的开心。
Post Footer automatically generated by wp-posturl plugin for wordpress.
Recent Comments