systemtap如何跟踪libc.so
January 12th, 2012
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: systemtap如何跟踪libc.so
下午和周忱同学折腾复杂程序的内存泄漏问题,用了valgrind, gogle perftools等工具都不大好用,很容易把应用程序搞死,于是打算用systemtap来在libc.so层面了解内存的使用情况。主要思路就是看malloc/realloc和free的调用次数的平衡。
首先准备下环境,系统是标准的RHEL 5u4:
$ uname -r 2.6.18-164.el5 $ stap -V SystemTap translator/driver (version 1.3/0.137 non-git sources) Copyright (C) 2005-2010 Red Hat, Inc. and others This is free software; see the source for copying conditions. enabled features: LIBRPM LIBSQLITE3 NSS BOOST_SHARED_PTR TR1_UNORDERED_MAP $stap -L 'kernel.function("printk")' kernel.function("printk@kernel/printk.c:533") $fmt:char const* $args:va_list $ stap -L 'process("/lib64/libc.so.6").function("malloc")' Missing separate debuginfos, use: debuginfo-install glibc-2.5-42.x86_64
内核的符号是OK的,glibc没有安装符号。系统提示用 debuginfo-install glibc-2.5-42.x86_64 命令安装符号信息,但是RHEL 5不交钱不能用这个服务的,只能自己下载包安装。
$ wget -c ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/glibc-debuginfo-2.5-42.x86_64.rpm $ sudo rpm -i glibc-debuginfo-2.5-42.x86_64.rpm $ stap -L 'process("/lib64/libc.so.6").function("malloc")' process("/lib64/libc-2.5.so").function("__libc_malloc@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560") $bytes:size_t
这次有了glibc的符号了,可以方便的跟踪libc.so中malloc的使用情况。
接着我们来简单的写个c程序调用malloc, 同时写个stap脚本来跟踪malloc的调用堆栈:
$ cat t.c #include <stdlib.h> void fun() { malloc(1000); } int main(int argc, char *argv[]) { fun(); return 0; } $cat m.stp probe process("/lib64/libc.so.6").function("malloc") { if (target()== pid()) { print_ubacktrace(); exit(); } } probe begin { println("~"); } $ gcc -g t.c $ stap -L 'process("./a.out").function("*")' process("/home/chuba/a.out").function("fun@/home/chuba/t.c:3") process("/home/chuba/a.out").function("main@/home/chuba/t.c:7") $argc:int $argv:char**
现在程序准备好了,那么我们来执行下看内存泄漏在那里:
$sudo stap m.stp -c ./a.out ~ 0x33d5e74b96 : malloc+0x16/0x230 [libc-2.5.so] 0x4004a6 [a.out+0x4a6/0x1000]
我们看到在a.out的0x4004a6的地方地方调用了malloc, 但是具体在程序里面是哪行呢? 用add2line就很容易找出来:
$ addr2line -e ./a.out 0x4004a6 /home/chuba/t.c:5 $ nl t.c 1 #include <stdlib.h> 2 void fun() { 3 malloc(1000); 4 } 5 int main(int argc, char *argv[]) { 6 fun(); 7 return 0; 8 }
哈哈,
祝大家玩得开心。
Post Footer automatically generated by wp-posturl plugin for wordpress.
霸爷V5!
systemtap 做这个有点儿杀鸡用牛刀吧,用 ltrace -i 就能达到这个效果了
我的ubuntu 电脑上
sudo stap -L ‘process(“/lib/x86_64-linux-gnu/libc.so.6”).function(“malloc”)’
stap -L ‘process(“./a.out”).function(“*”)’
CentOS上直接提示
Checking “/lib/modules/2.6.32-220.el6.x86_64/build/.config” failed with error: No such file or directory
symbol应该都安装了,请问是怎么回事?
error.d Reply:
September 6th, 2012 at 11:20 am
检查一下是不是你安装的kernel symbol和你系统的kernel版本不一致。。我之前因为这个问题遇到了和你类似的错误提示。。
我的ubuntu 电脑上
sudo stap -L ‘process(“/lib/x86_64-linux-gnu/libc.so.6”).function(“malloc”)’
stap -L ‘process(“./a.out”).function(“*”)’
输出为空
CentOS上直接提示
Checking “/lib/modules/2.6.32-220.el6.x86_64/build/.config” failed with error: No such file or directory
symbol应该都安装了,请问是怎么回事?
Yu Feng Reply:
March 11th, 2012 at 3:25 pm
照理说这个和内核的符号没有关系的。
JiaLiang Reply:
March 11th, 2012 at 4:04 pm
你用ubuntu吗? 我的比较新 11.10
sudo stap -L ‘process(“/lib/x86_64-linux-gnu/libc.so.6″).function(“malloc”)’ 输出为空
按道理上面的语句跟内核也没半毛钱的关系,另hello world测试是正常的。
Yu Feng Reply:
March 11th, 2012 at 4:06 pm
我都是用RHEL或者centos
为什么我ltrace -fc ./demon>/dev/null demon直接就退出了啊,这是个服务啊!
Yu Feng Reply:
August 1st, 2012 at 3:38 pm
建议操作系统换centos 6.2 这样麻烦事情少很多。
为啥我运行的时候,挂住了,没退出啦!悲催。。。。。。。。。。。。。。。。。。。。。
[root@localhost stapscript]# stap -vvvvv m.stp -x 9236
Systemtap translator/driver (version 1.8/0.151 non-git sources)
Copyright (C) 2005-2012 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
enabled features: LIBSQLITE3 NSS BOOST_SHARED_PTR TR1_UNORDERED_MAP NLS
Created temporary directory “/tmp/staptjdpXn”
Session arch: i386 release: 2.6.18-164.el5
Parsed kernel “/lib/modules/2.6.18-164.el5/build/.config”, containing 1943 tuples
Parsed kernel /lib/modules/2.6.18-164.el5/build/Module.symvers, which contained 3315 vmlinux exports
Searched: ” /usr/local/share/systemtap/tapset/i386/*.stp “, found: 4, processed: 4
Searched: ” /usr/local/share/systemtap/tapset/*.stp “, found: 81, processed: 81
Pass 1: parsed user script and 85 library script(s) using 21980virt/13832res/2268shr/12100data kb, in 40usr/390sys/438real ms.
Extracting build ID.
dwarf_builder::build for /lib/libc-2.5.so
parse ‘malloc’, func ‘malloc’
pattern ‘/lib/libc-2.5.so’ matches module ‘/lib/libc-2.5.so’
focused on module ‘/lib/libc-2.5.so’ = [0xb00000-0xc455c4, bias 0 file /usr/lib/debug/lib/libc-2.5.so.debug ELF machine i?86|x86_64 (code 3)
focused on module ‘/lib/libc-2.5.so’
module function cache /lib/libc-2.5.so size 3247
module function cache /lib/libc-2.5.so hit malloc
selected function __libc_malloc
function cache /lib/libc-2.5.so:malloc.c size 68
function cache /lib/libc-2.5.so:malloc.c hit malloc
selected function __libc_malloc
searching for prologue of function ‘__libc_malloc’ 0x6bd20-0x6bee3@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560
checking line record 0x6bd20@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560
checking line record 0x6bd3a@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3565
prologue found function ‘__libc_malloc’ = 0x6bd3a
probe __libc_malloc@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560 process=/lib/libc-2.5.so reloc=.dynamic pc=0x6bd3a
dwarf_builder::build for /lib/libc-2.5.so
parse ‘free’, func ‘free’
pattern ‘/lib/libc-2.5.so’ matches module ‘/lib/libc-2.5.so’
focused on module ‘/lib/libc-2.5.so’ = [0xb00000-0xc455c4, bias 0xb00000 file /usr/lib/debug/lib/libc-2.5.so.debug ELF machine i?86|x86_64 (code 3)
focused on module ‘/lib/libc-2.5.so’
module function cache /lib/libc-2.5.so hit free
selected function __libc_free
function cache /lib/libc-2.5.so:malloc.c hit free
selected function __libc_free
searching for prologue of function ‘__libc_free’ 0x69980-0x69b31@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3636
checking line record 0x69980@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3636
checking line record 0x6999d@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3641
prologue found function ‘__libc_free’ = 0x6999d
probe __libc_free@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3636 process=/lib/libc-2.5.so reloc=.dynamic pc=0x6999d
deleting module_cache
Eliding side-effect-free singleton block operator ‘{‘ at m.stp:2:94
Eliding side-effect-free singleton block operator ‘{‘ at m.stp:2:94
Turning on symbol data collecting, pragma:symbols found in probefunc
Pass 2: analyzed script: 4 probe(s), 1 function(s), 0 embed(s), 2 global(s) using 32172virt/21500res/6764shr/15288data kb, in 30usr/110sys/175real ms.
Pass 3: using cached /root/.systemtap/cache/7a/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.c
Pass 4: using cached /root/.systemtap/cache/7a/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
Pass 5: starting run.
Running /usr/local/bin/staprun -v -v -t 9236 -u/tmp/staptjdpXn/uprobes/uprobes.ko -R /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
staprun:main:387 modpath=”/tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko”, modname=”stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981″
staprun:init_staprun:305 init_staprun
staprun:enable_uprobes:167 Inserting uprobes module from /tmp/staptjdpXn/uprobes/uprobes.ko.
staprun:insert_module:73 inserting module /tmp/staptjdpXn/uprobes/uprobes.ko
staprun:insert_module:99 module options:
staprun:insert_module:107 module path canonicalized to ‘/tmp/staptjdpXn/uprobes/uprobes.ko’
staprun:check_signature:441 checking signature for /tmp/staptjdpXn/uprobes/uprobes.ko
Signature file /tmp/staptjdpXn/uprobes/uprobes.ko.sgn not found
staprun:check_signature:454 verify_module returns 0
staprun:insert_module:73 inserting module /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
staprun:insert_module:99 module options: _stp_bufsize=0
staprun:insert_module:107 module path canonicalized to ‘/tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko’
staprun:check_signature:441 checking signature for /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
Signature file /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko.sgn not found
Spawn waitpid result (0x100): 1
WARNING: /usr/local/bin/staprun exited with status: 1
Pass 5: run completed in 0usr/0sys/14002real ms.
Pass 5: run failed. Try again with another ‘–vp 00001’ option.
Running rm -rf /tmp/staptjdpXn
Spawn waitpid result (0x0): 0
轩脉刃 Reply:
August 27th, 2013 at 3:16 pm
我也遇到这个问题了
yufeng, 我感觉你很推崇systemtap,但你有考虑过systemtap的使用对系统的性能影响很大吗?
比如今天我在解一个epoll使用过程中带宽和cpu上不去的原因的时候,没用stap时 内网千兆卡(centos5.3)能吃60M带宽,一但用了stap(统计system API调用的脚本)带宽一下降了18M. 这种情况下怎么解决?谢谢!
Yu Feng Reply:
December 20th, 2012 at 11:40 am
systemtap的性能开销是有的10%左右。但是对于一般的系统来讲10%下降是可以接受的,systemtap的灵活性非常好。 建议不要在运行主路径来挂stap probe
in centos 6.4,the output is
stap -d /usr/libexec/systemtap/stapio m.stp -c ./a.out
~
0x7f7fcf1558a0 : malloc+0x0/0x210 [/lib64/libc-2.12.so]
0x7f7fcf1b4eec : wordexp+0x178c/0x1840 [/lib64/libc-2.12.so]
0x7f7fcf8b127e : main+0x1a2e/0x5fb0 [/usr/libexec/systemtap/stapio]
0x7f7fcf8b1652 : main+0x1e02/0x5fb0 [/usr/libexec/systemtap/stapio]
0x7f7fcf8af916 : main+0xc6/0x5fb0 [/usr/libexec/systemtap/stapio]
0x7f7fcf0f9cdd : __libc_start_main+0xfd/0x1d0 [/lib64/libc-2.12.so]
0x7f7fcf8af779 [/usr/libexec/systemtap/stapio+0x2779/0xc000]
这个怎么理解啊?跟上面的不同?
Yu Feng Reply:
August 27th, 2013 at 4:06 pm
不同的stap版本实现不同,有细微差别吧
如果stap不能显示libc版本,可以使用rpm -qa |grep libc查看大版本、小版本
听说DTrace也不错,还可以用于生产环境。貌似也有Linux的版本(github.com上),Oracle Linux 已经支持,不知道Linux自带的DTrace能否使用。
gouihk Reply:
October 8th, 2014 at 10:50 pm
https://github.com/dtrace4linux/linux
Yu Feng Reply:
October 9th, 2014 at 9:08 am
linux下的dtrace还不能用。
gouihk Reply:
October 9th, 2014 at 4:06 pm
thanks.