Home > 工具介绍, 调优 > systemtap如何跟踪libc.so

systemtap如何跟踪libc.so

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: systemtap如何跟踪libc.so

下午和周忱同学折腾复杂程序的内存泄漏问题,用了valgrind, gogle perftools等工具都不大好用,很容易把应用程序搞死,于是打算用systemtap来在libc.so层面了解内存的使用情况。主要思路就是看malloc/realloc和free的调用次数的平衡。

首先准备下环境,系统是标准的RHEL 5u4:

$ uname -r
2.6.18-164.el5

$ stap -V
SystemTap translator/driver (version 1.3/0.137 non-git sources)
Copyright (C) 2005-2010 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
enabled features: LIBRPM LIBSQLITE3 NSS BOOST_SHARED_PTR TR1_UNORDERED_MAP
$stap -L  'kernel.function("printk")'
kernel.function("printk@kernel/printk.c:533") $fmt:char const* $args:va_list

$ stap -L  'process("/lib64/libc.so.6").function("malloc")'
Missing separate debuginfos, use: debuginfo-install glibc-2.5-42.x86_64 

内核的符号是OK的,glibc没有安装符号。系统提示用 debuginfo-install glibc-2.5-42.x86_64 命令安装符号信息,但是RHEL 5不交钱不能用这个服务的,只能自己下载包安装。

$ wget -c ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/glibc-debuginfo-2.5-42.x86_64.rpm
$ sudo rpm -i  glibc-debuginfo-2.5-42.x86_64.rpm
$ stap -L  'process("/lib64/libc.so.6").function("malloc")'
process("/lib64/libc-2.5.so").function("__libc_malloc@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560") $bytes:size_t

这次有了glibc的符号了,可以方便的跟踪libc.so中malloc的使用情况。
接着我们来简单的写个c程序调用malloc, 同时写个stap脚本来跟踪malloc的调用堆栈:

$ cat t.c
#include <stdlib.h>

void fun() {
  malloc(1000);
}

int main(int argc, char *argv[]) {
  fun();
  return 0;
}

$cat m.stp 
probe process("/lib64/libc.so.6").function("malloc") {
if (target()== pid()) {
print_ubacktrace();
exit();
}
}
probe begin {
println("~");
}

$ gcc  -g t.c

$ stap -L 'process("./a.out").function("*")'
process("/home/chuba/a.out").function("fun@/home/chuba/t.c:3")
process("/home/chuba/a.out").function("main@/home/chuba/t.c:7") $argc:int $argv:char**

现在程序准备好了,那么我们来执行下看内存泄漏在那里:

$sudo stap m.stp -c ./a.out  
~
 0x33d5e74b96 : malloc+0x16/0x230 [libc-2.5.so]
 0x4004a6 [a.out+0x4a6/0x1000]

我们看到在a.out的0x4004a6的地方地方调用了malloc, 但是具体在程序里面是哪行呢? 用add2line就很容易找出来:

$ addr2line -e ./a.out 0x4004a6      
/home/chuba/t.c:5
$ nl t.c
     1  #include <stdlib.h>
       
     2  void fun() {
     3    malloc(1000);
     4  }
       
     5  int main(int argc, char *argv[]) {
     6    fun();
     7    return 0;
     8  }

哈哈,
祝大家玩得开心。

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: 工具介绍, 调优 Tags: ,
  1. January 12th, 2012 at 16:49 | #1

    霸爷V5!

    [Reply]

  2. chaoslawful
    January 15th, 2012 at 14:29 | #2

    systemtap 做这个有点儿杀鸡用牛刀吧,用 ltrace -i 就能达到这个效果了

    [Reply]

  3. JiaLiang
    March 11th, 2012 at 15:21 | #3

    我的ubuntu 电脑上
    sudo stap -L ‘process(“/lib/x86_64-linux-gnu/libc.so.6”).function(“malloc”)’
    stap -L ‘process(“./a.out”).function(“*”)’
    CentOS上直接提示
    Checking “/lib/modules/2.6.32-220.el6.x86_64/build/.config” failed with error: No such file or directory
    symbol应该都安装了,请问是怎么回事?

    [Reply]

    error.d Reply:

    检查一下是不是你安装的kernel symbol和你系统的kernel版本不一致。。我之前因为这个问题遇到了和你类似的错误提示。。

    [Reply]

  4. JiaLiang
    March 11th, 2012 at 15:22 | #4

    我的ubuntu 电脑上
    sudo stap -L ‘process(“/lib/x86_64-linux-gnu/libc.so.6”).function(“malloc”)’
    stap -L ‘process(“./a.out”).function(“*”)’
    输出为空
    CentOS上直接提示
    Checking “/lib/modules/2.6.32-220.el6.x86_64/build/.config” failed with error: No such file or directory
    symbol应该都安装了,请问是怎么回事?

    [Reply]

    Yu Feng Reply:

    照理说这个和内核的符号没有关系的。

    [Reply]

    JiaLiang Reply:

    你用ubuntu吗? 我的比较新 11.10
    sudo stap -L ‘process(“/lib/x86_64-linux-gnu/libc.so.6″).function(“malloc”)’ 输出为空
    按道理上面的语句跟内核也没半毛钱的关系,另hello world测试是正常的。

    [Reply]

    Yu Feng Reply:

    我都是用RHEL或者centos

  5. loki
    August 1st, 2012 at 15:28 | #5

    为什么我ltrace -fc ./demon>/dev/null demon直接就退出了啊,这是个服务啊!

    [Reply]

    Yu Feng Reply:

    建议操作系统换centos 6.2 这样麻烦事情少很多。

    [Reply]

  6. loki
    August 1st, 2012 at 16:49 | #6

    为啥我运行的时候,挂住了,没退出啦!悲催。。。。。。。。。。。。。。。。。。。。。

    [Reply]

  7. loki
    August 1st, 2012 at 17:02 | #7

    [root@localhost stapscript]# stap -vvvvv m.stp -x 9236
    Systemtap translator/driver (version 1.8/0.151 non-git sources)
    Copyright (C) 2005-2012 Red Hat, Inc. and others
    This is free software; see the source for copying conditions.
    enabled features: LIBSQLITE3 NSS BOOST_SHARED_PTR TR1_UNORDERED_MAP NLS
    Created temporary directory “/tmp/staptjdpXn”
    Session arch: i386 release: 2.6.18-164.el5
    Parsed kernel “/lib/modules/2.6.18-164.el5/build/.config”, containing 1943 tuples
    Parsed kernel /lib/modules/2.6.18-164.el5/build/Module.symvers, which contained 3315 vmlinux exports
    Searched: ” /usr/local/share/systemtap/tapset/i386/*.stp “, found: 4, processed: 4
    Searched: ” /usr/local/share/systemtap/tapset/*.stp “, found: 81, processed: 81
    Pass 1: parsed user script and 85 library script(s) using 21980virt/13832res/2268shr/12100data kb, in 40usr/390sys/438real ms.
    Extracting build ID.
    dwarf_builder::build for /lib/libc-2.5.so
    parse ‘malloc’, func ‘malloc’
    pattern ‘/lib/libc-2.5.so’ matches module ‘/lib/libc-2.5.so’
    focused on module ‘/lib/libc-2.5.so’ = [0xb00000-0xc455c4, bias 0 file /usr/lib/debug/lib/libc-2.5.so.debug ELF machine i?86|x86_64 (code 3)
    focused on module ‘/lib/libc-2.5.so’
    module function cache /lib/libc-2.5.so size 3247
    module function cache /lib/libc-2.5.so hit malloc
    selected function __libc_malloc
    function cache /lib/libc-2.5.so:malloc.c size 68
    function cache /lib/libc-2.5.so:malloc.c hit malloc
    selected function __libc_malloc
    searching for prologue of function ‘__libc_malloc’ 0x6bd20-0x6bee3@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560
    checking line record 0x6bd20@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560
    checking line record 0x6bd3a@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3565
    prologue found function ‘__libc_malloc’ = 0x6bd3a
    probe __libc_malloc@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3560 process=/lib/libc-2.5.so reloc=.dynamic pc=0x6bd3a
    dwarf_builder::build for /lib/libc-2.5.so
    parse ‘free’, func ‘free’
    pattern ‘/lib/libc-2.5.so’ matches module ‘/lib/libc-2.5.so’
    focused on module ‘/lib/libc-2.5.so’ = [0xb00000-0xc455c4, bias 0xb00000 file /usr/lib/debug/lib/libc-2.5.so.debug ELF machine i?86|x86_64 (code 3)
    focused on module ‘/lib/libc-2.5.so’
    module function cache /lib/libc-2.5.so hit free
    selected function __libc_free
    function cache /lib/libc-2.5.so:malloc.c hit free
    selected function __libc_free
    searching for prologue of function ‘__libc_free’ 0x69980-0x69b31@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3636
    checking line record 0x69980@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3636
    checking line record 0x6999d@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3641
    prologue found function ‘__libc_free’ = 0x6999d
    probe __libc_free@/usr/src/debug/glibc-2.5-20061008T1257/malloc/malloc.c:3636 process=/lib/libc-2.5.so reloc=.dynamic pc=0x6999d
    deleting module_cache
    Eliding side-effect-free singleton block operator ‘{‘ at m.stp:2:94
    Eliding side-effect-free singleton block operator ‘{‘ at m.stp:2:94
    Turning on symbol data collecting, pragma:symbols found in probefunc
    Pass 2: analyzed script: 4 probe(s), 1 function(s), 0 embed(s), 2 global(s) using 32172virt/21500res/6764shr/15288data kb, in 30usr/110sys/175real ms.
    Pass 3: using cached /root/.systemtap/cache/7a/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.c
    Pass 4: using cached /root/.systemtap/cache/7a/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
    Pass 5: starting run.
    Running /usr/local/bin/staprun -v -v -t 9236 -u/tmp/staptjdpXn/uprobes/uprobes.ko -R /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
    staprun:main:387 modpath=”/tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko”, modname=”stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981″
    staprun:init_staprun:305 init_staprun
    staprun:enable_uprobes:167 Inserting uprobes module from /tmp/staptjdpXn/uprobes/uprobes.ko.
    staprun:insert_module:73 inserting module /tmp/staptjdpXn/uprobes/uprobes.ko
    staprun:insert_module:99 module options:
    staprun:insert_module:107 module path canonicalized to ‘/tmp/staptjdpXn/uprobes/uprobes.ko’
    staprun:check_signature:441 checking signature for /tmp/staptjdpXn/uprobes/uprobes.ko
    Signature file /tmp/staptjdpXn/uprobes/uprobes.ko.sgn not found
    staprun:check_signature:454 verify_module returns 0
    staprun:insert_module:73 inserting module /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
    staprun:insert_module:99 module options: _stp_bufsize=0
    staprun:insert_module:107 module path canonicalized to ‘/tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko’
    staprun:check_signature:441 checking signature for /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko
    Signature file /tmp/staptjdpXn/stap_7a3366d2dd87f3d375b6cd334a2e6aec_1981.ko.sgn not found
    Spawn waitpid result (0x100): 1
    WARNING: /usr/local/bin/staprun exited with status: 1
    Pass 5: run completed in 0usr/0sys/14002real ms.
    Pass 5: run failed. Try again with another ‘–vp 00001’ option.
    Running rm -rf /tmp/staptjdpXn
    Spawn waitpid result (0x0): 0

    [Reply]

    轩脉刃 Reply:

    我也遇到这个问题了

    [Reply]

  8. buck
    December 20th, 2012 at 10:34 | #8

    yufeng, 我感觉你很推崇systemtap,但你有考虑过systemtap的使用对系统的性能影响很大吗?
    比如今天我在解一个epoll使用过程中带宽和cpu上不去的原因的时候,没用stap时 内网千兆卡(centos5.3)能吃60M带宽,一但用了stap(统计system API调用的脚本)带宽一下降了18M. 这种情况下怎么解决?谢谢!

    [Reply]

    Yu Feng Reply:

    systemtap的性能开销是有的10%左右。但是对于一般的系统来讲10%下降是可以接受的,systemtap的灵活性非常好。 建议不要在运行主路径来挂stap probe

    [Reply]

  9. Baul
    May 1st, 2013 at 22:59 | #9

    in centos 6.4,the output is
    stap -d /usr/libexec/systemtap/stapio m.stp -c ./a.out
    ~
    0x7f7fcf1558a0 : malloc+0x0/0x210 [/lib64/libc-2.12.so]
    0x7f7fcf1b4eec : wordexp+0x178c/0x1840 [/lib64/libc-2.12.so]
    0x7f7fcf8b127e : main+0x1a2e/0x5fb0 [/usr/libexec/systemtap/stapio]
    0x7f7fcf8b1652 : main+0x1e02/0x5fb0 [/usr/libexec/systemtap/stapio]
    0x7f7fcf8af916 : main+0xc6/0x5fb0 [/usr/libexec/systemtap/stapio]
    0x7f7fcf0f9cdd : __libc_start_main+0xfd/0x1d0 [/lib64/libc-2.12.so]
    0x7f7fcf8af779 [/usr/libexec/systemtap/stapio+0x2779/0xc000]
    这个怎么理解啊?跟上面的不同?

    [Reply]

    Yu Feng Reply:

    不同的stap版本实现不同,有细微差别吧

    [Reply]

  10. raymond
    June 8th, 2014 at 23:23 | #10

    如果stap不能显示libc版本,可以使用rpm -qa |grep libc查看大版本、小版本

    [Reply]

  11. gouihk
    October 8th, 2014 at 22:47 | #11

    听说DTrace也不错,还可以用于生产环境。貌似也有Linux的版本(github.com上),Oracle Linux 已经支持,不知道Linux自带的DTrace能否使用。

    [Reply]

    gouihk Reply:

    https://github.com/dtrace4linux/linux

    [Reply]

    Yu Feng Reply:

    linux下的dtrace还不能用。

    [Reply]

    gouihk Reply:

    thanks.

    [Reply]

  1. May 22nd, 2012 at 14:16 | #1
  2. December 10th, 2012 at 11:15 | #2
  3. January 30th, 2013 at 22:32 | #3