Home > Linux > Linux下谁在切换我们的进程

Linux下谁在切换我们的进程

October 8th, 2010

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: Linux下谁在切换我们的进程

我们在做Linux服务器的时候经常会需要知道谁在做进程切换,什么原因需要做进程切换。 因为进程切换的代价很高,我给出一个LMbench测试出来的数字:
Context switching – times in microseconds – smaller is better
————————————————————————-
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
——— ————- —— —— —— —— —— ——- ——-
my174.cm4 Linux 2.6.18- 6.1100 7.0200 6.1100 8.7400 7.7200 8.96000 9.62000

在我的很高端的服务器上,进程切换的开销在8us左右, 这个相对于高性能的服务器是不可接受的, 所以我们要在一个时间片内尽可能的多做事情,而不是把时间浪费在无谓的切换上。

好奇害死猫,我们来调查下谁在切换我们的进程:

[root@my174 admin]# dstat 1
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0 100   0   0   0|   0     0 | 796B 1488B|   0     0 |1004   128 
  0   0 100   0   0   0|   0     0 | 280B  728B|   0     0 |1005   114 
  0   0 100   0   0   0|   0     0 | 280B  728B|   0     0 |1005   128 
  0   0 100   0   0   0|   0     0 | 280B  728B|   0     0 |1005   114 
  0   0 100   0   0   0|   0   320k| 280B  728B|   0     0 |1008   143 
...

我们可以看到 csw的数目是 120/S, 但是dstat或者vmstat类似的工具并没有告诉我们谁在干坏事。好吧!我们自己动手行吧。
祭出我们可爱的systemtap!

[root@my174 admin]# cat >cswmon.stp
#! /usr/bin/env stap
#
#

global csw_count
global idle_count

probe scheduler.cpu_off {
  csw_count[task_prev, task_next]++
  idle_count+=idle
}


function fmt_task(task_prev, task_next)
{
   return sprintf("%s(%d)->%s(%d)",
                                task_execname(task_prev), 
                                task_pid(task_prev), 
                                task_execname(task_next), 
                                task_pid(task_next))
}

function print_cswtop () {
  printf ("%45s %10s\n", "Context switch", "COUNT")
  foreach ([task_prev, task_next] in csw_count- limit 20) {
    printf("%45s %10d\n", fmt_task(task_prev, task_next), csw_count[task_prev, task_next])
  }
  printf("%45s %10d\n", "idle", idle_count)

  delete csw_count
  delete idle_count
}

probe timer.s($1) {
  print_cswtop ()
  printf("--------------------------------------------------------------\n")
}
CTRL+D

这个脚本会每隔设定的时间打印出TOP 20切换最多的进程和他的pid, 我们来看下结果把:

[root@my174 admin]# stap cswmon.stp 5
                               Context switch      COUNT
                swapper(0)->systemtap/11(908)        500
                systemtap/11(908)->swapper(0)        498
                swapper(0)->fct1-worker(2492)         50
                fct1-worker(2492)->swapper(0)         50
                swapper(0)->fct0-worker(2191)         50
                fct0-worker(2191)->swapper(0)         50
                      swapper(0)->bond0(3432)         50
                      bond0(3432)->swapper(0)         50
                      stapio(879)->swapper(0)         26
                      swapper(0)->stapio(879)         25
                      stapio(879)->swapper(0)         19
                      swapper(0)->stapio(879)         17
                   swapper(0)->watchdog/9(31)          5
                   watchdog/9(31)->swapper(0)          5
                    swapper(0)->mysqld(18346)          5
                    mysqld(18346)->swapper(0)          5
                  swapper(0)->watchdog/13(43)          5
                  watchdog/13(43)->swapper(0)          5
                  swapper(0)->watchdog/14(46)          5
                  watchdog/14(46)->swapper(0)          5
                                         idle        859
--------------------------------------------------------------
...

我们可以看到进程从哪里切换到哪里,并且发生了多少次, 最后一行,我打印出来idle的次数,也就是说这时候系统没啥事情做,就切换到idle(0)这个进程去休息去了。

通过上面的调查,我们会很清楚的了解到我们系统的开销发生在那里,方便我们定位问题。
玩的开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. zituan
    October 8th, 2010 at 23:02 | #1

    很好,copy回来先~~

  2. gfs
    March 25th, 2011 at 11:36 | #2

    centos 4u4执行遇到这个错误
    [root@test ~]# stap cswmon.stp 5
    semantic error: libdwfl failure (dwfl_linux_kernel_report_offline): No such file or directory while resolving probe point kernel.inline(“context_switch”)
    semantic error: no match for probe point while resolving probe point scheduler.cpu_off
    Pass 2: analysis failed. Try again with more ‘-v’ (verbose) options.

    Yu Feng Reply:

    安装你的系统的符号信息!

    realzyy Reply:

    root@yunyang-Vostro-460:~# ./my.stp 5
    Pass 1: parsed user script and 76 library script(s) using 92496virt/22624res/2624shr kb, in 70usr/0sys/76real ms.
    Warning: make exited with status: 2
    Warning: make exited with status: 2
    Warning: make exited with status: 2
    Warning: make exited with status: 2
    ^CWarning: make exited with status: 130
    semantic error: no match while resolving probe point scheduler.cpu_off
    Pass 2: analyzed script: 0 probe(s), 2 function(s), 0 embed(s), 2 global(s) using 238668virt/23956res/3124shr kb, in 2410usr/690sys/3572real ms.
    Pass 2: analysis failed. Try again with another ‘–vp 01’ option.

    debuginfo装得应该没问题,这个不知道是什么原因?它找不到tapset库吗?

    Yu Feng Reply:

    内核版本比较低,没有 scheduler.cpu_off 或者 这个函数被-o编译的时候inline掉了。

    realzyy Reply:

    ubuntu 12,3.2.0-27-generic,版本应该不算低了。估计是被干掉了。

  3. March 26th, 2011 at 23:01 | #3

    倒是想到了 powertop 也可以干这个事. 虽然显示的结果并不一样, 更侧重于应用程序和 event.

    可以这样考虑, 笔记本要省电 CPU 必须进入 C Mode, 如果谁要工作打破了休眠模式那它就有问题.

    Yu Feng Reply:

    哈哈,很好,我还想写篇介绍powertop的另类使用呢.

  4. asterisk622
    June 8th, 2011 at 10:27 | #4

    [asterisk@ systemtap]$sudo stap –vp 01 –vp 01 cswmon.stp 5
    probe context_switch@/build/buildd/linux-2.6.32/kernel/sched.c:2999 kernel reloc=.dynamic pc=0xffffffff8153ebd7
    semantic error: failed to retrieve location attribute for local ‘next’ (dieoffset: 0x6660f0): identifier ‘$next’ at /usr/share/systemtap/tapset/scheduler.stp:39:17
    source: task_next = $next
    ^
    Pass 2: analyzed script: 2 probe(s), 9 function(s), 1 embed(s), 2 global(s) in 340usr/100sys/437real ms.
    Pass 2: analysis failed. Try again with another ‘–vp 01’ option.
    Running rm -rf /tmp/stapR3QW4C

    系统是ubuntu 10.04 2.6.32-32-generic

    Yu Feng Reply:

    context_switch这个实现每个系统貌似变化很厉害,最好用RHEL的版本来试验,保险点。

    asterisk622 Reply:

    多谢回复,确实得用RHEL来做,才用一天就遇到一些辣手问题了

  5. October 4th, 2012 at 11:47 | #5

    systemtap真是利器呀, 现在才懂一点,惭愧. 
    http://blog.ec-ae.com/?p=4960

  6. November 8th, 2012 at 12:23 | #6

    下面的错误信息是怎么回事?
    semantic error: failed to retrieve location attribute for local ‘prev’ (dieoffset: 0x337d4e): identifier ‘$prev’ at /usr/share/systemtap/tapset/scheduler.stp:39:17
    source: task_prev = $prev
    ^
    semantic error: failed to retrieve location attribute for local ‘next’ (dieoffset: 0x337d49): identifier ‘$next’ at :40:17
    source: task_next = $next
    ^
    Pass 2: analysis failed. Try again with another ‘–vp 01’ option.

    Yu Feng Reply:

    有的linux发行版上面的变量被编译器优化掉了,找不到这么变量,就无法后续的运行了。

Comments are closed.