Home > 工具介绍, 源码分析 > Linux TASK_IO_ACCOUNTING功能以及如何使用

Linux TASK_IO_ACCOUNTING功能以及如何使用

March 11th, 2012

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: Linux TASK_IO_ACCOUNTING功能以及如何使用

在过去我们了解系统IO的情况大多数是通过iostat来获取的,这个粒度只能精确到每个设备。通常我们会想了解每个进程,线程层面发起了多少IO,在Linux 2.6.20之前除了用systemtap这样的工具来实现是没有其他方法的,因为系统没有暴露这方面的统计。 disktop per设备per应用层面的IO读写统计,可以参考我之前写的,见这里.

透过lxr的代码确认,在Linux 2.6.20以后引入了TASK_IO_ACCOUNTING功能,通过把每个线程和进程的io活动通过/proc/pid/io导出大大方便了用户,这里需要注意的是RHEL 5U4基于2.6.18内核但是他们backport了这个功能,并由此催生了相应的了解per进程Io活动的工具如pidstat和iotop, 这两个软件工作的时候截图如下:

pidstat可以看到带层次线程IO活动


iotop能看到扁平线程IO活动

通过strace来了解到这二个软件关于IO活动部分输入源都是/proc/pid/io, 让我们来了解下这个文件:

# cat /proc/self/io
rchar: 1956
wchar: 0
syscr: 7
syscw: 0
read_bytes: 0
write_bytes: 0
cancelled_write_bytes: 0

这个文件后三个参数是IO记账功能新添加的,我们来了解下他们的意义,摘抄从man pidstat:

kB_rd/s
Number of kilobytes the task has caused to be read from disk per second.

kB_wr/s
Number of kilobytes the task has caused, or shall cause to be written to disk per second.

kB_ccwr/s
Number of kilobytes whose writing to disk has been cancelled by the task. This may occur when the task truncates some dirty page-
cache. In this case, some IO which another task has been accounted for will not be happening.

接着我们再来看下内核如何统计这三个值的,在RHEL 5U4源码数下简单的grep下:

[linux-2.6.18.x86_64]$ grep -rin task_io_account_ .
./block/ll_rw_blk.c:3286:               task_io_account_read(bio->bi_size);
./include/linux/task_io_accounting_ops.h:8:static inline void task_io_account_read(size_t bytes)
./include/linux/task_io_accounting_ops.h:13:static inline void task_io_account_write(size_t bytes)
./include/linux/task_io_accounting_ops.h:18:static inline void task_io_account_cancelled_write(size_t bytes)
./include/linux/task_io_accounting_ops.h:30:static inline void task_io_account_read(size_t bytes)
./include/linux/task_io_accounting_ops.h:34:static inline void task_io_account_write(size_t bytes)
./include/linux/task_io_accounting_ops.h:38:static inline void task_io_account_cancelled_write(size_t bytes)
./fs/direct-io.c:671:           task_io_account_write(len);
./fs/cifs/file.c:2221:                  task_io_account_read(bytes_read);
./fs/buffer.c:965:                              task_io_account_write(PAGE_CACHE_SIZE);
./fs/buffer.c:3400:                     task_io_account_cancelled_write(PAGE_CACHE_SIZE);
./mm/truncate.c:47:             task_io_account_cancelled_write(PAGE_CACHE_SIZE);
./mm/page-writeback.c:649:                                      task_io_account_write(PAGE_CACHE_SIZE);
./mm/readahead.c:180:           task_io_account_read(PAGE_CACHE_SIZE);

可以看出统计力度还是比较粗的。

同时Io记账相关的proc导出位于 fs/proc/base.c:

#ifdef CONFIG_TASK_IO_ACCOUNTING
static int do_io_accounting(struct task_struct *task, char *buffer, int whole)
{
 ...  
        return sprintf(buffer,
                        "rchar: %llu\n"
                        "wchar: %llu\n"
                        "syscr: %llu\n"
                        "syscw: %llu\n"
                        "read_bytes: %llu\n"
                        "write_bytes: %llu\n"
                        "cancelled_write_bytes: %llu\n",
                        rchar, wchar, syscr, syscw,
                        ioac.read_bytes, ioac.write_bytes,
                        ioac.cancelled_write_bytes);
}

简单的分析了下TASK_IO_ACCOUNTING运作方式,对了解每个进程的IO活动还是很有帮助的。另外再罗嗦下在RHEL 5U4是可以用这个功能的。

./configs/kernel-2.6.18-x86_64-xen.config:43:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64.config:45:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64-debug.config:45:CONFIG_TASK_DELAY_ACCT=y

默认这个特性是开的。

祝玩得开心!

后记: taskstats.c还支持netlink导出任务的pid,tgid已经注册和反注册cpumask. Iotop用到了这个特性。

sendto(3, “\34\0\0\0\26\0\1\0\216\1\0\0\30\357\377\377\1\0\0\0\10\0\1\0\324\5\0\0”, 28, 0, NULL, 0) = 28
recvfrom(3, “l\1\0\0\26\0\0\0\216\1\0\0\30\357\377\377\2\1\0\0X\1\4\0\10\0\1\0\324\5\0\0″…, 16384, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 364

谢谢 kinwin同学指出!

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. kinwin
    March 12th, 2012 at 10:08 | #1

    在我debian unstable上的iotop走的是netlink获取的per thread io account 信息

  2. mickey
    July 12th, 2012 at 22:58 | #2

    想问下linux2.6.18是如何使用iotop呢
    一直报这个信息:”CONFIG_TASK_DELAY_ACCT not enabled in kernel, cannot determine SWAPIN and IO %”

    Yu Feng Reply:

    内核要用CONFIG_TASK_DELAY_ACCT选项编译,我记得RHEL 5U4(2.6.18)比较backport这个特性。

    Yu Feng Reply:

    ./configs/kernel-2.6.18-x86_64-xen.config:43:CONFIG_TASK_DELAY_ACCT=y
    ./configs/kernel-2.6.18-x86_64.config:45:CONFIG_TASK_DELAY_ACCT=y
    ./configs/kernel-2.6.18-x86_64-debug.config:45:CONFIG_TASK_DELAY_ACCT=y
    默认都是开的。

Comments are closed.