Linux TASK_IO_ACCOUNTING功能以及如何使用
原创文章,转载请注明: 转载自系统技术非业余研究
本文链接地址: Linux TASK_IO_ACCOUNTING功能以及如何使用
在过去我们了解系统IO的情况大多数是通过iostat来获取的,这个粒度只能精确到每个设备。通常我们会想了解每个进程,线程层面发起了多少IO,在Linux 2.6.20之前除了用systemtap这样的工具来实现是没有其他方法的,因为系统没有暴露这方面的统计。 disktop per设备per应用层面的IO读写统计,可以参考我之前写的,见这里.
透过lxr的代码确认,在Linux 2.6.20以后引入了TASK_IO_ACCOUNTING功能,通过把每个线程和进程的io活动通过/proc/pid/io导出大大方便了用户,这里需要注意的是RHEL 5U4基于2.6.18内核但是他们backport了这个功能,并由此催生了相应的了解per进程Io活动的工具如pidstat和iotop, 这两个软件工作的时候截图如下:
pidstat可以看到带层次线程IO活动
通过strace来了解到这二个软件关于IO活动部分输入源都是/proc/pid/io, 让我们来了解下这个文件:
# cat /proc/self/io rchar: 1956 wchar: 0 syscr: 7 syscw: 0 read_bytes: 0 write_bytes: 0 cancelled_write_bytes: 0
这个文件后三个参数是IO记账功能新添加的,我们来了解下他们的意义,摘抄从man pidstat:
kB_rd/s
Number of kilobytes the task has caused to be read from disk per second.kB_wr/s
Number of kilobytes the task has caused, or shall cause to be written to disk per second.kB_ccwr/s
Number of kilobytes whose writing to disk has been cancelled by the task. This may occur when the task truncates some dirty page-
cache. In this case, some IO which another task has been accounted for will not be happening.
接着我们再来看下内核如何统计这三个值的,在RHEL 5U4源码数下简单的grep下:
[linux-2.6.18.x86_64]$ grep -rin task_io_account_ . ./block/ll_rw_blk.c:3286: task_io_account_read(bio->bi_size); ./include/linux/task_io_accounting_ops.h:8:static inline void task_io_account_read(size_t bytes) ./include/linux/task_io_accounting_ops.h:13:static inline void task_io_account_write(size_t bytes) ./include/linux/task_io_accounting_ops.h:18:static inline void task_io_account_cancelled_write(size_t bytes) ./include/linux/task_io_accounting_ops.h:30:static inline void task_io_account_read(size_t bytes) ./include/linux/task_io_accounting_ops.h:34:static inline void task_io_account_write(size_t bytes) ./include/linux/task_io_accounting_ops.h:38:static inline void task_io_account_cancelled_write(size_t bytes) ./fs/direct-io.c:671: task_io_account_write(len); ./fs/cifs/file.c:2221: task_io_account_read(bytes_read); ./fs/buffer.c:965: task_io_account_write(PAGE_CACHE_SIZE); ./fs/buffer.c:3400: task_io_account_cancelled_write(PAGE_CACHE_SIZE); ./mm/truncate.c:47: task_io_account_cancelled_write(PAGE_CACHE_SIZE); ./mm/page-writeback.c:649: task_io_account_write(PAGE_CACHE_SIZE); ./mm/readahead.c:180: task_io_account_read(PAGE_CACHE_SIZE);
可以看出统计力度还是比较粗的。
同时Io记账相关的proc导出位于 fs/proc/base.c:
#ifdef CONFIG_TASK_IO_ACCOUNTING static int do_io_accounting(struct task_struct *task, char *buffer, int whole) { ... return sprintf(buffer, "rchar: %llu\n" "wchar: %llu\n" "syscr: %llu\n" "syscw: %llu\n" "read_bytes: %llu\n" "write_bytes: %llu\n" "cancelled_write_bytes: %llu\n", rchar, wchar, syscr, syscw, ioac.read_bytes, ioac.write_bytes, ioac.cancelled_write_bytes); }
简单的分析了下TASK_IO_ACCOUNTING运作方式,对了解每个进程的IO活动还是很有帮助的。另外再罗嗦下在RHEL 5U4是可以用这个功能的。
./configs/kernel-2.6.18-x86_64-xen.config:43:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64.config:45:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64-debug.config:45:CONFIG_TASK_DELAY_ACCT=y
默认这个特性是开的。
祝玩得开心!
后记: taskstats.c还支持netlink导出任务的pid,tgid已经注册和反注册cpumask. Iotop用到了这个特性。
sendto(3, “\34\0\0\0\26\0\1\0\216\1\0\0\30\357\377\377\1\0\0\0\10\0\1\0\324\5\0\0”, 28, 0, NULL, 0) = 28
recvfrom(3, “l\1\0\0\26\0\0\0\216\1\0\0\30\357\377\377\2\1\0\0X\1\4\0\10\0\1\0\324\5\0\0″…, 16384, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 364
谢谢 kinwin同学指出!
Post Footer automatically generated by wp-posturl plugin for wordpress.
在我debian unstable上的iotop走的是netlink获取的per thread io account 信息
想问下linux2.6.18是如何使用iotop呢
一直报这个信息:”CONFIG_TASK_DELAY_ACCT not enabled in kernel, cannot determine SWAPIN and IO %”
Yu Feng Reply:
July 13th, 2012 at 3:18 pm
内核要用CONFIG_TASK_DELAY_ACCT选项编译,我记得RHEL 5U4(2.6.18)比较backport这个特性。
Yu Feng Reply:
July 13th, 2012 at 3:20 pm
./configs/kernel-2.6.18-x86_64-xen.config:43:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64.config:45:CONFIG_TASK_DELAY_ACCT=y
./configs/kernel-2.6.18-x86_64-debug.config:45:CONFIG_TASK_DELAY_ACCT=y
默认都是开的。