Home > Linux, 工具介绍, 源码分析 > 推介xz高压缩率算法

推介xz高压缩率算法

March 17th, 2011

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: 推介xz高压缩率算法

这几天看到Linux内核2.6.38发布的release说明里面提到:

The version .38 kernel comes with a library for decompressing XZ, a format developed from LZMA and known for its high levels of compression. This library is the basis not only for SquashFS, which now also offers XZ, but also for code that allows the kernel to unpack any parts of itself and of the initial ram disks (initrds) that were compressed with XZ.

觉得比较好奇, Linux下有那么多的压缩算法, 为什么要用这个, 它有什么过人之处? 今天深入了解了下xz, 顺便作了简单的benchmark体验了下.

官方网站在这里


#我的机器配置
$ summary 
...
# Aspersa System Summary Report ##############################
        Date | 2011-03-17 10:24:13 UTC (local TZ: CST +0800)
    Hostname | yufeng-laptop
      Uptime |  1:46,  5 users,  load average: 0.97, 1.03, 0.84
      System | Dell Inc.; Latitude E6400; vNot Specified (Portable)
 Service Tag | 9MKDW2X
      Release | Ubuntu 10.10
      Kernel | 2.6.38-yufeng
Architecture | CPU = 64-bit, OS = 64-bit
   Threading | NPTL 2.12.1
    Compiler | GNU CC version 4.4.5.
     SELinux | No SELinux detected
# Processor ##################################################
  Processors | physical = 1, cores = 2, virtual = 2, hyperthreading = no
      Speeds | 2x2535.000
      Models | 2xIntel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz
      Caches | 2x3072 KB
# Memory #####################################################
       Total | 1.95G
        Free | 16.70M
        Used | physical = 1.93G, swap = 4.42M, virtual = 1.94G
     Buffers | 51.30M
      Caches | 1001.72M
        Used | 1.09G
...
#安装压缩软件
$ apt-get install xz-utils bzip2 lzop

#准备数据
$ tar cf 1.tar linux-2.6.38 && cp 1.tar 2.tar && cp 1.tar 3.tar
$ ll 1.tar 2.tar 3.tar
-rw-r--r-- 1 yufeng yufeng 440494080 2011-03-17 18:17 1.tar
-rw-r--r-- 1 yufeng yufeng 440494080 2011-03-17 18:17 2.tar
-rw-r--r-- 1 yufeng yufeng 440494080 2011-03-17 18:17 3.tar
#开压
$ time xz 1.tar 

real	5m4.269s
user	5m1.670s
sys	0m1.340s

$ time bzip2 2.tar

real	1m3.357s
user	1m1.490s
sys	0m0.610s

$ time lzop 3.tar

real	0m5.526s
user	0m3.550s
sys	0m0.490s

#看压缩结果
$ ll 1.tar.xz 2.tar.bz2 3.tar.lzo
-rw-r--r-- 1 yufeng yufeng  65092368 2011-03-17 18:18 1.tar.xz
-rw-r--r-- 1 yufeng yufeng  76103454 2011-03-17 18:19 2.tar.bz2
-rw-r--r-- 1 yufeng yufeng 151462273 2011-03-17 18:17 3.tar.lzo

#解压还原
$ time xz -d 1.tar.xz 

real	0m8.507s
user	0m6.390s
sys	0m0.670s

$ time bzip2 -d 2.tar.bz2 

real	0m19.019s
user	0m17.420s
sys	0m0.670s

$ time lzop -d 3.tar.lzo 

real	0m6.261s
user	0m1.610s
sys	0m0.610s

从上面的数据我们可以看出, xz的压缩率确实不错,解压速度也飞快,就是压缩比较慢,但是我们大部分的需求是读,所以无所谓,一辈子压不了几次.

推荐在项目中使用,Linux都用了,我们怕啥!

玩的开心!

PS.
刘晓东 同学说:

刚刚在网上看到了一个XZ的并行版本pxz(http://jnovy.fedorapeople.org/pxz/),当然它的目的是为了加快xz的压缩速度同时尽量小的改变xz原来的高压缩率。

Post Footer automatically generated by wp-posturl plugin for wordpress.

  1. abutter
    March 17th, 2011 at 18:52 | #1

    他的压缩算法是 LZMA2,呵呵,应该说是 LZMA 牛才是。

    Yu Feng Reply:

    文档上写的很清楚,是后续者,做了很多的优化,有很多过滤器是针对数据的特点做优化的。

  2. March 17th, 2011 at 19:45 | #2

    确实很强大,xz,以后就用它了

  3. March 24th, 2011 at 09:25 | #3

    从你的数据来看,貌似bzip才是综合实力牛屄的

    Yu Feng Reply:

    根据自己业务的特点找个合适的才是王道,综合比应该意义不大。

  4. fighting-lxd
    March 24th, 2011 at 11:05 | #4

    拜读大作。刚刚在网上发现个XZ的并行版本pxz(http://jnovy.fedorapeople.org/pxz/),当然它的目的是为了加快压缩速度同时尽量不改变xz原来的高压缩率。

    Yu Feng Reply:

    谢谢你的提醒,加上去了!

  5. September 13th, 2012 at 22:06 | #5

    We are a group of volunteers and opening a new scheme in our community.
    Your site offered us with valuable information to work on.
    You have performed a formidable job and our whole group will probably be thankful to
    you.

Comments are closed.