fastsearch快速字符串查找算法

Home > Linux, 源码分析 > fastsearch快速字符串查找算法

March 15th, 2011 Yu Feng

原创文章，转载请注明： 转载自系统技术非业余研究

最近在做一个项目需要涉及到快速的字符串匹配，每秒几十万次的那种。之前我用过linux内核的的textsearch库的KMP,BM,FSM的算法觉得还不错，这几个算法用于Linux网络模块的关键词过滤系统，支持非线性的字符查找，但是对性能还是不够印象深刻。于是我想起了python的fastsearch. Python这样的脚本语言字符查找用的非常的密集，所以这个算法是非常的高效的，可以说这个算法很大程度影响着python的性能。我们来看下作者的网站怎么说的：

The Fast Search Algorithm (aka “BMHBNFS” 😉 #

The new find implementation is typically 2-10 times faster than the one used in Python 2.3 (using a reasonably representative set of test strings, that is). For the find portions of the stringbench test suite, the new algorithm is up to 26 times faster.

我们经过测试发现这个算法确实比KMP, KR算法快非常多，但是因为这个比较涉及到字符串的模式，结果会偏差非常多，评测结果比较没说服力，有兴趣的同学可以自己去做下比较，我这里就不做比较了。

最后看下Objects/stringlib/fastsearch.h文件里面的注释:

6 /* fast search/count implementation, based on a mix between boyer-
7 moore and horspool, with a few more bells and whistles on the top.
8 for some more background, see: http://effbot.org/stringlib.htm */
9
10 /* note: fastsearch may access s[n], which isn’t a problem when using
11 Python’s ordinary string types, but may cause problems if you’re
12 using this code in other contexts. also, the count mode returns -1
13 if there cannot possible be a match in the target string, and 0 if
14 it has actually checked for matches, but didn’t find any. callers
15 beware! */

源码在这里，记得翻墙。
Happy hacking!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Categories: Linux, 源码分析 Tags: BM, fastsearch, fsm, horspool, kmp, python, 匹配, 字符串

Comments are closed.

Linux 2.6.38 User-space interface for Crypto API smartctl获取raid卡下intel ssd寿命

系统技术非业余研究