Home > Linux, 源码分析 > fastsearch快速字符串查找算法

fastsearch快速字符串查找算法

March 15th, 2011

原创文章,转载请注明: 转载自系统技术非业余研究

本文链接地址: fastsearch快速字符串查找算法

最近在做一个项目需要涉及到快速的字符串匹配,每秒几十万次的那种。之前我用过linux内核的的textsearch库的KMP,BM,FSM的算法觉得还不错,这几个算法用于Linux网络模块的关键词过滤系统,支持非线性的字符查找,但是对性能还是不够印象深刻。于是我想起了python的fastsearch. Python这样的脚本语言字符查找用的非常的密集,所以这个算法是非常的高效的,可以说这个算法很大程度影响着python的性能。我们来看下 作者的网站 怎么说的:

The Fast Search Algorithm (aka “BMHBNFS” 😉 #

The new find implementation is typically 2-10 times faster than the one used in Python 2.3 (using a reasonably representative set of test strings, that is). For the find portions of the stringbench test suite, the new algorithm is up to 26 times faster.

我们经过测试发现这个算法确实比KMP, KR算法快非常多, 但是因为这个比较涉及到字符串的模式,结果会偏差非常多,评测结果比较没说服力,有兴趣的同学可以自己去做下比较, 我这里就不做比较了。

最后看下Objects/stringlib/fastsearch.h文件里面的注释:

6 /* fast search/count implementation, based on a mix between boyer-
7 moore and horspool, with a few more bells and whistles on the top.
8 for some more background, see: http://effbot.org/stringlib.htm */
9
10 /* note: fastsearch may access s[n], which isn’t a problem when using
11 Python’s ordinary string types, but may cause problems if you’re
12 using this code in other contexts. also, the count mode returns -1
13 if there cannot possible be a match in the target string, and 0 if
14 it has actually checked for matches, but didn’t find any. callers
15 beware! */

源码在这里, 记得翻墙。
Happy hacking!

Post Footer automatically generated by wp-posturl plugin for wordpress.

Comments are closed.