Hacker News new | past | comments | ask | show | jobs | submit login
A new Regular Expression Matching paper ‎(Philip Bille)‎ (philipbille.org)
12 points by carterschonwald on July 20, 2009 | hide | past | favorite | 4 comments



To save time for those after the paper itself (actually an extended abstract, but that's good enough for me for now):

http://www.philipbille.org/files/c10.pdf?attredirects=0

Executive summary (from me, a hasty reader and a sloppy thinker): It matches multiple characters at a time by taking advantage the fact that, typically, more than one character its in a machine word. This quickly becomes hairy, leading to over 50% of the remaining text written in the LaTeX math environment.

Less jokingly, in order to understand what's going on, I think you need to have read+understood Myers' regex algorithm first, which speeds up the straightforward Thompson NFA O(nm) matching by using a table scheme, which they then extend to support performing transitions for multiple characters at once.

Finally ... I'm not sure how much practical use this improvement really is, given:

> "Today, regular expressions have also become a shining example of how ignoring good theory leads to bad programs. The regular expression implementations used by today's popular tools are significantly slower than the ones used in many of those thirty-year-old Unix tools."

From http://swtch.com/~rsc/regexp/regexp1.html


This is about as awesome hackery piece of theoretical computer science as you can get, the first asymptotic improvement to regular expression matching in at least 17 years!


to quote from the paper's abstract, the improvement is We show how to solve this ubiquitous task in linear space and O(nm(log log n)/(log n)3/2 +n+m) time where m is the length of the expression and n the length of the string. This is the first improvement for the dominant O(nm/ log n) term in Myers’ O(nm/ log n + (n + m) log n) bound [JACM 1992]. We also get improved bounds for external memory.


Mind boggling work!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: