
A new Regular Expression Matching paper ‎(Philip Bille)‎ - carterschonwald
http://www.philipbille.org/news/regularexpressionmatching
======
crux_
To save time for those after the paper itself (actually an extended abstract,
but that's good enough for me for now):

<http://www.philipbille.org/files/c10.pdf?attredirects=0>

Executive summary (from me, a hasty reader and a sloppy thinker): It matches
multiple characters at a time by taking advantage the fact that, typically,
more than one character its in a machine word. This quickly becomes hairy,
leading to over 50% of the remaining text written in the LaTeX math
environment.

Less jokingly, in order to understand what's going on, I think you need to
have read+understood Myers' regex algorithm first, which speeds up the
straightforward Thompson NFA O(nm) matching by using a table scheme, which
they then extend to support performing transitions for multiple characters at
once.

Finally ... I'm not sure how much practical use this improvement really is,
given:

> "Today, regular expressions have also become a shining example of how
> ignoring good theory leads to bad programs. The regular expression
> implementations used by today's popular tools are significantly slower than
> the ones used in many of those thirty-year-old Unix tools."

From <http://swtch.com/~rsc/regexp/regexp1.html>

------
carterschonwald
This is about as awesome hackery piece of theoretical computer science as you
can get, the first asymptotic improvement to regular expression matching in at
least 17 years!

~~~
carterschonwald
to quote from the paper's abstract, the improvement is We show how to solve
this ubiquitous task in linear space and O(nm(log log n)/(log n)3/2 +n+m) time
where m is the length of the expression and n the length of the string. This
is the ﬁrst improvement for the dominant O(nm/ log n) term in Myers’ O(nm/ log
n + (n + m) log n) bound [JACM 1992]. We also get improved bounds for external
memory.

------
yannis
Mind boggling work!

