

Rust's regex-dna benchmark results - brson
http://benchmarksgame.alioth.debian.org/u64/performance.php?test=regexdna

======
brson
This uses a pure-Rust regex engine[1] by Andrew Gallant.

[1]: [https://github.com/rust-lang/regex](https://github.com/rust-lang/regex)

------
towelguy
Wait, the second fastest is the Javascript one?

~~~
eddyb
I'm not surprised, v8 has a RegExp JIT (irregexp IIRC) which used to be 20x
faster than SpiderMonkey's RegExp for simple patterns.

Back when I was playing with parsers in JS, the fastest approach to anything
string-related was through RegExp, even trivial tasks like "first character is
X".

The caveat, of course, was that you would only get that performance under v8.

Nowadays it should be more balanced, but I have no idea how v8 compares to SM
anymore.

------
cmrx64
Impressive! What was done to pull ahead?

~~~
burntsushi
Main ingredients:

1\. Cache matching engine state. (This is kind of a no-brainer. Previously,
new matching state was heap allocated for _every_ execution of the automaton.
Fixing this didn't make it go fast, but it made it so it wasn't embarrassing
slow.)

2\. Do more analysis on regexes to find literal prefixes. Note the plural: if
multiple literal prefixes are found, then they are compiled down to an Aho-
Corasick DFA.

3\. AC gets us most of the way there, but the extra squeeze is to pre-compute
all failure transitions and stuff the DFA into a transition matrix. Each byte
in the input now corresponds to a single state transition. (Because of the
memory overhead, this is only done for small numbers of small prefixes. But
this turns out to covers lots of real world use cases---it's not just for the
benchmark.)

There were some other optimizations that have less impact, such as a bounded
backtracking engine, use of `memchr` and an inline representation of
`Option<char>` (this last one is suspect).

Still lots more to do though. :-)

