Hacker News new | comments | show | ask | jobs | submit login

> Notably, Rust also edges out a multi-threaded C implementation for that one. I'm sure they've done some very hairy optimizations to get to the top on that board, but it's cool to see that it's possible.

The most important optimization I did was to avoid regex machinery as much as possible. In particular, Rust's regex library has very good support for prefix literal optimizations. In particular, regexes like `abc[x-z]foo|bar` will compile down to an aho-corasick[1] automaton over the strings `abcxfoo`, `abcyfoo`, `abczfoo` and `bar` with the failure transitions completely evaluated. (The end result is an automaton represented by a matrix. This is memory intensive, so only prefix literals of a certain size can be accommodated. But you don't need a lot to realize huge gains!)

I wrote more about it here: https://www.reddit.com/r/rust/comments/39unje/ahocorasick_fa...

Hint: most of the benchmark game regexes are relatively simple and compile down to either simple `memchr` calls (most regex engines will do that, nothing special) or a aho-corasick DFA that completely avoids the traditional regex evaluation machinery.

In general, regex implementations differ dramatically in the optimizations they perform. It's hard to draw conclusions about them without some explicit knowledge of how it is implemented.

[1] - https://github.com/BurntSushi/aho-corasick




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: