
Rejit - prototype of a non-backtracking, just-in-time, SIMD-able regexp compiler - rames
http://coreperf.com/projects/rejit/
======
adobriyan
I'm writing the _exactly_ same thing except SSE 4.2 is not mandatory.

Called the project 'jitre'.

Damn it!

P.S. I suggest to add libpcre with and without JIT to the set of benchmarks.
RE2 is not exactly the speediest library.

~~~
rames
Anything working yet ?

SSE4.2 is not mandatory. Rejit works without it as well (SSE support detected
dynamically), it just gives excellent performance with a relatively simple
implementation. Support for earlier versions of SSE can easily be added.

I'll try to add libpcre to the benchmarks when I get some time.

~~~
adobriyan
Search for small enough fixed strings works (no runtime assembly though,
script just generates C code and compile it with gcc).

------
nikic
Obligatory comment: Nowadays nearly nothing uses actually "regular" regexes,
which is also the reason why regex engines are typically using backtracking
and not Thompson NFAs.

For general-purpose applications (e.g. use in programming languages) people
usually want a regex flavor with support for at least backreferences and
typically also (recursive) subpattern references. Note: Adding support for
backreferences makes matching regexes an NP-complete problem (as opposed to a
simple linear-time algorithm without them).

(But of course, having fast NFA implementations for really-regular regexes is
still useful. After all a large part of the regexes you typically write will
not use backreference etc)

~~~
acdha
> Obligatory comment: Nowadays nearly nothing uses actually "regular" regexes

This is only partially true: while most people use implementations which
support back references, it's likely that a significant percentage of regular
expressions actually executed do not use that complexity. Given the widespread
use of regular expressions for input validation, log-file analysis, URL
dispatching in web frameworks, etc. there's a fair chance that a majority of
the regular expressions executed would benefit from this approach,
particularly since it would be trivial for an engine to transparently fall
down to the current implementation when it encounters a complex construct.

~~~
nikic
Not disagreeing there. Fast NFA implementations are very nice for matching the
regular subset and falling back to a more general algorithm for the non-
regular cases :)

------
tambourine_man
Whenever I see something like this, I'm always reminded of this great post:

[http://ridiculousfish.com/blog/posts/old-age-and-
treachery.h...](http://ridiculousfish.com/blog/posts/old-age-and-
treachery.html)

~~~
adobriyan
Even the dumbest SSE2 based code will kick the living shit out of any byte-at-
a-time super-optimized grep.

> we shall make a one gigabyte file with one thousand x's per line, and time
> grep searching for "yy" (a two character best case) and "yx" (a two
> character worst case).

In this case SSE based code will be a) several times faster because SSE loads
will be used, b) time difference between "yy" and "xx" cases will be minimal
(several ticks to accumulate result from second PMOVMSKB+PCMPEQB combo and c)
time of search will be background-independent.

Paraphasing Linux CodingStyle text, first off, I'd suggest printing out a copy
of the "why GNU grep is fast", and NOT read it. Burn them, it's a great
symbolic gesture.

The trivial exercise is to disassemble x86-64 version of glibc memchr(3) and
try to outperform it with byte-at-a-time based code.

Hint: you won't be able to do it.

------
Scaevolus
Very interesting! You should consider having a fast path for long string
literal prefixes.

You might find DynASM[1] useful, as long as you're not doing dynamic register
allocation. Here's[2] a good tutorial.

    
    
        [1]: http://luajit.org/dynasm.html
        [2]: http://blog.reverberate.org/2012/12/hello-jit-world-joy-of-simple-jits.html

------
awda
Neat and unexpected result (for me): It seems like v8's re engine is very good
(fastest) for "practically long" strings, (anything up to 256 B - 64 kB,
depending on expression).

Possibly measurement error, though, considering JS time in milliseconds and
the different test methodology for v8 under a certain number of trials, as
compared with other engines?

------
chromaton
I found a clear explanation of the SIMD instructions here:
[http://www.strchr.com/strcmp_and_strlen_using_sse_4.2](http://www.strchr.com/strcmp_and_strlen_using_sse_4.2)
. Apparently the instructions operate on strings up to 16 bytes in length.

~~~
ape4
I think it would be reasonable to have strlen, strchr and the other str*()
functions as processor instructions by now. With no 16 byte restrictions.

~~~
yoklov
Well it's 16 bytes at a time, and not 16 bytes maximum, but there are already
the (arguably misguided) repeat prefixes for string instructions, which are
designed for more or less that exact purpose.

------
CountHackulus
I wonder if the SIMD part of this could be integrated into D's compile time
regex support. Would be neat to see.

