

PIRE – really fast single-pass regex matching - alexkon
https://github.com/dprokoptsev/pire

======
alexkon
Here are the results of some sample benchmarks. These are the same as those
used in the paper "Regular Expression Matching in the Wild" by Russ Cox [1].

    
    
      500 MB file, regex: .*$
      pcre     31.67 MB/s
      re2     242.28 MB/s
      pire    756.32 MB/s
      
      500 MB file, regex: ABCDEFGHIJKLMNOPQRSTUVWXYZ$
      pcre    153.67 MB/s
      re2     653.76 MB/s
      pire    755.98 MB/s
      
      2 MB file, regex: (\d{3}-|\(\d{3}\)\s+)(\d{3}-\d{4})$
      re2     423,76 MB/s
      pire    775,89 MB/s
    

Source: Yandex, the company where PIRE originated. [2] (Russian)

[1] <http://swtch.com/~rsc/regexp/regexp3.html>

[2] <http://clubs.ya.ru/company/replies.xml?item_no=30753>

~~~
xentronium
There're two more interesting things mentioned in the article which originated
in yandex:

<https://github.com/highpower/xiva> \-- asynchronous web-server for server-
side part of html5 websockets

<https://github.com/khanton/NwSMTP> \-- they say it's a "nginx for SMTP", a
proxy server they use for providing ssl, rbl, antispam and antivirus checks.

------
jdludlow
"Pire does not have any Perlish conditional regexps, lookaheads &
backtrackings, greedy/nongreedy matches; neither has it any capturing
facilities."

Well that rules out nearly every use case I have for needing a regex in the
first place.

~~~
Groxx
Greedy / non-greedy is one of the main things I use in regexes (well, second
to capturing). That sucks. Though if it's non-capturing, I guess it makes
sense.

How many people _do_ use regexes for boolean operations? I can only think of
an instance or two where I have, aside from regular input validation.

~~~
jemfinch
Most uses of non-greedy matching can be replaced with faster, clearer, and
more precise inverted character classes, in my experience.

------
talklittle
Cool. A very similar (concept-wise, not yet sure about implementation-wise)
DFA/NFA library in Java, developed by Anders Møller at Aarhus University:

<http://www.brics.dk/automaton/>

I use this on Android and it performs substantially faster than
java.util.regex, of course working within the constraints of DFA/NFA as
opposed to the "Perlish" regular expressions.

Edit: And dk.brics.automaton has its own C reimplementation. I wonder how the
speed and functionality compare to PIRE?

<http://augeas.net/libfa/>

------
thibaut_barrere
The joys of naming: "pire" in french means "worst".

~~~
robrenaud
Maybe it is an appropriate term here? Pire throws out the advanced/complex
features of Perl compatible regular expressions, so the authors are arguably
following the "Worse is Better"[1] paradigm.

*Simplicity The design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.

[1] <http://en.wikipedia.org/wiki/Worse_is_better>

