
Step 1) Install pyre2 2) import re2 as re 3) 10x faster regex - axiak
http://github.com/axiak/pyre2
======
logic
If you're not familiar with re2, you may be familiar with a couple of other
little projects that the author, Russ Cox, is involved with: Go
(<http://golang.org/>) and Plan 9 (<http://swtch.com/plan9port/>).

Also, here's a great bit of technical history behind Russ' re2 implementation,
and why pyre2 will never be completely compatible with Python's re (without
fallback):

<http://swtch.com/~rsc/regexp/regexp1.html>

<http://swtch.com/~rsc/regexp/regexp2.html>

<http://swtch.com/~rsc/regexp/regexp3.html>

~~~
pfedor
Google Code Search was his intern project:
[http://googlecode.blogspot.com/2006/10/search-worlds-
public-...](http://googlecode.blogspot.com/2006/10/search-worlds-public-
source-code.html)

------
natmaster
What kind of licensing does this have?

It would be interesting to see if this could make it in Python 3.2 (It's
supposed to include the Unladen Swallow stuff, so might as well take more
awesomeness from Google.)

~~~
kingkilr
It won't. It doesn't implement the entire regular expression language Python
supports.

~~~
dkarl
Could a regex library parse a regular expression and decide reasonably quickly
which engine to use?

~~~
kingkilr
Probably, but python-core has no interest in maintaining multiple regular
expression implementations: [http://mail.python.org/pipermail/python-
dev/2010-July/101606...](http://mail.python.org/pipermail/python-
dev/2010-July/101606.html) (this thread starts talking about regex, which is a
backwards compatible enhancement to re, but also covers re2)

------
democritus
I tried using pyre2 and experienced runaway memory usage. I was using
re.split. The behaviour went away when I switched to the standard re module.
(without changing my code.)

Did anyone else experience this?

The code in question was:

(a, b) = re.split("\n\s*\n", text, maxsplit=1)

Thank you for any insights you may have regarding this.

------
sandGorgon
Could this be uploaded to pypi, so that it is available via pip ?

pip install pyre2 -E /tmp/testing

~~~
maxiak
If you read the readme under installation, you'll find that it is.

pip install re2 -E /tmp/testing

------
ww520
I'm curious. How much of a speedup in regex when using DFA instead of NFA? I
believe most regex implementation use NFA but there are well known algorithms
to convert NFA to DFA. It should be worth the try.

~~~
ori_b
The problem is that in the worst case, converting an NFA to a DFA gives an
overhead of 2^n, while keeping it as an NFA only incurs an overhead of m
(where m is the regex length).

So, in the worst case (ie, under algorithmic complexity attacks), the NFA
implementation is O(m * n), while the complexity of the DFA implementation is
- I believe - O(2^m * n)

~~~
koenigdavidmj
And fortunately, if re2 can not handle one of the regexen that you throw at
it, then it will pass it down to Python's original re module (NFA_based).
Perfectly seamless.

~~~
ori_b
Both re2 and Python are NFA based. The difference, I believe, is that re2
doesn't support backtracking, which changes the worst-case time from linear
growth to exponential growth on the input length. (For the snobs, that means
that re2 regexes are regular expressions. Python's aren't.)

~~~
btilly
The difference is that re2 avoids backtracking in most cases where Python's
normal regular expression engine would use it. And by avoiding backtracking,
avoids the possibility of exponential slowdowns.

~~~
ori_b
Isn't that what I just said?

~~~
btilly
No. "Doesn't support backtracking" suggests that it doesn't support features
that re backtracks for. In fact it does.

Also it is technically possible for re2 to (although it doesn't currently do
this) add backtracking to its current strategy. If it did this, then it could
offer all features of re, but would frequently be much faster. However I would
not permanently rule that possibility out.

~~~
ori_b
I'm talking about the C++ re2 engine, not the Python wrapper.

~~~
btilly
So was I.

