

RE2/J: Linear time regular expression matching in Java - fmela
https://github.com/google/re2j

======
mumrah
Lucene has had this capability for a while now. A regexp query (as defined by
the RegExp class) is compiled into an automaton for quick comparison with the
inverted index, also represented as an automaton. Automatons and FSTs are used
extensively by Lucene these days. Out of curiosity, I did some playing around
with Lucene's RegExp class when it first came out (or rather, when I first
learned about it). It provides a really interesting way to build regular
expressions:
[https://gist.github.com/mumrah/6104234](https://gist.github.com/mumrah/6104234)

Javadoc:
[https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/...](https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/automaton/RegExp.html)

Source: [https://github.com/apache/lucene-
solr/blob/trunk/lucene/core...](https://github.com/apache/lucene-
solr/blob/trunk/lucene/core/src/java/org/apache/lucene/util/automaton/RegExp.java)

~~~
chrisconroy
Related: the dk.brics.automaton library
([http://www.brics.dk/automaton/](http://www.brics.dk/automaton/)) is
excellent.

It's got a much different API than java.util.RegEx, but it let's you work with
Automatons as first class things instead of operating just with regexes. Being
able to compute intersections, unions, shortest match examples, from multiple
automata etc.. can be really useful.

------
userbinator
In addition to writing RE2, Russ Cox has a set of very readable articles on
implementing efficient regular expressions here:
[http://swtch.com/~rsc/regexp/](http://swtch.com/~rsc/regexp/)

------
dsymonds
Obligatory comment about how if you can't match it in linear time then it's
not a regular expression.

~~~
peterashford
So Perl and Python don't have regex then?

~~~
sanxiyn
No, they are not regular.

See
[http://en.wikipedia.org/wiki/Regular_language](http://en.wikipedia.org/wiki/Regular_language)
for more.

------
_benedict
Something missing from all such libraries that I find a little disappointing,
is the ability to compile multiple regular expressions into a single
evaluation with a paired action on the result, so that the regexp becomes a
general purpose map with string domain. At its most basic, each regexp could
then be paired with a different replacement operation. Or an arbitrary more
complex action.

With a DFA representation (linear time evaluation), this is pretty powerful.
I've a regexp library that supports this, that I've been meaning to spruce up
for wider release, but it's been several years and I've yet to do it.

[https://code.google.com/p/jjoost/source/browse/#hg%2Fjjoost-...](https://code.google.com/p/jjoost/source/browse/#hg%2Fjjoost-
text%2Fsrc%2Forg%2Fjjoost%2Ftext%2Fpattern)

