

Implementing Regular Expressions - acqq
http://swtch.com/~rsc/regexp/

======
jules
Here is an excellent and readable paper on implementing regular expressions
<http://sebfisch.github.com/haskell-regexp/regexp-play.pdf>

They also extend it to weighted regular expressions and even lazy, infinite
regular expressions.

~~~
acqq
The "Implementing Regular Expressions" page gives the full (scientific)
background to

"re2 - An efficient, principled regular expression library"
<http://news.ycombinator.com/item?id=3204427>

~~~
luriel
Also Russ Cox is one of the main hackers in the Google Go team and wrote the
new (pure) Go regexp engine based on his work on re2.

Of course the Go team also includes Ken Thompson who popularized regexps with
Unix, and Rob Pike of 'structural regular expressions' fame:
<http://doc.cat-v.org/bell_labs/structural_regexps/>

------
willvarfar
Excellent links. And this was just on my mind in fact.

In most web-servers there is a list of regexes - typically starting stems,
almost more glob-like - that are defined in precedence order to match requests
to handlers.

Are there any libraries that can treat this as a single effective operation,
rather than just testing against each regex as an island until there's a
match?

~~~
acqq
Are you are interested in the parsing of the HTTP requests? I think I've
recently seen some hand-coded state-based C implementation (I guess it can
even be O(n)?) in some fast open-source HTTP server.

~~~
papaf
The last time I looked at nginx it parsed the HTTP request with a hand-coded
state machine but routes that are defined by regular expression matching were
handled one expression at a time.

Routes that are not defined by regular expressions used a much faster tree
based method.

