
Parsing regular expressions with recursive descent - joeyespo
http://matt.might.net/articles/parsing-regex-with-recursive-descent/
======
alexis-d
If you liked this you may also enjoy <http://genius.cat-v.org/brian-
kernighan/articles/beautiful> (simple regex engine in C by Rob Pike, article
written by Brian Kernighan).

------
colanderman
His other post on matching using derivatives is very interesting:
[http://matt.might.net/articles/implementation-of-regular-
exp...](http://matt.might.net/articles/implementation-of-regular-expression-
matching-in-scheme-with-derivatives/)

~~~
anamax
See Might's talk at <http://www.stanford.edu/class/ee380/ay1011.html> .

Parsing with derivatives can take advantage of the Pratt hack to go even
faster.

------
revelation
If you want to build a regular expressions parser as a learning exercise, you
might want to look into automatons. Lots of easy to grasp (and best of all:
visual) concepts to learn there.

~~~
dxbydt
Second that. If you are going to go down this route, I wrote a simple tutorial
with lots of visuals to construct a regular expression from a finite state
automaton using JFLAP ( jflap.org ), for the "inverse fizzbuzz" problem -
[http://www.jasq.org/2/post/2012/06/fbfbfbfbfffbfbffbffbfbffb...](http://www.jasq.org/2/post/2012/06/fbfbfbfbfffbfbffbffbfbffbfbffbfbffbfgfbffbfbffbfgfbffbfgffbfbffbfffbffbfbffbf01.html)

~~~
sureshv
This one's great as well for understanding RegEx FSM's:
<http://rise4fun.com/rex>

~~~
dxbydt
Hey thanks! I fed the inverse fizzbuzz regex to Rex and it came up with an
automaton with 81 states!
<http://www.jasq.org/uploads/1/0/5/6/10564637/6329774.png?694>

My original automaton only has 8 states. (
<http://www.jasq.org/uploads/1/0/5/6/10564637/977128_orig.png> ) I guess
that's the price you pay for cycles. ( Mine is a cyclic graph, but Rex
produces a directed acyclic graph. Since they both must capture the same
regex, one of them would be much larger than the other )

------
bediger4000
Russ Cox put the plan 9 grep's parser source on-line:
<http://swtch.com/usr/local/plan9/src/cmd/grep/grep.y>

Hey, it's a yacc-compatible grammar.

------
eliben
I wrote a detailed set of articles for GameDev back at the time (2004) that
goes the whole way - full regex matcher (for simple regexes) including NFA/DFA
construction.

First two articles:

* <http://eli.thegreenplace.net/files/docs/forays/col1.html>

* <http://eli.thegreenplace.net/files/docs/forays/col2.html>

The rest listed here: <http://eli.thegreenplace.net/articles/>

------
Confusion
This series of articles [1] by Russ Cox is also interesting for anyone
interested in parsing regular expressions.

[1] <http://swtch.com/~rsc/regexp/regexp1.html>

See at the bottom of that articles for links to followups

------
UK-AL
Heh, came just at the right time... About to do a project on regexs. I've
implemented a regex engine before, but never a the parser bit.

------
crazygringo
> _Every programmer ought to have the experience of implementing a tool for
> matching regular expressions from scratch._

Umm... I'm sure it's fascinating, but I've really got better things to do with
my time, which is why I use RegExs in the first place :)

~~~
daeken
You also probably have better things to do with your time than writing your
own webserver, or creating your own language and compiler, or creating a
database. But every time you do one of those things, you learn a _lot_ about
the things you actually use, and you end up being a far, far better programmer
on the whole. If you only do the things you _need to do_ , you're never going
to be a great programmer.

~~~
crazygringo
Well of course, I couldn't agree more!

My point was that the original post seemed really judgmental in implying that
building a RegEx engine ought to be done by everybody. Instead of "every
programmer ought to have the experience", they could have said, "it can be
valuable to have the experience".

But a blanket statement that _every_ programmer _should_ do X... is really
annoying and shows a very narrow worldview. Programmers are diverse, and do
lots of different things.

~~~
sophacles
Wow, overreaction... there is a standard human communication idiom that is
being followed here. For example movies are labeled as "must see", food is
described as "you have to try...", and so on. Everyone[1] understands this to
be a hyperbolic phrasing that isn't to be taken literally, but to really
emphasize how important the speaker[2] finds the experience. Relax a little on
the pedantry, it invalidates any statements made about better uses of one's
time.

[1] The sentence that references this footnote uses the phenomenon it
describes for emphasis.

[2] This is a similar, but distinct phenomenon of using a term, in this case
speaker, with the implicit understanding that it can be generalized, in this
case to "communicator". It shouldn't be confused with the situation described,
but is also a useful pattern to understand for basic communication.

