
A Regular Expression Matcher in 30 lines of C - coderdude
http://www.cs.princeton.edu/courses/archive/spr09/cos333/beautiful.html
======
gjm11
It's very nice pattern-matching code, but it isn't a regular expression
matcher. The class of patterns it's able to match is much smaller than the
class of all regular expressions -- not because of missing "sugar" (character
classes, +, non-greedy wildcards, ...) but because it doesn't support groups
or the | operator.

As a result, the language supported by this matcher has, e.g., no way to match
the pattern (ab)* or the pattern a|b. It's much, much less powerful than an
actual regular expression matcher would be, and much of what makes it possible
to do it in 30 lines of C is that loss of power.

(I've written extremely similar code before: this level of functionality --
basically, glob or DOS wildcards -- is pretty useful. I'm not Rob Pike, and my
code for similar functionality would probably be longer than his. But my code,
or even Rob Pike's code, for even the simplest thing that could honestly be
called a regular expression matcher, would be longer than this by a bigger
factor.)

------
chanux
I see the OP has discovered
[[http://www.hnsearch.com/search#request/submissions&q=cod...](http://www.hnsearch.com/search#request/submissions&q=coderdude&start=0)]
r/tinycode which was posted on HN few days back
[<http://news.ycombinator.com/item?id=2717279>].

------
antirez
I wrote a similar glob-style string matching, but it's far more than 30 lines
(130 lines of C):

[https://github.com/antirez/redis/blob/unstable/src/util.c#L1...](https://github.com/antirez/redis/blob/unstable/src/util.c#L11)

I use it in Redis, but in general being "glob matching" more standard compared
to a regexp subset people are more likely to know the syntax.

------
ihodes
That was a superb read. Given the author(s) (who I noticed only after
reading), I suppose that's to be expected. I'd like to see a similarly elegant
version that translates the RE to a DFA–if anyone has any suggestions, I'd be
quite happy!

------
slug
This an excerpt of the excellent book
<http://oreilly.com/catalog/9780596510046>

------
perlgeek
Yes, matching regular expressions is easy, if you don't care about speed or
features. Like, being able to match a literal dot or asterisk. Or char
classes, or getting case insensitivity right (like the German ß that matches
SS, so case folding needs not to preserve string length).

Still nice to see that it can be done in so few lines.

------
shangaslammi
Here's my attempt at implementing the same functionality in Haskell:
<https://gist.github.com/1062472>

I tried to make it short, but still somewhat readable, so it's not completely
"golfed".

------
sheffield
"In 1967, Ken [Thompson] applied for a patent ..."

Wow, he pioneered that, too...

------
swah
That code is dense.

