
Regular expressions in lexing and parsing (2011) - astdb
https://commandcenter.blogspot.com/2011/08/regular-expressions-in-lexing-and.html
======
eesmith
> It's not too hard to write the regexp (something like "[a-ZA-Z_][a-ZA-
> Z_0-9]*"), but really not much harder to write as a simple loop. The
> performance of the loop, though, will be much higher and will involve much
> less code under the covers.

My experience with using Ragel as the lexer is that Ragel emits some high
performance code from that sort of regexp.

A couple of weeks ago I looked at a hand-written lexer for integers. It was
supposed to match "-?[0-9]+". It ended up allowing values like 00-123 because
of a bug in the '-' detection code.

Finding that bug took some close reading.

Which means I'm not convinced about the conclusion:

> ... don't write lexers and parsers with regular expressions as the starting
> point. Your code will be faster, cleaner, and much easier to understand and
> to maintain.

