
Some Strategies for Fast Lexical Analysis When Parsing Programming Languages - userbinator
http://nothings.org/computer/lexing.html
======
choeger
I wonder what the audience of such a leader is. JavaScript engines would
probably profit here, given the waste amount of code we download every day.
Maybe even browsers in general. Maybe shells.

But your average compiler or interpreter? The thing is that new languages
change so often, do you really want to always generate a lexer table for every
new keyword? Or complicate your build system by not using the industry
standard tools? And if you have an old language, do you really want to touch
that lexer that some bearded hacker has gobbled together decades ago so it
works both on a PDP11 and some long forgotten microcomputer?

~~~
archi42
Maybe people who will one day work on compilers?

I learned a lot writing a compiler for a subset of C over one semester (in
C++), and just recently built a "compiler" that is used in-house for a one-
shot conversion task. Without the prior knowledge that would have been a pain
(the input language is poorly documented), but that way I just wrote the whole
tokenizer/parser/emitter-chain in perl5 (pattern matching is awesome, and
performance doesn't matter as long as I can parse in the rang of 10k to 50k
lines per minute).

------
dbcurtis
I have always been a fan of flex. The thing that impressed me in this table is
the speed-up from the -F and -f options -- I didn't realize it could make a 4X
difference.

Anyway, in this benchmark, the author gets about a 15% speed-up going hand-
coded over -f. Given that, my take-away is that flex is almost always the
right answer for most lexing applications, because most of the time
maintainability is going to win over getting the last 15%. If you absolutely
_need_ that last 15%, you know it, and a living a specialized life.

~~~
auggierose
fun fact: the guy behind flex is also the guy behind seL4

