
Parser Generators Considered Useless? - dmoney
http://alumnit.ca/~apenwarr/log/?m=200902#20
======
scott_s
_And languages that are more complicated or have tight parser performance
requirements - like C++ compilers or Ruby interpreters - tend to have hand-
rolled parsers because the automatic parser generators can't do it._

It's actually common for people to use a parser generator when implementing a
compiler for a language. He's looking at the tip of the iceberg - quality
compilers for popular languages - and assuming that's all there is.

If you're by yourself or on a small team, it's a waste of your time to build a
parser for a language by scratch. You might be able to produce a better-
performing parser if you implement it yourself, but the performance of your
compiler isn't even a second-order concern, it's a third or fourth order
concern. I say this because you're going to care about the performance of the
code you _generate_ far before you care about the performance of the compiler
itself. On par with that is, of course, language features. If your language
features require a change in the grammar, it's much easier to change a grammar
file than to muck around in your own implementation.

Popular compilers do take time to optimize parsing, but very few compilers
have to handle the strain that the likes of gcc and MSVC do.

 _Unfortunately, that seems to be where my formal education ends, because I
just can't figure out why lexical analysis is supposed to be so difficult._

It's not difficult. It's just tedious. Which is why the formal theory exists:
given the description of all tokens in a language, we can generate a state
machine which will tokenize the input stream.

------
mtarnovan
Those _Big Fancy Professional XML Libraries_ are complicated (internally)
because they can parse DTD and XSD specified XML and do transforms with XSLT.
Something the author's naïve implementation is far from.

------
russell
The author is talking using parser generators for parsing XML. Well duh.
That's huge overkill. There are so many decent libraries out there that it
makes no sense to roll your own. If a DOM based parser is too cumbersome (say
giant files), use SAX and grab what you want on the fly. Even if I were
dealing with high volumes, like crawler output, I would still go with SAX
until I figured out where my bottlenecks were.

