So I'm not sure what the point is of the title of the article.
Using a combinator or generator makes sense when you really care just about getting trees out as fast as possible; e.g. what a command line batch compiler or interpreter does.
However, in general, you can easily shoot yourself in the foot with a handwritten parser, because you can't see the conflicts by looking at the code locally.
Conflicts is generally one of the least interesting problems in writing a parser unless you're designing a new language, and especially when hand writing recursive descent parsers, potential conflicts/ambiguities tend to become obvious quickly.
For prototyping a new language, sure, use a parser generator to test the grammar.
I too fall in the category of loving the idea of parser generator (and having written a few) but always falling back on hand-written parsers because the generators I've seen have all been inadequate. I hope that chances some day.
That's not a theoretical issue, but more a practical issue.
> you'll have to hack around the generator's limitations
You're probably thinking of LALR(1) class generators. But we have GLR parser generators for a long time now, and they are very flexible and have little limitations. See e.g. .
The rest of the paper elaborates on the title.
That alone pretty much wipes out any remaining advantages of parser generators, in which the "cow" of your rules has been turned into a "hamburger" state machine that is very difficult to follow, usually having very poor debug support compared to the maturity of the rest of the tooling. ("How come if I add a variant to ths rule, I get a reduce/reduce conflict in these five other rules elsewhere? Waaaah ...")
Lest there be any doubt: GCC uses a hand-written recursive descent parser for C++. (Meg and a half of code and increasing.)
That's virtually a proof that just parsing with functional decomposition is good enough for anything.
Another thing is that with functional parsing, you can use the exception handling (or other non-local, dynamic control transfers) of the programming language for recovery and speculative parsing. This parse didn't work? Chuck the whole damn branch of the parse with a dynamic return, and try going down another rabbit hole.
I don't believe your comment is fair or correct, and even very naive. Considering GCC's case, what makes a parser complex is not the language grammar itself, but all the requirements set onto the compiler to enable it to churn out intelligible warnings and error messages, which means supporting typical errors as extensions of the language grammar.
Furthermore, GCC's C++ parser support half a dozen different versions of C++ and all the error and warning messages that come with targetting a language standard but using a feature not supported by it. Rust has no such requirement, nor it will have any time soon.
Your comment strikes me as the old and faded tendency to throw baseless complains about tried and true technology by pointing out how green and untested tech is somehow better because it's yet to satisfy real-world requirements.
Oh, I took that for granted as the basis of my remarks: far from a random choice on my part.
I know which language I'd rather be fluent in.
My point was that grammatical purity doesn't appear to correlate particularly in the real world with the success/popularity of a language. My analogy to natural languages was relevant in that context. If parser generators are not well suited to some very successful existing languages, it's not particular useful to blame that on the languages and point to currently niche (and relatively young) languages that don't have that 'problem'. In the real world, the most used languages have and will continue to have for the foreseeable future 'terrible' grammars so there will continue to be a need for parsing techniques that can handle them.
I'd actually speculate further that the analogy to natural languages is relevant in that it may not be a coincidence that the most used languages have some of the most complex grammars. Why that might be the case is an interesting question to think about.
and 5 points in favor of a language that's LL(0) rather than LL(1). The logical conclusion of your argument is to use Lisp everywhere.
Actually there can be an integer between the two; but that doesn't change what kind of syntax is being read so it arguably doesn't push things to LL(2).
Other examples: seeing (a . we don't know whether this is the consing dot notation, or the start of a floating-point token.
Speaking of tokens, the Lisp token conversion rules effectively add up to LL(k). 12345 could be an integer or symbol. If the next character is, say, "a" and the token ends, we get a symbol. Basically if we see a token constituent character then k more characters have to be scanned before we can decide what kind of token and consequently what object to reduce to.
One of those handled most of C99 standard.