Hacker News new | past | comments | ask | show | jobs | submit login

Parser combinators let you put a breakpoint on a rule and get a call trace. (Same with plain old recursive descent or anything just involving function calls.)

That alone pretty much wipes out any remaining advantages of parser generators, in which the "cow" of your rules has been turned into a "hamburger" state machine that is very difficult to follow, usually having very poor debug support compared to the maturity of the rest of the tooling. ("How come if I add a variant to ths rule, I get a reduce/reduce conflict in these five other rules elsewhere? Waaaah ...")

Lest there be any doubt: GCC uses a hand-written recursive descent parser for C++. (Meg and a half of code and increasing.)

That's virtually a proof that just parsing with functional decomposition is good enough for anything.

Another thing is that with functional parsing, you can use the exception handling (or other non-local, dynamic control transfers) of the programming language for recovery and speculative parsing. This parse didn't work? Chuck the whole damn branch of the parse with a dynamic return, and try going down another rabbit hole.

That's more a proof that the grammar of C++ is absolutely terrible. More modern languages such as rust have been carefully designed to be parsable by non-necronomicon-level code.

> That's more a proof that the grammar of C++ is absolutely terrible. More modern languages such as rust have been carefully designed to be parsable by non-necronomicon-level code.

I don't believe your comment is fair or correct, and even very naive. Considering GCC's case, what makes a parser complex is not the language grammar itself, but all the requirements set onto the compiler to enable it to churn out intelligible warnings and error messages, which means supporting typical errors as extensions of the language grammar.

Furthermore, GCC's C++ parser support half a dozen different versions of C++ and all the error and warning messages that come with targetting a language standard but using a feature not supported by it. Rust has no such requirement, nor it will have any time soon.

Your comment strikes me as the old and faded tendency to throw baseless complains about tried and true technology by pointing out how green and untested tech is somehow better because it's yet to satisfy real-world requirements.

You are incorrect: the grammar of both C and C++ is fundamentally hard to parse, let alone the extra complexity of giving nice error message and doing good recovery. E.g. they require context to parse https://stackoverflow.com/a/41331994/1256624 , https://en.m.wikipedia.org/wiki/Most_vexing_parse , and C++ is undecidable to parse correctly http://blog.reverberate.org/2013/08/parsing-c-is-literally-u... .

> That's more a proof that the grammar of C++ is absolutely terrible.

Oh, I took that for granted as the basis of my remarks: far from a random choice on my part.

File that next to the 'proof' that the grammar of English is absolutely terrible. More modern languages such as Esperanto have been carefully designed to be learnable by regular human beings.

I know which language I'd rather be fluent in.

What is this trying to say? If a crucial portion of your job depends on the ability to parse a language (and it does, if you're a programmer who uses an IDE), then that's a point in favor of a language that's LL(1) rather than context-dependent. Making an analogy to natural languages here isn't relevant.

In the context of the discussion, it has been pointed out that parser generators are not used in practice in multiple very successful compilers for very successful real world languages. The grandparent to my comment claimed that the fact that GCC uses a hand written recursive descent parser is proof that that approach to parsing is good enough for anything. The parent claimed that no, it's a proof that the grammar of C++ is 'terrible'.

My point was that grammatical purity doesn't appear to correlate particularly in the real world with the success/popularity of a language. My analogy to natural languages was relevant in that context. If parser generators are not well suited to some very successful existing languages, it's not particular useful to blame that on the languages and point to currently niche (and relatively young) languages that don't have that 'problem'. In the real world, the most used languages have and will continue to have for the foreseeable future 'terrible' grammars so there will continue to be a need for parsing techniques that can handle them.

I'd actually speculate further that the analogy to natural languages is relevant in that it may not be a coincidence that the most used languages have some of the most complex grammars. Why that might be the case is an interesting question to think about.

> that's a point in favor of a language that's LL(1) rather than context-dependent

and 5 points in favor of a language that's LL(0) rather than LL(1). The logical conclusion of your argument is to use Lisp everywhere.

Lisp is LL(1). If we have #, we don't know what we're looking at until we read the next character, like #s structure, #( vector, #= circle notation, etc.

Actually there can be an integer between the two; but that doesn't change what kind of syntax is being read so it arguably doesn't push things to LL(2).

Other examples: seeing (a . we don't know whether this is the consing dot notation, or the start of a floating-point token.

Speaking of tokens, the Lisp token conversion rules effectively add up to LL(k). 12345 could be an integer or symbol. If the next character is, say, "a" and the token ends, we get a symbol. Basically if we see a token constituent character then k more characters have to be scanned before we can decide what kind of token and consequently what object to reduce to.

That variety of Lisp with its particular brand of syntactic sugar is LL(1), and its lexing LL(k). Because of its circumfix notation, a Lisp language can be LL(0) when the prefix symbols and tokens are defined appropriately. That was the point of my original comment.

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact