> Hand-rolled recursive descent parsers as well as parser combinators can easily...

masfuerte · 2024-09-02T13:31:36.000000Z

If you start with an unambiguous grammar then you aren't going to introduce ambiguities by implementing it with a recursive descent parser.

If you are developing a new grammar it is quite easy to accidentally create ambiguities and a recursive descent parser won't highlight them. This becomes painful when you try to evolve the grammar.

tgv · 2024-09-02T12:25:54.000000Z

The original comment says that using yacc/bison is "fundamentally misguided." But parser generators make it easy to add a correct parser to your project. It's obviously not the only way. Hand-rolling has a bunch of pitfalls, and easily leads to apparently correct behavior that does weird things on untested input. Your comment then is a bit like: I've never had memory corruption in C, so Rust/Java/etc. is for toy projects only.

thomasmg · 2024-09-03T09:21:40.000000Z

> Hand-rolling has a bunch of pitfalls

I'm arguing that this is not the case in reality, and asked for concrete examples... So again I ask for a concrete example... For memory corruption, there are plenty of examples.

For parsing, I know one example that lead to problems. Interestingly, it was about using a state machine that was then modified (manually) and the result was broken. Here I argue that using a handwritten parser, instead of a state machine that is then manually modified, would not have resulted in this problem. Also, there was no randomized testing / fuzz testing, which is also a problem. This issue is still open: https://issues.apache.org/jira/browse/OAK-5367

tgv · 2024-09-03T17:33:26.000000Z

There's no reason for concrete examples, because the point was about the fundamental misguidedness of parser generators, not about problems with individual parser generators or the nice things you can do in a hand-rolled one, but to accommodate you, ANTLR gives one on its home page: "... At Twitter, we use it exclusively for query parsing in Twitter search... Samuel Luckenbill, Senior Manager of Search Infrastructure, Twitter, inc."

Also, regexps are used very often in production, and that's definitely a parser-generator of sorts.

The memory corruption example was an analog, but to spell it out: it's easier and faster to write a correct parser using flex/bison than by hand, especially for more complex languages. Parser-generators have their use, and are not fundamentally misguided. That you might want to write your own parser in some cases does not diminish that (nor vice versa).