Hacker News new | past | comments | ask | show | jobs | submit login

> Hand-rolled recursive descent parsers as well as parser combinators can easily obscure implicit resolution of grammar ambiguities.

Could you give a concrete, real-life example of this? I have written many recursive-descent parsers and never ran into this problem (Apache Jackrabbit Oak SQL and XPath parser, H2 database engine, PointBase Micro database engine, HypersonicSQL, NewSQL, Regex parsers, GraphQL parsers, and currently the Bau programming language).

I have often heard that Bison / Yacc / ANTLR etc are "superior", but mostly from people that didn't actually have to write and maintain production-quality parsers. I do have experience with the above parser generators, eg. for university projects, and Apache Jackrabbit (2.x). I remember that in each case, the parser generators had some "limitations" that caused problems down the line. Then I had to spend more time trying to work around the parser generator limitations than actually doing productive work.

This may sound harsh, but well that's my experience... I would love to hear from people that had a different experience for non-trivial projects...




If you start with an unambiguous grammar then you aren't going to introduce ambiguities by implementing it with a recursive descent parser.

If you are developing a new grammar it is quite easy to accidentally create ambiguities and a recursive descent parser won't highlight them. This becomes painful when you try to evolve the grammar.


The original comment says that using yacc/bison is "fundamentally misguided." But parser generators make it easy to add a correct parser to your project. It's obviously not the only way. Hand-rolling has a bunch of pitfalls, and easily leads to apparently correct behavior that does weird things on untested input. Your comment then is a bit like: I've never had memory corruption in C, so Rust/Java/etc. is for toy projects only.


> Hand-rolling has a bunch of pitfalls

I'm arguing that this is not the case in reality, and asked for concrete examples... So again I ask for a concrete example... For memory corruption, there are plenty of examples.

For parsing, I know one example that lead to problems. Interestingly, it was about using a state machine that was then modified (manually) and the result was broken. Here I argue that using a handwritten parser, instead of a state machine that is then manually modified, would not have resulted in this problem. Also, there was no randomized testing / fuzz testing, which is also a problem. This issue is still open: https://issues.apache.org/jira/browse/OAK-5367


There's no reason for concrete examples, because the point was about the fundamental misguidedness of parser generators, not about problems with individual parser generators or the nice things you can do in a hand-rolled one, but to accommodate you, ANTLR gives one on its home page: "... At Twitter, we use it exclusively for query parsing in Twitter search... Samuel Luckenbill, Senior Manager of Search Infrastructure, Twitter, inc."

Also, regexps are used very often in production, and that's definitely a parser-generator of sorts.

The memory corruption example was an analog, but to spell it out: it's easier and faster to write a correct parser using flex/bison than by hand, especially for more complex languages. Parser-generators have their use, and are not fundamentally misguided. That you might want to write your own parser in some cases does not diminish that (nor vice versa).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: