

Killing Yacc: 1, 2 & 3 - Garbage
http://blogs.perl.org/users/jeffrey_kegler/2010/12/killing-yacc-1-2-3.html

======
haberman
To me Yacc is a perfect example of how people's pain points can lead them to
the wrong conclusion about what needs to be improved.

Pain point: debugging shift/reduce conflicts in Yacc can make a person want to
pull out their own toenails.

Irrational conclusion: whatever replaces Yacc should accept any CFG. No
annoying errors, yippee!

But there are two major problems with accepting arbitrary CFG's, as Russ Cox
pointed out in his essay: you give up linear-time parsing, and you don't know
if your grammar has ambiguities.

For ambiguities, Russ used the example of 1+1+1. I think a much better example
would have been 1-1-1, because (1-1)-1 and 1-(1-1) give you different answers!
You don't want to be flying blind about whether your language has ambiguities
or not. It would be like having a regular programming language guess what you
meant instead of giving you a syntax error.

What if instead, the tool was actually good at helping you understand why your
grammar isn't being accepted? Imagine if the error message was something like:

"The string 1+1+1 could be interpreted as either <foo> or <bar>"

...where "foo" and "bar" are convenient ASCII representations of the parse
trees (better than what I could demonstrate with HN formatting limitations).

~~~
qntm
That's a great error message, but is it generally possible to take an
arbitrary CFG and say whether or not the language it generates contains one or
more strings with ambiguous interpretations? Even if it's possible, and such
strings could be demonstrated, is it computationally practical to find them?

I'm just asking, I seriously don't know.

EDIT: Ah, I guess yacc does this already.

EDIT2: Okay, so since we're talking about building faster parsers, not
building parsers faster, and that shift/reduce conflict is something that
comes up at parser build time rather than parse time, is there any reason why
you can't perform that ambiguity check with yacc and then build your parser
the new way?

~~~
haberman
> is it generally possible to take an arbitrary CFG and say whether or not the
> language it generates contains one or more strings with ambiguous
> interpretations?

No, unfortunately it is undecideable, as are many interesting questions you
wish you could ask about grammars.

However it is often possible. Even when it is not possible to definitively say
that the grammar is _ambiguous_ , it should be possible to say:

"When I see the string <foobar> I can't decide between <x> and <y>."

That's (sort of) what yacc is doing with its shift/reduce conflict errors. I
think this it's generally easier to understand _why_ this is with top-down
(LL) parsers than bottom-up ones like yacc though. Top-down parsers are closer
to the way people actually think. That's why I favor an ANTLR-like approach.

------
theoj
Even if it turns out that the new parsing algorithm doesn't shine in all
situations, I think it is nevertheless a welcome development. Compiler and
interpreter makers can use this algorithm for those grammars and languages
where it actually outperforms Yacc.

~~~
sharkbot
However, the article points out that we can only use the algorithm when we
know it outperforms Yacc; read "know" as "proven theoretically".

