
Yacc is not dead - shadytrees
http://research.swtch.com/2010/12/yacc-is-not-dead.html
======
cletus
This is one area that has always disappointed me. Compiler-compilers just
never seem to have lived up to the promise of making it easier to define your
own languages. If anything, languages like Ruby have done a far better job
(albeit in a more limited way).

I wonder at the reasons for this. Is this simply the wrong theoretical
approach? If you look at formal grammars and how Yacc, ANTLR, etc deal with
them, they're intended to be unambiguous. This is actually a problem in many
ways.

PEGs (Parsing Expression Grammars) are a relatively new (~10 years)
development that I find far more interesting for this reason as they solve the
ambiguity problem by simply ranking the alternatives, which can have hideous
backtracking possibilities (and tend to expensive in space terms) but in
practice seem to lead to more natural definitions (at least in my limited
experience).

I just have to wonder if we're going about this whole language thing the wrong
way.

~~~
kragen
I agree that PEGs are more interesting.

Formally speaking, you can parse PEGs in worst-case linear time with a packrat
parser. Practically speaking, the constant factor on that worst case is still
much worse than the usual case (the number of nonterminals in your grammar is
a factor of the constant factor.)

I've written a one-page PEG compiler-compiler (interspersed with many pages of
text explaining it) at <https://github.com/kragen/peg-
bootstrap/blob/master/peg.md>.

------
jedbrown
And a response from the authors of the paper:
<http://matt.might.net/articles/parsing-with-derivatives/>
(<http://news.ycombinator.com/item?id=2001736>)

------
russell
I have used yacc and its offspring like bison for a long time. Writing the
grammars is really simple, but debugging them is really arcane. The problem is
that the place where the parser generator discovers the error can be some
distance from where you introduced the ambiguity. Grammars have their own
debugging style, particularly with yacc/bison output. Eventually you have an
aha moment and you start looking for an improper empty state or trying a
reduction at the wrong place in the parse tree. It isnt hard if you grasp it
and probably an intractable bitch if you dont.

I will agree however, that the parser and the bison source look like someone
barfed C all over the floor, unless someone has given it serious TLC in the
past dozen years.

------
6ren
What do people use yacc and other parsing tools for these days?

Looking at the uses page for ANTLR yesterday, it seems to be entirely DSL
(query languages etc) and language tools (Java src code transformers, coverage
checkers etc).

It seems that if you just need a text format, you'll probably settle for JSON
(if you're happy to output data-structures directly) or XML (if you need to
design a schema or interop with one). They are both general, non-tailored
formats, making specific formats easy to extend. Less specific, more flexible.

Recently it was said here that most modern programming languages
(surprisingly) use hand-made recursive descent parsers. This gives maximum
power and performance, at the cost of complexity and being a lot of work.

BTW: javacc is a nice update of yacc IMHO, using EBNF, eg. you can write
repetition with an * instead of breaking it into _A:: aA|-_ . But it doesn't
seem to have an active community around it (cf ANTLR), which is sadly a
crucial factor for practical use.

However, I personally have always written manual recursive descent parsers
(including XML formats), instead of using tools. For me, yacc et. al. just get
in the way. But it's such a cool idea! BTW: There's a body of research called
"grammarware", that didn't seem to take off; and I've heard that everyone who
looks at grammars believes that can make it simple and easy to use, and set
out to do so, but no one ever succeeds (...yet, anyway).
<http://www.cs.vu.nl/grammarware/>

In my humble (and I admit, extremely unpopular) opinion, XML represents an
amazing leap forward towards this goal: an XML Schema (or a DTD) is a grammar,
and many parsing issues just disappear using this simpler, less powerful
approach. For example, it has to be deterministic (aka 1-unambiguous); you
can't have recursion within the one level of an element (aka tree grammar). I
just wish it weren't so ghastly and soul-destroying to use. You have to look
past the surface to see its disguised beauty (its beauty is "very effectively
disguised").

 _funfact_ : Eric Schmidt wrote lex
<http://en.wikipedia.org/wiki/Lex_(software)> Now, you might pooh-pooh lex for
various reasons, but you can't call Eric a non-technical CEO.

~~~
regularfry
Just off the top of my head, Ruby uses a bison parser. They're also handy any
time you've got a syntax specified in a standard, like CSS.

~~~
6ren
thanks - found it was also yacc at one time:
[http://stackoverflow.com/questions/2646312/language-
parsers/...](http://stackoverflow.com/questions/2646312/language-
parsers/2647234#2647234)

Here's the spec itself: <https://github.com/Ruby/Ruby/blob/trunk/parse.y>

------
drallison
Mathew Might video "Yacc is Dead" at Stanford from Feb 2011.
<http://www.stanford.edu/class/ee380/permlinks/Might.html>. Slides are also
available.

