

Comparison of parser generators - wslh
http://en.wikipedia.org/wiki/Comparison_of_parser_generators

======
tlarkworthy
I wrote a simple compiler during uni, and a syntax highlighter for a
commercial product in 2002. I consider myself competant with grammars. Yet I
hardly ever deploy my parsing skills in everyday programming, even though I
know most protocol interpreters, file readers etc. would befitting from the
readability that grammar and parsing separation achieves. I never use parsers
because the "compile the compiler" then "compile the generated code" is
horribly cumbersome. Also, the time taken to remember grammar syntax and get
back in the mindset of grammar writing takes a 10 min regular expression hack
stretch into 2 weeks of dev time (I am constantly switching languages so its
not like ever use the same parser generator twice). ^^ was true a month ago
until I met peg.js (<http://pegjs.majda.cz/online>). You can try out the
grammar live, without compilation on example data! This now means you can
rapidly prototype a grammar and even logic within it (in JS). Even if you are
not targeting a JS embedded parser, it still lets you work out your grammar
ready to translate into another language.

Since discovering peg.js I have applied proper parsing technology twice in a
month for relatively trivial tasks. I would have never bothered before. I
really recommend this tool for programmers who find the inertia of getting out
the parsing big guns too much to bother.

~~~
tikhonj
Parser combinators, like Haskell's Parsec, are another popular option like
this. They're also much more lightweight than a parser generator.

In Haskell, people almost never use regular expressions, and Parsec is part of
the reason. Anything trivial can be done just with library functions, and
anything that would require a non-trivial regexp are both far easier and far
more readable with Parsec.

------
AntiRush
Recently, I've been trying out a few of the options out there.

My requirements are:

* generates C

* generates a fast parser

Ease of use is also obviously a good thing. A few of the options I explored:

ANTLR[1]:

ANTLR is really great is you're using it for Java. Since Java is the first
officially supported target, everything works really well there. The C
bindings are pretty good too, though the documentation is pretty sparse and
somewhat out of date (the examples are good, though).

ANTLR4 looks really exciting. It uses a new algorithm called adaptive LL( _).
To quote the author,

    
    
        The benefit is that the adaptive algorithm is much  stronger than the static LL(*) 
        grammar analysis algorithm in v3. Honey Badger takes any grammar that you give it; it 
        just doesn't give a damn. (v4 accepts even left recursive grammars, except for indirectly 
        left recursive grammars where x calls y which calls x).  
    

When I was looking at it, ANTLR4 was not released and the C bindings hadn't
been updated. It looks like at least half of that changed in the last couple
of months.

Finally, ANTLRworks is awesome. You can graphically step through the parser
and it even draws great diagrams. It's far and away the best debugging
experience I've had.

Ometa[2]:

I looked briefly at Ometa - the work done on PEGs is really neat.
Unfortunately there's not a viable C backend and overall the project is fairly
undocumented.

Flex/Bison:

The old reliable. It's a bit harder to write your grammar for a LALR
generator, but it outputs good C code and is supported pretty much everywhere.
The resulting code is harder to debug than the code generated by ANTLR,
though.

Flex/Lemon[3]:

This is where I ended up. Lemon is the parser written for sqlite. It's very
similar to Bison but has some great new features and is a bit easier to use
and debug. It's also an extremely simple project (1 .c file and 1 template
file covers everything), and is extremely well tested (it's been used by
sqlite for a long time).

I'm not perfectly happy with my current solution - in an ideal world there
would be an LL(_) or PEG style parser generator that had first class support
for C as a target language.

[1]<http://www.antlr.org/> [2]<http://tinlizzie.org/ometa/>
[3]<http://www.hwaci.com/sw/lemon/lemon.html>

------
pjkundert
boost::spirit

Directly implement a parser for a language by writing its grammar in C++ code.
Spectacular, and very easy to use once you practice on a few small languages.
I implemented a fairly complete HTTP/1.1 parser using it, for an embedded
application's web interface, but it really shines for implementing tiny one-
line single-use parsers for strictly validating arbitrary program inputs;
instead of hacking it with a sub-par hand-written parser, you can implement a
completely specified grammar in a small inline block of C++, and it "reads"
like a YACC parser definition.

Wonderful invention.

And, might I add -- since it's C++, it is easy to use it for only your
parsers, and provide a 1st class C linkable API.

------
aurelianito
I always thought of parser generators as a DSL in disguise. There should be no
need for parser generators to generate code for high-level languages. It
should be quite easy to write a library that does the same thing and does not
complicate the build cycle.

Haskell's parser combinators (and its python cousins) are great examples of
it.

But, if you need C as the target language for portability reasons they might
be the right tool for the job.

~~~
mamcx
Which python cousins?

~~~
aurelianito
pyparsing (<http://pyparsing.wikispaces.com/>)

------
meric
Missing <http://www.gazelle-parser.org>

~~~
karavelov
Also missing Marpa: <http://jeffreykegler.github.com/Marpa-web-site/>

------
DanielBMarkham
Sigh. Not an F# in the bunch. Wish somebody would at least add FParsec to the
chart

~~~
askimto
It's Wikipedia. Why don't you do it?

