Comparison of parser generators

tlarkworthy · on Jan 27, 2013

I wrote a simple compiler during uni, and a syntax highlighter for a commercial product in 2002. I consider myself competant with grammars. Yet I hardly ever deploy my parsing skills in everyday programming, even though I know most protocol interpreters, file readers etc. would befitting from the readability that grammar and parsing separation achieves. I never use parsers because the "compile the compiler" then "compile the generated code" is horribly cumbersome. Also, the time taken to remember grammar syntax and get back in the mindset of grammar writing takes a 10 min regular expression hack stretch into 2 weeks of dev time (I am constantly switching languages so its not like ever use the same parser generator twice). ^^ was true a month ago until I met peg.js (http://pegjs.majda.cz/online). You can try out the grammar live, without compilation on example data! This now means you can rapidly prototype a grammar and even logic within it (in JS). Even if you are not targeting a JS embedded parser, it still lets you work out your grammar ready to translate into another language.

Since discovering peg.js I have applied proper parsing technology twice in a month for relatively trivial tasks. I would have never bothered before. I really recommend this tool for programmers who find the inertia of getting out the parsing big guns too much to bother.

tikhonj · on Jan 27, 2013

Parser combinators, like Haskell's Parsec, are another popular option like this. They're also much more lightweight than a parser generator.

In Haskell, people almost never use regular expressions, and Parsec is part of the reason. Anything trivial can be done just with library functions, and anything that would require a non-trivial regexp are both far easier and far more readable with Parsec.

_yid9 · on Jan 27, 2013

boost::spirit

Directly implement a parser for a language by writing its grammar in C++ code. Spectacular, and very easy to use once you practice on a few small languages. I implemented a fairly complete HTTP/1.1 parser using it, for an embedded application's web interface, but it really shines for implementing tiny one-line single-use parsers for strictly validating arbitrary program inputs; instead of hacking it with a sub-par hand-written parser, you can implement a completely specified grammar in a small inline block of C++, and it "reads" like a YACC parser definition.

Wonderful invention.

And, might I add -- since it's C++, it is easy to use it for only your parsers, and provide a 1st class C linkable API.

aurelianito · on Jan 27, 2013

I always thought of parser generators as a DSL in disguise. There should be no need for parser generators to generate code for high-level languages. It should be quite easy to write a library that does the same thing and does not complicate the build cycle.

Haskell's parser combinators (and its python cousins) are great examples of it.

But, if you need C as the target language for portability reasons they might be the right tool for the job.

mamcx · on Jan 29, 2013

Which python cousins?

aurelianito · on Feb 2, 2013

pyparsing (http://pyparsing.wikispaces.com/)

AntiRush · on Jan 27, 2013

Recently, I've been trying out a few of the options out there.

My requirements are:

* generates C

* generates a fast parser

Ease of use is also obviously a good thing. A few of the options I explored:

ANTLR[1]:

ANTLR is really great is you're using it for Java. Since Java is the first officially supported target, everything works really well there. The C bindings are pretty good too, though the documentation is pretty sparse and somewhat out of date (the examples are good, though).

ANTLR4 looks really exciting. It uses a new algorithm called adaptive LL(). To quote the author,

    The benefit is that the adaptive algorithm is much  stronger than the static LL(*) 
    grammar analysis algorithm in v3. Honey Badger takes any grammar that you give it; it 
    just doesn't give a damn. (v4 accepts even left recursive grammars, except for indirectly 
    left recursive grammars where x calls y which calls x).

When I was looking at it, ANTLR4 was not released and the C bindings hadn't been updated. It looks like at least half of that changed in the last couple of months.
Finally, ANTLRworks is awesome. You can graphically step through the parser and it even draws great diagrams. It's far and away the best debugging experience I've had.
Ometa[2]:
I looked briefly at Ometa - the work done on PEGs is really neat. Unfortunately there's not a viable C backend and overall the project is fairly undocumented.
Flex/Bison:
The old reliable. It's a bit harder to write your grammar for a LALR generator, but it outputs good C code and is supported pretty much everywhere. The resulting code is harder to debug than the code generated by ANTLR, though.
Flex/Lemon[3]:
This is where I ended up. Lemon is the parser written for sqlite. It's very similar to Bison but has some great new features and is a bit easier to use and debug. It's also an extremely simple project (1 .c file and 1 template file covers everything), and is extremely well tested (it's been used by sqlite for a long time).

I'm not perfectly happy with my current solution - in an ideal world there would be an LL() or PEG style parser generator that had first class support for C as a target language.

[1]http://www.antlr.org/ [2]http://tinlizzie.org/ometa/ [3]http://www.hwaci.com/sw/lemon/lemon.html

meric · on Jan 27, 2013

Missing http://www.gazelle-parser.org

karavelov · on Jan 27, 2013

Also missing Marpa: http://jeffreykegler.github.com/Marpa-web-site/

DanielBMarkham · on Jan 27, 2013

Sigh. Not an F# in the bunch. Wish somebody would at least add FParsec to the chart

askimto · on Jan 27, 2013

It's Wikipedia. Why don't you do it?