
The Lemon Parser Generator - znpy
http://www.sqlite.org/src/doc/trunk/doc/lemon.html
======
haxiomic
Lemon is great. I used it to develop a browser based GLSL parser[0]. I spent
ages working with existing javascript parser generators and kept running into
walls with syntax and performance until I settled on porting Lemon.

I ported the core parser to haxe rather than straight javascript which means
it can be used to build dependancy free parsers for python, java, c++, php
(and any other haxe-supported[1] language)

I've only ported the LALR portion of the parser - the data tables are still
generated by the c version of lemon. Although I haven't wrapped it up in a
self-contained project, if you want to do something similar have a look at the
gh repo[2]. I've not documented anything, but if someone finds this and is
interested in building a cross-platform parser with a similar method feel free
to get in touch and I'll give you a guide on using the code. If there's enough
interest I'll build it out into something self-contained and easy to use.

[0] [http://haxiomic.github.io/haxe-glsl-
parser/](http://haxiomic.github.io/haxe-glsl-parser/)

[1] [http://haxe.org/manual/introduction-what-is-
haxe.html](http://haxe.org/manual/introduction-what-is-haxe.html)

[2] [https://github.com/haxiomic/haxe-glsl-
parser/tree/master/too...](https://github.com/haxiomic/haxe-glsl-
parser/tree/master/tools/parser-generator)

~~~
hobo_mark
what kind of limitations did you find in the JavaScript ones?

~~~
haxiomic
There were a few aims I wanted to meet:

\- Stick as closely as possible to the GLSL specification by using the
reference grammar with minimal modification

\- Produce a haxe version of the parser so I could do some haxe-compile-time
magic with it

Haxe is fairly similar to javascript so I tried altering PEG.js to spit out
haxe. The result was a 9000 line file which it turns out was too big to
compile. ANTLR looked promising but it appeared to be a bit too much to work
create a haxe port of the runtime.

Lemon was minimalistic and had a simple syntax which was fairly close to the
syntax of the reference grammar. The core parser is only about 300 lines[0]
and everything else is in data tables[1] which are generated by the lemon
command line tool.

[0] [https://github.com/haxiomic/haxe-glsl-
parser/blob/master/gls...](https://github.com/haxiomic/haxe-glsl-
parser/blob/master/glsl/parse/Parser.hx)

[1] [https://github.com/haxiomic/haxe-glsl-
parser/blob/master/gls...](https://github.com/haxiomic/haxe-glsl-
parser/blob/master/glsl/parse/Tables.hx)

------
fizixer
So where does this lie on the antlr, Allen Short's parsley, PEG, spectrum?
(very loosely speaking)

And how does it relate to the issues discusseed in Haberman's articles [0],
[1], [2]? which can be considered a good survey of this field (although I'm
open to other surveys if anyone can point out).

Also intersting (Terence Parr's talk, ANTLR creator) [3], Allen Short's talk
[4].

[0] [http://blog.reverberate.org/2013/07/ll-and-lr-parsing-
demyst...](http://blog.reverberate.org/2013/07/ll-and-lr-parsing-
demystified.html)

[1] [http://blog.reverberate.org/2013/08/parsing-c-is-
literally-u...](http://blog.reverberate.org/2013/08/parsing-c-is-literally-
undecidable.html)

[2] [http://blog.reverberate.org/2013/09/ll-and-lr-in-context-
why...](http://blog.reverberate.org/2013/09/ll-and-lr-in-context-why-parsing-
tools.html)

[3]
[https://www.youtube.com/watch?v=q8p1voEiu8Q](https://www.youtube.com/watch?v=q8p1voEiu8Q)

[4]
[https://www.youtube.com/watch?v=t5X3ljCOFSY](https://www.youtube.com/watch?v=t5X3ljCOFSY)

~~~
SQLite
I wrote lemon in the late 1980's on a Sun4, while a graduate student. There
was also a program called "lime" that generated an LL(1) parser, but I've long
since lost that code.

Lemon was intended as a yacc-replacement. The advantages of lemon over yacc
are that lemon has a less error-prone syntax (it uses symbolic names rather
than $1, $2, etc), and that lemon generates a reentrant and thread-safe
parser. (At the time, yacc/bison parsers were neither reentrant nor thread-
safe. I don't know if that has been fixed in the intervening decades.)

Lemon has always been open source. But it languished with little attention for
10 years until I used it to generate the parser for SQLite. Then suddenly
people started to notice and use it.

Lemon does not have a separate version control system. The source code to
Lemon (a single file of C plus a template file for the generated parser) are
part of the SQLite source tree.

~~~
code4life
Yes, bison now is both, reentrant, thread-safe and supports push parsing. I
helped implement the push parser support many years ago with the help of the
Bison development team.

------
seiji
If you want to see the complete SQLite SQL grammar that gets passed to Lemon,
check out
[http://www.sqlite.org/cgi/src/artifact/f599aa5e871a4933](http://www.sqlite.org/cgi/src/artifact/f599aa5e871a4933)

It's pretty easy to read once you understand the file format (basically for
each "line": [parser stuff] { C code using results of parsing })

------
urand48
I love lemon, and used it to implement a simple configuration file language
that my customers loved.

The advantages for me -- especially vs yacc -- were:

\- easy of use / great documentation

\- clear & customizable syntax error messages

\- thread safety (no statics)

\- no memory leaks / support for C++ destructors

\- generated code runs cleanly under valgrind

------
sarahprobono
I've used Lemon before for a compilers class at Uni. I was using yacc before,
and can say that the Lemon syntax is much more pleasant to work with.

