
PEG.js - parser generator for JavaScript - dmajda
http://pegjs.majda.cz/
======
grayrest
There are a couple other parser generators for JS:

* <http://zaa.ch/2k> \- Jison

* <http://inimino.org/~inimino/blog/peg_first_release> \- another Packrat (has an es5 parser as an example)

Plus a few more that work but produce generated code that's too slow to be
useful.

~~~
mnemonik
Worth noting that, last I heard, CoffeeScript is using Zaach's Jison:
<http://zaa.ch/2o>

------
scott_s
I had to do some digging for myself to verify that, yes, PEGs are a recent
concept distinct from the kind of grammars I'm already familiar with. See
Bryan Ford's paper, _Parsing Expression Grammars: A Recognition-Based
Syntactic Foundation_ from PoPL 2004:
<http://portal.acm.org/citation.cfm?id=964001.964011>

~~~
barrkel
Memoization to get linear performance out of backtracking, which is
exponential without it, is certainly a nice sleight of hand, but I wouldn't be
under any illusions that the constant factor is competitive with say an LALR
recognizer. Also, with a scheme that maps so closely to recursive descent,
it's easy to miss what amount to grammar ambiguities, where nested productions
preferentially eat tokens that may be in the follow set (i.e. the if/else
problem).

~~~
cynicalkane
PEGs don't have grammar ambiguities. Every PEG is deterministic.

Grammar ambiguities are resolved by ordered choice. If there are two ways to
parse something, the first way will be tried first. If it succeeds, the
ordered choice short-circuits and the second way is ignored. So you can
resolve, say, the classic dangling else problem by ordering your choices
correctly.

~~~
barrkel
I know all of that. My point is that the if/else problem really is an
ambiguity (i.e. it is a language bug, not just a problem you have to get
around in parsing), and problems similar to it crop up when your tool doesn't
alert you to first/follow conflicts, in LL(1) parlance. That PEGs encode the
implementation strategy inside the declarative grammar is a flaw, in my
opinion, not a benefit.

~~~
cynicalkane
In programming, ambiguity is _bad_! Explicitly encoding disambiguation rules
in the grammar is a good thing, just like it's a good thing in functional
pattern matching, in Prolog... I'd be interested to hear any reason why you
would want to not disambiguate a grammar in any language designed to be
computer-recognizable.

~~~
barrkel
Here, read this:

<http://compilers.iecc.com/comparch/article/05-09-114>

In particular:

"Yes, this is "the" problem with PEG's. / implies ordering of the search
(parsing) space. You need to order your / operators so that special cases
(e.g. longer matches), so that they appear first. Unfortunately, if you don't
do this, nothing will tell you you have a problem with your grammar, it will
simply not parse some inputs. To me this implies that if one wants to use a
PEG to parse some input, then one must exhaustively test the parser."

Hopefully the thread therein will describe my issue better than I have thus
far.

~~~
scott_s
Indeed it does, and Chris Clark very nicely summarizes the advantages and
disadvantages of PEG vs other parsers. Nice find.

------
patrickg
I love LPEG for Lua. It usually takes me some time to write a decent grammar,
but once it works it is rather straightforward to read it.
(<http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html>)

Now seeing this for javascript makes me think if I should actually do more
work with js.

~~~
richard_lyman
Amen on the easy to read part. I'm the author of clj-peg (
<http://www.lithinos.com/clj-peg/> ), and the main reason I wanted to write it
was to have grammars that were human readable.

~~~
allyt
I'm really interested in human-readable grammars. Thanks for your work.

------
cousin_it
A parser generator? Why oh why? It sounds so 80s. JavaScript is strong enough
to express parser combinators, e.g. see Chris Double's implementation (also
based on the PEG formalism):

[http://www.bluishcoder.co.nz/2007/10/javascript-parser-
combi...](http://www.bluishcoder.co.nz/2007/10/javascript-parser-
combinators.html)

I did the same thing in ActionScript. It's easy and fun.

~~~
dmajda
I believe that the classical generated parser has generally better performance
potential - there is less function calling, less string passing (but this may
be avoided easily in combinators too) and more opportunity for optimalization
if you have the whole grammar AST in hand.

------
raganwald
Are there PEGs available for languages like Javascript? Given recent
discussion of Perl being undecidable, I assume there isn't one for Perl. Has
anyone tackled such a thing for Ruby?

Update: Thanks for the links! That being said, I didn't mean parsers written
_in_ Javascript or Ruby, I mean PEGs or CFGs that parse Javascript and/or Ruby
programs :-)

~~~
ErrantX
<http://treetop.rubyforge.org/>

Im writing a programming language in Ruby at the moment - and after playing
with that for a few days I've gone back to "hand coded lexer plus Racc for
grammar". It's probably personal preference but I find that a little more
tunable.

~~~
dmajda
What exactly you didn't like?

~~~
ErrantX
Just personal preference: I prefer the grammar structure for parsers like Racc

------
euroclydon
I'm confused about the grammar example.

How does: multiplicative : primary "* " multiplicative { return $1 * $3; }

return the product of two integers?

~~~
mbrubeck
HN formatting messed up your asterisks. The whole rule is:

    
    
        multiplicative    : primary "*" multiplicative { return $1 * $3; }
                          / primary
    

"primary" refers to another rule in the grammar, which is defined as integer
or a parenthesized expression. This snippet defines "multiplicative", which is
either a primary, or a primary followed by an asterisk followed by another
multiplicative. (This recursively expands to allow any number of primaries,
separated by asterisks.)

The value of the multiplicative expression is the value of the first term
times the value of the third term. (The second term is just the literal string
"*".)

~~~
barrkel
On the face of it, however, the grammar is problematic. There's a reason it
uses "additive", "multiplicative" and "primary" rather than a more traditional
expression, term and factor: by using commutative operators like + and
multiply, and leaving out - and / it disguises the fact that the grammar is
actually evaluating the expression from right to left, rather than the
expected order of left to right.

~~~
dmajda
I knew somebody would raise this point :-) You are right that the evaluation
order would be wrong for "-" and "/".

I will probably implement support for left recursion in PEG.js - it is
possible (see e.g. <http://www.vpri.org/pdf/tr2007002_packrat.pdf>). After
that, the grammar could be rewritten to evaluate in the correct order.

(Another alternative - which works right now - is to change the parsing
expressions to something like "additive ([+-] additive)*" and deal with the
whole chain of operations with the same priority at once. I didn't use this in
the example as I wanted it to be as simple as possible.)

~~~
cynicalkane
Don't use that paper! I tried to use the parsing technique in that paper while
working on a parser for use at the CME, and it caused me weeks of headaches.

First, you'll note the algorithm is extremely complicated--nothing like the
simple top-down algorithm that makes PEGs so attractive. Not only is it
complicated, it misses basic refactoring issues--some logic is duplicated
across functions, and the functions interact in ugly ways.

Second, it doesn't even handle left recursion correctly. Throw a ruleset like
this at their parser

A -> B "a"

B -> C "b"

C -> B / A / "c"

and it will explode into a million little pieces, because the authors did not
account for any recursive rule having multiple recursion points. Don't even
try something like

S -> A / B

A -> A "a" / B / a

B -> B "b" / A / "b"

~~~
dmajda
Interesting, thanks for the warning. I only skimmed through the paper today
and noted that the algoritm seems complex, but I didn't attempt to understand
it in detail.

What was your final result? Did you implement the left recursion in the way
the paper describes, invented/found some other way or abandoned the whole
idea?

~~~
cynicalkane
The approach I'm working on uses the same "growing the seed" idea, but in a
different way.

It involves the memo entries being able to remember which left-recursive
results they are dependent on. This way, when a left-recursive rule produces a
result that is dependent on itself, it knows that this match can possibly be
"grown" through repeated iterations. That's a basic sketch of the idea.
Performance properties remain the same in the case of left-recursive rules
that are not interdependent. I don't really know what they are like for large
numbers of interdependent left-recursive rules--but if you have a language
like that, better to use an Earley or GLR parser.

I'm still working on it. It passes a battery of test cases, two of which I
posted above, but I'm not 100% confident in it just yet. Also, as posted
above, I'm trying to get permission from the higher-ups to release the code
into the wild.

