
Nearley – Parser toolkit for JavaScript - dkarapetyan
http://nearley.js.org/
======
matheusmoreira
I love the Earley algorithm! It's relatively small, online and works with any
context-free grammar. It seems to be the basis of many other algorithms I've
read about in the literature, suggesting it is easy to extend. The grammar is
pluggable, allowing the programmer to easily create an API around it. I wonder
why it's not widely used... Whenever I see a general algorithm being employed
in parser tools it's GLR. Perhaps a good Javascript implementation of Earley
could change that.

Nearley implements Joop Leo's improved handling of right-recursion¹ but
perhaps it can be optimized further by using the method of precomputation²
described in Aycock & Horspool's 2002 paper.

¹
[https://dx.doi.org/10.1016/0304-3975(91)90180-A](https://dx.doi.org/10.1016/0304-3975\(91\)90180-A)

²
[https://dx.doi.org/10.1093/comjnl/45.6.620](https://dx.doi.org/10.1093/comjnl/45.6.620)

~~~
ThePhysicist
While a GLR parser has the same worst-time complexity O(n^3), it usually
performs better on nearly deterministic or deterministic grammars (though
Earley can be modified to work better on deterministic grammars). Apart from
that the algorithms give quite similar results, and can both handle any
context-free grammar. GLR might have the advantage that it doesn't require any
"tricks" to handle right-recursion, and that the algorithm itself is very
compact.

I haven't implemented an Earley parser yet (should probably give that a try),
but I suspect it's not more difficult than implementing a GLR parser (for a
great GLR reference implementation, check out the Elkhound paper:
[http://scottmcpeak.com/elkhound/sources/elkhound/algorithm.h...](http://scottmcpeak.com/elkhound/sources/elkhound/algorithm.html)).

In any case time complexity is not always the best performance measure in the
real world, as it's usually the constant overhead that makes up for the
largest performance differences, at least when parsing programming languages
(which often are fully or almost fully mostly deterministic and can be handled
in linear time even by recursive-descent parsers given the right grammar).
Here, simple shift-reduce parsers really shine, as they do not do any
backtracking and work with a simple rule table and a heap/stack for the tokens
they emit. Also, the (optional but often useful) lexing phase of parsing
should not be underestimated, as it can be as tricky as the token-based
parsing that follows. Python, for example, cannot be lexed with a context-free
grammar as the indentation is stateful (and newlines are treated differently
depending whether they occur inside a bracket/parens expression or not, which
requires a grammar to keep track of the nesting)

The main advantage of "traditional" tools like yacc or bison is that they are
highly optimized, and produce parsers that can process > 100 kloc / second,
which is hard to achieve with most other frameworks (I couldn't find any
benchmarks on the JS parser).

~~~
ThePhysicist
Correction: If you want the parse tree, GLR and Earley can have O(n^4) time
complexity it seems:

[http://cstheory.stackexchange.com/questions/22621/complexity...](http://cstheory.stackexchange.com/questions/22621/complexity-
of-a-glr-parser-on-an-ambiguous-grammar)

------
mamp
Looks like a great toolkit. I'm going to convert some of my PEG.js grammars
over to test it out. It would be good to have an online page like
[https://pegjs.org/online](https://pegjs.org/online). It allows people to
start playing without all the setup.

------
leeoniya
feature set looks very similar to
[https://github.com/SAP/chevrotain](https://github.com/SAP/chevrotain)

would be good to have a comparison

EDIT: actually there is one [1], and it includes Nearley which does quite
poorly.

[1]
[http://sap.github.io/chevrotain/performance/](http://sap.github.io/chevrotain/performance/)

~~~
matheusmoreira
The difference in performance is explained by the capabilities of the
algorithms used by each parser library.

The Earley algorithm can parse any context-free grammar while Chevrotain
appears to be restricted to the LL(k) class of grammars. Earley is O(n³) in
the worst case but it performs better with more restricted classes of context-
free grammars. Even if given the same grammar, Earley will still incur the
cost of generality, a cost which LL(k) parsers don't have to pay.

~~~
leeoniya
Okay, do you have a non-contrived, practical example that can be handled by
Nearley but not LL(k)?

It also doesn't help that linear LL(k) performance/optimizations is the first
feature listed.

------
mamcx
How good is for error reporting? That is my #1 thing that drive me away for
parser generators...

------
eridius
Title should probably be "Nearley - parser toolkit for JavaScript".

