
Earley Parsing Explained - rwmj
http://loup-vaillant.fr/tutorials/earley-parsing/
======
kazinator
I'm looking at this, and the Earley items with the "fat dot" look a heck of a
lot like LR(0) kernel items in LALR(1) parser generation.

Then it hits me: it looks like Earley parsing is to LALR(1) (vaguely ) like
NFA simulation is to DFA table.

If we look at it at a very high level: Earley is making these items
dynamically while scanning the _input_ , and grouping them into sets
representing states. Whereas a LALR parser generator will generate the items
and group them into subsets statically, while processing the _grammar_ ,
generating a push-down table driven by lookahead which is then applied to the
input.

Analogously, NFA computes sets of states dynamically according to the input,
by performing closures on the NFA graph, whereas DFA does it statically in the
absence of input.

------
pbiggar
Slight tangent, but interesting: Earley became a therapist and has been
practicing since 1973. From his bio:

> Jay also has a Ph.D. in computer science from Carnegie-Mellon University and
> was formerly on the U.C. Berkeley faculty, where he published 12 computer
> science papers, one of which was voted one of the best 25 papers of the
> quarter century by the Communications of the A.C.M.

[https://selftherapyjourney.com/Pattern/Beginning/Who_We_Are....](https://selftherapyjourney.com/Pattern/Beginning/Who_We_Are.aspx)

------
thristian
Traditional shift/reduce parsers are impractical to write without a
specialised tool and often difficult to debug, PEG parsers and parser-
combinators are great to work with but are often inefficient and produce
unhelpful syntax errors. If there's some other parsing scheme that could be a
best-of-both-worlds, I would love to learn more.

~~~
kazinator
Unfortunately, there is a catch:

[http://loup-vaillant.fr/tutorials/earley-parsing/parser](http://loup-
vaillant.fr/tutorials/earley-parsing/parser)

Once you have a successful parse, extracting the tree from it is a little bit
like pulling teeth. You have to perform searches on the Early set data, and
deal with ambiguities at that point.

None of the mainstream method have any major difficulty with popping out
abstract syntax trees in a straightforward syntax-directed manner.

~~~
yxhuvud
There are efficient ways to do it though (the article link to Scott), even if
they are not always very easy to understand.

