
Parsing: A Timeline (2014) - lelf
http://blogs.perl.org/users/jeffrey_kegler/2014/09/parsing-a-timeline.html
======
hardmath123
Re: Earley being forgotten, I've been working on the "nearley" Earley parsing
library for JS for many years now and it now has a very solid user-base (200+
dependents on npm). It's probably not the fastest a JS-based parsing library
could be, but it's certainly no longer forgotten! Here's a small sample of
some of the amazingly varied projects using Earley parsing (via nearley)
today… [https://nearley.js.org/#projects-using-
nearley](https://nearley.js.org/#projects-using-nearley)

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=8290681](https://news.ycombinator.com/item?id=8290681)

------
rstuart4133
LALR being so popular is just a result of computers being so slow at the time.
At the time Knuth's LR parsers took prohibitive amounts of time and memory to
construct. LALR fixed that by discarding the look ahead. It made the problem
of automatically building a LALR parser from a grammar tractable, but limited
the grammars that could be used so much you had to really work to get a
working grammar for a real language.

That's all changed. LR parsers written (eg
[http://lrparsing.sourceforge.net/](http://lrparsing.sourceforge.net/) can
compile grammars) in a reasonable amount of time, and its written in an
interpreted language. The Python parser it creates can process input in about
25us per token, which isn't too shabby. It doesn't insist you write
productions directly, but rather has a more succinct language that compiles to
productions. This language has the same "power" (it, can parse the exact same
class of languages as BNF), but handles most of the things people struggle
with when using raw LR productions - like lists, associativity and precedence.
And, the old clumsy error messages that required a PhD in computer parsing are
gone, replaced by messages most people can understand.

LL parsers aren't are powerful as LR (meaning the can recognise less real
world languages). But, and it's a big but, a human can hand code a LL parser
for a language whereas noone in their right mind would manually construct a LR
parser by hand. The hand written LL parsers are usually called recursive
descent parsers. Because they are hand written the humans writing them can
cheat. They can look ahead an arbitrary number of symbols, they can collate
and refer to information a LL or LR grammar can't use, like types. The result
is they can more than make up in the difference in power between LL and LR.
But it comes at a cost. The cost is mixing of concerns. Your parsing, type,
macro, and generation code gets intertwined.

Still in my view LL and LR parsers win the day. They both take a fixed and and
very small time to process each token, and also fixed and very small amount of
memory. The language they recognise is always well defined, by with I mean if
there default action if they don't know what language is to stop, whinge and
refuse to produce a parser. Whereas parsing techniques plough on anyway. God
know what language they are recognising if you get it wrong.

------
avmich
Tomita isn't mentioned :( . A good GLR approach...

