
LR Parsing: More Elegant Than You Think - jhpriestley
https://jasonhpriestley.com/lr
======
dbcurtis
1\. Web site is just a blank white page on mobile Safari.

2\. LR parsing is elegant in the mathematical sense of the word, but use in
production leads to regret. The problem with bottom-up parsers of any flavor,
LR included, is that when you encounter a syntax error there is no context
information, or context is difficult to infer, which makes creating a sensible
error message less fun than poking yourself in the eye repeatedly with a sharp
stick. I contend that giving good error messages is the most important factor
in user-friendliness of any parser.

Look up "precedence climbing" \-- a recursive descent parser with expressions
handled by precedence climbing is going be an easier path to a production
parser with good diagnostics, IMHO.

~~~
cwzwarich
> The problem with bottom-up parsers of any flavor, LR included, is that when
> you encounter a syntax error there is no context information, or context is
> difficult to infer, which makes creating a sensible error message less fun
> than poking yourself in the eye repeatedly with a sharp stick.

Both LL and LR parsers share a fundamental flaw for error reporting; they are
both prefix parsers. They will parse the input until it fails to be a valid
prefix of the language represented by the grammar. However, it is rare that
this coincides with the place where the actual error was made.

The difference in information available here at the end of the valid prefix is
less about the difference between LL and LR parsing and more about the
difference between LL and LR grammars. An LR parser parsing an LL grammar is
able to recover the same information as an LL parser at the point of error.
And for non-LL grammars, it is easy to extend an LR parser with additional
context information by splitting states (see
[http://gallium.inria.fr/~fpottier/publis/fpottier-
reachabili...](http://gallium.inria.fr/~fpottier/publis/fpottier-reachability-
cc2016.pdf)).

Parsing algorithms that do not produce prefix parsers (e.g. precedence
parsing) are often able to produce more logical error messages with less work,
because they continue making actions after the input is no longer a valid
prefix, and this can amount to gathering more information about the error. For
the same reason, LR-family parsers that perform default reductions (e.g. LALR
parsers, or any minimal state LR parser) often perform reductions (but not
shifts) after the point of error, and produce better error messages than
canonical LR parsers.

My favorite approach to error recovery is Richter's approach of error
intervals
([https://dl.acm.org/citation.cfm?id=4019](https://dl.acm.org/citation.cfm?id=4019)),
which uses alternating prefix/suffix parsers to find minimal error regions in
the input that can not occur as a substring of any valid input. This has no
dependence on the grammar or parsing technique. It was not (widely?) known at
the time that Richter wrote his paper, but suffix parsing of LR grammars is
linear time.

~~~
mehrdadn
I'm unfamiliar with precedence parsing but what attracted me to (G)LR was its
running time. What is the running time guarantee of a precedence parser for
LR(k) and general (including potentially ambiguous) context-free grammars?

~~~
cwzwarich
Operator precedence parsing is linear time, but it only produces a parser for
a very limited set of grammars, the epsilon-free LR(1) grammars with no
consecutive nonterminals.

Precedence parsing is typically combined with another parsing method to parse
a non-operator part of a grammar, or it is used with preprocessing filters
that insert extra marker tokens to work around the limitations.

Most programming languages have grammars that are very close to being operator
precedence languages, so in the early days of parsing there were a lot of
attempts to extend precedence parsing to handle more general grammars. After
SLR/LALR parsing was developed, most of this work stopped.

~~~
mehrdadn
Thanks! That sounds awfully limiting if I'm being honest, even if a language
is 'mostly' well-behaved... I can't say I can see how you could use
preprocessing to get around the limitations since it would seem that doing so
properly would require already parsing the input. It sounds like a nice method
for when it works but it doesn't really sound like it would substitute for LR?

~~~
oso2k
Not really limiting. Crockford used it in jslint [0][1][2] to parse
JavaScript, and OIL Shell also uses it [3] to parse a shell language.

[0] [https://youtu.be/Nlqv6NtBXcA](https://youtu.be/Nlqv6NtBXcA) [1]
[https://crockford.com/javascript/tdop/tdop.html](https://crockford.com/javascript/tdop/tdop.html)
[2] [https://en.wikipedia.org/wiki/Operator-
precedence_parser](https://en.wikipedia.org/wiki/Operator-precedence_parser)
[3]
[https://www.oilshell.org/blog/2017/03/31.html](https://www.oilshell.org/blog/2017/03/31.html)

------
andreareina
The layout engine is making the lines too long, which combined with hard line
breaks means that this is what I get:

    
    
        ************************
        ***
        ************************
        *****
        ************************
        **
        ************************
        ****

~~~
klmr
Honestly, the custom layouting on this website is cute but I wish the author
would ship proper HTML code instead and accept the _slightly_ (if that!)
inferior layouting of the browser engine.

At the moment the layouting als breaks the expected behaviour when selecting
the text (double-click usually selects a word, but on this website it selects
a seemingly arbitrary sentence fragment). But this doesn’t matter because
copy&paste is broken anway, since all spaces are removed (but random hyphens
are instead included. uhm.)

Furthermore, the layout engine seems to attempt some microtypography
corrections implemented by pdfTeX but does so relatively badly (dashes
protrude too far, other punctuation doesn’t protrude at all; punctuation is
broken incorrectly across lines; ). In sum, I’m not convinced this layout is
actually even better than the browser’s, even ignoring the bugs.

------
yuchi
The code snippets use JS in a very non idiomatic way — which makes this post
interesting IMHumbleO

There are few mutations that could be removed to have a clearer/purer
functional approach, BTW

------
WallWextra
I recently was looking at the original LR paper by Knuth, and for the first
time noticed that it said "Case Institute of Technology" at the top.

i.e. he discovered it as an undergrad.

~~~
pmiller2
To be fair, there was a lot more not terribly complex yet still foundational
stuff to be discovered then.

------
isaachier
What language is being used in the snippets?

~~~
jhpriestley
Javascript

~~~
isaachier
Lol I thought it might be. The dialect has changed a lot since I last dabbled
in JS.

~~~
ninjakeyboard
It's come a very long way.

~~~
rimliu
Not really.

