
Parsing Expressions by Recursive Descent (1999) - rspivak
https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm
======
userbinator
This is a great article that I've recommended many times for learning about
precedence-climbing, but I found it much more intuitive and concise to derive
precedence climbing as the refactoring of a recursive-descent parser in two
steps. This is succintly describable as:

1\. Notice how all the functions in an RD parser are almost the same: call the
next-higher-level, then loop with that call while the current token is at this
level. The topmost level is for consuming a terminal or a parenthesised
subexpression. That can easily be turned into one parameterised function and a
table lookup for the tokens.

2\. The first set of recursive calls always goes "to the top", so why not call
that directly. This is the 'P' function in the article. The second set of
recursive calls are nothing more than finding the right level to process the
current token, so jump directly to that level by calling the function directly
with that level number instead of repeatedly descending via recursive calls
until the current level matches the token. With the current token being of
level n, the parameterised RD parser calls expr(1) -> expr(2) -> expr(3) ->
... expr(n) when it could just call expr(n).

I think this is such an elegantly simple algorithm that anyone learning
computer science should have implemented at least once along with the usual
study of recursive data structures. It's also a very good step towards
understanding how a compiler works.

------
throwaway99182
'''Left recursive grammars, such as G, are unsuitable for recursive-descent
parsing because a left-recursive production leads to an infinite recursion.
While the parser may be partially correct, it may not terminate.'''

I don't see why a left-recursive solution doesn't terminate. Does anyone have
an insight to offer, or an example grammar and text on which a left-recursive
production doesn't terminate?

~~~
thaumasiotes
> an example grammar and text on which a left-recursive production doesn't
> terminate

All grammars with left recursion have this problem. The parser has no
lookahead (and even if it did, the required lookahead to avoid this is
unboundedly large) -- you can't tell, given your input, whether you should
recurse or not.

Here's a pretty simple grammar:

    
    
        S -> A
        A -> a
        A -> A B (this is left recursion, a rule for producing A that itself begins with A)
        B -> b
    

The language matches the same strings as the regular expression (a|ab+), that
is, a single a followed by zero or more b's.

So, suppose you're the parser. You begin with the start rule, which tells you
to expect an A. For the A, you have two choices: recur via the rule A -> A B,
or stop recurring and process a literal a. You see the input "a". How can you
know which to do?

Compare the analogous right-recursive grammar, which matches (a|b+a):

    
    
        S -> A
        A -> a
        A -> B A
        B -> b
    

Here, you start out expecting an A. When you see the input "a", you know that
the only way to make sense of it is to invoke the rule "A -> a", which is a
terminal rule. And when you see the input "b", you know the only thing you can
do is invoke the rule "A -> B A", since that is the only rule allowing an A to
begin with "b".

edit: replaced regexes involving * with uglier regexes using disjunction and
+, since HN doesn't allow asterisks in most normal uses.

~~~
meric
I've implemented parsing left recursions with a memoized recursive descent
parser generator. It requires hints to tell the generator it is left
recursion[1] - but even with this inconvenience, I think it's more convenient
than having to refactor a left-recursive grammar into a right recursive one
and then flipping the precedence back. I'm still finishing building the first
useful parser using it, there may still be bugs.

It works by dynamically refactoring the grammar at each recursion. [2] The
dynamic refactoring only happens if left recursion is hinted for that rule.

[1] See the `left(..)` call in `prefixexp` that tells the parser generator
about the left recursion. [https://github.com/meric/l2l/blob/parser-
generator/l2l/lua.l...](https://github.com/meric/l2l/blob/parser-
generator/l2l/lua.lua#L261)

[2] [https://github.com/meric/l2l/blob/parser-
generator/l2l/gramm...](https://github.com/meric/l2l/blob/parser-
generator/l2l/grammar.lua#L514)

