

Some problems of recursive descent parsers - dkarapetyan
http://eli.thegreenplace.net/2009/03/14/some-problems-of-recursive-descent-parsers

======
timtadh
The real solution to this problem is to properly remove the left recursion.
Student's often make the mistake the author did. Let's examine our favorite
grammar:

    
    
        E -> E + T
           | E - T
           | T
        T -> T * F
           | T / F
           | F
        F -> id
           | number
           | ( E )
    

The obvious way to remove left recursion in the grammar is to "flip" the non-
terminals. Like so:

    
    
        E -> T + E
           | T - E
           | T
        T -> F * T
           | F / T
           | F
        F -> id
           | number
           | ( E )
    

This is wrong. Consider the string "28 / 7 * 2". The orginal grammar produces
the following parse:

    
    
              E
              |
              T
             /|\
            T * F
           /|\  |
          T / F 2
          |   |
          F   7
          |
         28
    

The new grammar produces:

    
    
              E
              |
              T
             /|\
            F / T
            |  /|\
           28 F * T
              |   |
              7   F
                  |
                  2
    

When computing the expression. These tree produce different answers. The first
gives "8", the correct answer, the second gives "2". Basically, it is
parenthesized incorrectly.

To fix this problem the solution is to insert intermediate non-terminals into
the grammar:

    
    
        E -> T E'
        E' -> + T E'
            | - T E'
            | e            <--- standing in for epsilon, the empty string
        T -> F T'
        T' -> * F T'
            | / F T'
            | e
        F -> id
           | number
           | ( E )
    

Consider the new parse tree

    
    
                 E
                / \
               /   \
              T     E'
             / \    |
            F   T'  e
            |  /|\
           28 / | \
             /  |  \
            /   F   T'
                |  /|\
                7 * F T'
                    | |
                    2 e
    

If processed correctly this tree will yield the correct expression tree.

    
    
              *
             / \
            /   2
           / \
         28   7
    

The "trick" is to swing up subtrees from the prime nodes. The operator coming
up from the prime node is going to be the root of the subtree (for that
precedence level) and the new operator is going to be inserted as the left
most descendent.

To see how this works with code. Checkout this example recursive descent
parser for this grammar.

[https://github.com/timtadh/tcel/blob/master/frontend/parser....](https://github.com/timtadh/tcel/blob/master/frontend/parser.go#L361)

The swing and collapse functions are defined here:

[https://github.com/timtadh/tcel/blob/master/frontend/parser....](https://github.com/timtadh/tcel/blob/master/frontend/parser.go#L361)

You can read all about how to do this correctly in the "Dragon Book" Section
4.3.3 (page 212) in the second edition. "Compilers: Principles, Techniques, &
Tools" by Aho et. al.

------
qznc
Precedence climbing solves these problems.

This article is from 2009. Same blog 2012:
[http://eli.thegreenplace.net/2012/08/02/parsing-
expressions-...](http://eli.thegreenplace.net/2012/08/02/parsing-expressions-
by-precedence-climbing)

~~~
timtadh
Yeah but no. That is no longer a recursive descent parser it is a hand written
Shift-Reduce parser. Nothing wrong with that just a totally different parsing
technique. He also covers several other interesting techniques on his blog.
However, the "standard" solution is the one given in my comment above. As long
as you can transform your grammar in to the correct form. Aho et. al give the
fully general algorithm for conversion as I mentioned.

~~~
qznc
Ok, I agree that "standard" is grammar transformation. I'd argue that
precedence climbing is "better", which means it provides a bunch of benefits
and nearly no downsides in practice.

------
drallison
Rather than using recursive descent parsing, a better approach is to use
Vaughan Pratt's Top Down Operator Precedence Algorithm
([http://en.wikipedia.org/wiki/Pratt_parser](http://en.wikipedia.org/wiki/Pratt_parser)
). It gives the simplicity and conceptual elegance of recursive descent while
avoiding the recursive plunge of a complex expression grammar.

