
Top Down Operator Precedence - shawndumas
http://javascript.crockford.com/tdop/tdop.html
======
barrkel
This approach is highly compatible with recursive descent, and I'd recommend
its use in place of how recursive descent normally deals with arithmetic
expressions involving multiple levels of precedence because of how it
eliminates redundant recursion, but I wouldn't push it much past unary and
binary operators. I think the trivial translation from (non-left-recursive)
BNF to recursive descent code is valuable enough to negate any performance
gains from Pratt-style parsers in general coding. Perhaps the only exception
would be dynamic grammars, i.e. parsers which have to deal with a grammar that
changes dynamically during the parse - this is where a table-driven parser
comes in handy, as it's usually easier to generate or modify tables than code.

I don't think the performance advantage is very significant outside of
expression-style parsing either. To explain why, look at a simple grammar
(written in EBNF using some regex operators for succinctness and avoiding left
recursion):

    
    
        program ::= statement* ;
        statement ::= (expr | while-stmt | for-stmt | if-stmt | compound-stmt |) ';' ;
        while-stmt ::= 'while' '(' expr ')' stmt ;
        for-stmt ::= 'for' '(' expr ';' expr ';' expr ')' stmt ;
        if-stmt ::= 'if' '(' expr ')' stmt [ 'else' stmt ] ;
        compound-stmt ::= '{' statement* '}'
        expr ::= simple-expr (('<' | '>' | '==' | '!=') simple-expr )* ;
        simple-expr ::= term (('+' | '-') term)* ;
        term ::= signed-factor (('*' | '/') signed-factor)* ;
        signed-factor ::= ['+' | '-' | '!'] signed-factor | factor ;
        factor ::= <number> | <ident> [ '(' [expr (',' expr)*] ')' ] | '(' expr ')' ;
    

The grammar above is trivially convertible to recursive descent, with each
grammar rule becoming a single method or procedure.

This kind of grammar is typical for a programming language, particularly one
inheriting from Algol. You're usually in some kind of context: a declaration
context, a statement context, or an expression context. Within any given
context, there is usually a group of potential non-terminals available, but
each of those non-terminals begins with a single specific token - e.g. 'if',
'while', 'for' etc. in a statement context in C or Pascal. Visually speaking,
the graph for the root non-terminal for any given "context" (e.g. statement in
the above) looks broad and shallow.

The counter-example is the equivalent graph for expression (expr in the
above). It is narrow and deep; this is necessary to encode operator precedence
in the natural translation to recursive descent. In order to parse 1 + 2, a
whole raft of rules need to be burrowed into: first expr, then simple-expr,
term, signed-factor, factor; and all the way back out to simple-expr, which
will parse the '+'; and then all the way back down to factor again, and back
out. This is where Pratt parsers excel; they remove all these redundant nested
calls.

