
Pratt Parsers: Expression Parsing Made Easy - jashkenas
http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
======
eliben
Uhm, I wrote this a while ago: [http://eli.thegreenplace.net/2010/01/02/top-
down-operator-pr...](http://eli.thegreenplace.net/2010/01/02/top-down-
operator-precedence-parsing/)

[sorry for the plug, but I think my tutorial is more comprehensive in terms of
explaining how the thing actually works]

~~~
silentbicycle
I agree - I find your post is clearer than the original paper, Crockford's
explanation, and Lundh's Python post. It helps that you start with just
addition and multiplication, _then_ add more functionality after explaining
the basics.

------
scott_s
_I slapped together a crude lexer that works and we’ll just pretend that
tokens are raining down from heaven or something._

This perfectly describes all parsing discussions and papers I've read. I love
it.

~~~
ShirtlessRod
"I have a truly marvelous demonstration of this proposition which this margin
is too small to contain."

------
fholm
I love Pratt parsers (or Top Down Operator Precedence parsers rather), I
recently re-wrote the IronJS lexer and parser by hand (from using one
generated by ANTLR) using the techniques described in this blog post and saw
some pretty hefty performance increments
[http://ironjs.wordpress.com/2011/03/19/new-lexer-and-
parser-...](http://ironjs.wordpress.com/2011/03/19/new-lexer-and-parser-in-
ironjs/)

Edit, I also have a generalized version that can take pretty much any input
and any output (F#) here: <https://github.com/fholm/Vaughan>

------
cletus
A post about... Programming? This isn't the HN I know!

Good post. I need to ruminate on this some more but it might well apply to
something I'm working on.

------
ms4720
how to do it in python: <http://effbot.org/zone/simple-top-down-parsing.htm>

~~~
beagle3
Unfortunately I can only upvote once. Way, way more complete than the post
linked article.

~~~
ms4720
not a problem

------
ssp
I believe this algorithm is also known as "precedence climbing", and it's the
most common way to deal with operator precedence in a recursive-descent
parser.

~~~
barrkel
I'd say that having a single production for each precedence level is a more
common technique in recursive descent, but it that can be slow because every
factor needs to be parsed by recursing through all precedence levels. This
technique avoids that, and can parse factors straight away before going into
the priority checking loop.

Delphi uses this parsing method for expressions, and is one of the reasons
it's so fast at compiling.

~~~
scottdw2
I don't actually know if Delphi is fast or not, I'll take your word on it.

However, I'd be skeptical that the parser is one of the main reasons for its
fast compile times. In my experience, parsing is really a tiny amount of
compilation time. The VB.NET compiler, for example, uses the same parsing
technique, and it is ... "not fast" at compiling. It spends most of its time
binding symbols.

~~~
barrkel
Delphi's parser binds symbols and types the tree as it parses. The only thing
left to do after parsing is code generation, which is only a couple of passes
over the tree.

The parser (or more accurately, the lexer) is the bit of the compiler which
ultimately limits its performance, because it needs to see every character of
the source. The compiler can never be faster than linear in the length of its
input. Metrics on the Delphi compiler actually show that string hashing is the
hottest piece of code, and that's already optimized about as far as it can go.

~~~
scottdw2
Which means its fast because of the way it binds symbols, not because of the
way it parses infix expressions. One thing to consider is that such a design
hurts the usability of the language. It's a lot easier to not have to worry
about forward declarations and "module" interfaces. I don't know Delphi, but I
imagine what you describe relies on something like the "unit" construct from
Turbo Pascal.

~~~
barrkel
There are many things that make it fast - or rather, it's the lack of slow
things that make it fast. Excessive recursion during parsing is one of those
things that is avoided (and that was what was on-topic here). Your focus on
binding is instead a focus on something that another compiler does
particularly slowly. That a different compiler is fast is not due to it being
fast at binding specifically; it's due to being fast, or at least not slow, at
almost everything.

Delphi does indeed inherit unit syntax from Turbo Pascal. This is both a
blessing and a curse; it makes writing well-structured programs easier by
enforcing a logical consistency in ordering (programs more or less have to be
written in a procedurally decomposed way), but it also makes writing programs
with complex interdependencies between parts more awkward. The unit format is
also one of the things that makes the compiler fast (or not slow); relinking a
binary unit based on a partial compile is very quick, and as it's not a dumb
C-style object format, the compiler can be intelligent about dependencies.
Every symbol has a version, a kind of hash, associated with it, computed from
its type, definition, etc. When units are relinked after a recompile of a
unit, only the symbols need be looked up and their versions compared; if there
is no version mismatch between imports and exports, then dependent units don't
need recompilation.

Random factoid: the guy who wrote the original source for the current Delphi
compiler was also one of the developers for the .NET GC, and CLR performance
architect (Peter Sollich).

------
statictype
I've also played around with combinatorial parsing using F#'s FParsec (a clone
of Haskell's Parsec lib) and I found very pleasant and powerful to work with.

------
amadiver
Excited to see how this is integrated with Magpie

