Hacker News new | past | comments | ask | show | jobs | submit login
Top Down Operator Precedence (1973) (tdop.github.io)
41 points by carlsmith on May 25, 2015 | hide | past | favorite | 19 comments

The only copy I could find was a PDF that contained images of the original paper, literally printed on paper. It seemed worth redoing and putting online properly.

I've tried not to change anything beyond the typography, but it's tricky to get perfect. It's on GitHub, so feel free to chip in.

Can you link to the scanned PDF and your GitHub repo? I've noticed a few places with mismatched brackets, such as "(....]", and wanted to check to see if they were mismatched in the original or if that's just a transcription error.

edit: Found the original at http://hall.org.ua/halls/wizzard/pdf/vaughan.pratt.tdop.pdf and your repo at https://github.com/tdop/tdop.github.io. Am reading the paper now, it's not one that I'd read before so thanks for posting it, and will send a pull request with suggested fixes when I'm done.

There seem to be a number of places where you've changed hyphens to em-dashes. (E.g.: "This argument is independent of whether we specify program control explicitly, as in Algol—like languages, or implicitly, as in Planner—Conniver-like languages.") It's a bit confusing to read. Would you possibly mind fixing this? Thank you!

I collected up a number of nits like this, plus some more meaningful transcription errors, into a branch, and have sent a pull request: https://github.com/tdop/tdop.github.io/pull/2

I read through the whole thing, but there's something about the writing style which I think makes it rather difficult to follow; even the Dragon Book was more straightforward.

If I'm not mistaken, this article describes the same thing, and it was far easier to read:


It took me a number of readings before I figured it out, back in the day. I summarized my understanding in http://eli.thegreenplace.net/2010/01/02/top-down-operator-pr... -- hope it's helpful

The Pratt/TDOP scheme is a decentralized or OO way to organize the same idea. I coded them both, at https://github.com/darius/sketchbook/blob/master/parsing/pre... and https://github.com/darius/sketchbook/blob/master/parsing/pra...

(My Pratt is smaller than my precedence-climbing here because I ended up actually using the latter, so it got some extras added.)

Talking about the shunting yard algorithm, it's funny how often trains and railroads have inspired programming concepts. Parsing, concurrency and recently Scott Wlaschin describe monads through railways http://fsharpforfunandprofit.com/posts/recipe-part2/.

ps: while searching for Wlaschin talk I stumbled upon that https://www.google.com/search?q=monad+railway+equipment #rimshot

Worth noting is Doug Crockford's TDOP parser implementation in JavaScript, it's quite elegant: https://github.com/douglascrockford/TDOP/

As well as his accompanying talk, titled Syntaxation: https://www.youtube.com/watch?v=9e_oEE72d3U (sadly, the recording has gaps and is generally horrible; as far as I know he gave the talk only once).

Crockford's talk led me to the original paper and together they really expanded my horizon on how to make maintainable parsing engines. Thanks for putting in the effort and making it more accessible!

Surprisingly, when people can shoot movies with their iPhones, we have nothing to help here. I'd love to see something similar to super-resolution, merging multiples audio sources into a 'higher' res one. And temporal markov graph-cut, to fill gaps as much as possible.

Wrong post?

Just a strong digression. It's sadly common to have badly recorded talks which is surprising considering the era.

Yeah, it doesn't help that conference organizers tend to discourage the audience from doing their own recordings.

Why is that ? Copyrights or syndicated revenue ? Professional conference recording seems quite pricey[1]. Maybe there can be a middle ground solution.

[1] There's a crowdfunding to help pay for some lambda talks, it was around 15k. I don't know the details though, if it's for many days in many rooms

I love the idea of remastering old papers for the web. While SVG and MathJax allow to easily include diagrams and math, this work shows that you can get away with using only Unicode.

To make the most of the capabilities of HTML, I would have made citations hoverable, so a tooltip can show the reference next to it without the need to go to the bottom.

I have used TDOP and made variations of it before (e.g. combining TDOP and PEG, as well as introducing a form sub-grammars, like what you get with CFGs). I could also see the similarity between TDOP and left corner parsing, which gave me a better grounding on it.

Still never really felt like I deeply understood it. I think there was some intuitive leap I just wasn't making. A lot of it had to do with the choice of left/right binding powers, and loop that does the comparison and chooses whether to keep extending or to "dive down" into a new sub-tree.

Perhaps if there was a slightly different formulation, or something that was just a plain obvious description or methodology for getting the numbers right I would finally get it.

Why not hard-code the operator precedence into the BNF? Like the K&R book does it.

If you're using something that generates a parser for you—like bison or yacc—then sure, you can express your grammar like that, and there are advantages to doing so. On the other hand, if you don't have something like that available—say, you're working in a language where nobody has written a yacc clone yet, so there's no way for you to generate a parser from a BNF description—then implementing a Pratt parser is a lot easier than implementing a parser generator.

Even if you have alternatives, there are still situations in which you might want to use a Pratt parser. It's much easier to understand the internals of this kind of parser than understand the shift/reduce rules of an LL(k) grammar, and Pratt parsers have the nice property that you can easily modify them at runtime, making it easy to switch to alternate grammars in particular contexts—e.g., directly parsing SQL within particular delimiters is a pretty trivial task that is somewhat more difficult with a BNF-style grammar.

EDIT: Actually, a better final example would be user-defined operators with arbitrary precedence and branching. There is no way of expressing this directly in a BNF-style grammar, but it is trivial with a Pratt parser.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact