Hacker News new | past | comments | ask | show | jobs | submit login
Technical Papers Every Programmer Should Read (At Least Twice) (fogus.me)
198 points by llambda on Dec 22, 2011 | hide | past | favorite | 30 comments

Is there a list anywhere of which Computer Science (and related) textbooks are worth reading? (Perhaps based on which universities are actually using them?)

Personally, I find that the research literature is usually (intentionally or unintentionally) opaque and that most ideas worth learning are at least 5 years old, so textbooks are the best place to actually learn things from a source intended to teach you the material.

The list I could actually vouch for would be:

  Russell and Norvig
  Introduction to IR (Manning et al)
  Database Systems: The Complete Book
Others that I've heard good things about are:

  Dragon Book
  Convex Optimization (Boyd)
  Probabilistic Robotics
  Foundations of Statistical NLP
I'd really like a good distributed systems or distributed databases book (rather than reading old Lamport papers or something), as well as a good statistics/probability book, and I'm not particularly interested in books about coding/engineering style any more (e.g., Code Complete, Pragmatic Programmer, Matz's Ruby book, K&R).

There is SICP.

For distributed systems I'd read Lynch's book on Distributed Algorithms.

For stats Michael Freedman's book, Statistics, is a good simple introduction. Someone mentioned Calculus and Statistics (http://www.amazon.com/Calculus-Statistics-Dover-Books-Mathem...) here on HN a couple of weeks ago. I had looked it when it came out and re-reviewed it with that thread -- I really like it a fair bit more than I remember and a better text for those looking for a more rigorous treatment than Freedman -- although probably still too simple if you're reading Foundations of Statistical NLP.

There's also Scott's book on Programming Languages, which is worth reading.

I've read quite a few comments on HN recommending Appel's Modern Compiler Implementation (ML, Java, and C versions exist) over the Dragon Book although I haven't read either so I can't vouch for it.

There's this list: http://www.reddit.com/r/compsci/comments/gprp0/is_there_a_li...

I would add Lisp In Small Pieces to the compilers section of that list.

Queinnec's Lisp in Small Pieces is a superb book, if you liked that you might also like Henderson's book on implementation of functional PLs (particularly SECD machines.)

Regarding the list: I think it's a nice effort, but lacks several excellent and remarkable books. Just quickly browsing, I found the following essential books missing:

  * In algorithms: 
    - Aho, Hopcroft, Ullman: The Design and Analysis of Algorithms.
    - Wirth: Algorithms and Data Structures.
  * In Compilers:
    - Wirth: Compiler Construction.
    - Grune, Bal, Jacobs, Cerial: Modern Compiler Design.
    - Muchnick: Advanced Compiler Design and Implementation.
  * In Lambda calculus:
    - Barendregt: Introduction to Lambda Calculus.
  * In theoretical computer science:
    - Kozen: Automata and Computability.
    - Davis: Computability and Unsolvability.
  * In concepts of PLs:
    - Turbak, Gifford, Sheldon: Design Concepts in Programming Languages.
(In general, the list does not mention any of Wirth's books, which is a shame. The Project Oberon book should also be mentioned in OS stuff, I guess...)

Regarding compiler construction: As has been previously mentioned several times on HN, Cooper's and Torczon's "Engineering a Compiler" is a more recent and (IMHO much more readable and accessible) choice.

I did not like the Java version one bit, (I felt) there's a significant amount of hand-waving in the book.

I am however a big fan of Matt Might's lectures : http://matt.might.net/teaching/compilers/spring-2011/ - v beautiful slides, v nice approach overall.

Lamport's papers are exceptionally readable, I should note. His discussion of e.g. Paxos is still current and interesting and still has implications for distributed databases even though that algorithm is too expensive for frequently (like, order of magnitude hundreds or thousands of changes a second) changing data.

The Dragon is mostly old and hoary by today's standards; we frequently use GLR instead of LALR(1), it has no discussion of interesting modern developments like packrat parsers, there is next to no discussion of any part of the compiler that isn't parsing, etc. A product of its time, but not an edifice for the ages in the way that SICP is.

If you find a good stats/probability book, let me know; I've been looking for one for years and come up short. Russell and Norvig is ironically superior to most texts in the field.

You're wrong about _any_ edition of the the Dragon.

The 0th edition (green dragon, Aho&Ullman) was mostly about parsing, but talked about everything. The 1st edition (red dragon, Aho,Sethi&Ullman) was about everything not including JITs, vectorization, GCs and parallelization. The 2nd edition (purple dragon, Aho,Lam,Sethi&Ullman) is about everything, and they even dropped operator precedence (Pratt) parsing to make room for more non-parsing stuff.

there is next to no discussion of any part of the compiler that isn't parsing

Are you sure? There are 200 pages that cover lexing and parsing -- of a 950 page book.

If anything I'd argue that they've largely punted on parsing given that it's a lot less important than it used to be (which may explain why they didn't add GLR or packrat parsing) and focused on things that were considered more interesting issues in compilers.

I agree. I hate the Dragon book, and I particularly hate that it's the go-to book for learning compilers. It should not be. I recommend "Engineering a Compiler" by Cooper/Torczon, or "Modern Compiler Design in X" by Appel.

I'd recommend the following two (both free online):

The Elements of Statistical Learning: http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Second, while focused on computer vision, has great intro to probability and learning:

Computer Vision: Models, Learning, and Inference http://computervisionmodels.com/

Foundations of Statistical NLP is awesome.

Having some background in statistics, but none in either linguistics or NLP, that book was a revelation. If you read, and implemented all the exercises in that book you'd find a way to make millions, as NLP is a big deal right now. I did find a little too much concentration on the low level stuff (character parsing, bag of words etc), but in conjunction with Elements of Statistical Learning its wonderful.

Jurafsky and Martin's Speech and Language Processing complements ESL and Introduction to IR well by focusing on 'high-level' NLP, but its downside is that it focuses perhaps too much on linguistic terminology & theories instead of algorithms & implementations.

I second CLRS - I just finished an algorithms course taught from that book, and I can't imagine a single textbook being more comprehensive.

I'm surprise nobody's mentioned Knuth, though maybe that goes without saying?

As for probability and statistics, I haven't really found anything (at an advanced) level that I've been happy with. Maybe it's because my background is in statistics, so it's a perception bias (I see the flaws more easily than with other subjects), but I think that most statistics textbooks are pretty rotten.

There are really only two that I'd recommend, and only one at a high level. Gelman & Hill is a great introduction to computational statistics at a high level, while still very readable (and enjoyable!)


Other than that, the only truly stellar statistics textbook I've ever seen was the one I used in my intro class in high school. It's sad, but it's a very true comment about the current state of most statistics textbooks (that I can find).

Grandparent comment is asking for textbooks. The Art of Computer Programming is not considered a textbook, it's a treatise.

> I'm surprise nobody's mentioned Knuth, though maybe that goes without saying?

Everyone's waiting for the complete book.

I'd add Types and Programming Languages to that list.

Sipser's text on the theory of computation is quite good.


I also like Papadimitriou's textbooks: Elements of the Theory of Computation and Computational Complexity.

From McCarthy's paper:

> When a free register is wanted, and there is none left on the free-storage list, a reclamation (7) cycle starts.

> (...)

> (7) We already called this process "garbage collection", but I guess I chickened out of using it in the paper -- or else the Research Laboratory of Electronics grammar ladies wouldn’t let me.

Seconding a comment on the linked url, but adding the url, for the lazy amongst us :

lambda papers (steele, sussman) http://library.readscheme.org/page1.html

and btw, an interview of john backus : http://archive.computerhistory.org/resources/access/text/Ora...

discussing amongst other things, the pros/cons of fp, and his issues solving how to fit IO; not mandatory but highly refreshing.

This was on HN about 3 months ago: http://news.ycombinator.com/item?id=2979458

Michael suggested it for over-break reading, so a retrospective seemed in order. :)

It needed to be on HN at least twice.

I was going to just suggest adding "End to End Arguments in System Design" (http://mit.edu/6.033/www/papers/endtoend.pdf) but then realized that the entire MIT 6.033 reading list is exceptionally worthwhile.


Twice? These days it takes reading them at least 5 times before I can claim to comprehend them.

Am I the only one who thinks it's really weird that we throw around these lists of, "things programmers must read," but we don't ever throw around lists of, "things programmers must program at least X times"?

I guess for programmers believing in code reuse X is either 0 or 1...

I've written essentially the same application several times in different languages, for different frameworks, or for substantially different platforms.

I'd also count a serious refactoring or structural improvement as having programmed something again.

I think this could be a really interesting discussion. My first reaction to georgieporgie's comment was the same as omaranto's. But in practice, like you say, we do end up doing a lot of code re-writing.

Suppose we are to explain omaranto's statement to someone who is not a programmer. My try is: given that (i) programs, whether in source or binary format, can be indefinitely replicated with virtually zero cost and (ii) we know how to write modular programs and (iii) there are plenty of useful modules available for re-use, we never have to substantially re-write anything from scratch.

But (iii) is arguably false due to a variety of reasons: intellectual property issues, incompatible platforms, "incompatible" programming languages, etc.

And (ii) is a big lie.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact