Hacker News new | past | comments | ask | show | jobs | submit login
Technical Papers Every Programmer Should Read at Least Twice (2011) (fogus.me)
366 points by altern8 on Dec 20, 2014 | hide | past | web | favorite | 66 comments

There is very little all programmers should be required to have in common. The field is just that big now.

9 times out of 10 a list like this includes a treatise on floating point number representation, which while useful, probably isn't of utmost importance in the 21st century, but hey, at one time folks thought that was required for 'all programmers' to read. At least this list does seem more up to date and relevant.

I just wish we'd stop with 'all X should' titles. Its demeaning and inaccurate.

The beauty behind fundamental programming language concepts—which make up the majority of this list—is that they apply to programming in general. They are all widely used abstractions, underpinnings for common technologies or powerful mental tools. Most importantly, they promote a sort of higher-order reasoning and mathematical thinking that's extremely powerful but rarely taught well.

Are these strictly required for anyone? I suppose not. But they would be useful for everyone, and we'd all be better off if everyone had a stronger understanding of these things.

Also, I think finding "should" as demeaning is a bit much. "Should" means that the author thinks the benefits of knowing something outweighs the costs of learning it, nothing more. Pretty reasonable. Doing anything else is reading too much into it.

> The beauty behind fundamental programming language concepts—which make up the majority of this list—is that they apply to programming in general.

I would disagree (with the extent of their applicability and general usefulness). Not that this isn't somewhat useful to many or most programmers -- you may certainly be right about that -- but there are other subjects (like data structures, for example) that are probably a lot more useful, and serve as much more fundamental underpinning for common technologies[1]. So, yes, we'd all benefits if more programmers learned this stuff, but we'd benefit even more if they spend their time on even more important/general CS topics.

Because programmers spend much of their time in code they sometimes tend to place too much emphasis on code over working, well-behaving, performant running programs. Producing such programs requires a lot more than "the right" programming style: it requires a good understanding of hardware, of requirement analysis, of common failure modes, of algorithms and data structures. Programming style (or language) is just a part of what's required, and probably among the least important parts.

The widespread belief among PL people is that "good" programming languages (for your own definition/preference of "good") promote good working software, but not only has that never been proven, there isn't much evidence to suggest this is even true at all (now, I'm not saying it's not true, but just playing devil's advocate and pointing out it isn't evidently true).

[1] Proof: binary (and other types of trees) trees are used in every programming language, while classes, object, higher order function or monads are most certainly not; ergo: you can certainly program -- and produce good, working software -- without classes/object/high-order-functions/monads, but you can't produce good software (in general, that is), without knowledge of trees (and/or other data structures).

Actually I'd argue that there are quite a few things that programmers should have in common, if by programmers we're talking about serious engineers/computer scientis[1] and not "amateur-hour web designers"[2].

Knowledge of the fundamental concepts talked about on that list of papers, and the history therewith, is what allows us to go past the current level of engineering and actually reach greater heights. Most of the emphasis nowadays is spent on knowledge of specific toolchains, frameworks, etc, instead of actually learning WHY and HOW these things work.

From that basic foundation you can then go into any field and learn the semantics, details, and problem-specific techniques to deal with the problems presented. Without that foundation, we're all just floundering around, becoming proficient in using these tools without actually KNOWING how they work and therefore unable to take them to the next level.

[1] - A conversation with Alan Kay https://queue.acm.org/detail.cfm?id=1039523 [2] - JavaScript isn't Scheme http://journal.stuffwithstuff.com/2013/07/18/javascript-isnt...

"not "amateur-hour web designers"

Perhaps you're confused. Most people who refer to themselves as "web designers" aren't meant to be or and aren't trying to be computer scientists or engineers. Many of them have a graphics design education, or taken inspiration from that tradition. Some of the more technically-minded of them can do basic coding, but most of them stop at HTML and CSS. But they're better than I am at UI/UX, and visual design, because that's what they do. They're not meant to be computer scientists.

If you mean web-application developers, however, I'd urge you to take a look at some of the stuff that is being done on the front-end these days. (not to mention the fact that this blog post was written by a Javascript expert)

Yes, there are some people who have drifted into their jobs and are little more than cargo cultists, although some of the more talented and curious of them do make the upgrade to serious professionals. But by far, most of the people I've worked with recently on front-end jobs have had a rigorous computer science education, with an excellent knowledge of data structures, algorithms, software engineering, and computer architecture. The fact that they're working on the front-end, in Javascript, is incidental. In their spare time, and if they're lucky, on some in-house stuff, they might prefer working in Haskell, or maybe Clojure. But much like C in past decades, the web is ubiquitous, and any serious developer of this era must know how to work in it. We don't all have the privilege of getting paid to write Scheme.

But I do share your opinion that there are certain foundational concepts and knowledge that all professional programmers should have in common, including some of the papers referenced.

Perhaps the comment came out wrong, let me try to be a little clearer.

I'm not trying to create separate categories between front-end, back-end, desktop, CLI, and systems engineers. The distinction I am trying to make though is that yes, while Computing has become a vast field, there simply are some basic fundamental skills that are absolutely required if we want to go beyond our current level of achievement.

These are things that you have mentioned: deep knowledge of algorithms, data structures, software engineering, computer architecture, etc. This is absolutely the MINIMUM requirement Without understanding these things, we will stay at this present level of software engineering forever. Sure we will have mastered the tools, and the current programming-paradigms that these tools teach us, but we will not be able to advance.

Whether one programs in JavaScript, Forth, Common Lisp, or even BASIC isn't the issue. The point that Fogus's post is trying to get across, or at least what I have taken away from it, is that most "serious" programmers are incredibly lacking in what is considered basic foundational knowledge. What field one specializes in is irrelevant, there is just some stuff that everyone has to understand, not necessarily in the way that a specialist in the field does, but at least have more than passing, cursory knowledge of it.

Fair enough. We're are on the same page, after all.

I started programming seventeen years ago as an "amateur-hour web designer". This sort of attitude is not productive or helpful to our rapidly-growing industry.

Would you still consider yourself an "amateur-hour web designer"? If not, why did you improve? Should we not aspire to be better?

I've learnt from various sources, but none of them were papers. There is no one-size-fits-all teaching or learning method. And rudeness and insults are going to make people less likely to learn, not more.

Is that not because, at the time, you were much more of a beginner than you are now?

I've been studying Computer Science for the last four years (BSc and now MSc) but was coding for maybe three years before that.

Back then I was learning JavaScript and PHP from w3schools, PHP.net and some blogs. I built a framework from reading the source code for CodeIgniter. I learn't a lot about object orientated design, or the lack thereof, in that project.

By the time I finished 2nd year of Comp Sci I had a pretty good understanding of algorithms, design patterns, some theory (Petri-nets, state machines, etc..), databases, etc. Going into my MSc, there is quite a lot that you can't learn without reading papers. For example, chances are you are going to have no idea how to solve the consensus problem in a distributed system without reading Paxos.

I tried to read the Paxos paper and gave up. The Raft paper I probably could've understood, but I found it easier to understand by looking at an implementation and a less formal description of it. I don't think it's about what you know (plenty of papers have few prerequisites and are conceptually quite simple).

I've been writing software for 20/years, and had I learned the concepts brought forth in these papers years ago, I would have been much better. Hacking is not writing software.

Floating point behavior still represents a significant portion of new questions asked on stack overflow. I'd say it's still very relevant.

Considering JavaScript makes it difficult to avoid, that's not surprising... And since js is probably one of the common denominators of much activity on the fringes of programming it seems like floating point should be widely discussed.

Unrelated to numeric computing, my favourite thing would be Butler Lampson's Hints for Computer System Design.

I've worked with code in the past where the previous developers didn't understand that there was a difference between an integer in a string (i.e. var x = '10';) and an integer (i.e. var x = 10; ). In some ways weak typing and automatic type conversion is quite dangerous because it can encourage beginners to adopt bad practice and be non the wiser.

Edit: Changed 'In some ways duck typing' to 'In some ways weak typing' as rightly pointed out below.

> duck typing and automatic type conversion

It's weak typing (or equivalently "automatic type conversion" as you say) that is to blame. "Duck typing" - also "structural polymorphism", "subtyping polymorphism", know in OCaml as "polymorphic variants" and adopted as a default by Go's interfaces semantics - would only enable "'10' + 10" if String type would have a method "+(x:Number)", which most likely is not there, or if it is it's explicitly laid out.

It's weak typing - ie. language trying to coerce elements of expressions to the types which make the whole expressions make sense, with implicit rules that no one ever reads - that's dangerous.

Sorry for being this pedantic, but while I'd like to see less weak typing in languages (outside of some specific domains, like AWK) I'd also would like to see more "duck typing", because it's more convenient than nominative polymorphism and exactly as safe.

Sorry, you're absolutely right! I had confused the two when writing my post.

> There is very little all programmers should be required to have in common.

We don't need to say "all must know X things" in order to convey a notion of a well-rounded programmer. Of course we don't have to agree. Still, generally speaking, I think the bare minimum is this: the ability to solve your problems with the languages and tools you know in a reasonable amount of time.

But that doesn't say much, so any list needs to get more specific:

* ability to use your tools * ability to decompose a problem * ability to debug a problem * ability to communicate intent via code * ability to communicate intent via language

Have programmers stopped representing monetary values using floating point yet?

Really the only thing "every programmer" should know is how to efficiently look up information relevant to whatever their current goals are.

"X all Y should" is a clickbait title prohibited by HN.

A few years ago when there was a wave of books titled "1000 ____s to ____ Before You Die," I really wished I'd been in a position to get a book published called "Never Mind The 1000 Things, Just Die."

"There is very little all programmers should be required to have in common"

Especially the closer one gets to domain specific things, yes. But I think it is beneficial to all who develop software to realize from time to time how deep the rabbit hole of implementation specific complexities goes, mainly so they understand how to design their systems without stepping in any "obvious" implementation specific landmines.

But in general I think your post validates the claim, since I really need to be aware of of all the traps in the floating point number representation to do my programming job properly :)

The guys at Sony might disagree with you. How to secure a network or a mainframe was known in the 1970s, but who needs to know all that legacy systems crap, right? OO look new JS framework!!

I think the Sony issues illustrate platz's point "There is very little all programmers should be required to have in common. The field is just that big now." In a company like Sony you need some guys that specialise in security and prevention of hacking, and some that specialise in other stuff.

Umm, no, not really, it has to be baked into everything you do, even the most junior programmer working on a website has to know e.g. about SQL injection.

No, you can't just have "some guys that specialize in security". That's exactly what 99%+ of software is full of obvious security holes. 99%+ of developers don't know anything about or care anything about security. Security is a process not a product remember? Everyone has to be actively part of it all the time. You don't churn out buggy exploitable software and then expect some "security experts" to somehow magically make it secure.

I see your point but if 99%+ of developers are rubbish at security what are you going to do about it?

a) Hope they suddenly improve, which is probably not going to happen

b) Accept they have limits and encourage them to use systems and frameworks that are hard to screw up on, written people who are good at security?

I note in the Sony hack the only computers that survived basically intact were Macs, not I guess because their owners understood security but because they were well designed and idiot friendly.

They don't have to suddenly improve, they can learn and improve at a normal pace. Yes, of course they should not use PHP. But simply having them use languages, libraries and frameworks that are written by people who get security won't stop them from writing insecure applications to put into production. They still need to learn to write secure code too.

>Yes, of course they should not use PHP.

Wow, this is utterly worthless, if not harmful, advice. There is no correlation between choosing PHP for an application with the security of said application. PHP apps are so widely deployed that securing them is a pretty well-known process at this point.

Yeah, the constant stream of security holes in PHP because the developers are so bad they actually chased away the only person they had who cared about security is nothing to worry about. You can just dip your app in some magic security sauce and everything will be fine.

I think Communicating Sequential Processes [0] by Hoare is another landmark paper that should be on this list for its perspective on organizing concurrent processes. This was actually required reading for the concurrency section of my undergrad operating systems course.

[0]: https://www.cs.cmu.edu/~crary/819-f09/Hoare78.pdf

"The practice of supplying proofs for nontrivial programs will not become widespread until considerably more powerful proof techniques become available, and even then will not be easy. But the practical advantages of program proving will eventually outweigh the difficulties, in view of the increasing costs of programming error."

The Hoare paper actually linked is hopelessly outdated, despite being an interesting historical artifact.

I might call it "sadly" outdated... not that I'm a fan of provably-correct programming (I've never done it, I just fix systems and networking software), but the seemingly modern comfort with buggyness and sloppy implementation... funny or sad, who knows

It's interesting to read the HN comments on this post now and what others have said previously.

https://news.ycombinator.com/item?id=3382962 https://news.ycombinator.com/item?id=2979458

Wow. That really is a quite revealing comparison.

Part of it might be that this was dropped on a Friday night, US time.

There is also the fact that the first condensation seeds of discussions on HN are largely determined by chance.

Eight out of ten are about programming languages, and strong on the functional side to boot. It's not that these topics aren't important, or that they're not great papers, but isn't that a bit too heavily skewed toward one area? Shouldn't at least one of those top ten be more directly about security, or performance, or some other kind of idea rather than the notation we use to express ideas?

Yeah, I know, make your own list. Maybe I will. Nonetheless, the author specifically mentions "cover a wide-range of topics" as a goal and this list fails to meet that goal.

Yes, there is an emphasis (and bias!) towards functional languages and distributed systems. Yes, Fogus' intent is to raise awareness of these concepts. Yes, other people would have different lists. Yes, the list could have more breadth.

Personally, I'd like to see a concept map of essential computer science topics combined with some of the top papers and books that cover each. It could be implemented as a curated collapsable directed graph.

> Personally, I'd like to see a concept map of essential computer science topics combined with some of the top papers and books that cover each. It could be implemented as a curated collapsable directed graph.

I've tried building one of these before, both for CS and for Maths. Funnily enough formatting, displaying and making it accessible (whilst retaining enough data) was the time-consuming issue, despite all the tech around for handling graphs. :/

Most of us program, so programming languages is actually a fairly general supporting field for the "computer science" community when other active fields tend to be more specialized. Other general fields include systems and algorithms, which seem to be represented somewhat. In my opinion, none of these papers are must reads, but that is just an opinion :)

Disclaimer, I'm a PL researcher, but I work in a systems group.

Good catch.

These are definitely interesting papers, especially to someone like me who has a much stronger interest in programming languages than other (admittedly essential) topics like performance and similar, but this selection is a revealing demonstration that a lot of people who are heavy into functional programming tend to think that FP is the silver bullet of software engineering and that beyond it, there's nothing really interesting worth learning (especially not OOP or other non-FP methodologies).

Perhaps these 10 are among the greater set of papers that every programmer should read. I will shamelessly plug the IRLS paper on l0 optimization: http://onlinelibrary.wiley.com/doi/10.1002/cpa.20303/abstrac...

It's weird that "What every programmer should know about memory" Isn't on here. Even for languages that manage memory for you understanding the hard limitations and basic operations used to access and manipulate memory is certainly useful.

This is an excellent read, but I disagree with the title. It's a must-read for people interested in CPU architecture and for programmers who, after profiling, still consider doing micro-optimizations. E.g. if you seriously think about reordering your struct members for faster access. If you're building a web service, skip it.

I think it is a useful read for people doing any kind of programming at all, in that it might discourage them from attempting optimizations that seem obviously good, and might have _been_ good, if they were running on 20 year-old architectures, but these days just make the code more convoluted to no gain.

I know that this is the kind of effect the paper had on me (I spent a lot of time carefully reading it shortly after it was published), and now I almost never use what knowledge I gained from it to make performance optimizations, but it does frequently discourage me from attempting optimizations that I now know would probably not have the desired effect.

I would recommend this one for historical purposes: http://insecure.org/stf/smashstack.html

This is an excellent paper but it does require a prior understanding of the stack, addressing, buffers, pointers, etc. It would be really nice if there was an article or blog post designed for people who want to read this paper but don't have the prior knowledge that's required to understand it. Unfortuntely I don't know of any. HN?

Why this Hoare paper and this Lamport paper? A list of ten is a bit long considering how much background material is required reading for every single entry.

Any list without Shannon's 1948 "A Mathematical Theory of Communication" is just not a good list. Sorry.


The foundation of information theory. It is, by far, the most astonishing paper I have ever read. Far more astonishing than Lamport's famous conclusion about clocks. It is the kind of paper that causes a soul rift when read thoroughly.

I’m partial to his Master’s thesis, “A Symbolic Analysis of Relay and Switching Circuits.” It literally invented digital computing. As an electrical engineering student it blows me away that one guy’s thesis (he was 21!) can be fundamental to so much.

Most seem to be available here - https://github.com/papers-we-love/papers-we-love

I love that even HN has listicles, and I would rate this one as at least on par with http://www.lifehack.org/articles/lifestyle/21-things-you-are...

Can some please do a similar list for machine learning and also for maths relevant to ML?

Machine Learning is a fast moving field, and many papers are incremental improvements, so an always up-to-date list will be unlikely. None the less, there are quite a few lists out there if you look. Here are some I've seen posted recently:





Probably the most famous papers is Breimen's "The Two Cultures". The rejoinder's from Cox and Hoadley add two additional perspectives from long careers.

Breiman, btw. Link to paper: http://projecteuclid.org/euclid.ss/1009213726

The OP already did this for ML too. Read the title again, "every programmer", don't you think people doing ML are programmers? :-D

I had a dream about one of these papers tonight (seriously), so I come to HN and find this post. Pretty amazing coincidence. :)

Has anyone else explored the rest of his site? Good posts, workable, no fuss design. All good work here. Keep it up!

fogus is a well known HN contributor.


Agree with platz

#oly $#it! I have read these papers, all of'm! WHAT! Haha that is so accurate LOL!

Great stuff! But there are way more important whitepapers to be honest. I can't really think of the others right now, but if you go to the Digital library from ACM / IEEE, you can find really good stuff.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact