Hacker News new | past | comments | ask | show | jobs | submit login
Papers every developer should read at least twice (2009) (silvrback.com)
623 points by teleforce on July 20, 2021 | hide | past | favorite | 95 comments

While this list looks solid, I think it's telling that none of the papers that immediately came to my mind were on this list.

I suspect that there are many more, and which papers are important to any one person is as varied as the disciplines we have within software engineering.

- Out of the Tar Pit 2005, a paper on Functional Reactive Programming that is an excellent read for anyone doing functional programming, UI programming, or a number of other things.

- Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web 1997, important read for anyone working in distributed systems or services with any sort of scale.

- Roy Fielding's dissertation 2000, to learn just how widely applicable REST principles are and how misunderstood it is as a design.

- The Part Time Parliament 1998, the original Paxos paper, important basis for anyone working with distributed systems.

- The Cathedral and the Bazaar 1997, an essay not a paper, but a good background to the open source world.

Out of the tar pit is the one I came in here to see.

I think the implications of that paper double every time I read it. Last time I walked away from that paper I decided I wanted to write a pure SQL logic evaluation system and now we're actually using it in our product to expose customization opportunities.

I almost worry what another read through might reveal. The last one was a really painful realization about wasted human capital.

Edit: I just realized the parent posted this as:

> a paper on Functional Reactive Programming

It is incredibly important to note that this is incorrect. It should be:

> a paper on Functional Relational Programming

It actually doesnt have much to do with "reactive" programming model as we know it today, and is more about managing complexity.

The paper explicitly states that it’s about Functional Relational Programming, so that’s verifiably correct. I’m not sure if I can agree, though.

> It actually doesnt have much to do with "reactive" programming model as we know it today

What would you say is the ”reactive” model as we know it today?

If you asked me what FRP is, I’d tell you about Functional Reactive Programming. Maybe I’d point you to Conal’s writings[0] or something like Your Mouse is a Database[1]. Andre Staltz did a nice writeup on how it’s useful to think of FRP in more broader terms than the original definition would require[2]:

> it would make sense to talk about "functional reactive programming" as a paradigm or an idea where you build applications using listenable event streams (or "signals"), creating and composing them using pure functions

For me, that’s the spirit of FRP. And when I read Out of the Tarpit, that’s exactly what jumps at me. The paper seems to describe data that’s changing over time, and other data being derived from those such that when the original data changes, so does the derived one. That’s pretty much precisely Functional Reactive Programming to me, even if the words are different.

You might have another sense of the words in mind; I’d like to understand what you think, and why.

  [0]: http://conal.net/papers/push-pull-frp/  
  [1]: https://queue.acm.org/detail.cfm?id=2169076  
  [2]: https://medium.com/@andrestaltz/why-i-cannot-say-frp-but-i-just-did-d5ffaa23973b

Fielding dissertation for those interested: https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

This is probably one of the most practical ones to read. REST is so much more than "/noun/id" routes.

I'll confess that I've never been able to make heads or tails of the Fielding paper. That is, it's not that I don't think I understand what it's saying, it's that I don't relate to it; it doesn't really seem to be describing solutions to problems that have bugged me during my career.

So I'll ask you and others here: what do you think are the most important and applicable insights from that paper?

> So I'll ask you and others here: what do you think are the most important and applicable insights from that paper?

* Statelessness for horizontal scaling

* Idempotency to handle network partitions

* Request state must be inline or addressable by URL

* Hypertext As The Engine Of Application State

* Media type negotiation

* How to exploit pervasive caching

I think having these formalized in a coherent framework was very valuable, and I don't see how this couldn't help but influence your career, unless all you write are desktop applications.

This is a great list and the conclusion is very fair. I think maybe what it is is that both these concepts and various limitations and tradeoffs had already been broadly socialized by the time I came into the industry. So it's not that I don't see the paper's influence, it's more that I was never able to find anything to latch onto from the primary source which I hadn't already internalized based on its synthesis in various tools and techniques that I learned in different ways.

Honestly, just HATEOAS. If you can change your client’s behavior without actually re-deploying it, you’re doing yourself and team a huge favor. Even just backwards compatible server changes are worth it.

Useful heuristics. Mooted by coworker's incomprehension, poor craftsmanship, and beligerent disregard for received wisdom.

In other words, all HTTP APIs devolve into screen scrapping, which is actively thwarted by terrible libraries. So just give up any expectations and do whatever scores the most agile scrum kanban karma on your team.

I'll confess that, like the Fielding dissertation, I also can't make heads or tails of this comment.

Too cynical?

Distrust any one who claims they do understand. Because they don't.

Our study group chewed thru REST in Practice. It's very reasonable, approachable. Good advice for a lot of design choices that you'll likely run into.

Sadly, REST is like Agile. It's not possible for any two people to reach consensus on any aspect, big or small.

And just like Agile, REST doesn't matter. What ever you do will be wrong.

So just smile and play along.

Ok this I understand, and agree with, to a decent degree.

Is the following at all reasonable?

REST is for when you are building an API for broad “public” consumption (where “public” might mean public within your company). But many teams are actually building APIs for a very narrow audience; often just their colleague’s javascript! In that situation it’s often fine to build tailored endpoints. Furthermore, the fact that we see RPC being used by large well respected tech companies is additional backing for this view.

If you are into Out of the Tar Pit, check out M36, a relational DB written in Haskell that aims to live up to the paper:


You could probably get away with SQLite too. This seems to work really well for our use cases - We integrate SQLite w/ C# and do have many functional aspects involved. C#8 is great about opt-in functional paradigms.

If you pick SQLite you can also lean on their exotic-tier test coverage and the fact that you can author application defined functions for utilization from SQL (i.e. your DSL in this usage context).

The Paxos paper...I mean it seems funny, especially so to the author's academic audience who get all the references? From the distributed systems class I took, the main takeaway was Paxos seems inaccessible mainly due to the byzantine writing style. We also read Paxos Made Simple and Paxos Made Live, and when it came time to actually implement anything, nobody used the original paper at all.

So yeah, the original paper might be fun or funny, but not the easiest for understanding the basis for consensus protocols.

I've seen that paper referenced multiple times in the vein of "Metaphors might seem nice, but never ever do what this paper did because it renders the subject matter practically inscrutable."

Came here to add Out of the Tar Pit. Here’s my clipboard: http://curtclifton.net/papers/MoseleyMarks06a.pdf

Yes! I’m (yet another) person expecting to have seen Out of the Tar Pit on the list.

Paxos paper was one of my favorite reads back in grad school. Would highly recommend

Like most of these types of articles, it seems to me, to be biased towards a certain genre of development. I don't know how often I encounter a lot of the stuff the author seems to consider "fundamentals."

I have spent a good part of my life doing things like device control, and direct, native, user interface. These can get mui hairy.

It's difficult to talk to a lot of folks about these, as everyone is focused on "the Big Picture," to quote Peter O'Toole.

Device control, in particular, has a lot of aspects that are unique, and not particularly applicable to many other disciplines. With the advent of some of the new communication techs, like Bluetooth, and advanced serial buses (like USB and Thunderbolt), we're starting to see a bit of cross-pollination with communications (another discipline that many software developers never need to worry about).

UI has always been best served (not exclusive, but best), as native. This means that each platform tends to have a specific framework.

Learning frameworks, SDKs, and APIs has always (for me) been the most time-consuming part of adapting to a platform or system.

But that's just me. YMMV.

I strongly agree. I’ll take any article describing what “every” developer should do with a pinch of salt. A lot of developers are out there day in, day out, doing CRUD-y or UI-y work and have no need to set aside time to read a lengthy paper about LISP. Not that the paper is bad or not worthy, but the range of “developer” is vast these days.

I guess I’m talking about myself here too. I have no Computer Science training and can’t say I’ve ever felt like I need it. I could take the time to read an academic paper about the next 700 programming languages or I could read an introduction to iOS development with Swift. I know which one is most likely to help my career.

I've also been a developer for almost 10 years without any Computer Science training (aside from my college diploma). I've also felt the same way about not needing any further formal education up until a certain point.

That point is now, and it's partially out of boredom. I've worked with many languages, frameworks, libraries, patterns, and they're all starting to look the same. I've become a master of tools, able to reach for the right tool given a specific scenario, but I'm starting to find I'm lacking a sense of curiosity and depth.

Maybe without a strong foundational knowledge, we'll only ever be users of the tools, and never creators. I feel like I need to start giving back at some point in my career. Maybe it's time to start working on foundations.

> I have no Computer Science training and can’t say I’ve ever felt like I need it.

Did a CS degree (92-95); Database design + SQL has been the only thing properly relevant to my career* (and then only the theory side because the practical was Oracle embedded Pascal...)

Not relevant: Prolog, SML, electronic design, 68000 assembly, Pascal, processor design, compiler design, etc.

* some of them have been relevant in personal fun projects though

> These can get mui hairy

For that, we have Mickens, The Night Watch: https://www.usenix.org/system/files/1311_05-08_mickens.pdf

I too have spent a lot of time in the direct access mines. Everything has been just a bit too specific to write papers about, it's mostly a question of digging registers out of reluctantly provided datasheets. Fortunately I get to do it in C# now.

I love that!


I've started a job recently where device control using C# seems to be a big part, but was not aware of that during the interview process and I'm now looking to learn more. Are there any keywords, topics, books, etc. that you would recommended I search for with regards to device control? Thank you.

A lot of the best device control stuff has been written in good old C.

I am not a Linux guy, but I’ll bet the Linux Kernel has a bunch of stuff.

Platforms tend to have foundation-level device support, and, nowadays, it’s unwise to go around it.

I’d definitely look at the device control foundation library (C#, maybe Windows?), as a starting point.

One of the lessons that I’ve needed to learn, was to get out of the weeds, and use the tools at hand.

Most communication and device control stuff tends to have two main characteristics:

1) They are composed of “layers,” with increasing granularity, as you get closer to the hardware.

2) They tend to be highly asynchronous, with a lot of “reactive” behavior.

> These can get mui hairy


Thanks! Note taken...

A few papers that changed my life were the ones about distributed, no sql databases (cassandra, consistent hashing, dynamodb, map reduce, etc). But I would not recommend them to anybody else, I was just useful to me as at the time. I was a CS student in Argentina, studying in a very old fashioned university. The coding exams were done ON PAPER in a Pascal-flavored pseudo code.

When I found and started reading those papers it was like finding life in another planet. It completely changed my way of thinking. I dropped out school and focused on learning as much of the cool stuff as possible (on the internet). It payed off well :)

I found that the traditional way of doing computing education is actually the way to go. Like what you've described as working out things on paper.

Computing Science should not have deep dependencies on certain tide of computing devices(be it hardware or software). Instead computing conceptual models should be taught, so that students are better equipped with those ideas to express them and find transformations and implementations when needed.

Though certain getting hands dirty coding is also needed, but all too often we kind of thinking in a tool users' manner.

I agree with "back to the foundations", and I've done a lot of those. But there should be a mix. My source is: I can show you where all my classmates ended up (spoiler, NOT great).

Where is the feedback loop with paper? Seems like it would be biased towards a certain type of coder, as opposed to one that likes to play around with stuff to problem solve.

I like working with pen and paper in programming a lot. And I agree that it helps for learning and really understanding.

But at some point, if you want to educate people to build real working products, in the real world - and this is what the vast majority of CS students are going to do - then you have to teach also exactly this. How to build things for real and not just thinking about how to build things.

Yeah, I guess the tension between pen/paper and concrete computer programming is elastic, somehow it depends on the person's willingness to have them growing or static.

I appreciate more about some theory when I've programmed mechanically for a while. And I also found writing some code to be relieving when I drank too much theoretical kool-aid.

This is a good list. It’s interesting to me that it emphasizes historic papers that were quite novel when they were published. These papers were impactful, and are thus good on some absolute scale. But they were not written to be introductions to their topic, and they do not reflect more modern thinking on the topic.

Reflections on Trusting Trust is seminal. In the age of networked computing and open source (and decades of apparently no one mounting a meaningful exploit along these lines) one is asking the reader to do a lot of work to generalize from the concepts in the paper to ideas of security and trust that align with their experience. I could imagine (and did) reading this paper in a class with a moderated discussion. But without the discussion I suspect it would fall flat for many readers.

In the comments people suggest the original Paxos paper. It is cute, but incredibly difficult to understand IMO. For people already aware of network programming primitives, Lamport doesn’t use any of them instead inventing his own metaphors. I tend to think Raft was so successful when it came out because it was described more clearly, not that it was a better algorithm.

It would be interesting to find better more accessible ways of learning the same material. Or maybe the best would be an edited volume, like a textbook, interleaving these classic papers with modern analysis, reflection, and commentary.

Author of the post here. There's a lot that one can argue about with my choices. A thing that might help is that I selected the papers that led me to see things in a different way. One could argue that _Reflections on Trusting Trust_ isn't very important wrt modern security concerns but, to me, the most important thing is noticing that trouble can happen upstream in any process, and often it can be hidden. The abstraction of source code is so total for many people that they forget about all of the layers underneath. Things that don't look possible are often possible.

I was surprised to not see "What every computer scientist should know and floating-point arithmetic". [0]


A lot of developers will never run into most of the topics discussed in these papers.

However, at some point, they will all write some software that does arithmetic on arbitrary data. They will all be tempted to calculate an average by adding all the numbers and dividing by ten. Therefore, I propose that all developers should read and re-read Goldberg[1] regularly.

[1]: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...

Since others are using this as a forum to share great CS papers, here’s one of my favorites: Programming as Theory Building - Peter Naur


This blog post is from 2017 but it's a repost from ~2009.

Maybe it's just me but I don't think reading a paper about `A Laboratory For Teaching Object-Oriented Thinking` is relevant anymore except for historical purposes.

And honestly, it's not a very good paper.

Computing Machinery and Intelligence [1]

By A. M. Turing

October 1950

This paper introduces the concept of the Imitation Game, which would eventually be referred to as the Turing test. [2]

[1] https://academic.oup.com/mind/article-pdf/LIX/236/433/986611...

[2] https://en.wikipedia.org/wiki/Turing_test

And if you want a guided tour, the Annotated Turing by Charles Petzold is awesome.


As someone who's not a trained computer scientist, this was surprisingly approachable (and didn't seem to dumb anything down, which is the usual problem with pop science).

This one has been very influential to me:

Big Ball of Mud

Brian Foote and Joseph Yoder


Oh hey, another BBOM fan :-)

I would add "Traits: Composable Units of Behavior" (http://web.cecs.pdx.edu/~black/publications/TR_CSE_02-012.pd...), which is good if you're interested in OO at all but also if you want to know more about traits in Rust (or roles in Perl).

While the list is solid, but it doesn't mean "every developer" really should read them. I personally don't like this kind of "click bait" title. I also believe for many developers, no, don't need to read them at all.

> Reflections on Trusting Trust – Ken Thompson

I wish Ken writes more papers & documents.


"Kernighan has written ten times as much readable prose as has Ritchie, Ritchie ten times as much as Thompson. It's tempting to say that the reverse proportions hold for code, but in fact Kernighan and Ritchie are more nearly tied and Thompson wipes us both out." -- Dennis Ritchie

I think that he has now retired from Google. I fail to find references online. I only vaguely remembers of some tweets that told about that indirectly.

That was the only one on the list I had read. Time to hunt down the balance.

I think this is a good list of interesting papers. But it also highlights how situational or context dependent such lists are, maybe necessarily so.

Note how this list has little on program correctness, performance (big O or real world), distributed systems, operating systems, networking, team organization.

I see there's no mention yet in this thread of A Mathematical Theory of Communication [1]. Claude Shannon launched the entire field of information theory with this paper. Admittedly, not useful to everyone, but information theory is surprisingly useful outside of the simple communications context.

[1] http://people.math.harvard.edu/~ctm/home/text/others/shannon...

I'm surprised that No Silver Bullet didn't make the list. I would have put it at #1.

Perhaps you don’t need to read it twice.


I'd add Theorems for Free!: https://ecee.colorado.edu/ecen5533/fall11/reading/free.pdf

There are more that I've found interesting over the years but that paper really helped me shift how I think about programming and the design of programs. It lead me to many others and to appreciate type theory.

Back to the Future, Dan Ingalls and others [http://ftp.squeak.org/docs/OOPSLA.Squeak.html]

Homesteading the Noosphere, Eric Raymond [http://catb.org/~esr/writings/homesteading/homesteading/]

If we can just get to "every developer has paper that they have read, ever" I'd be happy

I would add "What color is your function?" (not a paper, more of a blogpost)

I’ve read (a few times) just one paper from that list; Can Programming Be Liberated from the von Neumann Style? – John Backus

I was very confident that the list would contain the Logical Clock paper by Lamport, but alas it doesn’t.

And if you need more papers to read you can join my weekly cs paper newsletter :)


From Michael Feathers who wrote Working Effectively with Legacy Code

I like to recommend the Big Ball Of Mud paper.

By and large most software projects still seem to fall into the pits described in this paper.

There is nothing "every developer" should do.

Forgot that floating-point one.

(2009) even

01. [On the criteria to be used in decomposing systems into modules – David Parnas](https://prl.ccs.neu.edu/img/p-tr-1971.pdf)

02. [A Note On Distributed Computing – Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kendall](https://www.cc.gatech.edu/classes/AY2010/cs4210_fall/papers/...)

03. [The Next 700 Programming Languages – P. J. Landin](http://thecorememory.com/Next_700.pdf)

04. [Can Programming Be Liberated from the von Neumann Style? – John Backus](http://www.csc.villanova.edu/~beck/csc8310/BackusFP.pdf)

05. [Reflections on Trusting Trust – Ken Thompson](http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thom...)

06. [Lisp: Good News, Bad News, How to Win Big – Richard Gabriel](https://www.dreamsongs.com/Files/LispGoodNewsBadNews.pdf)

07. [An experimental evaluation of the assumption of independence in multiversion programming – John Knight and Nancy Leveson](http://sunnyday.mit.edu/papers/nver-tse.pdf)

08. [Arguments and Results – James Noble](http://www.laputan.org/pub/patterns/noble/noble.pdf)

09. [A Laboratory For Teaching Object-Oriented Thinking – Kent Beck, Ward Cunningham](http://c2.com/doc/oopsla89/paper.html)

10. [Programming as an Experience: the inspiration for Self – David Ungar, Randall B. Smith](https://suif.stanford.edu/~lam/cs343/programming-as-experien...)

A link to a better-formatted PDF of [01] On the criteria to be used in decomposing systems into modules – David Parnas


The hero we need since the article didn't provide links.

For #10 here is a version that is not backwards https://bibliography.selflanguage.org/_static/programming-as...

Oh I think I remember the parnas one now, I was looking for it, so thanks

Thank you.

This reminds me that a lot of [other] papers are bullshit. Papers are basically long-form blog posts. One group of people did a thing and here are their results. From those results they often come up with generalized conclusions. A lot of the time, people just take those conclusions as truisms! But do the conclusions extrapolate to other groups, scenarios? Will bias color the readers' takeaways (like authority fallacy)? How well does this work when implemented elsewhere over 10 years? Does anyone who has implemented this paper take it seriously, or without a million caveats?

I cringe when I see a team blindly implement the design in a paper. Design to your needs, not somebody else's! Papers are great places to take ideas from, but you should never treat them like gospel, or copy them outright. It's the same trap as designing your solution around your tools rather than vice versa. When someone brags about implementing the design in a paper, I expect it to work poorly (at least until they figure out all the stuff that wasn't in the paper).

Papers are experiments, whereas Best Practices are the experiments that were reproduced by many people in many places over a long period of time. If what you're building matters (not R&D), implement the best practice first, not the experiment.

There are probably 100s of articles that are tangentially related to physics, fluid dynamics, materials science etc...

... but none of them will teach you how to show up to a customer's house on time to install a toilet quickly, efficiently, and accurately, with a smile on your face.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact