Hacker News new | past | comments | ask | show | jobs | submit login
Essays on programming I think about a lot (benkuhn.net)
438 points by jchook on July 21, 2020 | hide | past | favorite | 94 comments

Sadly it's a video/presentation, not an essay, but Simple Made Easy[1] is the single software argument that has made the most impact on me.


1. https://www.infoq.com/presentations/Simple-Made-Easy/

I don't have a transcript link at hand, but as far as videos go, "Functional Core, Imperative Shell" / "Boundaries" by Gary Berhardt is also a must-see (or must-read, hopefully).

Here's the video link:


Unfortunately there's no transcript on the official video

I’ve been programming for a long time, watched this presentation several times, done a bunch of other research, and still don’t know if I understand what this presentation is about. I fear that I’ve tried to apply these simple-vs-complex principles and only made my code harder to understand. My understanding now is that complexity for every application has to live somewhere, that all the simple problems are already solved in some library (or should be), and that customers invariably request solutions to problems that require complexity by joining simple systems.

> still don’t know if I understand what this presentation is about

1. The simplicity of a system or product is not the same as the ease with which it is built.

2. Most developers, most of the time, default to optimizing for ease when building a product even when it conflicts with simplicity

3. Simplicity is a good proxy for reliability, maintainability, and modifiability, so if you value those a lot then you should seek simplicity over programmer convenience (in the cases where they are at odds).

I find the graph at the top of Sandi Metz's article "Breaking up the Behemoth" (https://sandimetz.com/blog/2017/9/13/breaking-up-the-behemot...) to be poignant.

If you agree with her hypothesis, what it's basically saying is that a clean design tends to feel like much more work early on. And she goes on to suggest that early on, it's best to focus on ease, and extract a simpler design later, when you have a clearer grasp of the problem domain.

Personally, if I disagree, it's because I think her axes are wrong. It's not functionality vs. time, it's cumulative effort vs. functionality. Where that distinction matters is that her graph subtly implies that you'll keep working on the software at a more-or-less steady pace, indefinitely. This suggests that there will always be a point where it's time to stop and work out a simple design. If it's effort vs. functionality, on the other hand, that leaves open the possibility that the project will be abandoned or put into maintenance mode long before you hit that design payoff threshold.

(This would also imply that, as the maintainer of a programming language ecosystem and a database product that are meant to be used over and over again, Rich Hickey is looking at a different cost/benefit equation from those of us who are working on a bunch of smaller, limited-domain tools. My own hand-coded data structures are nowhere near as thoroughly engineered as Clojure's collections API, nor should they be.)

> I fear that I’ve tried to apply these simple-vs-complex principles and only made my code harder to understand. My understanding now is that complexity for every application has to live somewhere, that all the simple problems are already solved in some library (or should be), and that customers invariably request solutions to problems that require complexity by joining simple systems.

Simplicity exists at every level in your program. It is in every choice that you make. Here's a quick example (in rust):

    fn f(i) -> i32 { i }      // function
    let f = |i| -> i32 { i }; // closure
The closure is more complex than the function because it adds in the concept of environmental capture, even though it doesn't take advantage of it.

This isn't to say you should never pick the more complex option - sometimes there is a real benefit. But it should never be your default.

You are correct in your assessment that customers typically request solutions to complex problems. This is called "inherent complexity" - the world is a complex place and we need to find a way to live in it.

The ideal, however, is to avoid adding even more complexity - incidental complexity - on top of what is truly necessary to solve the problem.

I think, the shift in programmer's perspective on where complexity should live is very much related to the idea of "the two styles in mathematics" described in this essay on the way Grothendieck preferred to deal with complexity in his work: http://www.landsburg.com/grothendieck/mclarty1.pdf.

Rich belongs to the small class of industry speakers who are both insightful and nondull. Do yourself a favour if you haven't and indulge in the full presentation.

I still can't believe that I was actually there during that exact presentation but at the time it didn't have the impact on me that it seems to have had on HN as a whole. Maybe I should review it again, or maybe I'm just not smart enough / don't have the right mindset, IDK.

Rich Hickey seems to be a bit of a Necker cube. Some people i know and respect think he is a deep and powerful thinker. But to me his talks always seem like 90% stating the obvious, 10% unsupported assertions.

That is the key: stating the obvious actually is hard and I think Rich does a beautiful job to translate the thoughts and feelings most programmer have into words. It actually gives a way to discuss and think about things (especially design and architecture) with others. I learned that there is no such thing as "common ground" or common knowledge magically and intuitively shared by all programmers. So if this already reflects your thoughts - even better.

If you find 90% of his statements to be obvious, maybe all that means is that you're a deep and powerful thinker too?

Yeah, I think it depends on whether you're thinking about things from a SYSTEMS perspective or a CODE perspective.

Hickey clearly thinks about things from a systems perspective, which takes a number of years to play out.

You need to live with your own decisions, over large codebases, for many years to get what he's talking about. On the other hand, in many programming jobs, you're incentivized to ship it, and throw it over the wall, let the ops people paper over your bad decisions, etc. (whether you actually do that is a different story of course)

Junior programmers also work with smaller pieces of code, where the issues relating to code are more relevant than issues related to systems.

By systems, I mean:

- Code composed of heterogeneous parts, most of which you don't control, and which are written at different times.

- Code written in different languages, and code that uses a major component you can't change, like a database (there's a funny anecdote regarding researchers and databases in the paper below)

- Code that evolves over long periods of time

As an example of the difference between code and systems, a lot of people objected to his "Maybe Not" talk. That's because they're thinking of it from the CODE perspective (which is valid, but not the whole picture).

What he says is true from a SYSTEMS perpective, and it's something that Google learned over a long period of time, maintaining large and heterogeneous systems.


tl;dr Although protobufs are statically typed (as opposed to JSON), the presence of fields is checked AT RUNTIME, and this is the right choice. You can't atomically upgrade distributed systems. You can't extend your type system over the network, because the network is dynamic. Don't conflate shape and optional/required. Shape is global while optional/required is local.

If you don't get that then you probably haven't worked on nontrivial distributed systems. (I see a lot of toy distributed computing languages/frameworks which assume atomic upgrade).


His recent History of Clojure paper is gold on programming language design: https://clojure.org/about/history

I read a bunch of the other ones. Bjarne's is very good as usual. But Hickey is probably the most lucid writer, and the ideas are important (even though I've never even used Clojure, because I don't use the JVM, which is central to the design).

I think that the thing about that talk that struck a chord is that he took a bunch of things that people had been talking about quite a bit - functional vs oop, mutability, data storage, various clean code-type debates, etc. - and extracted a clear mental framework for thinking about all of them.

I read Peter Naur's "Programming as Theory Building" and thought it quite good.


At the time I found it I was was working where the key challenge was not so much technical as much as ensuring that simple technology, spread across a large breadth of functionality, adhered to a consistent vision of the business domain that was being implemented. The program became an implementation of the mental model I developed of the business domain itself: its purpose, its uses, and its allowances and prohibitions. The Naur paper hit exactly not only on what I had been implementing in code, but on what other developers would have to know in order to maintain that code over time.... and had the kind of knowledge that had been lost over the life of application by the time I came to be involved.... and part of why my project existed.

The examples near the start reminded me of another piece shared here before, "How to Build Good Software". Most notably the part near the end, titled "Software Is about Developing Knowledge More than Writing Code".


It has a few great quotes scattered throughout.

"Building good software involves alternating cycles of expanding and reducing complexity."

"Software should be treated not as a static product, but as a living manifestation of the development team's collective understanding."

"Software projects rarely fail because they are too small; they fail because they get too big."

So glad to see the Law of Leaky Abstractions in there - that's had a very long-running impact on how I think about programming. It's still super-relevant today, nearly 18 years after it was published.


It's nonsense, written because Joel had never used a language with a decent type system. Any Haskell programmer uses half a dozen non-leaking abstractions before breakfast. Even the examples in the post itself don't hold up - using UDP instead of TCP doesn't actually mean your program will work any better when someone unplugs the network cable.

I think we Haskellers have to be realistic and say that although it feels that many of our abstractions are non-leaking, they're only non-leaking in the sense that a modern, triply-glazed, thoroughly insulated house is non-leaking of heat compared to a draughty, cold house built 200 years ago. There are indeed leaks, but they are small and generally ignorable.

I don't think that's true. A lot of these abstractions are provably correct and so simply cannot leak (and in slightly more advanced languages you might even enforce those proofs - consider Idris' VerifiedMonad and friends).

Of course if you put garbage in at the lower levels (e.g. define a monoid instance that doesn't actually commute) then you will get garbage out at the higher levels (e.g. the sum of the concatenation of two lists may no longer equal the two lists' sums added together), but that's not the abstraction leaking, that's just an error in your code.

You are abstracting over a CPU and memory. Your abstraction leaks in that memory layout actually matters for performance, for example. Or if you have a bad RAM chip.

> Your abstraction leaks in that memory layout actually matters for performance, for example.

There are cache-aware abstractions if your situation warrants them. Of course if you abstract over a detail then you lose control over that detail. But that's not the same as a leak, and it's the very essence of programming at all; if the program needs to behave differently every time it runs, then creating a useful program is impossible.

> Or if you have a bad RAM chip.

That's another example of what I said about garbage in, garbage out. The fault isn't in the abstraction, the fault is the bad RAM chip. If you were manually managing all your memory addresses then a bad RAM chip would still present the same problem.

Right. As they say, In theory there is no difference between theory and practice; in practice, there is.

Get better theory.

Huh, I just realized that this was ambiguous, and people might be (validly) interpreting as "Get better theory [and there won't be a difference!]"

For the record, I meant "Get better theory, and your theory can also talk about the difference between practice and theory."

I would argue that the "correct" behavior of a RAM chip is an abstraction over the actual physical behavior.

That abstraction leaks when the actual physical behavior of a RAM chip differs from the abstract specification that it implements.

That's not exactly false, but at that point you might as well say that anything that breaks is an abstraction leak. If my car won't start in the morning, is that an "abstraction leak"? I don't think it is (or at least I don't think it's a useful perspective to see it as one), because the problem wasn't that I was thinking of the abstract notion of a car rather than the details of a bunch of different components connected together in particular ways; the problem is that one or more of those components is broken (or maybe that some of the components are put together wrong).

> You are abstracting over a CPU and memory. Your abstraction leaks in that memory layout actually matters for performance, for example.

I find the idea that an abstraction is leaky if different implementations of it perform differently to be fairly useless. I don't think it's a useful concept unless the abstraction captures the expected performance. If the abstraction doesn't give any performance guarantees, then the caller shouldn't have any performance expectations.

Similarly to abstractions around accessing a file on disk that might fail. The abstraction should account for potential failures. If it doesn't account for failures, but the implementation does fail, then it's meaningful to call it leaky.

Performance matters sometimes. If you're in a situation where performance matters, and the abstraction doesn't capture the expected performance, but you're subject to the abstraction's actual performance anyway, then the abstraction leaked in a way that matters to you.

You've found that the abstraction isn't useful for doing your task. That's not leaking. That's like complaining that your ice cream maker can't cook rice. That's not what it's for.

In fact, this is a common manner for abstractions to become leaky. You find you are in need some guarantee not present in the abstraction. You choose to add whether or not that guarantee is satisfied to the shared interface. Congratulations! You've added a leak to the abstraction.

But that's not the only option available. If you need a guarantee not provided by an abstraction, you could ignore the abstraction and use something that actually provides the guarantees you need.

I have to care about what's inside it (to know whether the performance is up to what I need) rather than just the interface. To me, that's leaking.

For example, if I have to care whether the "collection" is implemented as a linked list or as a vector, the the "collection" abstraction has leaked.

Abstractions are equivalences, not equalities. You shouldn't expect an abstraction to make a linked list the same thing as a vector - they aren't, and they never will be - but they are equivalent for certain purposes, and a good abstraction can capture that equivalence. The performance of those two different collections is not the same, but that's not a leak unless the abstraction tried to claim that it somehow would be the same.

> You shouldn't expect an abstraction to make a linked list the same thing as a vector - they aren't, and they never will be

I would even argue that's the point of an abstraction. Hide the details that don't matter to the caller. If performance is a detail that matters, and the abstraction doesn't capture it, then you're using the wrong abstraction.

Yeah, and that is still completely pointless observation, because literally nothing I do will change because of it. Because if abstraction leaks in these edge cases, we are still better off trying to come up with same abstractions then not.

Indeed proof formalisms are themselves leaky abstractions.

The theories, the implementations, or both?

The theories as conceived to relate to the actual programming environment. Any proof about a Haskell program’s correctness relies on a leaky abstraction (an axiomatization) of what will actually happen when you run GHC on the source file.

So implementation then..

The theory is supposed to be an abstraction of the implementation, not the other way around...

Supposed by whom?

Inside the computer abstractions are always leaky: the set of integers is finite; the reals are anything but.

WTF? The set of integers isn't finite. There are non-leaky ways to represent integers or computable reals in a computer (of course one cannot compute uncomputable reals, by definition). And plenty of finite subsets of either are well-behaved and non-leaky. If you treat a finite subset of the integers as being the set of all integers then of course you will make mistakes, but that's not a problem of abstraction.

Is it possible that you are misinterpreting what Spolsky meant? I think he means that in the real world we interact with implementations of abstractions, and that the implementation always shines through and can bite you in the ass. This is what makes side-channel attacks possible, and (in Spolsky's view) unavoidable.

> I think he means that in the real world we interact with implementations of abstractions, and that the implementation always shines through and can bite you in the ass.

I understood fine. He asserts that "always" on the basis of a handful of examples, only one of which even attempts to show anything more than a performance difference. It's nonsense.

I think you're giving Haskell too much credit... In my experience most abstractions need to be replaced because of performance requirements - achieving lower latency, higher throughput, etc. That's the reason to go with UDP instead of TCP. Not sure if this sort of leakiness falls under what Joel had in mind though.

The reason UDP is less leaky is not because it meets any guarantee better, but because it guarantees less

IMHO the switch to UDP is happening because the work TCP is doing to ensure reliability is now done at network and thus having TCP do it is redundant. TCP assumed very simple and dumb network, which is no longer the case.

You should have a look at the paper “end-to-end arguments in system design” - http://web.mit.edu/Saltzer/www/publications/endtoend/endtoen...

More or less reliability in the datagram layers affects performance - for example, WiFi does its own retransmissions whereas ethernet does not, because WiFi uses a less reliable physical layer, and because you don’t want your packets to have to go from London to New York and back before you discover one of them was lost.

But reliability at the WiFi later cannot give your application the semantics of an ordered data stream, so it is not a substitute for TCP. You can replace TCP with a different transport protocol if you want different behaviour, eg SCTP or DTLS or QUIC, but in all cases they are providing a higher level abstraction than raw datagrams, not just (and not necessarily) more reliability.

Interesting that you mentioned Haskell. In my experience, I've found that every type having a bottom makes a lot of abstractions leaky.

1. Every type has a bottom in every mainstream language, most of them are just less explicit about it.

2. Bottoms do not make abstractions leaky in some generalised sense. The "fast and loose reasoning is morally correct" result applies: any abstraction that would be valid in a language without bottoms is still valid wherever it evaluates to a non-bottom value.

Interesting, can you give an example?

I agree. Dijkstra et al. always pushed (as early as in the 1960s) that resources used at abstraction level n should be effectively invisible at level n + 1. Anything else is an improperly designed abstraction.

Of course there's always the thermodynamic argument that "any subprogram has the permanent and externally-detectable side effect of increasing entropy in the universe by converting electricity to heat" but that is to me a bit of a Turing tar-pit of an argument.

Even that's just an effect that you can represent in your language. The evaluation of 2 + 2 is not exactly the same thing as the value 4, but you could track the overhead of evaluation (e.g. in the type) and have your language polymorphically propagate that information.

I’m going through the conversation with a colleague atm where he believes DRY applies to everything.

The Sandi Metz talk/post included in the list touches on that:


My takeaway from it is that we need to distinguish what you might call essential from incidental duplication. Essential duplication is when two bits of code are the same because they fundamentally have to be, and always will be, whereas incidental duplication is when they happen to be the same at the moment, but there's no reason for them to stay that way.

For example, calculating the total price for a shopping basket has to be the same whether it's done on the cart page or the order confirmation page [1], so don't duplicate that logic. Whereas applying a sales tax and applying a discount might work the same way now, but won't once you start selling zero-rated items, offering bulk discounts to which a coupon discount don't apply etc.

[1] Although i once built a system where this was not the case! In theory, the difference would always be that some additional discounts might be applied on the confirmation page. In theory ...

I don’t understand why people interpret that article as recommending you avoid prematurely removing duplication and comparing it to the rule of 3. The point of her essay is that you should resist the sunk cost fallacy and refactor your duplication-removing abstractions when requirements (or your understanding of them) change.

The problem isn't DRY. The problem is most programmers' inability to tear down abstractions that aren't correct for your new requirements when they evolve.

Yeah that sucks. Especially when you're designing service endpoints and your colleague insists upon reusing an existing endpoint instead of opening up a new one, because the two use cases looked the same when he squinted hard enough.

Now instead of /credit-card and /debit-card, which are independently testable, debuggable and changeable, you just have /card. Can't change the debit logic in /card because it will break credit. Can't change the credit logic in /card because it will break debit.

I feel your pain. Probably well past 'fixing' now, but a 'type' could be passed in with the payload, even if it's not exposed as the URL itself, no?

I definitely would have done that if I had the choice.

However the endpoint owner just insisted I just send nulls for the fields I didn't have.

Well... early in my career, a colleague and I developed a bit of a mantra: "Did you fix it everywhere?" Later in my career, I learned the value of there only being one place to have to fix.

At the same time, too much DRY can over-complicate (and even obfuscate) your code. That's not the answer either.

Taste. Taste, experience, and wisdom. But I don't know how to give them to someone who doesn't have them. Maybe by pointing out the problems of specific things they're trying to do, in a way that they (hopefully) can understand and see why it's going to be a problem. Maybe...

Some good resources about DRY:

> DRY is about knowledge. Code duplication is not the issue.


> Every piece of knowledge must have a single, unambiguous, authoritative representation within a system


When everything is DRY you get the Sahara Desert.

Some things should be DAMP every so often, and at regular intervals.

It usually helps to point out that DRY and loosely-coupled are often two sides of the same coin.

by everything you mean outside of programming also? Or just in programming? I guess I find DRY pretty important for me, as a tool to force me to abstract and to help me understand the system I'm working on.

I hate it, because it’s either:

A) not a law, but a principle

B) a tautology (formally the essence is “All non-trivial abstractions are non-trivial”)

(Let alone the weaselly “to some degree“)

I think this is a little pedantic.

"Not a law" is strictly tue, but the "law" idiom is totally in-line with "law of supply and demand," "law of diminishing returns," etc.

The tautology problem is not a problem. Tautologies are powerful. Douglas Adams has a great essay on this, but his novel version is more concise.

If you want strict laws and no "to some degree" hedges, read physics.

> but the "law" idiom is totally in-line with "law of supply and demand," "law of diminishing returns," etc.

Which I equally detest.

> The tautology problem is not a problem. Tautologies are powerful.

They are, of course, but how is the statement tautologically useful in this context?

> If you want strict laws and no "to some degree" hedges, read physics.

Sure, but physics is not the only field that does not consist of, mostly, overly general, extrapolation of empirical, but ephemeral phenomena.

Love the list, but I definitely think Programming Sucks [1] should be on any list of programming essays. :)

1: https://www.stilldrinking.org/programming-sucks

I guess I have read this already, but did not remember it very much. Halfway through, I thought it was written by James Mickens. So, if you liked this, you might also like Mickens' essays like The Night Watch <http://scholar.harvard.edu/files/mickens/files/thenightwatch.... All should be available at https://mickens.seas.harvard.edu/wisdom-james-mickens And they are not only very funny, but also insightful.

> All programmers are forcing their brains to do things brains were never meant to do in a situation they can never make better, ten to fifteen hours a day, five to seven days a week, and every one of them is slowly going mad.

Yeah, this resonates. I've got in trouble with my wife more than once for being in "code mode" when I'm working and something happens, it seems to turn off my basic empathy for some reason. Programming changes people.

Lately I've been into what I would call "the classics", Knuth, Peter Norvig, Minsky, Dijkstra... I realised most of what nowadays is called modern software/techniques basically consist in re-framing old essays from them.

Some of my references: the famous Norvig view on design patterns [1] and also his view of clean code [2]. Knuth on programming [3] also really enlighting.

1. https://norvig.com/design-patterns/design-patterns.pdf 2. https://www.cs.umd.edu/~nau/cmsc421/norvig-lisp-style.pdf 3. http://www.paulgraham.com/knuth.html

Joe Armstrong's PhD thesis 'Making reliable distributed systems in the presence of software errors' is good reading.


Definitely recommend. I recently bought a book on Amazon on distributed systems[0] which talked about fault tolerance without even a cursory mention of Joe Armstrong's work. I've returned the book, but I wish I could have browsed the bibliography before buying.

[0]: https://www.amazon.com/dp/1543057381

I think about 'Why it is Important that Software Projects Fail' a lot.


It's been posted a few times over the years: https://news.ycombinator.com/from?site=berglas.org

Arthur C. Clarke's Hazards of Prophecy: The Failure of Imagination: https://fabiusmaximus.com/2017/12/27/arthur-c-clarke-hazards...

I read it from your link. It was an interesting piece about innovation and imagination versus lack of imagination.

The first time I read this, I had to read it a few times to really understand. Great essay/lecture, but what else would one expect from Ken Thomposon?

"slumming with basic programmers" https://prog21.dadgum.com/21.html

I only think about "Out of the Tar Pit".

I've never understood why people like this paper. Over the years I've evolved a bite-sized rebuttal, if anybody wants to fight me:

The authors' "ideal world" is one where computation has no cost, but social structures remain unchanged, with "users" having "requirements". But the users are all mathematical enough to want formal requirements. The authors don’t seem to notice that the arrow in "Informal requirements -> Formal requirements" (pg 23) may indicate that formal requirements are themselves accidental complexity. All this seems to illuminate the biases of the authors more than the problem.

Link to the paper, for convenience: http://curtclifton.net/papers/MoseleyMarks06a.pdf. I think it's a siren and takes away attention that could be spent on better papers.

> But the users are all mathematical enough to want formal requirements.

This seems like a strange interpretation. I understand that the term "formal requirements" has a technical meaning in some disciplines, but I also think it's pretty clear that the author isn't using the term in that way.

It is much more likely that the author meant that users have requirements, but those requirements don't typically map cleanly to actions taken by the computer. This step of translation is necessary in the construction of a program, even if it is typically done piecemeal and iteratively.

It's a strange interpretation only if you focus on just the vocal minority that talks about formal requirements. Here are two quotes by Hillel Wayne (https://www.hillelwayne.com/post/why-dont-people-use-formal-...), whom I've found to have the most balanced take:

"The problem with finding the right spec is more fundamental: we often don’t know what we want the spec to be. We think of our requirements in human terms, not mathematical terms. If I say “this should distinguish parks from birds”, what am I saying? I could explain to a human by giving a bunch of pictures of parks and birds, but that’s just specific examples, not capturing the idea of distinguishing parks from birds. To actually translate that to a formal spec requires us to be able to formalize human concepts, and that is a serious challenge."

"It’s too expensive doing full verification in day-to-day programming. Instead of proving that my sort function always sorts, I can at least prove it doesn’t loop forever and never writes out of bounds. You can still get a lot of benefit out of this."

The post makes a strong case, IMO. Formal methods can be valuable, but they don't have the track record yet for anyone to believe they're the #1 essential thing about programming.

Fourteen years later, I still think about this. I have three half-written drafts of blog posts about things this essay has inspired me to do.

I think part of the mystique of it is that the authors kind of faded into the background after publishing it. I've never been able to find a follow-up paper from them.

It's super great! I loved it. Hope to re-read in a year or two as a reminder.


One of the last links, but probably most significant:

Inventing on Principle https://vimeo.com/36579366

I saw the Inventing on Principal video several years ago and I’m very happy to come across it again. Thanks for sharing!

As a less experienced web developer, i found "How I write backends"[1] very enlightening and I recommend it to all my peers when we discuss useful resources.

1. https://github.com/fpereiro/backendlore

I just read it but I think it focuses too much on specific tools (Redis, Node,...) and their configurations which might not be the best for most use cases. Especially if it's for a beginner who maybe doesn't have to start out with a load balancer and Redis.

"If you choose to write your website in NodeJS, you just spent one of your innovation tokens. If you choose to use MongoDB, you just spent one of your innovation tokens. If you choose to use service discovery tech that’s existed for a year or less, you just spent one of your innovation tokens." I think the author is stuck in 2010.

The essay was written in early 2015, around the same time NodeJS foundation was set up.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact