
The Law of Leaky Abstractions (2002) - beltsazar
http://joelonsoftware.com/articles/LeakyAbstractions.html
======
jbert
I continually find this particular point useful and enlightening. (Joel's
whole corpus of stuff is worth a read. At the risk of causing a hacker news
"joel storm" of submissions, check out the pricing/segmentation piece:
[http://www.joelonsoftware.com/articles/CamelsandRubberDuckie...](http://www.joelonsoftware.com/articles/CamelsandRubberDuckies.html))

For example, after discussing the subject of promises (in the sense of the
tool to manage and abstract asynchronous program flow) with a colleague, it
occurred to me that the _reason_ we care about async behaviour is that a
"function call" is a leaky abstraction, in that it takes a variable amount of
time.

We think of a function call as "doing X, Y and Z". The fact that it takes
_time_ to do it, blocking our 'thread of execution' causes us the problem that
we can solve with the overhead of async programming (or threading, or...)

(By 'function call' I'm glossing over system call, library call, embedded cpu-
eating loop.)

From this perspective, you can see that an alternative to working around the
leak is plugging (some of) the leaks - which is the approach golang takes, by
shuffling around threads of execution to avoid them being stalled by blocking
system calls.

That perspective is useful to me and helps me understand why I think I prefer
the golang approach, "more robust abstraction".

~~~
gavinpc
Agreed about Joel. When I first read his "corpus", it was already a few years
old. But whenever you get to it, get to it.

My only complaint is that once you find some _really good_ writing in this
field, you realize how hard it is to come by (in print, anyway). I happened to
find "Joel on Software" in the "Professional Computing" section at Barnes &
Noble. That and "Coders at Work" are the _only_ books worth reading I've ever
seen in decades of browsing that section. Still, I'm glad I came across them.

------
kstenerud
Leaky abstractions are precisely why it's important to understand what's going
on at least one level below where you're working at. This has always been true
of computers, whether you're working at a script level, system language level,
assembly, or even hardware. Eventually something will break the abstraction,
and on that day you'll be there looking at that stack trace or register +
memory dump or burnt out nand gate or cold soldered capacitor or faulty wall
circuit, and not knowing what to do about it.

~~~
danielweber
The constant churn of web technologies make this really frustrating, because
you have an entire ecosystem of people that doesn't know what their tools do.
"Just run re-war again, dude, that will fix it."

------
nathan_long
This is a concept that I've discussed frequently with team members. Related:
if you abstract away the differences between A and B, you likely lose some of
their unique advantages. A and B could be databases, hard drives, etc. It may
be worth it, but it's a tradeoff.

------
mfournier
Can every word be considered an abstraction?

An abstraction in mathematics is a simple, small idea that we use to describe
and understand a large variety of objects. For example the concept of an
algebraic group is an abstraction that can be applied to numbers, functions,
matrices, etc. We can understand all these objects using the same concept.

Are all words this way? Take the word "orange". Are all oranges the same? They
all have variations in taste, texture, color, etc. They are all physically
separate objects. But we use the same idea to understand and describe them.
Maybe this is more clearly seen with the word "fruit".

This applies to verbs too - think of the verb "to run". Usain Bolt runs in a
very different way from normal people, but we use the same concept to
understand what they do.

These abstractions "leak", i.e. cause confusion when the underlying reality
doesn't match up. Is a tomato a fruit? If I trained as an Olympic sprinter
would I have to abandon misconceptions about how to move my body,
misconceptions caused by a lifetime of using the word "run"?

Is every word an abstraction?

------
lmm
The examples don't support the argument. It's not like writing your
application using UDP and expecting some packet loss is going to mean that
your application continues to work when someone unplugs the network cable. And
the other examples aren't really _leaks_ , just performance disadvantages. I
might accept an argument that any sufficiently powerful abstraction imposes a
performance penalty, but that's a lot less interesting.

I've seen a single-platform JVM bug. _once_. The only definite OS failure I've
ever seen was bad RAM, and you could write a bootable program that ran without
an OS and that wouldn't protect you from bad RAM. I've seen _one_ CPU flaw,
and linux abstracted over it seamlessly. I've never seen a failure that was
the result of analogue chip behaviour. There are plenty of non-leaky
abstractions around.

~~~
jhh
I think saying that some examples are just performance disadvantages rather
than leaks misses the point. "Leaky" is a metaphor here, so assuming that any
kind of actual data loss is necessary for it to be the case seems incorrect to
me.

For example if your SQL query is too slow from a pragmatic point of view and
then you go ahead and fix it by using optimizer hints then the abstraction of
the procedural querying process provided by SQL has just failed you as it
hasn't worked.

I agree that some abstractions are very good, but that does not make Joel's
point moot. The key thing is to understand that abstractions are always some
sort of tradeoff between the virtues of directness and the virtues of
productivity.

Btw: I am really surprised that this essay had never been posted to HN before.

~~~
lmm
> For example if your SQL query is too slow from a pragmatic point of view and
> then you go ahead and fix it by using optimizer hints then the abstraction
> of the procedural querying process provided by SQL has just failed you as it
> hasn't worked.

Well sure. But I've never had that happen; is the A = B and B = C and C = A
problem the article talks about one that causes problems in real-life code?

~~~
danielweber
I've had to rewrite _lots_ of SQL queries (for Postgres) that an intelligent
database should have just handled for me. I'd prematurely exclude certain
columns that it could be mathematically proven from the query I would never
need and see amazing speed-ups.

(I'm not picking on Postgres, it's a wonderful DB, but you need to understand
how your tools are working.)

------
yetanotherphd
A true leaky abstraction is where the behavior of the interface is forced to
reflect some of the complexity of the implementation, but in a way that is
surprising given the rest of the interface.

I think most of his abstractions are not leaky, but they come at the cost of
poor performance in certain cases.

In any case, his point is correct: it's hard to use complex systems to
implement a simple interface, because it's hard to create the glue code so
that the simple interface drives the implementation in the "right" way in
every circumstance.

There is, of course, another approach that the author didn't mention, which is
providing a mechanism in the interface for "hints", e.g. "please store this
matrix in column-major format". That is not a perfect solution, but it is
another tool for better abstractions.

------
tikhonj
Here's the actual law:

    
    
        > All non-trivial abstractions, to some degree, are leaky.
    

It itself is somewhat leaky: it never defines "non-trivial"! Perhaps "non-
trivial" just means "complex enough to be leaky", rendering the whole thing a
tautology.

I've been thinking about different philosophies behind abstraction for a
while. My ideas are a little unformed, but I think they're still useful.
Basically, as I see it, there are two different ways to define abstractions.

There's the usual way: you come up with a new notion that has additional
capabilities. The abstraction _adds_ to the underlying concept. Perhaps it
also hides some detail, but the main action is _adding_. Some of the leaky
abstractions from the post really fit this: TCP, for example, adds new
capabilities and behavior over the underlying protocol; C++ strings add a
bunch of string-specific capabilities over char arrays; NFS adds filesystem
capabilities to a network connection.

These abstractions are largely "leaky" because what they add does not always
map nicely to what's underneath. They're also interesting because each
abstraction only fits _one_ thing--TCP is just for network communication, C++
strings are just text in char arrays and so on. Ultimately, all these
abstractions are just more structure over whatever they're based on. Again,
the key idea is _adding_.

The other sort of abstraction is what I think of as "algebraic" or "Haskelly".
Instead of _adding_ , these abstractions _take away_. They expose some feature
of the underlying concept and throw everything else away. A monoid, for
example, just exposes an associative operation from whatever set (or type)
it's defined over. It can't really be "leaky" (at least in the same way)
because it's showing what's already there. The normal sort of abstraction is
like a structure built on top of a foundation; this sort of abstraction is
like a lens or a view into a property of its foundation.

A very important result is that these abstractions are very independent of any
basis or use-case. Monoids come up _everywhere_. Strings form a monoid.
Numbers form a monoid (in many different ways, even!). Booleans form a monoid.
Certain classes of functions form monoids. Note how I said "form": even if we
ignored it, the monoidal structure would still be there. Talking about these
things as monoids doesn't add anything but just highlights something already
present.

Instead of adding capabilities to the underlying object, this sort of
abstraction instead exposes structure shared by otherwise distinct objects.
Monoids represent things you can combine; groups represent things that are
somehow symmetric; rings represent things that are sort of like numbers;
fields represent things that are really quite a bit like numbers; F-algebras
represent things that are like any of the other abstractions I've listed
(meta-abstraction!); functors represent things that can be transformed while
preserving structure and so on.

These abstractions also let us talk about different things uniformly and
generically. We can reason about and write code against these abstractions
without worrying about any particular underlying object. By doing this,
nothing _but_ the abstracted structure can do anything--details of any given
thing can't leak through because we have to be generically compatible with
_every_ instance of the abstraction.

Functors are a great example of this. If we wrote code using lists, we could
do all sorts of things: always return an empty list, add elements to the end,
drop elements from the front, change the order... If we wrote a code just
using a _functor_ , it would still work for lists but all we'd be able to do
is map a function over the elements. Nothing else. Always returning an empty
lists wouldn't even be a meaningful construct! Clearly, using this abstraction
has taken away our capabilities, but it turns out this is often a good thing.

But this is where "non-trivial" comes to bite us. Are any of the abstractions
I listed actually "non-trivial"? I'm not sure I could claim that. To be
generally applicable, all of them have as little structure as possible.
Ideally, you want to limit your capabilities as much as possible to express
whatever you care about. This usually means limiting them _a lot_. All of the
things I listed can be expressed as a handful of functions and a handful of
algebraic laws--a few lines of description is enough to understand them
_completely_. They are nowhere _close_ to something like TCP!

Ultimately, I am not even sure it really makes sense to compare the two sorts
of abstractions. But they are certainly both approaches to _abstracting_ ,
albeit in different ways. It's something that's worth thinking about the next
time you have to design or use an abstraction yourself.

~~~
perlgeek
... and in the end you still have talk over TCP and deal with that leaky
abstraction, or you accidentally write O(N^2) code that could be expressed
easily in O(N) (compare
[https://github.com/nominolo/HTTP/commit/b9bd0a08fa09c6403f91...](https://github.com/nominolo/HTTP/commit/b9bd0a08fa09c6403f91422e3b23f08d339612eb)
for a recent HN submission) because the "taking away" approach at abstraction
also abstracts out some important performance considerations.

~~~
tel
TCP is done via the IO monad and this is an excellent example of what the IO
monad is used for---it forces you to realize that there's another layer of
(this time quite leaky) abstraction over your potentially simple inner
abstraction.

------
runT1ME
I used to think all abstractions were leaky until someone explained parametric
polymorphism and free theorems to me.

[http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf](http://ttic.uchicago.edu/~dreyer/course/papers/wadler.pdf)

Every programmer or computer scientist should read that paper. Wadler is clear
enough that even without knowing typed lambda calculus, you can understand his
point.

------
jeroen94704
This is why I think Bret Victor's stuff, while incredibly cool and
interesting, will have a tough time living up to his vision.

~~~
pazimzadeh
On the other hand, think of how much power a program like PowerPoint gives the
average user without forcing them to learn the underlying technologies. Or the
iPhone. I think that user interface design solves most of the leakiness
problems, and the illusion of seamlessness is critical to good interface
design. Remove as many features as you have to so that the abstraction doesn't
leak.

PowerPoint came out in 1990 - what's the PowerPoint of 2020?

~~~
jeroen94704
All good points, however Bret Victor has specifically been showing off
alternatives to low level programming by building very high levels of
abstraction. He has even apparently said that he thinks the development
community is purposely keeping programming complicated to maintain the demand
for their specialized skills. I think the law of leaky abstractions is why it
is simply not possible to elevate programming to a level of abstraction that
allows people without specialized skills to implement arbitrary applications.

------
temujin
(2002)

