
The Cost of Abstraction (2016) - wheresvic1
http://250bpm.com/blog:86
======
hu3
I've seen excessive abstraction kill projects time and time again. The crime
scene is similar in every case:

\- What if we ever need to change database server or driver? Proceeds to write
a layer to abstract data access

\- We might have different login forms one day? LoginFormFactory it is

\- This code hits our KV Redis/memcache/NoSQL by calling the driver lib
directly? Can't have that, I'll write a CacheStorage or DocumentStorage layer.

Over time, this defensive mentality produces codebases that are so bloated
that no one in the team can confortably grasp all the moving parts despite the
project not being rocket science.

At that point, devs rather quit or get a substantial raise in order to
continue working and ultimately endup leaving to start a new project. But this
time, they'll be sure to not make the same mistakes. This time, they'll
abstract even more so "when one part of the project becomes bloated, it can be
easily rewritten thanks to the abstractions". And the cycle continues.

~~~
jstimpfle
Which is not to say that we shouldn't wrap dependencies. We absolutely should.
But by default, the wrapper should not offer an abstraction that is so
flexible that the dependency can be exchanged. Instead, it should start with
the actual problem, by offering only abstractions that (a) make sense in the
context of the program and (b) translate well to concepts of the
dependency/library.

For example, hiding POSIX-style select/poll vs Win32 Completion ports behind a
common abstraction is something I tried recently, and seems to be HARD. Did
not complete. Why? I started with the abstraction without being familiar with
what the _problem_ was. All that I knew was that I wanted to experiment with
"game engine design".

~~~
Nursie
> Which is not to say that we shouldn't wrap dependencies. We absolutely
> should.

Why?

If I'm writing something for a known environment that's unlikely to change, it
seems like a waste of time.

~~~
jstimpfle
Even then, often there will be small bits that don't fit well. By wrapping
everything you make the model and assumptions of your application explicit.
Code that is written against your own model (which is almost always simpler
than what the library offers) will be more understandable.

I really like the Dijkstra quote, "The purpose of abstraction is not to be
vague, but to create a new semantic level in which one can be absolutely
precise."

Usually I even wrap libc and POSIX functionality in my C programs. Advantages
include (1) not having to mess with specific types, such as size_t, time_t,
DIR*, and so on -- which often don't fit my own model (e.g. dealing with
unsigned size_t vs signed integers is painful), and often need small fixes to
be portable. (2) compiles a lot faster since I don't need to include 10000s of
lines from system header files (3) not wrapping makes huge portions of the
project dependent on the library, instead of only the wrapper. (4) You tightly
control access to the library, which makes it easier to troubleshoot. You can
also make up your own strategy for managing library objects in a central
place.

~~~
Nursie
I think this is nonsense and leads to enterprise-y type patterns and hard to
understand code.

Wrapping things for the sake of it is a recipe for unnecessary verbosity and a
hard to learn codebase as things are obfuscated and take longer to trace
through.

In C, your point 1 makes me think your code is unsafe, if you're having
trouble keeping track of types and signedness. Point 2 is fair enough, but I
think a lot of C headers are granular enough that it's irrelevant, especially
given how fast everything is these days. Gone are the 8 hour builds of old.

Point 3 is precisely the type of premature interface-isation that's the
problem here. If you're never going to change it, then it doesn't matter
whether huge portions of the code rely on the library or your wrapper. It's
basically the same thing. Only you've put in extra work to wrap something in
another layer for no gain. 4 much the same.

~~~
jstimpfle
Hmm, I think I should put that in context. I'm not saying write a wrapper for
the sake of it. I don't wrap everything in three layers of OO giftwrap. I'm
saying one should follow basic rules of hygiene.

For example, I have a FreeType backend in my code, and I have a single file
which implements a font interface (open font file, choose font size, draw to
texture in the format of my app) by interfacing with Freetype. It returns my
own error values, prints warnings with my own logging framework, puts the data
in my own datastructures, etc. All things that Freetype could never do since
it does not know my project.

I have an OpenGL backend, and I make sure that my geometry and math stuff, UI
drawing code, and what not, does not depend on OpenGL, or OpenGL types. (Did
that once when I learned OpenGL, and it was terrible). So now I have basically
one file which is dependent on OpenGL, and it's there to simply put my
internal datastructures on the screen, in a very straightforward way.

Same goes for my windowing backend. I use GLFW currently but I make sure I
don't use e.g. GLFW_KEY_A in client code. I define my own enums and own
abstractions. It does not come with any cost to simply have my own KEY_A,
which means I can swap out the backend relatively easily, and can change my
model more easily (for example, make up my own controls abstraction instead of
relying on a standard PC 104 keys keyboard, etc).

> In C, your point 1 makes me think your code is unsafe, if you're having
> trouble keeping track of types and signedness.

No, it's safer precisely because I have a well-defined boundary where values
get converted. I can check for overflows there, and not worry about the rest
of my code. There is no practical way to deal with size_t in every little
place when normal values come as int.

> Point 2 is fair enough, but I think a lot of C headers are granular enough
> that it's irrelevant,

You bet. Have you ever tried to access parts of the Win32 API? You're
immediately in for (I think it was) 30K lines of code even if you define
WIN32_LEAN_AND_MEAN. Otherwise it's maybe more like 60K. It's crazy. Or check
out GLEW (which I don't use it anymore). It generates 27000 lines of header
boilerplate simply to access the OpenGL API.

Add to that that most headers have a standard #ifdef include guard, which
means that while files get ignored the second time, the lexer has to parse the
whole thing over and over again.

> If you're never going to change it

How would you know?

> then it doesn't matter whether huge portions of the code rely on the library
> or your wrapper.

It does matter a lot if there is a semantic mismatch.

~~~
Nursie
> How would you know?

You have a target system, and no current plans to change it. You can't predict
every change that might happen down, and neither should you try - you'll waste
a ton of engineering effort and if there is change, it's almost always change
in ways you didn't anticipate.

~~~
jstimpfle
I absolutely agree with your statement, but I don't think it applies in this
situation. I'm not trying to predict future changes. It's the opposite. I'm
codifying the current state. I make sure that there are no misconceptions
about the extent to which the library is used.

------
pdpi
I find that there's two core points that help me figure out where I'm missing
some form of abstraction: the body of a function should only operate on a
single level of abstraction, and it should only know technical details about
one thing.

If you're writing some database code, having high-level fetchMyEntity() calls
mixed with connection/resultset/cursor logic is bad news.

If you're writing something that reads from a message queue and stores the
message in a database, the place where the two meet shouldn't know much of
anything about either the message queue or the database.

Obviously, exceptions exist where these rules need to be broken, but I find
they're fairly few and far between.

~~~
base698
The "Simple Made Easy" talk describes this as the root of complexity in an
application. Mixing and tying various subcomponents that shouldn't have to
know each other exist.

~~~
taneq
> Mixing and tying various subcomponents that shouldn't have to know each
> other exist.

And this is the real trap when dealing with abstractions - maybe the
_implementations_ of two different operations are the same, but if the
_semantics_ are different then de-duplicating them creates an unnecessary
dependency between the two. The more cross-links in your application
structure, the harder it is to do anything without breaking everything.

~~~
pdpi
> And this is the real trap when dealing with abstractions - maybe the
> implementations of two different operations are the same, but if the
> semantics are different then de-duplicating them creates an unnecessary
> dependency between the two.

One of the truly remarkable things about Haskell is that its approach to
structural abstractions allows you to break out of this problem (though you
end up paying a different cost for it)

------
chestervonwinch
Lately I've been thinking about the similarities between the abstraction that
results from post-hoc software refactoring and post-hoc mathematical proof
"refactoring". In both cases the refactoring is an attempt towards a more
ideal form, but that form is almost always more impenetrable to newcomers.

E.g., a quote about the mathematician Carl Gauss:

> Gauss' writing style was terse, polished, and devoid of motivation. Abel
> said, `He is like the fox, who effaces his tracks in the sand with his
> tail'. Gauss, in defense of his style, said, `no self-respecting architect
> leaves the scaffolding in place after completing the building'.

------
karmakaze
There's a difference between deduplication/extraction and abstraction which
often seems to get lost. Abstraction is not about the mechanical
reorganization of code. It is the structuring the code to follow a
natural/logical abstraction that exists outside of the code. The first clue is
the name of the abstraction. If it describes how the 'abstraction' works or
what is going on inside it, then it's best to leave it. An abstraction should
be able to opaquely represent what it is. This is the _value_ of abstractions,
it let's you not think about what's inside while working at a higher level.

A similar issue I have is with people constantly 'refactoring'. I choose to
say I'm 'factoring' code instead. If you can't name the factors that you're
separating then it's likely you'll change your mind an end up 'refactoring'
it. Sometimes you take factored code and factor it further, which I don't have
a different name for, just more factoring.

------
sorokod
Introducing an abstraction is a way of extending the "base" language into the
domain of the problem being solved. Viewed in this way, creating an inc_pair()
function does not make sense beyond applications that deal with incrementing
stuff. On the other hand, if we are moving a player on a grid,
move_diagonaly_up() makes sense and is worth introducing into the extended
language vocabulary.

------
thunderbong
Rule of Three

[https://en.wikipedia.org/wiki/Rule_of_three_(computer_progra...](https://en.wikipedia.org/wiki/Rule_of_three_\(computer_programming\))

------
danielovichdk
Abstractions are of course due when time is right.

It's difficult to address but imo not be the goal from the get go, to
introduce abstractions to a codebase.

There is also a cultural thing around abstractions where inexperienced
developers look up to or are fascinated with the complexity of something or
someone that brings that to the table.

It's also a common feature of certain so called architects, because they
probably feel they need to some advanced techniques.

The thing is though, that when you have worked with developers or architects
that advocates for simple abstractions, and it over time proves that is both
efficient and cheap, then you start to doubt complexity in total.

Also remember that complexity is often not complex per se, as long as you
spend time on breaking that complexity down.

And then you have a better fundamental platform for solving what you need to.

Simple code is fast code. And also easier to change.

------
denart2203
Very insightful. I like thinking about software complexity and one of the
concerns there to deal with complex software is that the design and intentions
should be communicated (which means either documentation or exist as a common
understanding of purpose and function).

From your perspective, it means that there is also a need to establish
agreements on the levels, depth and ways abstractions in the code are formed.
Indeed, I worked with software where the functions and operations weren't
implemented in a messy way per sé, but the many levels of indirection,
abstraction (and obscurement) made things just really difficult to read and a
real tail-chaser when it came to maintenance.

Those levels can also make it much more difficult to understand the flow and
the operations that are happening, because in many languages you pass
references to data objects, so data gets changed in many ways.

Nice article, puts me into thinking mode again! :)

------
martin_drapeau
I totally agree that the cost of abstraction should become more important. It
is often the number one cause of frustration when trying to add a new feature
to an existing code base. Building a mental model of any code base takes time.
Abstractions for the sake of abstraction makes it harder to grasp.

------
hyperpallium
Abstraction should be based on whether the abstraction makes it easier or
harder to reason about: to see, to predict consequences of decisions, to
diagnose causes of behaviour.

Usually, the established abstractions of the domain are an excellent guide.
Even if you think they are sub-optimal objectively, they are easier to reason
about for experts in that field.

I like the article's language of comtaining the "damage" code can do. A
C-style modularizarion technique I haven't seen used is long functions, with
sections' variables' visibility limited by braces.

------
amelius
That's a really simply abstraction there, and I bet most people won't even
call that an abstraction.

It's a bit like saying bricks are useless, by limiting the entire argument to
a single brick.

~~~
coldtea
It's almost like he's making a small non-real-life example to illustrate his
general point, which is not confined to the particular example.

~~~
JoeSmithson
Your comment reads just as well without the condescending "It's almost like"
at the beginning

~~~
coldtea
Yeah, but the condescension was put in to counter-balance the parent's facile
dismissal of the author/TFA.

In other words, to add a cost (the possibility of being dismissed/sneered
back) to such dismissals.

~~~
afarrell
I think its better to give feedback explicitly rather than sneering. The
latter just feels bad to read (even as a third party) but doesn't really
espouse some better active social norms to follow.

> to add a cost (the possibility of being dismissed/sneered back) to such
> dismissals.

Given the fact that this is a conversation among strangers, I would assert
that it isn't really that effective to just add costs by making discourse less
pleasant.

\--

In general, I think a community is healthier when we treat people 25% better
than you expect to be treated, to account for the Fundamental Attribution
Error and other misinterpretations.

~~~
coldtea
> _I think its better to give feedback explicitly rather than sneering. The
> latter just feels bad to read (even as a third party) but doesn 't really
> espouse some better active social norms to follow._

I guess so. Sometimes I'm just pissed from the easy dismissal, as in "This 5
second basic retort is all you've came up with, and you think you've taken
down TFA?".

~~~
afarrell
It's not an unreasonable thing to get ticked-off by

------
nkingsy
These abstraction discussions seem to always result in commenters talking over
each other. I would love to have these conversations rooted in a code sample.
Otherwise no one is talking about the same thing.

If someone writes a blog series called “should this be abstracted, what’s the
abstraction?” I think we would see some great discussion.

------
theaeolist
Structuring 1M line of code as one function or 1M functions is clearly equally
absurd. What is the sweet spot? There must be an answer from psychology and/or
information theory.

~~~
jstimpfle
The answer is: What is the problem? It depends on the problem.

We're unlikely to encounter a problem that is solved with a single funtion, or
as one function per line. But the important thing is to look at what the code
should achieve. Most importantly, local decisions should be made based on what
the code should achieve in a global context. Local syntactic optimizations are
less important than the global picture.

I strongly agree with the Golang authors and with experienced C programmers
that it's much better to write a few lines more and be more clear and explicit
in exchange. Note that there are some additional lines that increase
complexity, but I would argue that lines which contribute to clarity do not
contribute complexity. In fact, those investments in additional lines usually
decrease the number of moving parts.

Syntactic homogeneity is important so one can easily see what one piece of
code achieves in the global context. It does not help if we micro-manage and
constantly think about the type of for loop or lambda abstraction or error
handling mechanism to use, only to shave another line off.

Unfortunately looking at the actual problem is what most people forget.
Instead the discussion are about languages (filter, maaap. maaaaap),
frameworks, libraries, object orientation without any concern of what these
features can do towards reaching a specific goal.

Now that you mention information theory, I want to mention the term "Semantic
compression", created by Casey Muratori. I think he has done at least one
stream about it, which you should be able to find on YouTube. In general I
recommend to follow him. He is one of the most experienced and no-nonsense
guys I've found on the internet.

