
The Wrong Abstraction (2016) - mkchoi212
https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction
======
Tainnor
I feel some people here are misunderstanding the blog post.

Sandi Metz IMHO doesn't claim that the problem occurs at step 2 or 3. She
doesn't claim that it's wrong to introduce abstraction when there is
duplication.

What she is saying instead is that the problem occurs from step 6 onwards:
when you find yourself wanting to reuse an abstraction that, regardless of
whether it made sense in the first place or not, has outlived its usefulness.

I think this is in agreement with other points that she often makes, about
being bold, but methodical about refactorings.

The whole discussion about "you should never abstract away code before you see
the third duplication" has little to do with the article, and I'm also really
not sure it's good advice.

~~~
BoiledCabbage
> What she is saying instead is that the problem occurs from step 6 onwards:
> when you find yourself wanting to reuse an abstraction that, regardless of
> whether it made sense in the first place or not, has outlived its
> usefulness.

You're 100% correct in this. And what's even more amazing to me is that even
after you explicitly calling this out, the majority of people replying to you
(and presumably have read the article) still think the problem is between 2 &
3.

The argument she is making is not "don't make abstractions until you're 100%
certain they are correct". She is essentially saying make abstractions where
appropriate. Some of these abstractions will be wrong. When you start seeing
yourself making certain behaviors it's probably because it's the wrong
abstraction, so back it out and refactor.

Ultimately that abstraction seemed right based on the info known at the time
it was created, now that you know more don't try to cling to it because it was
already made. Be ok with backing it out and refactoring.

~~~
qznc
If you see an abstraction does not fit, you have the choice to consider it
incomplete or unsuitable. If incomplete, you can fix it (assuming write
access). If unsuitable, you should "back it out" as you say.

In my opinion this distinction is applicable and thus useful in contrast to
whining about leaky abstractions:
[http://beza1e1.tuxen.de/leaky_abstractions.html](http://beza1e1.tuxen.de/leaky_abstractions.html)

~~~
Firadeoclus
It seems to me that a straightforward fixing of an incomplete abstraction is
exactly what Sandi Metz warns against (i.e. steps 5+6). The abstraction is
"almost perfect", so it should not fall in the "unsuitable" category.

It just so happens that complecting several slightly different uses in one
abstraction comes at a significant cost. Backing out (inlining the
abstraction, eliminating unused code) is a simple recipe to let you see the
true amount of overlap, which may or may not itself be a suitable candidate
for a smaller abstraction.

~~~
qznc
I rather see Sandis post as a criterium when an abstraction should be
considered unsuitable: When you use only a small fraction of it because of
conditionals.

~~~
ryanbrunner
I think that's probably correct in describing where you end up, but not any
particular step along the way. It's one of those "the road to hell is paved
with good intentions" situations.

When you first modify the abstraction, it's nearly perfect. Just one tiny
conditional and it's a perfectly suitable abstraction again. The problem is,
when this process repeats itself, you slowly get to the point where any one
client of the abstraction is only using a small fraction of it, but there was
never a singular point where someone made a decision to use the abstraction
when it was anything less than "almost perfect".

------
jpswade
You can’t plan for what you don’t know.

This is why I like the "Rule of three"[1]. Only once you've done it three
times will you truly begin to understand what the abstraction might need to
look like.

1\. [https://wade.be/2019/12/10/rule-of-
three.html](https://wade.be/2019/12/10/rule-of-three.html)

~~~
ed312
Any advice on teaching this to junior engineers? Seems like folks with 3-5
years of experience keep trying to not only over-abstract but also keep re-
inventing the wheel with abstractions (vs looking for existing libraries).

~~~
ozim
My favorite example of really bad abstraction is add/edit crammed into single
popup/model. You know edit is basically a copy paste of add so "ding ding ding
here goes DRY!" in a junior mind. But quickly enough it shows up that some
properties can be set in add, whereas in edit they have to be read only. Quite
often you get also other business rules that can be applied only on edit or
make sense only when adding new entity. But when you create first version they
look a lot like the same code that should be reused.

For me this is really good example of how similar looking code is not the same
because it has different use case.

~~~
gridlockd
> But quickly enough it shows up that some properties can be set in add,
> whereas in edit they have to be read only.

So? Just put in some conditionals.

What is the alternative? Duplicate most of the code with minor, non-explicit
differences? What's the benefit? You just _moved_ complexity around, you
didn't get rid of it.

The drawback is that now anything you have to add, you have to add _and_
maintain it in two places. And since your "add" and "edit" are probably 90%
the same, it's going to happen 90% of the time. It's very annoying during
development and you're likely to fuck it up at some point.

~~~
bonestormii_
This is a good example of how this overall topic gets reduced to "How much
abstraction?" instead of "In what ways should something be abstracted?"

Obviously an Add/Edit field are operating on the same record in a hypothetical
database, so it makes little sense to duplicate the model.

On the other hand, if the conditionals within the abstracted version become
too complex or keep referencing some notion of a mode of operation (like, ` if
type(self) == EditType && last_name != null` lines of thinking), that is
sometimes another type of smell.

But say you make some kind of abstract base class that validates all fields in
memory before committing to the database, and then place all of your checking
logic in a validate() method. That sounds like pretty clean abstractions to
me.

And moreover, this is probably provided by an ORM system and documented by
that system anyway--so that's a publicly documented and likely very common
abstraction that you see even between different ORMs. That, I think, is the
very best kind of abstraction, at least assuming you are already working in
such an environment as a high-level language and ORM. Making raw SQL queries
from C programs still contain their own levels of abstractions of course
without buying whole sale into the many-layered abstraction that is a web
framework or something.

This question becomes more important when you aren't just updating a database
though. If you're writing some novel method with a very detailed algorithm,
over abstraction through OOP can really obscure the algorithm. In such a case,
I try to identify logical tangents within the algorithm, and prune/abstract
them away into some property or function call, but retain a single function
for the main algorithm itself.

The main algorithm gets its definition moved to the base class, and the
logical tangents get some kind of stub/virtual method thingy in the base class
so that they have to be defined by subclasses. The more nested tangents are
frequently where detailed differences between use cases emerge, which makes
logical sense. It's not just that it's abstract, but the logic is
categorically separated.

It's a very general pattern supported by many languages, so you see it all
over the place. That organization and consistency in itself helps you to
understand new code. In that way, it also becomes a kind of "idiom" which in a
sense is one more layer of abstraction, helping you to manage complexity.

As a counter of that, you see code where `a + x * y - b` becomes
self.minus(self.xy_add(a), b). More abstract, but not more logical; not
categorically separating; not conforming to common idioms; obscuring the
algorithm; and so on...

And then there is performance! Let's not talk about the performance of runtime
abstractions.

------
Pxtl
Every Line Of Business codebase I've worked on has been the worst "there I
fixed it" copypasta spaghetti, and has never made it to the point where "maybe
we shouldn't add a parameter to this existing, cleanly abstracted method to
handle this new similar-but-distinct use-case" was anywhere near my radar for
abstraction.

I would _love_ to have developers where my problem was "maybe you piggybacked
on existing code _too much_ , in this case you should've split out your own
function".

~~~
mrfredward
The business codebase I'm working on now was written by OOP crazy people who
thought inheritance was the solution to every line of duplicated code. When
they hit roadblocks, they filled the base class with things like
if(this.GetType() == typeof(DerivedClass1)){...

I would do anything to have the duplication instead.

~~~
isbvhodnvemrwvn
Then the very same people learn that inheritance bad, composition good, and
they'll create abstractions with no meaning on their own, which call 10 vague
other abstractions (but hey, no inheritance!). Figuring out what happens there
is even worse than with inheritance. Some people grow out of it, fortunately
(mostly after having to deal with shit like that once or twice).

~~~
mannykannot
> ...they'll create abstractions with no meaning on their own...

As if that doesn't happen with inheritance!

The dark pattern is using inheritance as an alternatve way of implementing
composition. Anyone who thinks that "inheritance bad, composition good" is the
proper response to this is probably as confused about the issue as those
making the mistake in the first place.

To be clear, you are clearly not making that claim yourself, but you are
invoking it to make a straw man argument.

------
leto_ii
As I gain more and more experience (I would now call myself more or less a
mid-level developer), I find that the distinction that matters is not
abstraction vs duplication, but the one between developer mindsets.

I have many times met/worked with people who think the main task of the
developer is to 'get shit done'. Regardless of their level of experience,
these developers will churn out code and close tickets quite fast, with very
little regard for abstraction, design, code reuse etc.

Conversely, the approach that I feel more and more is the correct one is to
treat development as primarily a mental task. Something that you first think
about for a while and try to design a little. The actual typing will in this
case be a secondary activity. Of course, this doesn't mean you shouldn't
iterate on your design if during execution problems come up. Just that the
'thinking' part should come before the 'doing'.

My feeling is that with this second approach the abstraction/duplication
trade-off will not matter so much anymore. With enough experience you will
figure out what you can duplicate and what you can design. And when you design
you will develop an understanding of how far you should go.

Approaching development as a task of simple execution I think inevitably leads
to illegible spaghetti down the line.

~~~
Tainnor
I agree that many issues with bad code could really be avoided by first
thinking about the solution a bit, of which the code is just an expression.

I'm not advocating weeks of architecture astronauting without code feedback -
because practical considerations (e.g. the compiler can't deal with this kind
of code due to some limitations) matter - but some people seem overeager to
just start writing some code "and see what happens".

------
nfw2
When considering whether some abstraction is "right" or "wrong", another
important thing to consider is how cleanly the abstraction fits into a mental
model of how the program works. Good abstractions provide value outside of
removing duplication. They help us reason about a program by providing
compression of logical concepts.

Consider some helper function: "convertSnakeToCamelCase." This abstraction
would take a string, do some operations on it, and return another string. It
is easy to understand what the input and output is without having to think
about these operations. This abstraction provides a benefit for anyone having
to think about the program because it reduces the amount of concepts the
reader has to parse from N (where N is the number of operations) to 1. This is
helpful because people have limited mental bandwidth and can only reason with
a finite number of concepts at any given time.

Consider a different helper function: “processDataPayload.” This function
takes data in some arbitrary complex shape and returns data in some arbitrary
complex shape. The abstraction effectively communicates nothing to the reader,
and it is actively unhelpful because it forces that person to follow a
reference, remember all the details of what that function does, and substitute
those details into the original function.

Trying to find the conceptual boundaries that make the program easiest to
reason about IMO is more of an art than a science and difficult to govern with
hard and fast rules.

~~~
jasonhansel
Agreed. I also think it's important to create abstractions that provide
guarantees and/or maintain invariants. That way, your abstractions actually
help you be more confident that your code is correct.

The point of abstraction isn't per se to reduce duplication--it's to make your
code more straightforward and to make errors more obvious.

------
pierrebai
Counter: Refactoring is far, far, far cheaper than duplication or wrong
abstraction.

Duplication means you lose the wisdom that was gained when the abstraction was
written. It means that any bug or weird cases will now only be fixed in one
place and stay incorrect for all the places you duplicated the code.

About the rule of three: I personally extract functions for single-use cases
all the time. The goal is to make the caller be as close to pseudo-code as
possible. Then if a slightly different case comes up, I will write the
slightly different case as another function right next to the original one.
Otherwise, the fact that you have multiple similar cases will be lost.

~~~
jonahx
Counter-counter:

Refactoring is by far the most expensive and error prone activity in
programming. It can also be one of the most valuable. But unless it's trivial,
it's the most mentally arduous and time-consuming work you do as a programmer.

~~~
jasonhansel
Refactoring is only error prone if you don't have integration tests. The
advantage of extensive integration testing is that you can relentlessly
refactor without fear of breaking things.

~~~
jonahx
I'd much rather have them than not, but don't fool yourself into thinking you
can refactor without any fear because you have integration tests.

No matter how many you have, they'll only be testing a tiny fraction of your
possible code paths.

------
seanalltogether
This quote from John Carmack speaks very succinctly to the problems that many
abstractions in a code base can cause, and it's a constant reminder for me
when building out business logic.

> "A large fraction of the flaws in software development are due to
> programmers not fully understanding all the possible states their code may
> execute in."

[https://www.gamasutra.com/view/news/169296/Indepth_Functiona...](https://www.gamasutra.com/view/news/169296/Indepth_Functional_programming_in_C.php)

~~~
hackinthebochs
But abstractions reduce possible state and allows you to specify that state in
obvious ways, e.g. on function parameters. Do not underestimate the power of
functional boundaries.

~~~
ben509
They also tend to impose a degree of discipline. I've often found myself
wanting to shove a parameter in somewhere and realized I didn't _need_ the
damned thing.

------
ragnese
I can't help but wonder if we're sometimes using the wrong words for things.
In this discussion we keep talking about "code duplication" and "abstraction"
hand-in-hand, but I think they're almost orthogonal concepts, at least as I
think of them.

Seeing the same code almost copy-and-pasted in a few places might call for
some code-deduplication. But that's not necessarily a new "abstraction" in my
eyes. It may be, but it also may not be.

I'm struggling to think of a specific example because I fully intended to go
to bed before arriving here... But as a really stupid example, let's say you
have `val a = x + x + x` and `val b = y + y + y` and `val c = z + z + z` in
your code. If you write a new function like `fun addThreeTimes(i) = i + i +
i`, I don't see that as a new abstraction at all. If, however, you invent
multiplication, _now_ you're at a new abstraction! `val a = x * 3; val b = y *
3`, etc.

"Abstraction" to me is about thinking at a different semantic level, not about
avoiding copy and paste.

Does this resonate with anyone else? Am I missing the point?

~~~
MaulingMonkey
They're theoretically orthogonal but practically not. You can deduplicate code
without abstraction per se, but the result is generally unreadable and
unmaintainable. As such, all reasonable code deduplication relies upon
abstractions. However, not all abstractions involve code deduplication, and
may instead have other goals (such as making it easier to reason about local
state, invariants, etc.)

> If you write a new function like `fun addThreeTimes(i) = i + i + i`, I don't
> see that as a new abstraction at all.

If you only call it once, it's not code deduplication either.

What differentiates addThreeTimes(i) from sqrt(x) or average(x,y) or pow(x,y)
or multiply(x,y)? Not how many call sites it has, nor the presence of a
dedicated operator to the function in the language. Instead, I'd say: the
function's reusability, composability, commonness, ... or to put it another
way: addThreeTimes is an "abstraction" \- it's just a poor garbage unreusable
unremarkable unrememberable abstraction with no expressive power.

However, poor abstractions aren't the only result of overeager code
deduplication. Sometimes you end up with "good" abstractions misapplied to the
wrong situations - e.g. they solve issues your current problem doesn't
actually have. As an example, turning your list of game entities into a list
of (id, aabb_f32) tuples might be exactly what you want for a renderer culling
or broad phase physics pass - but completely counterproductive for
implementing the gameplay logic of a turn based game! If you've already got a
list of tuples, you've a few choices:

1\. Modify the tuple (add tile position information that's useless to the
renderer/physics, muddying the abstraction)

2\. De-abstract (e.g. perhaps change several function signatures to pass in
the original entity list instead of the AABB list)

3\. Re-abstract (perhaps your gameplay logic should take something else that
accounts for things like the fog of war instead of a raw list of entities?)

4\. ???

~~~
ragnese
> What differentiates addThreeTimes(i) from sqrt(x) or average(x,y) or
> pow(x,y) or multiply(x,y)? Not how many call sites it has, nor the presence
> of a dedicated operator to the function in the language. Instead, I'd say:
> the function's reusability, composability, commonness, ... or to put it
> another way: addThreeTimes is an "abstraction" \- it's just a poor garbage
> unreusable unremarkable unrememberable abstraction with no expressive power.

I agree that call sites or presence of language operators is not the defining
distinction here. But I disagree that reusability, composability, or
commonness (is that not "call sites"?) are somehow defining features of an
abstraction, either. Obviously, those are good qualities for code to have, but
that's not related to what I'm thinking about.

The difference in my example is specific to the ladder of abstraction from
addition to multiplication. When I was taught multiplication in early grade
school, I was taught it as basically just being another way to write addition.
When I first learned it, I would do exercises that involved taking an
expression like "3 * 5" and translating it to "3 + 3 + 3 + 3 + 3" and then
evaluating that. However, after time, I've stopped thinking about
multiplication as addition. In my mind, I just think of multiplication as its
own thing. I've fully internalize the "abstraction" because I don't even think
about addition anymore when I see multiplication.

So, when we take a Year, Make, Model, and Color and group them together and
call it "Car", we're making an abstraction and it has little to do with code
duplication. It has much more to do with wanting to think about higher-order
constructs. You and I agree here, as per your first paragraph.

If I have some kind of rendering engine and I find myself often rotating, then
shifting a shape, I can write a `rotateThenShift(Shape, angle, distance) ->
Shape` function and not feel like I've abstracted anything. I'm still
"talking" about a shape and manually moving it around. Even if I just rename
that function to `foobinate(Shape, angle, distance)`, I feel like I'm closer
(but not quite) to a new level of abstraction because now I'm talking about
some higher-order concept in my domain (assuming "foobinate" would be some
kind of term from geometry that a domain expert might know).

All other points about good or bad abstractions apply. I just don't think
every single function we write is a new abstraction.

~~~
MaulingMonkey
> commonness (is that not "call sites"?)

I realize it's been 8 days, but I've mulled over the distinction and figured
out the point I'm trying to make - and it's a matter of concept reuse vs code
reuse. I might write a once-off, project specific, completely nonreusable
function, with exactly one call site, but it still might be named after and
based off of reusable _concepts_.

A concrete example that comes to mind: I often write a "main" function, even
in scripting languages that don't require it. This lets me place the core
logic at the start of the script for ease-of-reading/browsing _without_ having
all it's dependency functions defined yet. I then invoke this main function
exactly once, at the bottom of the script.

This is clearly not code reuse nor code deduplication - but it _is_ concept
reuse, the concept being "the main entry point of an executable process."

I might write a mathematical function like "abs" or "distance" as a quick
local lambda function without intending to reuse it as well. I might later
refactor to reuse/deduplicate that code by moving it into a common shared
library of some sort. I might then later undo that refactoring to make a
script nice and self-contained / standalone / decoupled / to shield it from
upstream version churn / to improve build times / ???

> multiplication

If you'd only used multiplication exactly once, it wouldn't have had much
staying power as a useful abstraction. That it's a repeating, common, reusable
pattern that can be useful in your day to day life is part of what makes it a
useful abstraction worth internalizing.

------
adrianmonk
Two questions (genuine, not rhetorical):

(1) How much of this is because it's _actually hard_ to back out of the wrong
abstraction and pivot to the correct one, and how much of it is other causes?

The article hints at this with, "Programmer B feels honor-bound to retain the
existing abstraction." Why do they feel this way, and is the feeling
legitimate? Do they lack the deep understanding to make the change, or are
they not rewarded for it, or are they unwilling to take ownership, or is it
some other reason? I could see it going either way, but the point is to
understand whether you're really stuck with that abstraction or not.

(2) How much of the wrong abstraction is because people lack up front
information to be able to know what the right abstraction is, and how much of
it is because choosing good abstractions (in general and specifically ones
that are resilient in the face of changing requirements) is a skill that takes
work/time/experience/etc. to develop?

If it's due to being unable to predict the future, then it makes sense to
avoid abstractions. If it's due to not being as good as you could be at
creating abstractions, then maybe improving your ability to do so would allow
a third option: instead of choosing between duplication and a bad abstraction,
maybe you can choose a good abstraction.

~~~
zbentley
> Why do they feel this way, and is the feeling legitimate?

In my experience, it's because the amount of diff (red or green) in a change
request is--consciously or subconsciously--correlated with risk.

Even though we killed SLoC as a productivity metric years ago, the idea that
"change/risk is proportional to diff size" is still pervasive.

I'm totally into YAGNI/"code volume is liability" school of thought. But
equating _change_ volume with liability is a subtly different and very harmful
pattern.

Adding a single conditional inside your typical 1200 line mixed-concern
business-critical horrorshow function may assume a much greater liability
(liability as in bug risk and liability as in risk/difficulty of future
changes) than e.g. deleting a bunch of unused branches, or doing a function-
extraction refactor pass. Standard "change one thing at a time" good
engineering practices still apply of course.

------
runald
For something that argues against bad abstractions, the article sure is
lacking in concrete examples and makes a point in abstract. A lot of people
will likely misinterpret or get the idea that abstraction is only done for
duplicated code (DRY as some people would call it). I think the wrong/bad
"abstractions" here mostly refers to abstraction that was made over common
code that is very specific in a context and is very susceptible to domain
changes.

But there are a lot of other kinds of abstraction aside from DRY. There are
abstractions made to reduce clutter and hide implementation detail and will
likely be used only once. There are also abstraction that are more general and
aren't coupled to the domain. These abstractions are more reusable and
composable, and are immune to domain changes such as the step 6 in the
article. Some people would find these kinds of abstractions harder to digest,
but I personally consider these kinds of abstractions as extensions to the
standard library, or even additions to the vocabulary of the programming
language.

Note that I don't claim that general abstractions are necessarily better,
since the generality can be made to the extreme and we'd have monads for
breakfast.

All in all, I agree with the article, except that it is only referring to one
kind of abstraction, although I hesitate to call it as such.

------
goto11
I'm skeptical because it is really easy to un-share code by copying it into
multiple places but it is very hard to unify duplicated code. So I prefer to
err on the side of sharing.

But yes, you should be ready to change sharing into duplication if you realize
the code is just "accidentally similar" and need to evolve in separate
directions.

In practice I have seen a lot more pain due to duplicate code compared to the
issue of over-abstracting code, because the latter is much easier to fix.

~~~
joeframbach
On the other hand, it's really difficult to know who is using that shared
code. If you make an innocuous change in a shared method, it could affect
someone else you don't know.

~~~
bcrosby95
It's a million times easier than figuring out if those minor differences in
duplicate code are accidental or on purpose.

As bad as a flag-laden method might be, you know the intent of all callers.

------
ricksharp
The mistake is creating an abstraction because of seeing duplication.

DRY is not a good guiding principle. It is an anti-principle.

Abstractions should only be created when they have a clear purpose and create
a simpler architecture by encapsulating a single concern.

The reality is that all code is duplication. The reason we write code is
because it is the most concise language to specify the intended goal _in the
current context_.

What is unique is not the code that we are writing. The unique part is the
code in the current context and each level of abstraction separates the
context from the implementation - so that abstraction must be beneficial in
organizing the overall solution into individual logical components of singular
concern.

------
random3
This is so true, but so shallow too. I think the big mistake is to treat the
code as "the main thing" when in reality it's just a model (a golem) mimicking
some "other thing"

We're missing an entire set of code characterizations. Yes we have a "pattern
language" but there's not much to characterize it structurally wrt "code
distance" from one part of the code to the other (e.g. in call stack depth as
well as in breadth).

And again all of this needs to happen wrt the "abstraction" not the code
itself. Having 10 methods 90% duplicated in a single file with 10% pecent
difference is many times better than trying to abstract it.

Having the same "unit conversion" function duplicated in 3 parts of the code
can be disastrous.

These two examples are very easy to see and understand, but in reality you're
always in a continuous state in between. And "code smells" like passing too
many parameters or doing "blast radius" for certain code changes are only
watching for side-effects of a missing "code theory". An interesting book on
the topic is "Your code as a crime scene".

The bottom line is we're trying to fix these problems over and over again
without having a good understanding of what the real problem is and this leads
to too many rules too easy to misinterpret unless you are already a "senior
artist"

~~~
ijidak
> Having the same "unit conversion" function duplicated in 3 parts of the code
> can be disastrous.

This.

I feel like it's really about cognitive load to remember and recognize the
differences.

Duplication in 3 distant files, places a heavy load on the developer to:

1\. Discover the duplication 2\. To grasp the reason for the differences in
the 3 different locations. 3\. Remember these things

Whereas when the duplication is in the SAME file, #1, #2, and #3 can become
very manageable cognitively.

Now the question changes to..

Is the cognitive load of dealing with the different special cases in a single
de-duplicated method GREATER than simply leaving them in separate methods?

Often the answer is duplication WITHIN a file is less of a cognitive load.

Whereas duplication ACROSS files is a heavy cognitive load.

Minimizing cognitive load minimizes mistakes. And minimizes developer fatigue.
Thus boosting productivity.

At least, that's my development philosophy, even though I've never seen it in
a design pattern or a book.

It just seems to make sense.

------
bob1029
This whole thing exists on a normalized/de-normalized spectrum. The problem is
that both ends have pros/cons.

On the normalized side, you have the benefit of single-point-of-touch and
enforcement of a standard implementation. This can make code maintenance
easier if used in the correct places. It can make code maintenance a living
nightmare if you try to normalize too many contexts into one method. If you
find yourself 10 layers deep in a conditional statement trying to determine
specific context, you may be better off with some degree of de-normalization
(duplication).

On the de-normalized side, you have the benefit of specific, scoped
implementations. Models and logic pertain more specifically to a particular
domain or function. This can make reasoning with complex logic much easier as
you are able to deal with specific business processes in isolation. You will
likely see fewer conditionals in de-normalized codesites. Obvious downsides
are that if you need to fix a bug with some piece of logic and 100 different
features implement that separately, you can wind up with a nasty code
maintenance session.

I find that a careful combination of both of these ideas results in the most
ideal application. Stateless common code abstractions which cross-cut
stateful, feature-specific code abstractions seems to be the Goldilocks for
our most complicated software.

------
brandonmenc
Junior programmers duplicate everything.

Intermediate programmers try to abstract away absolutely every line that
occurs more than once.

Expert programmers know when to abstract and when to just let it be and
duplicate.

------
cjfd
If there is one single article about programming that I hate it is this one.
It is completely the wrong message. One should instead be very eager to
eliminate duplication. To avoid the pitfalls that the article notes one should
create abstractions that are the minimal ones required to remove the
duplication to avoid over-engineering. Also one should keep improving the
abstractions. That way one can turn the abstraction that turned out to be
wrong into the right one. It is the attitude of constant improvement that will
make one succeed as opposed to the attitude of fear of changing something that
this article seems to encourage. When one does things one learns. When one is
afraid to try things everything will just calcify until it is no longer
possible to add any new features. What one does need to make the refactoring
work is automated tests.

~~~
Ensorceled
In 30 years, I can count on the fingers of one hand the number of times I've
encountered projects that were in trouble because there was copy/pasted code
everywhere and the team was not abstracting out of fear of breaking the
existing code.

What I have encountered is dozens of projects that had essentially ground to a
halt because of numerous deeply, and incorrectly, abstracted systems, modules
and libraries.

Correcting projects in this state has almost always been refactoring into
fewer abstractions; less complex, more cohesive and less coupling.

~~~
cjfd
Actually, I have in fact seen this. I worked at a place where this copy and
paste programming actually lead to functions that are many thousands of lines
long and are full of duplication and very deeply nested. At some point a file
was split because the compiler would not handle such a large file (!). Very
difficult to change anything.

And also, refactoring by removing abstraction is fine as well. The thing that
is not fine is having problems and doing nothing about them. To me it seem
that is what the article ultimately encourages to do.

------
scrozart
DRY gets abused regularly in my experience. It doesn't stop at method/class
abstractions either; I've seen entire microservices & plugins developed to
ensure each app doesn't have that one chunk of auth code, for instance, even
though they each may have subtly different requirements (those extra params
again). The logical end to this sort of thing is infinitely flexible/generic
multipurpose code, when the solution is really, probably increased
specificity. DRY is probably the lowest-hanging fruit for practices/patterns,
and I think this leads to a disproportionate focus on it.

~~~
hesdeadjim
It’s also easy compared to solving new problems, so it can be an emotionally
safe way of feeling productive. Failure is difficult to measure until the
abstraction falls flat on its face months later, at which point it can be
chalked up to the demons of “changing requirements”.

~~~
zbentley
That is a very, very important point; well put.

The "of course it sucks: changing requirements!" boogeyman means one of two
things: "the code was written to do the wrong thing because requirements
changed/weren't communicated" or "the code was _hard to change_ when it needed
to do a new thing".

Figuring out which of those two is in play is very important.

------
jack_h
I would say that if developers are hacking on an abstraction that is ill-
suited to the task until the code base is a nightmare, they will take this
advice and duplicate code until it's a nightmare.

The fact of the matter is every line of code that is written has an associated
cost. Developers all too often pay that cost by incurring technical debt.

------
haolez
That's mostly how I matured as a developer: I find myself abstracting less and
writing less code today than I did 10 years ago, but I'm more productive
today, my code is cheaper to maintain and has fewer bugs. Sometimes, I will
literally copy paste a small amount of logic just to avoid making a future
reader of this code to keep hunting around where the business logic is
actually implemented. "It's right here, my dear future reader!".

Or maybe I was just a really bad programmer 10 years go :)

------
gm
This advice just _feels_ very wrong. After thinking about it and seeing the
other comments, some remarks:

1) It's fine to go back and duplicate code after you correct the abstraction.
But it should be the _first_ phase in doing a larger pass to refactor code to
fit the current business requirements. If you forgo the _second_ step, which
should be to search for suitable abstractions again, you are absolutely
guaranteed to be left with shit code that breaks in this situation, but not
that other one, and no one knows why. I would absolutely only duplicate code
as the prequel to deduplicating it again with updated abstractions.

2) If you do any of this without thorough unit tests you're insane. Keep the
wrongly-abstracted code unless you have time to thoroughly fix the mess you
will have made when you duplicate code again and introduce bugs (you're human,
after all).

2a) If you are going to do this and there are no unit tests, create those unit
tests before you touch the code initially (before the duplication).

3) Some of the comments saying you should wait until you implement something
two or three times before creating an abstraction seem like comp sci 101 rules
of thumb. It's way too simplistic a rule, way too general. Prematurely
abstracted (haha!). The type of project and the type of company/industry will
tell you what the right tradeoff is.

That is all.

~~~
haolez
You are assuming that the code is a moving target. Not every software project
behaves that way. Sometimes, the software gets done as is.

~~~
gm
In that case, then the original problem (incorrect abstraction) does not
exist, or at least does not get worse over time, and thus does not need
fixing.

------
bcrosby95
I find it interesting that comments on these articles mainly discuss 1 aspect
about it. But rarely this part:

> Don't get trapped by the sunk cost fallacy.

In my experience, yes, programmers are hesitant to throw out an abstraction.
Why not work to change this, rather than telling people not to abstract?

~~~
ben509
I don't think it's a sunk cost fallacy. I think the hesitation is more for
social reasons, often not wanting to do a big pull request that's going to be
scrutinized.

~~~
Tainnor
"Big pull requests" that are unannounced are always problematic because who
wants to be the person saying "all of this work you've done is wrong"?

In such situations, it's good to get buy-in from other people before
attempting to do such a thing. Make a proposal for a big change and discuss
it. There's still a chance that, in the implementation it doesn't work as
nicely as believed initially, but at least now it's less likely that the idea
will be rejected wholesale during code review.

------
preommr
I strongly dislike this article because the title is much broader than most of
the substance of the article.

Advising not to overextend an abstraction is inarguable.

The actual title "Duplication is far cheaper than the wrong abstraction", and
the thing that people will really discuss, is a loaded statement that's going
to need a lot of caveats.

------
klyrs
I use DRY in two ways. The first is that I'm happy to make 2 or 3 copies of a
snippet before promoting that to a new function.

The second is when I find a bug in a duplicated snippet. I'll mend the snippet
and its duplicates, once or twice before promoting it to a function.

In the rarer (in my line of work) instance that a common snippet gets used
with several intrusive variations, I usually document the pattern. It's
tempting to use templates, lambda functions, closures, coroutines, etc but far
simpler to duplicate the code. But again, if a bug (or refactor) crops up and
I need to fix it in many places, then I'll spend some time thinking about
abstraction and weigh the options with the benefit of hindsight.

------
crazygringo
Another tip is: if you're duplicating, and they're not lines of code that are
visually obviously next to each other, then leave a comment next to both
instances mentioning the existence of the other.

There's nothing inherently wrong with duplication, except that if you change
or fix a bug in one, you need to not forget about the other. Creating a single
function solves this... but at the potential cost of creating the wrong
abstraction.

When you're at only 1 or 2 extra instances of the code, just maintaining a
"pointer" to the other case(s) with a comment serves the same purpose.

(Of course, this requires discipline to always include the comments, and to
always follow them when making a change.)

~~~
stormdennis
Would the risk forgetting to update the comments not be a reason for creating
a wrapper method that handled calls to both and contained the relevant advice?

------
gorgoiler
Brilliant insight. Always remember: (1) make it work, (2) make it right, (3)
make it fast. 80% of projects get scrapped in between (1) and (2) because you
end up realizing you wanted something completely different anyway.

~~~
willcipriano
> (1) make it work, (2) make it right, (3) make it fast.

I've always disagreed with this. In my view you should make it a habit to
write optimized code. This isn't agonizing over minor implementation details
but keeping in mind the time complexity of whatever you are writing and
working towards a optimal solution from the start. You should know what
abstractions in your language are expensive and avoid them. You should know
roughly the purpose of a database table you create and add the indexes that
make sense even if you don't intend to use them right away. You should know
that thousands of method lookups in a tight loop will be slow. You should have
a feel for "this is a problem someone else probably solved, is there a optimal
implementation I can find somewhere?". You should know when you use a value
often and cache it to start with. Over time the gap between writing
unoptimized and mostly optimized code gets smaller and smaller just like
practice improves any skill.

~~~
sagichmal
> In my view you should make it a habit to write optimized code.

It depends on your domain.

If you're writing for embedded, or games, or other things where performance is
table stakes, then sure.

If you're writing code to meet (always changing) business requirements in a
team with other people, writing optimized code first is actively harmful. It
inhibits understandability and maintainability, which are the most important
virtues of this type of programming. And this is true even if performance is
important: optimizations, i.e. any implementation other than the most obvious
and idiomatic, must always be justified with profiling.

~~~
Tainnor
You're mostly right, but even in typical LOB applications, there are some low-
hanging fruits you should really pay attention to. One common example are N+1
queries.

And if you _do_ find yourself writing an algorithm (something which happens
more rarely in LOB applications, but can still happen occasionally), it's
probably still good to create algorithms that are of a lower complexity class,
provided they are not that much harder to understand or don't have other
significant drawbacks. I remember that I once accidentally created an
algorithm with a complexity of O(n!).

------
thinkloop
A related problem: duplication is not equality. If two things happen to be the
same right now, it doesn't mean they are intrinsically the same thing. If you
have multiple products selling for $59.99 they shouldn't share a function to
generate the "duplicate" price. Abstractions needs to be driven by conceptual
equivalence, not value equivalence, where duplication is a good hint for a
potential candidate of abstraction, but not the complete answer alone.

------
allenu
In a large organization, the other thing you notice with trying to fix
duplicated code is, if you take on refactoring it all, you are now responsible
to make sure everything still works AND that you do not inhibit any future
work. You are now responsible for more than you may have bargained for.

Coming up with the right abstraction takes some predicting of future use-
cases. It's more than just refactoring work to put it all in one place.

------
SkyPuncher
I think there's a big cultural challenge with adopting duplication. It goes
against most people's career growth objectives.

Being able to effectively create clean, re-usable abstractions is a measure of
being a "senior" engineer at many places. In other words, to be viewed as
senior, you need to be able to effectively write abstraction frequently. It's
hard to measure an abstraction in the moment, so a lot of people assume that
the senior simply knows better.

I find this extends to a lot of programming. Seniors will often use
unnecessary tricks or paradigms simply because they can. It can make it
extremely difficult for junior developers to grok code. Often this re-enforces
seniority. "If only the seniors can work on a section of code, then they are
senior". Likewise, there are so many books on crazy architectures and
patterns. It's really neat to understand, but I've determined those books are
pretty much self-serving.

\----

I've found that my work is often far more limited by the domain/business logic
than any sort of programming logic. I'll happily write code that looks really
basic - because I know ANYBODY can come in and work with that code. If I write
code that a junior needs to ask me questions like "what is this pattern?" or
"what does this mean?", I've written bad code.

\-----

With all that being said, every single job interview I've ever had expects me
to write code at the level of complexity that my title will be at. They'd much
rather see me build some sort of abstract/brittle concept than using some
constants and switch statements. The prior looks cool, the latter looks
normal.

~~~
leto_ii
> I think there's a big cultural challenge with adopting duplication. It goes
> against most people's career growth objectives.

My experience is the complete opposite :D. What I've noticed is that the
people who 'deliver' quickly (without much regard for what might be called
code quality) and fulfill business requirements without much questioning are
perceived as more valuable.

> I've found that my work is often far more limited by the domain/business
> logic than any sort of programming logic.

I broadly agree with this statement. However, just like a good carpenter knows
how to properly build a bookcase, a table, a roof etc. a good developer should
understand the programming logic and know how to apply it. Business
requirements need to be fulfilled, but it's up to us to decide how to do that.
More so, I think it's up to us to push back when we feel business requirements
don't make sense from a technical point of view, or even from a business point
of view.

------
zarathustreal
I’ve seen this “hot take” a few times before and even see developers that I
would have considered very good agree with it. Consider that all code is
computation, this is the point of a computer: to compute. Consider that
abstraction doesn’t seem valuable -to you- for a multitude of reasons. Perhaps
you’re using a flawed paradigm that emphasizes objects over computation. This
would obviously mean abstraction -increases- the difficulty of reasoning about
your code. Perhaps you don’t have a mental map of appropriate abstractions due
to a lack of education or knowledge gap, this could lead you down the path of
creating abstractions which reduce duplicate characters or lines of text but
are not logically sound (“leaky abstractions.”) All of these things come
together in a modern “enterprise” software environment in just the right way
such that abstraction starts to seem like a bad idea. Do not fall into this
line of thinking. Study functional programming. Study algebraic structures.
Eventually the computer science will start to make sense.

------
hota_mazi
> prefer duplication over the wrong abstraction

Such a strange advice.

If you're able to recognize the wrong abstraction right away, surely you would
not use it, right?

~~~
allenu
I think the intent was to communicate that abstractions aren't always right.

Some people might think that because there's duplicate code and that the
abstracted code maps to the duplicated code 1 to 1 and leads to fewer lines in
total, it's a good abstraction, not realizing that there are costs to doing
this that may not be aware of.

------
layer8
The main takeaway from the article is that abstractions which have become
inadequate should be corrected (removed and/or replaced by adequate ones) as
soon as possible. A corollary is that abstractions should be designed such
that they can be replaced or removed without too much difficulty. A common
problem in legacy code bases is not just that they contain many inadequate
abstractions, but that the abstractions are entangled with each other such
that changing one requires changing a dozen others. You start pulling at one
end and eventually realize that it’s all one large Gordian knot. One thing
that I learned the hard way over the years is to design abstractions as
loosely coupled and as independent from each other as possible. Then it
becomes more practical to replace them when needed.

------
hackinthebochs
I couldn't disagree more. There is no such thing as abstracting too early
(this does not go for structural abstractions like factories, singletons,
etc). The best code is code you don't have to read because of strong, well-
named functional boundaries.

------
naringas
sometimes it's better to copy and paste some code only to make each copy
diverge more and more over time (somewhat like a starting template) as opposed
to introduction an abstraction to generalize some slightly different behaviors
only to use said abstraction twice.

this makes even more sense when the code will live on in different programs

there's a point when incurring the cognitive overhead costs of the abstraction
become worthwhile, probably after the 3rd time. but my point is that it's also
important to consider that the abstraction introduces some coupling between
the parts of the code.

~~~
rightbyte
I find it easier to read long functions of code than jumping around in helper
functions or abstractions. Especially if I am not familiar with the code base
and don't know common functions by heart.

------
memexy
> Re-introduce duplication by inlining the abstracted code back into every
> caller.

Ideally this type of workflow would be supported by the code editor. I've done
this manually a few times and it's not fun.

------
chiefalchemist
Why not simply duplicate the abstraction, refactor as needed, and adjust the
necessary caller(s)?

Having to know, find and maintain the individual duplications feels dirty and
its own way wrong.

Choose your wrongs wisely?

------
kevsim
Relevant post from earlier today
[https://news.ycombinator.com/item?id=23735991](https://news.ycombinator.com/item?id=23735991)

------
ridaj
Previously discussed here:
[https://news.ycombinator.com/item?id=17578714](https://news.ycombinator.com/item?id=17578714)

~~~
arendtio
I find that first comment particularly insightful.

However, I am not sure about the order of state and coupling. To me it seems
to depend on the language, as for functional languages, avoiding state is king
and in object oriented environments, coupling could be a more important
factor.

------
jbmsf
One of the reasons duplication is used badly is that it is one of the easiest
abstractions to recognize.

One of the ways I've seen DRY go horribly wrong involves reusable code units
evolving into shared dependencies that often interdepend in complex ways.
Unfortunately, the problems of such a system are observed much later than the
original code duplication and fewer people have the experience to see it
coming.

------
adamkl
Sandi mentions this during a talk she gave on refactoring a few years ago. [0]

It’s a great little video for showing junior developers how a messy bit of
code can be cleaned up with a few well chosen OOP patterns (and a set of unit
tests to cover your ass).

[0] [https://youtu.be/8bZh5LMaSmE](https://youtu.be/8bZh5LMaSmE)

------
vxNsr
I want to thank everyone here, I’ve been stuck for about a week now on an
issue that is entirely germane to this topic and the whole conversation here
really helped me flesh out what was wrong and allowed me to understand a path
forward. I’m honestly holding myself back from popping onto my computer right
now to start working on it.

------
tarkin2
"With C you can shoot your own foot. With C++ you can blow your own leg off".
I feel the same is true here.

The abstraction may be right at the time of writing, yet further on it often
becomes not only wrong, but a massive hindrance.

With time and effort, hacky code and be worked into shape. An eventual wrong
abstraction normally means a rewrite.

------
kolinko
I wish this article was available two years ago when I tried to explain this
to a bunch of juniors working for me...

~~~
nnutter
“ Posted on January 20, 2016 by Sandi Metz.”

~~~
kolinko
Damn, I wish I saw it back then :)

------
nbardy
This has been one of the hardest fought lessons I’ve learned it my programming
career, but also one of the most fruitful. I am to make my abstractions too
late rather than too early. My rule of thumb tends to me copy things six to
seven times before you try to build an abstraction for it.

------
worik
Really this is stating the obvious.

The social problem at step 6, 7, and 8 is a social and economic one. Having
the time, resources, and skill to do a job properly is very important. But
there are social and economic pressures to "just get it done".

This is a specific formulation of a general problem.

------
kureikain
I think one of the cool thing about pattern matching or language(In my case,
it's Elixir) that support function operator is we can have same method with
different argument sigunatures. So we don't have to duplicate or inherit
whatever and still share some common method.

------
why-el
Rob Pike discusses similar points in this section of his talk on Go Proverbs
[https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s](https://www.youtube.com/watch?v=PAAkCSZUG1c&t=9m28s).

------
Xlurker
I'd rather ctrl-f and change code in multiple places than deal with
abstraction hell.

------
tomphoolery
This again?? ;)

I love this post. A lot of wasted hours were spent in the past trying to use
abstractions that no longer made sense, but Sandi encouraged me to go back and
rethink a lot of that and now my code is way easier to read. Thanks Sandi!

------
recroad
Programmer B in Step 6 should have used SOLID and refactored to extend the
module (or something similar).

This is strawman argument which has little to do with the "wrong" abstraction
and everything to do with poor design choices.

------
ninetax
What are people's recommendations on books on how and when to create the right
abstractions?

Last year I read Zach Tellman's _Elements of Clojure_ and really loved the
parts that touched on the subject of abstraction.

------
dfischer
Reminds me of this discussion:
[https://news.ycombinator.com/item?id=12120752](https://news.ycombinator.com/item?id=12120752)
(John Carmack on inlined code).

------
pps43
Related to [http://yosefk.com/blog/redundancy-vs-dependencies-which-
is-w...](http://yosefk.com/blog/redundancy-vs-dependencies-which-is-
worse.html)

------
kuharich
Prior comments:
[https://news.ycombinator.com/item?id=17578714](https://news.ycombinator.com/item?id=17578714)

------
gumby
Early de duplication is the equivalent of early optimization: a bad idea that
boxes you in.

Duplicate code is a sign that there _could_ be a generalization missing.

------
neetrain
I think the term "wrong" causes all the misunderstandings.

It sounds like the abstraction was wrong _in the first place_.

Can it be called "rotten" abstraction?

------
avodonosov
> they alter the code to take a parameter, and then add logic to conditionally
> do the right thing based on the value of that parameter

But that's a textbook example of bad code, competent coders don't do this.

Update: for example see Thinking Forth chapter "Factoring Techniques", around
the tip "Don’t pass control flags downward.". Page 174 in the onscreen PDF
downloadable from sourceforge.

And there is no need for duplication. The bigger function can be split into
several parts so that instead of one call with flag everyone calls needed set
of smaller functions.

~~~
zbentley
> that's a textbook example of bad code, competent coders don't do this.

That's reductive and dismissive.

There's a ton of subtlety in even defining the terms for that "best practice".
What counts as a control flag versus a necessary choice that must be made by
callers? Are you still passing control flags if you combine them into a
settings object? What if you use a builder pattern to configure flags before
invoking the business logic--is that better/worse/the same? What if you
capture settings inside a closure and pass that around as a callback? How far
"downward" is too far? How far is not far enough (e.g. all callers are
inlining every decision point)?

The answer to all of those is, of course, "it depends on a lot of things".

And that's before you even get into the reality (which a sibling comment
pointed out) that even if we grant that this is inherently bad code, that
doesn't imply anything about the competence of the coder--some folks aren't
put in positions where they can do a good job.

Unrelated aside: Thinking Forth is an excellent book! Easy to jump into/out of
in a "bite size" way, applicable to all sorts of programming, not just Forth
programming.

------
kristo
There should be a code tool to re-inline code from an abstraction

------
djhaskin987
Mods this article is old, should be labeled 2016.

------
ulisesrmzroche
“Premature optimization is the root of all evil”

------
amelius
A manager once asked me: please reuse as much code as you possibly can.

This reminded me of that.

------
sheeshkebab
I’m not sure why this is #1... but since it is, both of these - duplication
and wrong abstractions - are otherwise known as technical debt.

~~~
dasil003
Not necessarily. Technical debt is when you do something quick and dirty to
get a feature out in the short-term knowing that it won't be maintainable,
scalable, etc, but you do it anyway with the expectation that you'll fix it
later. Some duplication and wrong abstractions are caused by this, but
definitely not all.

~~~
hrhrhrd
No, technical debt is a very general category that includes deliberate hacks,
structural flaws, and small mistake bugs. It's anything that over time will
damage the code base, duplications and wrong abstractions being very much
included in that

~~~
dasil003
You're welcome to your own definitions, but personally I keep bitrot, deferred
maintenance, and "structural flaws" (which can be subjective and dependent on
use cases and scale) out of the bucket of technical debt since it robs the
metaphor of a defining aspect: intentionality. Debt is not something that
happens passively as the world changes around you, it's something which you
sign up for.

~~~
quinnirill
If you unintentionally destroy property and have to pay for it, you’re in
debt.

We even have a concept of life debt.

Some debt is intentional, some incidental.

Most technical debt I’ve seen was not intentional, just a well meaning design
that was created to serve a purpose that eventually outgrew it, and that’s
when the interest started to pile up.

And happening passively is exactly what it does, interest rates change, your
ability to make downpayments change. All part of the very well functioning
metaphor in this context.

