
Prefer duplication over the wrong abstraction - hyperpallium
http://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction?duplication
======
sctb
Previous discussion:
[https://news.ycombinator.com/item?id=11032296](https://news.ycombinator.com/item?id=11032296)

~~~
Rexxar
I generally agree that it's good not to have many dupe but in this case there
was a lot of upvotes and a lot of comments and tagging the link as "dupe" just
destroy the discussions.

Why not just make successful "dupe" link stay but not bring karma to the
poster ? That will stop the perverse incentive to try to submit old successful
link for easy karma.

~~~
sitkack
The new title makes me so happy.

------
jerf
For a long time I've thought that Don't Repeat Yourself is the most important
software engineering rule. I still do, but in line with this article, I've
learned that some things that appear to be duplication if you just look at the
literal code actually aren't. Just because you've got a chunk of ten or twenty
lines that are identical right now doesn't mean that they are actually
identical.

It's hard to give concrete examples of such an abstract concept, but one I
encounter with some frequency is some chunk of code that grabs something from
a URL. Assuming you're using some half decent library that takes care of the
brute mechanics, it's easy to end up with repetitive code to deal with setting
up a form and putting on a couple headers and dealing with authentication or
whatever, and it's tempting to try abstract on that yet further. Yet I find
all the attempts to do so make the problem worse, because the superficially-
similar code is actually quite dissimilar. The pressures on the instances are
different. Here I may be deeply concerned about SSL, there I may be adding
header hacks for some broken auth scheme, over there I've got some sort of
serialization problem. Trying to put them all behind the same abstraction is
just pain. It turns out the level of abstraction offered by these often years-
old, mature HTTP libraries is more correct than may initially meet the eye.

For code to be "duplicated", the full context of the code need to be
duplicated, not just have superficial visual similarities. It's important that
all the code with the same context, the same evolutionary pressures, the same
likely change planes end up in one place, and it is equally important that
code that lacks that similarity _not_ end up bound together. In the end, it's
the same error.

~~~
TeMPOraL
I agree. I follow the DRY principle _very_ strongly, but not on the literal
code level - just one level above, on the "what is this doing" level. So in
your example, if I see a dozen places where the program is using that URL
library to e.g. send requests to get a different dataset from the same API,
I'll quickly DRY it into a function. That's because all those places do the
same thing - "fetch a dataset named $name from API XYZ". But I'll leave alone
all the other places that use the same URL library, even with pretty much
identical parameters, if they're doing different things. They may look the
same now, but they may start to differ later. And even if they don't, placing
them under a common abstraction when they represent _different_ things is
basically a lie. I don't like it when the code lies to people.

------
erichocean
My favorite example of code you should almost always[0] duplicate are view
classes (in MVC terminology).

Instead, I see people subclassing views to change their behavior and dealing
with absolutely mind-numbing behavioral bugs in their subclass. Furthermore,
the subclass's implementation is totally dependent on the _exact_
implementation of the superclass. Change it, and you get a whole crop of new,
hard to debug, bugs.

Solution? When you have a view class that almost, but not quite, does what you
want, copy the entire class, rename it, and make it do what you want. You'll
be done before your next break, and nothing that happens anywhere else will
cause it to break in the future.

[0] The one exception are situations where all you are doing is configuring an
existing view class in code and/or adding behavior in places that were
_specifically_ anticipated by the class in question. For example, setting up
the options on an existing table view class, and responding to row selections
using a method meant to be overridden by subclasses.

But if your table view is actually different than the class you've got now—for
example, it does something strange in responses to selections—you're much
better off finding an open source replacement for the table view (if it's
vendor-provided, like UITableView in iOS), copying it, renaming, and modifying
it to do exactly what you need to it do.

~~~
gnaritas
> Instead, I see people subclassing views to change their behavior and dealing
> with absolutely mind-numbing behavioral bugs in their subclass.

Because they're breaking a rule of good design, implementations should depend
on abstractions, not other implementations. Never subclass a concrete class,
only abstract classes.

> Solution? When you have a view class that almost, but not quite, does what
> you want, copy the entire class, rename it, and make it do what you want.

Better solution, move the duplicate code into a common abstract superclass.
Allow the views to differ where they're concrete and keep the common code
abstract in the superclass. Then you can't ever break one implementation by
changing another implementation, your core complaint.

~~~
TeMPOraL
> _Better solution, move the duplicate code into a common abstract superclass.
> Allow the views to differ where they 're concrete and keep the common code
> abstract in the superclass. Then you can't ever break one implementation by
> changing another implementation, your core complaint._

And then you suddenly find yourself having a "god object", a class that does
huge amount of things and most of them are not related to anything else in the
class - they just exist to facilitate code deduplication lower in the
inheritance tree.

And then, one page _really_ needs a different version of a behaviour that's
common to many of them, and you're back to either duplicating or polluting the
"god object" even more.

Inheritance is a bad tool for this problem. It's better to separate different
pieces of functionality and then _compose them_ in views. If your view needs a
component behaving in a slightly different way, then you can just create a new
component (or even subclass the old one, if you must).

~~~
gnaritas
> And then you suddenly find yourself having a "god object", a class that does
> huge amount of things and most of them are not related to anything else in
> the class - they just exist to facilitate code deduplication lower in the
> inheritance tree.

No, you don't. Point of fact if most of the subclasses are duplicating some
code, it does relate to what the superclass is and that code belongs there.
This is not a god object.

> And then, one page really needs a different version of a behaviour that's
> common to many of them, and you're back to either duplicating or polluting
> the "god object" even more.

That's what overriding is for.

> Inheritance is a bad tool for this problem. It's better to separate
> different pieces of functionality and then compose them in views. If your
> view needs a component behaving in a slightly different way, then you can
> just create a new component (or even subclass the old one, if you must).

If composition is possible, it's always a better choice than inheritance of
course; when it isn't, a shallow inheritance hierarchy is fine. Deep
hierarchies are never ok.

> (or even subclass the old one, if you must).

No, you don't subclass concrete implementations to change them, that leads
back to the whole problem you're complaining about. Implementations should
never descend from other implementations.

------
danjoc
I see the author's point, but it overlooks code maintenance.

At step 4, where "Time passes" it should read, Programmer A fixes bugs,
optimizes, adds features, etc. Repeat.

If the code is duplicated, the maintenance effort is also duplicated or the
duplicates diverge significantly making them more difficult to understand in
relation to each other. Why does methodA do this, but methodB does that?

Tools can help prevent ProgrammerB and ProgrammerX from coming in and stinking
up the place. Set up sonarqube and let it tell them they must pass the quality
gate before merging.

~~~
bluejekyll
Agreed. I think this is actually dangerous advice. While I agree that the
wrong abstraction can cause lots of pain, the repetition of a bug through
duplicate code is arguably a worse issue.

One example might be not using an abstraction for accessing filesystem
resources in a web based service. You might end up duplicating a ton of code
that improperly takes a string from a request without sanitizing it and then
have a massive security problem all over your code.

~~~
thinkloop
This would not be a candidate for duplication tho, because the underlying
intent is also logically the same. Author is talking about things that
coincidentally happen to have similar code at the moment, but are
fundamentally and logically unrelated, having a high likelihood of diverging
in the near future.

~~~
bluejekyll
I understand the intent. I still believe the advice is dangerous if improperly
applied.

~~~
therealjumbo
That's true for all advice, especially in software development.

------
dahart
I love the thinking and agree with this as much as can be agreed with
something general and lacking specific examples or specific situations. It's a
nice reminder that there are other options besides modifying a base class to
fit only one case's requirements.

I'd call it forking instead of duplication to emphasize that the paths are
different, not equal.

When this problem does come up, it's not usually trivial. Often the
abstraction doesn't break at the same time the second use is introduced, it
breaks much later after the second use is built, making it more difficult to
fix.

The much more common problem I've experienced in industry is people making an
abstraction with only one use case, anticipating before even having a second
use case. It's much easier to discuss and fix if everything you're facing is
concrete, and anticipating uses without having them at hand nearly _always_
results in a broken abstraction.

I'd love to see more about how to determine the abstraction is wrong.

> If you find yourself passing parameters and adding conditional paths through
> shared code, the abstraction is incorrect.

This seems too simplistic. The right answer for this description is often
inheritance. What are better ways to identify a broken abstraction?

------
dwb
It seems to me that it would be better to address step 6, where programmer B
adds cruft to the abstraction rather than re-working it. I appreciate that
pragmatics sometimes dictates that duplication is the right option, but
instead of mostly giving up and advocating an emphasis on duplication, I'd
rather encourage better design of abstractions in the first place, and more
willingness to genuinely improve abstractions later on if necessary (rather
than more complex logic or whatever). Especially with abstractions as small as
methods – I very much question preferring duplication in cases as small-scale
as those. This may depend on the expressiveness of the language you're using,
though.

~~~
ralusek
This is the correct answer. It sounds like in that case the initial
abstraction was either poorly designed, or the behavior being added doesn't
fit the abstraction.

------
vemv
And conversely, the right abstraction is preferrable over duplication.

There's almost always chance to stop a little, think/research hard and create
something novel, maybe leveraging specific features your language/framework
offers.

~~~
menzoic
Duplication is fine until it gets out of hand, at which point you'll be able
to identify the right abstraction.

~~~
adrianN
Then you're in the situation that your code is a mess of copy-paste and its
obvious what you should do, but management prefers that you work on features
instead of changing "proven, reliable" code. Thus the mess is there to stay.

~~~
acdha
That bad culture is a separate problem which will make everything worse.

In your case, it seems mixed - while you do have to update code in multiple
places the flip side is that you can more confidently change code in one
location without breaking something else. If you're on a death-match piling up
technical debt, that might be less painful than dense inter-connected code,
albeit still worse than working somewhere with more long-term thinking.

~~~
majormajor
Yeah, picking the wrong abstraction too early _also_ gives you "proven,
reliable" code that's sometimes even harder to maintain.

First example of this that occurs to me was an old Rails app I took over. We
needed to migrate it up to modern versions of Ruby and Rails since it was
getting harder and harder to support the old ones, and they were about to hit
their maintenance end of lifes, but someone had extended the Rails framework
stuff with their own extra layer.

Now we had code that changed subtly with a new version of Ruby that was used
in dozens of places for actually-somewhat-different things, so the fixes had
to account for all those different ways it was being used too. Not a lot of
fun, you can fix one endpoint but then have to make sure you don't re-break it
with how you fix the next one...

------
hexalisk
I think this is especially pertinent for the current SOA/microservices trend
that many companies are adopting nowadays. Build small, simple, highly
focussed services that specialize at one thing. Services do one thing, and do
it well, at the cost of flexibility and the ability to generalize over
different use cases. I think this something that is perfectly ok and shouldn't
make programmers squirm, even if it leads to code duplication.

In some of the more popular microservice frameworks for instance, I see some
attempts to build abstractions for things like data persistence -(what if we
want to swap out Postgres for Mongo, etc?) But I think that's a good example
of a needless abstraction. In fact by building an abstraction like that, you
lose many of the features that perhaps made that specific database so great in
the first place.

It seems to be easier to mentally justify logic duplication in the hardware
world. An FPGA is an abstraction of logic. In general, it trades off
performance for more flexibility. An ASIC is specifically designed from the
ground up to do one task very well, but deviate from that task and it's
borderline useless. I think that for microservices, code should be treated
more like ASICs. One off, application specific code that we won't be afraid to
throwaway and rewrite from scratch if we need to.

------
drblast
This made me throw up in my mouth a little.

Duplication is not inherently _wrong_ as there is a point where you should
stop abstracting and live with repetitive code, but that point tends to be way
further out than most inexperienced devs would assume.

If you're using duplication as a temporary tool to help find the right
abstraction, fine. But don't check that code into source control.

The problem with duplication is that it's difficult or impossible to track. It
sucks if you have to spend two days figuring out some complex abstraction, but
_at least you know it 's there_ and you can work your way out to see all the
code that's affected by your change.

If there's duplication in the code base you might never know that you changed
function A and it works just great now, but function B that you have no idea
even exists because it was duplicated from function A three years ago now has
subtle bugs.

If an abstraction is wrong, make it right. Don't use that as an excuse to
abandon the difficult work and duplicate code. You're just pushing the problem
off on to the dev two years from now who has to maintain it.

~~~
jbooth
So then you make all your nice pretty abstractions, and something comes from
the business where it needs to be different in some wacky pathological case.
Now your abstractions suck.

Duplication leaves you room for that messy business logic. Increased
abstraction often means increased coupling, and that can be much harder to
maintain. It's a judgment call on a case-by-case basis.

~~~
gnaritas
If new business rules completely invalidate your abstractions in ways that
aren't easily fixable, you're doing it wrong. Those wacky pathological cases
should just be new implementations that plug right into your existing stuff,
even if it requires slightly reworking the abstraction it's better than giving
up and duplicating everything. If you're finding duplication easier, it just
means you're abstracting things poorly and need to get better at doing it.
Mostly likely, you've abstracted too early before you had enough different
cases to actually understand how to split the concrete from the abstract
correctly.

~~~
TeMPOraL
> _If new business rules completely invalidate your abstractions in ways that
> aren 't easily fixable, you're doing it wrong._

Why, you might have been doing it right all along. The abstractions were good
for old business rules, but are wrong for new business rules. You need to
replace your abstractions with better ones. You can't expect to predict future
business requirements perfectly 100% of the time; there's no shame in having
to rewrite stuff to accomodate changing reality.

~~~
gnaritas
Good abstractions aren't about predicting, they're about being nimble so that
when change does come along, it's easy, because you don't have a ton of
duplication to fix in order to change the abstraction. The new business rules
don't belong in the abstraction in general anyway, they belong in the concrete
implementations, that's the point of having implementations.

------
syncsynchalt
I'm so glad to hear a pithy saying given to this.

So many times I've fought for ripping out some indirection-heavy pattern and
just doing {thing} procedurally, only to run up against the concern of the
duplication it'll introduce.

~~~
weberc2
This doesn't validate the procedural position in a procedural vs polymorphism
debate; if anything, the example can be summarized by "there was a good
polymorphic abstraction, programmer B came along and couldn't figure out how
to extend the abstraction to implement his new requirement, so he made the
abstraction less polymorphic and more procedural, and then this happened over
and over again until the accumulated procedural logic made the whole thing
unbearably bad".

I think the appropriate summary for this article would be: bastardized
abstraction < duplicated logic < proper abstraction.

------
hiphipjorge
I feel this is good advice. I think the harder problem is getting buy-in from
everyone you work with to do this. Often times I take this approach and have
to really try to sell it to people reviewing my code because they've bought
into this idea of "DRY ever, no matter what".

DRY yourself is useful when 1. You're repeating yourself more than 2 or 3
times and 2. The inputs/outputs and intention of the code are very, very
clear. If that's not case, I'm ok with a little copy/paste.

------
fragsworth
Disagree. You should never "prefer" one over the other without considering and
understanding the actual line where the decision becomes non-obvious.

The author is missing a major important factor, that it entirely depends on
the size/scope of the duplication, and the size/scope of the changes necessary
for functionality.

More lines/complexity that won't change means you should be more likely to
want to abstract it. More changes to the code means you should be more likely
to want to duplicate it.

------
forrestthewoods
Man I dunno. This is so tough.

On one hand, absolutely. Stacking params on params is a recipe for disaster.
But often times with duplication you do need to update the duplicates. Which
means you wind up with 6 pieces of code all similar but slightly different
such that it fails in increasingly nuanced and edge casey ways.

No silver bullet unfortunately.

~~~
weberc2
You don't have to choose. If you have a function* with excessive arguments, it
has too many responsibilities. Identify the responsibilities and make a
function for each. Then replace the call sites of the original function with
only the new functions that are appropriate for that use case. If you find
that the same N new functions are repeatedly called in the same places, you
can pull those out into a new function as well. It's all very simple, though
learning to decompose responsibilities correctly takes practice.

* For sanity's sake, I'm just going to say "function" in place of "abstraction".

------
lisper
Another useful rule: if you find yourself needing to change an existing
abstraction, especially if you need to make it more complicated by (say)
adding a parameter, this is often an indication that your _design_ is broken,
and that you need to change _that_ so that you don't need the extra parameter.
Most often in my experience the root cause of most problems of this sort is
different data representations. Something is big-endian over here, little
endian over there. This quantity is represented as a string over here, as an
integer over there. I've found that being absolutely ruthless about being
consistent in data representations goes a very long way towards keeping my
code from spinning wildly out of control.

------
hugozap
This also applies to CSS and is the reason why most CSS codebases are a mess.
Functional CSS mitigates this (at the cost of duplication, but personally I've
found it much more easy to maintain).

The original problem comes from trying to map logical components to CSS
classes and forcing inheritance. This is a good intention but doesn't work
well. With the project evolving, it gets complicated to alter something
without breaking something else, because the inheritance chain means that rule
conflicts are likely and not that easy to spot, so to have a mental model of
how the element will behave visually means a lot of cognitive load.

------
dleslie
"Don't succumb to early optimization" applies as much to design patterns as it
does to algorithms.

------
esaym
I've wasted a lot of time thinking how to de-duplicate stuff only to have to
re-duplicate it a few months down the road (or fill up the de-duplicated
version with if statements..)

------
js8
How do you know that the abstraction is wrong (and how) if you don't create
it?

It seems to me like the article is saying - don't ever make mistakes, they are
costly.

~~~
nerdponx
I think the point of the article is that, if you aren't totally sure that an
abstraction is correct or justified, then it's better to avoid building one in
the first place. Not saying to never make mistakes, it's highlighting a
particular kind of mistake that people don't always think about, and suggests
that you can avoid making this mistake by not being so obsessive with DRY

~~~
js8
I think such conservative approach can be justified when working on existing
code, which has been somewhat proven to work. But I don't think it's justified
when you're building the code.

I simply don't think you can always get the good abstraction right. But if you
don't attempt it, you will never get it right. What you're advocating is not
attempting it.

~~~
ktRolster
If it's new code, and you are unsure what abstraction is needed, build the
simplest abstraction possible. YAGNI

That way, as the code develops, and it becomes more clear what is needed, then
you can easily build it at that time.

If you have code that needs to be duplicated, throw it into a function, a
lightweight and powerful abstraction.

~~~
js8
That's what I am advocating in this thread. ;-)

------
smailq
This is another one of the factors that makes development time estimates
difficult. While adding/removing/improving the code, you realize that some
part of the code needs to be abstracted/duplicated, or vice versa. And this
could take unknown amount of time to get it done.

Also, this is where the technical dept incurs over time, if the
abstraction/duplication is held off.

------
unabst
Is it just me or did the author just provide the perfect argument for pattern
matching in function calls as seen in Elixir? One can easily split/fork the
functionality of the same abstraction depending on the arguments. This
preserves the names (which preserves the abstraction), but extends the
functionality to adapt to the new use case in parallel, without any
dependencies.

Also, by "wrong abstraction", the author is arguing for abstractions to be
backed by reasoning, not circumstances a coder might find themselves within
code. Duplicate code is not itself reasoning, but a circumstance. Though it
can be abstracted away, elimination cannot be the only reason.

This also highlights why more languages need features to help the coder who
finds themselves in these situation. Circumstance is a context, and is driven
by dependency. Again, dependency is the enemy. We need tools and code that
fight it, not cause or encourage it.

------
auggierose
To know when to go for an abstraction, and when just to live with the code as
is for the time being, that's arguably the difference between a good and a
great programmer. And even great ones get it wrong from time to time.

------
swsieber
This fits in very well with the concept write code that is easy to delete:
[http://programmingisterrible.com/post/139222674273/write-
cod...](http://programmingisterrible.com/post/139222674273/write-code-that-is-
easy-to-delete-not-easy-to)

And here's the link to that discussion earlier this year:
[https://news.ycombinator.com/item?id=11093733](https://news.ycombinator.com/item?id=11093733)

Edit: fixed the typo where I said wife instead of write.

~~~
gabesullice
Kids these days. Back in my day, wife code was forever /s ;)

~~~
swsieber
Ha ha yeah, I came back to look and saw that interesting typo. Whoops.

------
goodoldboys
Abstraction isn't the enemy, it's programmer X not recognizing that the
abstraction has now become a square peg into a round hole.

When requirements change, it's usually a ripe opportunity for refactoring.

------
z3t4
The problem is that we like to take the path of least resistance. So we add a
parameter to the function instead of splitting it into two different
functions. Saving one hour now, that is much better spent on something else.
But might waste thousands of man-hours in the future. From a business side
though, you might not afford to the extra man-hour right now, but maybe if the
product is successful you will afford the extra thousand man-hours in the
future.

------
Bahamut
You can always abstract away duplicated code after - what you cannot do as
easily is fix a bad abstraction. It leads to a world of pain.

------
nradov
This seems like more of a work around for limited languages which lack the
features for constructing certain abstractions. Work arounds are often a good
engineering practice, but let's recognize them for what they are.

------
vbezhenar
If the code is duplicated then there's some abstraction, otherwise the code
wouldn't be duplicated. If abstraction is wrong, just refactor the code,
inline old functions, extract new abstraction.

------
bebop
Sort of a tangent, but I always liked this quote: "It is better to have 100
functions operate on one data structure than to have 10 functions operate on
10 data structures." \- Alan J. Perlis

------
matchagaucho
Agile principles allow for "wrong abstractions" as technical debt.

The goal is to keep the team moving forward and constantly refactoring.

------
z3t4
You should only solve the same problem once. Avoid coupling and complexity.
Try to make the code easy to understand.

