
In Defense of Copy and Paste - zacharyvoase
http://zacharyvoase.com/2013/02/08/copypasta/
======
toomim
I love copy & paste! I also defended it in this scholarly article:
[http://harmonia.cs.berkeley.edu/papers/toomim-linked-
editing...](http://harmonia.cs.berkeley.edu/papers/toomim-linked-editing.pdf)
with a video: <http://youtu.be/1wo_7MTdWWI>

~~~
pjungwir
No idea why you were downvoted. That paper looks like it has far more
thoughtful things to say than most comments here. :-)

~~~
gus_massa
I didn't downvote it, but I was just going to skip that comment and continue
reading.

The comment is too short, apparently slightly off-topic, looks like self
promotion and give no reason to look to the linked content. But as you said,
this is a good comment!

How I would rewrite this comment (with some parts stolen from the abstract and
some parts just invented):

 _I love copy & paste! I think that the use of programming abstractions like
functions and macros has inherent cognitive costs. A few years ago, as part of
my research we proposed "Linked Editing": A technique for managing duplicated
source code with the help of the text editor. We implemented it in a prototype
editor as a XEmacs extension. More details in this article:
[http://harmonia.cs.berkeley.edu/papers/toomim-linked-
editing...](http://harmonia.cs.berkeley.edu/papers/toomim-linked-editing.pdf)
and we made a video of the editor: <http://youtu.be/1wo_7MTdWWI> ._

P.S.1 I still prefer refactoring to this kind of multiple edition. I use
Racket, so I love functions and macros. But sometime they are very complex to
edit, so perhaps sometimes this multiple edition can be a good idea.

P.S.2 Another possibility is that some users are too eager to downvote. I saw
many good comments downvoted, but usually they bounce later.

------
stcredzero
_> This may come across as a straw man argument_

Big time. The refactoring in this case was ill advised. When things started
getting hairy, it should've been backed out.

Piling too much flexibility in one function is a common mistake. A
justification for copy/paste it does not make.

I worked at a shop with this rule: don't try DRY until you've seen at least
three repetitions. I think this saves one from premature refactoring.

Another way to put it: Refactor when the code speaks to you, that is when need
is evident. Keep the result only if its a significant improvement. Avoid
refactoring only because you are enamored of refactoring. (Or enamored of a
rule.) Goes for any programming technique/tool, really.

~~~
samspot
> don't try DRY until you've seen at least three repetitions

A significant number of bugs I've had to fix over the years were the result of
just two repetitions. Invariably someone updates one of them and not the other
one. Or copies the first one, then updates, then forgets.

This happens over and over and over and over. I'm so sick of it that I want to
punch my coworkers (I don't usually get angry quickly but this has been
wearing on me for a few years now). So I will continue to refactor at the
second copy.

~~~
oinksoft
As Djikstra said, "two or more, use a for."

~~~
stcredzero
That's not entirely relevant in this context. We're talking about two similar
bits of code in entirely different parts of the codebase.

------
bunderbunder
That example under the _When Tools Make It Worse_ section - uggghhhhh. Why
would anyone actually _do_ that? That isn't DRY refactoring, that's cargo cult
refactoring.

DRY is not, was never, and should never be about unnecessarily replacing
clean, well-factored code with @$2!% shared mutable state. The goal is to
normalize your code, not to micro-optimize for keystroke count. No.
Nonononononono. Just no.

~~~
sophacles
Oh man, I've been fighting a similar battle this month. I have to deal with a
codebase which has to talk to a handful of proprietary protocols, which don't
have the best interoperability between implementations. However the all the
protocols look fairly similar at a macro level (general concepts, data
structure layout, etc). So somehow this mutated into a terrible inheritance
tree 3-9 classes deep, with absurd placement of special cases 4 classes
upstream, or special case handling for a specific device in each class all the
way up the inheritance stack, and worse.

So I am trying to convince the cargo cult inheritance people that there are
patterns like strategy and "library of useful functions" to handle a lot of
the code better. But they keep sticking on cargo cult DRY. No matter how many
times I explain to them that we need to organize with repeated code, because
as the future happens, and new versions of "interoperable" code will diverge
in their special case handling and whatnot, we still can't repeat ourselves.

/rant

So yeah... Cargo-cult DRY is just as bad for readability and maintainability
as spaghetti and big balls of mud.

~~~
bunderbunder
Sometimes I really wish there were a way to make violating the Liskov
Substitution Principle throw a compiler error.

~~~
codewright
Learn Haskell.

------
crazygringo
Knowing when to refactor is obviously an art, not a science. And with an
example as trivial as this, it's not really a very "real-world" example.

A lot of the time, you don't even _know_ if two swaths of code are
"coincidentally" identical (don't refactor) or identical in a "deep" way
(refactor), even when the program is yours -- you just don't know how the
program will evolve.

In the absence of additional information, I usually refactor only when I see
_three_ similar code paths, since by that point a project rarely goes back.
Over the years, it's turned out to be a surprisingly good rule of thumb.

~~~
bobwaycott
Same here. Doing the same thing three times is always my rule of thumb for
when to take a look at refactoring, automating, etc.

------
mwcampbell
As I ponder this more, I think it's useful to consider the concepts of
simplicity and complecting as articulated by Rich Hickey in his talk "Simple
Made Easy". As he explains it, to complect is to braid multiple things
together, whereas in a simple system, multiple things are composed. He has
often pointed out that simplicity does not necessarily mean fewer things; as I
understand it, it's not about how many things they are, but how they interact.

In that light, the single flexible tweet list function presented in this post
is indeed problematic because it has a few things braided together: a tweet
list, a profanity filter, and pagination.

So we should be suspicious of repetition, but at the same time avoid
complecting.

------
pjungwir
One rule I try to follow is to avoid refactoring when the shared code is
"coincidental." Perhaps this is another way of expressing what the author says
about business logic.

I've definitely worked on projects where developers created large, unwieldy,
hard-to-grok, buggy abstractions in the name of DRYing code. I'm pretty
aggressive about making code DRY, but simplicity and readability are more
important.

The effort I'll tolerate in pursuit of DRY also varies by language. I've been
doing some Android work lately, and I'm finding that things I would have done
DRY in Ruby require too much added complexity to make DRY in Java.

~~~
Jare
I think that's exactly what the author tries to say. I've also joked
expressing it as "you don't refactor your twins."

I'm curious (and I think it can be a useful sub-topic) what you think made
Java worse for DRYing. The rigid type system? Added verbosity?

~~~
pjungwir
One easy example: no higher-order functions like `map`, `select`, and `inject`
because of no lambdas, so I wind up repeating the same looping code all over
the place.

I took a look at a few "functional programming in Java" libraries, but their
solutions were still pretty verbose, and it didn't seem worth adding a
dependency for a tiny smartphone app.

~~~
pjungwir
EDIT: After thinking about it some more, there are several places where two
methods are almost the same except for one line stuck in the middle. I don't
know how to turn them inside-out in Java without spawning a bunch of tiny
classes, and it's just not worth it. But with Ruby I could turn them into a
single method, throw a yield in the middle, and then call it with a block. So
again it comes down to no lambdas.

(Hmm, this was supposed to be an edit, but somehow I did a reply instead.)

------
BoredAstronaut
Straw man thinks DRY applies to two-line function. Straw man is a straw man.
Also, less code > DRY. In fact, less code -> DRY. If refactoring makes for
more code, not really DRY. More like taking a principle to its illogical
conclusion. Compression is a process of diminishing returns.

Although there are certainly times when a factoring two lines into one line is
better. Like when it's self-documenting, or when those lines otherwise add
noise to part of another function.

Sometimes a new function is not the right approach to avoiding repetition. If
you can't write a function to adhere to DRY, use a macro or equivalent. In
C/etc, macros are wonderful if used well.

~~~
rubbingalcohol
I feel like people in this thread (not just you) are particularly caught up
with an oversimplified example that the author even acknowledged may not have
been the best example within the article itself.

This article sheds light on something I also encountered frequently when I was
doing contracting, and also have to put the brakes on myself when I see I'm
going down a bad road: creating more generalized code is not always better
than creating code that repeats trivial pieces of functionality but
accomplishes distinct tasks.

Part of the difference between "conscious competence" and "unconscious
competence" is innate awareness of places where refactoring or normalization
will actually create technical debt. I found myself thinking "no duh" when I
read the article, but that's only because it was explaining things I was
unconsciously very familiar with.

I think this article would be a great read for less experienced programmers. I
think the examples may have been lacking, but it would be hard to simplify any
application to a point that would make sense to illustrate this issue in a
blog post, so attacking it as a "straw man" is actually a "straw man" in and
of itself if you fail to account for the author's intended purpose by
including the examples. lol

~~~
lmm
I don't think the article serves any purpose. It uses an obvious strawman to
try and argue; this is not going to convince anyone of anything, novice
programmer or no.

------
michaelfeathers
One of the things that people don't get about refactoring is that it is not
just a matter of extracting things or removing duplication. Sometimes you
merge things or re-introduce duplication to get someplace better.

When you look at refactoring examples online, they often make that mistake.
There's a straight arrow toward a "better solution" but without any
backtracking. It's a hobbled view of refactoring.

To bring it home, in the blog example, I think is perfectly fine to remove
duplication in the way listed as "bad", as long as you reintroduce the
duplication when you have a bit of trouble. Much of the time, you're lucky and
you don't.

------
taeric
I really really like this take. Refactoring is usually pitched as something
that is completely orthogonal to solving the actual problem you were given. I
think too many of us (clearly, I'm projecting) are weary of anyone else going
on a refactoring spree because we see it break down things that were just fine
separate. Often with only "warm fuzzies" being the actual gain. The
progression shown in this post is really really good.

~~~
fullreset
Agreed. This is -- or should be -- called "premature refactoring."

~~~
rubbingalcohol
I don't even think it's always premature refactoring. Sometimes it is just
stupid refactoring.

A correlating result to stupid refactoring is the existence of over-
generalized functions that try to do so much that they need an absolute crap-
ton of parameters passed in and still end up locking you down to a limited set
of functionality. Adobe's ColdFusion scripting language (anyone remember that)
used to have functions that would automatically generate huge and specific
pieces of client-side JavaScript functionality. Stuff to the effect of:

    
    
      cfCreateShoppingCartWithPopupSummaryWhenUserHoversOverLink({
        supportsPaypal: true,
        dontShowLinkOnCategoryPage: true,
        doShowLinkOnProducePage: false, ...},
        'myShoppingCartElement',
        ...)
    

Okay maybe I'm embellishing a little bit. But the end result was loading
hundreds of kilobytes of proprietary JavaScript libraries to support these
weird built-in functions that would create very specific bits of client-side
functionality that would then need a billion parameters passed in to allow
remotely useful customization. Maybe it would have been better to just learn
JavaScript instead of being locked in this way.

------
tterrace
I think the first step the author took on the refactoring path was one I
wouldn't take. It breaks the "do one thing" rule and the rest of the post is
the pain that naturally follows from having an over-generalized method that
tries to do too much.

------
Chaseph
These articles are a dime a dozen. This popular philosophy isn't right,
because look at my poorly coded example of it.

You're dry code, is only dry is the laziest of senses, and represents a lousy
programmer cluttering the system. A really lousy implementation of any of
these programming paradigms would make one side look wrong.

In your example, the refactored code would look excellent if it implemented
OOP and the Strategy Pattern. The two different feeds can inherit their
similarities from the same place, and their differences implemented in
separate places. Which feed to produce can be chosen dynamically, rather than
one crappy grab-all function.

------
adrianhoward
Not directly related - but it's something I come across so often when
mentoring newbie devs that I thought I'd mention it in passing just in case
anybody has this problem.

A pattern I sometimes see with newbies who understand the value of DRY is - as
soon as they get to the point when they're about to repeat something or about
to copy and paste - they stop themselves and start refactoring to remove the
duplication they haven't typed into existence yet. They see adding the code
that will produce the duplication as bad / waste.

Don't do that.

It's hard - because the code that they've not typed or copy/pasted doesn't
exist or work yet. It's still in their head.

Make the duplication explicit first.

Type it out. Copy and paste. Change those two branches so they have exactly
the same structure.

When you've done that - and everything is working and all tests pass - then
refactor the heck out of it.

Much simpler, faster and less error prone.

~~~
ufo
I don't know if I agree 100%. Sometimes the extra abstraction helps you solve
the problem in the first place.

In my experience the OP is right. The worse problem is when you eliminate
duplications in the wrong place or using the wrong abstraction, leading to
brittle abstractions that break in the future.

~~~
adrianhoward
Yeah - there are probably exceptions ;-)

However what I've seen happen on multiple occasions is somebody merrily
driving along churning out code then suddenly hitting the "ohhh - duplication
is bad" wall and halting as they feel their way around the duplication and
abstract they may need (or may not - since they've not written the code yet).

Duplication is not a mortal sin. Having it sit there for a few hours while you
work out the meat of the problem isn't going to kill any kittens.

And often the easiest way to fully grok the abstraction that you need is to
make the duplication really, really obvious.

------
Chris_Newton
I expect most of us would agree that a single function should ideally have
exactly one main job and do it well.

Two functions are really doing the same job, and should probably therefore be
combined into a single function, not when their behaviour _is_ the same but
when it _should always be_ the same. As the article suggests, that
determination is generally more about the software design or domain model than
the mechanics of the current implementations.

Having said that, there is also a middle ground: create some sort of
utility/helper function(s) to contain the code that _is_ the same,
coincidentally or otherwise, and then rewrite the two higher-level functions
in terms of common helpers for now. If those higher-level functions need to
diverge for good reasons later, at least it will be an active decision to
separate the behaviours.

IME that sort of breakdown is unlikely to be beneficial with very short
functions such as the examples here. There’s not enough commonality to justify
the overheads of breaking everything up. However, in more realistic code, if
you’ve got, say, 80% common operations between multiple cases, there are often
some underlying concepts that can be extracted into their own functions. Those
then become informatively named building blocks for the original functions.

Put another way, you might not want to consolidate the functions’ _interfaces_
if they serve logically distinct purposes, but you can still consolidate some
of their _implementation_ details.

------
jbrains
I see a lot of comments here of the type "You have to know when to refactor".
I don't do it this way. Instead, I rely on a willingness to undo a refactoring
when I see that something else might work better -- and even to undo _that_
when I decide that I've got that wrong.

I have no problem extracting as in "WHEN REFACTORING GOES BAD" -- although I
might wait for a third copy because removing the duplication -- because I want
to see whether a useful abstraction would emerge. On the other hand, as soon
as I recognise that one of those copies wants to change in a way that the
other does not, I'd simply inline the method and let them diverge. I don't
consider this a problem.

It seems as though some programmers believe that, one they extract something,
it needs to remain extracted. No. It's only "cargo cult refactoring" if you
stop thinking.

Most importantly, refactoring is experimentation. It's a kind of Mechanical
Turk-based genetic programming-oriented style of designing, except that you
have heuristics you can follow. That means that you'll go down the wrong path.
THAT'S OK! as long as you allow yourself to backtrack. Remember: refactorings
are small, _reversible_ design changes. That means not just that one _can_
undo them, but that one is _willing_ to undo them.

------
danso
OK, I'm obviously missing something, and part of the problem is that I'm not a
Python programmer so my brain is obviously in "skim-mode".

Couldn't the problematic DRY pattern be alleviated by refactoring the
following call:

    
    
        filter_profanity = kwargs.pop('filter_profanity')
        tweets = Tweet.objects.filter(**kwargs)
        if filter_profanity:
            tweets = itertools.ifilter(lambda t: not t.is_profane(), tweets)
        return render(request, template, {'tweets': tweets})
    
    

Into something like:

    
    
        def tweet_list(request, **kwargs)
           ...
    
           tweets = get_filtered_tweets(kwargs)
           ...
    
        def get_filtered_tweets(**args)
           filter_profanity = args.pop('filter_profanity')
           if filter_profanity 
              etc....
            end
           return tweets
        end
    
    

Why does the logic for the Tweet filtering have to be encapsulated in the
rendering function?

// edit:

What might help is if the OP showed how the non-refactored code would look
with the profanity_filter and pagination features. I agree that his refactored
proposal is confusing...I'm just having a hard time imagining how the non-
refactored version would be less so.

~~~
drostie
I believe they intended the 'proper' code to look like:

    
    
        def global_feed(request):
            tweets = Tweet.objects.all()
            tweets = itertools.ifilter(lambda t: not t.is_profane(), tweets)
            return render(request, 'global_feed.html', {'tweets': tweets})
    
        tweets_per_page = 20
        def user_timeline(request, username):
            tweets = Tweet.objects.filter(user__username=username)
            page = request.GET.get('page', 1)
            offset = (page - 1) * tweets_per_page
            tweets = tweets[offset:offset + tweets_per_page]
            return render(request, 'user_timeline.html', {'tweets': tweets})

~~~
danso
So the problem with that is seems to be...what if there was another view that
required pagination?

So the two unpleasant scenarios seem to be this:

1) The OP's assertion that DRYing the code may unintentionally break
functionality in all the places that use it.

2) The DRY assertion: copy-pasting functionality, such as pagination, makes it
more likely that the pagination functionality won't be properly updated across
all the modules that use it.

I guess it's a case of YMMV...because in this hypothetical app, it doesn't
seem likely that the number of views will multiply, thus making it easier to
update the copy-pasted code. But that seems like a mindset as prone to future
problems than one that is more DRY-minded.

------
sha90
My only real concern with this essay is that the OP bothered to refactor out
the duplication, but didn't bother to refactor his internal refactoring when
it got too complicated, instead claiming: "look, now it got messy", threw up
his arms, and said there's nothing more that can be done, blaming DRY as the
culprit.

Except we CAN do something about it.

It would have been just as easy to continue refactoring the tweet_list()
method to pull filtering, pagination, and profanity checking out into sub
methods-- at which point you've built a strong reusable component that can
support many more combinations of those extra requirements. So by the time you
get more feedback saying, "we need a new page that only shows 5 tweets per
page and hides profanity, but does not filter", you can now easily take that
reusable component, pass in those options and be done rather than starting
from the top because you refused to clean up your internals. That's why we
strive for reusable components in the first place.

In other words, if the argument is that refactored code is messy, it really
means you aren't done refactoring.

------
njharman
Refactoring / DRY to me is not about creating monolithic, generic do anything
functions. It decomposing code into layers of abstraction somewhat like
mini-"DSL"s. The top level functions are tying together next level "down"
helper functions. Which may themselves be higher level tools over something
like DB api. More than 3-4 layers is probably a smell.

------
mwcampbell
In defense of the single flexible function, I think the hypothetical business
requirements are pathological. Or perhaps the hypothetical developer is taking
a pathologically literal interpretation of them. Who would want pagination in
one view but not another? As for the profanity filter, that should probably be
a preference of the currently logged-in user which is applied to all feeds
which that user views. (It should probably be enabled when an anonymous user
is viewing any feed.)

I suppose some developers don't have the freedom of suggesting alternative
specified behavior that is nicer to implement. In some cases I have not had
that freedom. But in this hypothetical case, when pressed, the person setting
the requirements ought to value consistency.

My own experience has been that I tend to do copy-and-paste because it's
easier, but then regret it later. I don't think I've yet erred too far on the
side of trying to follow the DRY principle.

------
daly
I did a self-study on a project that lasted several months. I wrote down
everything, including mis-typed characters, grammar errors, syntax errors, and
semantic errors. I did a root cause analysis of the result. One general result
is that I have a 3% error rate, regardless of activity. In fact I find that
the delete key is by far the most important key on the keyboard, representing
3% of all of the characters I type.

One observation is the fully 1/2 of ALL the programming errors I made were due
to copy/paste. Your mileage may vary but I doubt it.

Copy/paste is evil but it is so "low level", like the delete key, that you
probably don't even think about it.

You may find it worthwhile to do a deep analysis of your personal error rate
on some project. It is very enlightening. In fact, we ought to fund studies so
we can get industry wide statistics.

------
einhverfr
The thing is:

"If you are using copy and paste while coding you are probably committing a
design error" doesn't conflict at all with what he says. The fact is that copy
and paste is the point when one looks and says "is refactoring appropriate
here?"

One thing I would point out is that premature optimization is the root of all
evil. You can get a pretty good sense that if your refactor adds more lines
than it deletes and functionality remains the same, that you have added
complexity in refactoring which means very likely that you are doing it wrong.
This is particularly true if you can't say it is reducing the number of lines
of code generally, or compartmentalizing state changes.

(This leaves aside the fact that the most pernicious use of copy and paste in
the world is "sample code.")

------
darkchasma
If your code is starting to look ugly, refactor it. If your refactoring is
looking ugly, stop, you're doing it wrong. If a test breaks because of your
refactoring, stop, you're doing it wrong. I call this the Don't be Stupid
principal.

~~~
lanstein
principle

~~~
darkchasma
Lol. Do I look less stupid if I say it's my principal principle?

------
jiggy2011
Isn't this what functional programming is for?

You have several pieces of code that follow a very similar structure and logic
but perform very different purposes for the program. So you try and generalise
the structure of the code?

------
readme
Refactor if doing so will give you an advantage.

Copy paste when you aren't sure if the requirements will change. Nothing is
worse than building an abstraction only to find out it's useless given this
new project requirement and that the two abstractions should really be
separate.

------
darec1
I don't remember where I read it, but it's good advice:

Copy the first time, only start refactoring if you need the code a third time.

