Hacker News new | past | comments | ask | show | jobs | submit login
Why do so many developers get DRY wrong? (changelog.com)
137 points by jerodsanto 8 days ago | hide | past | web | favorite | 150 comments

Eh. Something I find pro devs do is just code the damn thing out quickly and wait for the right abstraction to emerge before stuffing it blindly into a function. If that means a bit of repetition, fine. If you push everything into tiny little methods or functions or abstract them into their own objects the first time you come across a couple of repeated lines of code then the clearer and better solution may not emerge as the requirements start to change. On the other hand, self documenting code is most easily done via method naming.

This type of topic is hard to talk about. It's so nuanced that saying a statement about how to do it sounds like a a gutless generality. It also depends on the programming language and lifetime of the project. I've banged out some real ugly code when servers were on fire, but it was all stuff that was destined for an early death.

Documenting code via single-caller functions is usually a mistake, because to everyone else who looks, the set of functions is an API.

Both internal and external APIs must be kept coherent.

Also when readers are trying to understand exactly how some function changes the system state, having to refer to numerous other functions it calls is tedious.

Spreading state mutation out across the system is almost always a bad idea. I would categorize that person as doing DRY badly.

Coalescing state transitions should trump decomposition. But it’s often the case that you can do both at once.

This is wise. Always try to keep your state change points concentrated and legible.

That second paragraph should be on the front page of the Redux website.

Can anyone ELI5 the first sentence in each of the two above paragraphs please?

I can ELI20 easily enough. 5 is a bit tougher.

Don’t make me look under every rock to figure out why you changed the stuff I gave you.

We aren’t built to deal with chaos. When you put stuff in one end and something random comes out the other end, you have no idea at all what’s going on (until you do, and then they try to put you in charge).

You want heaps of code that looks but doesn’t touch. Push the changes to the edges where they are easy to see. If I can trust that things in the middle aren’t mucking around with things all the time then lots of little methods don’t hurt my ability to think about what is happening.

(This is not quite the point of Hexagonal Architecture, but they have some common ground. See also early Angular’s philosophy of cooking all input data immediately and passing it around cooked.)

Which is indeed one of the points of Redux.

The more of your app you can write as pure functions, the more of it is predictable and easily testable, so you don't have to worry about how it's going to behave.

That's part of why we now specifically recommend writing as much of your logic as possible in reducers:


I think as long as the mutators are cohesive, the design is more coherent to the users.

In a purely pigeonhole principle sense, more pure code crowds out impure. But if you can put all of the impure code into one or two phases of an interaction, you’re improving things without really changing the ratios at all.

We often get sucked into analogs of the things we really need and I feel like this is one of them. You want the state changes to be comprehended. Fewer of them might help a lot or make things worse (legibility or performance). I believe it’s another quality over quantity situation.

> Also when readers are trying to understand exactly how some function changes the system state

Yep. That's why changing system state is better kept at a minimum, and at the highest layer possible.

Most languages have a very clear public/private marker for determining what is an API and what is just there for convenience (it doesn't have to be literally public/private, for some it's scope, others just tag the names, etc). You don't need to change a code's style just because of it.

On the other hand, if the functions do not have logical and atomic meanings, they will do make your code as hard to understand as it will be to name them.

> Also when readers are trying to understand exactly how some function changes the system state, having to refer to numerous other functions it calls is tedious.

This also applies for small inlined common helper functions. If it folds two or three operations into one, but doesn't have an obvious universally recognizable name, it would often be way clearer to a reader of the code to just put the few operations in directly.

Because we write code for other PEOPLE, not the compiler.

If it is inlined it is usually the same thing to the compiler either way in my example.

> Documenting code via single-caller functions is usually a mistake, because to everyone else who looks, the set of functions is an API.

That's less the case if the functions are local (either not exported from the module they are included in or actually function-local, in languages supporting nested functions, to their caller.

> Both internal and external APIs must be kept coherent.

If it's not external, it's not an application programming interface. And, even ignoring that, there is no definition of coherent for which that approximates truth that is inconsistent with single use functions for code clarity and organization.

Even languages that don't have mechanisms to make methods/functions truly local/private tend to have conventions for indicating functions/methods not part of the intended-as-public API of a class or module.

> Also when readers are trying to understand exactly how some function changes the system state, having to refer to numerous other functions it calls is tedious.

Conversely, I find the code having functions with descriptive names helps me not have to read through a bunch of irrelevant code when I'm doing that, which helps me get to the part I need to understand whatever I'm trying to understand faster and with less distraction. Yes, if it's done badly it's problematic, but that's true of literally everything.


> Also when readers are trying to understand exactly how some function changes the system state

Yes, if you are doing relatively unconstrained imperative code that's willy-nilly modifying state, breaking that up into functions passing mutable state around is likely to make it even more incomprehensible. Decomposing into units that are externally-pure functions (that is, while they may do local mutations, they don't modify any state received from the caller), though, is not problematic. In general, except for subsystems dedicated to managing mutable state (and where this function is kept as constrained as possible), I prefer coding in externally-pure functions in general. When you start with that and use it as a constraint on decomposition, decomposition can no longer serve to obscure state modifications. Indeed, it clarifies and more narrowly isolates them.

> If it's not external, it's not an application programming interface

Splitting hairs, I think. Even if not intended, someone else may assume your "documentation" function was meant to be generally accessible (i.e. an API), and call it that way.

> Even if not intended, someone else may assume your "documentation" function was meant to be generally accessible (i.e. an API), and call it that way.

And then, suddenly, instead of a single use function that serves mainly for documentation and code organization, it js a locally resuable and locally reused piece of functionality. Which is a problem, ...why, exactly?

The only difference between a well-designed single-user function and a well-designed locally reused function is that the single, clear, and unit tested purpose of the single-use function isn't needed in more than one place when it is written. If that changes, why shouldn't it be reused?

That's what I'd call accidental API design. It might work, or it might lead to perdition :-)

Refactor with a new internal and coherent API once it's clear that's needed.

This is such a good, succinct way of putting it. One of the worst code-bases I've ever had to work with was largely so because of so many single-caller functions and far too granular abstraction.

It seems that's how a few major libraries are organized, sometimes one function per file.

Since this comment finds a lot of agreement I'm going to play the devils advocate. Private function for the sake of documentation are fine. At least if they are all on the same level of abstraction. If you've ever worked in a team where everyone works this way you stop thinking in APIs, which can be a good thing for internal code.

Ideally the function's name reveals what it's doing. Only if you are tracing down a problem you are required to look into the details.

This actually helps to speed up navigating code, because parts of it get meaningful names.

"Only if you are tracing down a problem you are required to look into the details."

This is the common justification, and it's misguided. It turns out that you need to be aware of the "details" every time you look at the code.

If, by looking at the code enough, you memorize the "details," it's tempting to move them to a new function with a clever name that tickles your memory. That won't help anyone else.

Use comments to introduce blocks of code that need explaining. Use functions in coherent APIs.

> This is the common justification, and it's misguided. It turns out that you need to be aware of the "details" every time you look at the code.

No, I don't. I need to be aware of the relevant details, but code in functions with meaningful names mean (1) I can skim a high level overview to see where the details relative to my current interest we likely to be faster and, (2).I can zoom in without distraction to those more easily.

> Use comments to introduce blocks of code that need explaining.

Comments make a wall of code that is already hard to get an overview of because of its size less legible, decomposition did the opposite.

Disagree. Comments can add "section headers", while keeping the code flow linear, so I can skim it just like I skim an article. With methods, I have to jump backwards and forwards and have to remember my place as well as think about the code. This introduces unnecessary overhead.

> Comments can add "section headers", while keeping the code flow linear, so I can skim it just like I skim an article.

Yes, and decomposition provides an outline, which is faster to skim than an article. There's a reason tables of contents are a thing; they let you find wheat you care about much faster than linear text with section headers.

With code, they also have the advantage of allowing you to make use of step-over/step-into debugging.

I’m with you 100%, but I suspect that we’re a minority. For me, linearity of code is directly proportional to legibility of code. It’s not clear to me whether or not this is universally true, or whether this is an individual preference.

Minority, maybe not. There's 20+ upvotes on my original comment re single-caller functions -- much more than replies contradicting it.

> Documenting code via single-caller functions is usually a mistake, because to everyone else who looks, the set of functions is an API.

That's a really good way of putting it.

I try to avoid it by not writing new functions, but gladly using existing ones, in my implementation of whatever single new one.

For example if I'm writing a find_and_update_foobar function, I'll use find_foobar if it exists, and with the right signature, but I won't write it just to implement the one I actually care about; ditto update_foobar.

But, professionally I've mainly only used python; so I find it still deteriorates into a mess. (I type hint extensively, but still it only takes some missing hints, or something too loosely - or wrongly - typed.)

I haven't used rust professionally/enough/on something large enough to be sure, but my feeling is that it just having a type checker prevents so much mis-refactoring.

But then what happens to "Your function should be doing one thing and one thing only." (The thing indicated by its name, considering the current abstraction level.)?

Perhaps it is not always a bad idea to decompose that function into 2 properly named "steps", which reside in their own functions.

Interestingly, Rust can be quite opinionated abour code organisation (it makes you keep things in bigger chunks), because some of the compiler analysis doesn't work through function boundaries.

I think this one is bullshit, can anyone else comment on this one?

Are there things you may only do iff you using a single function body?

A key one is partial borrows of structs (i.e. borrowing only 1 field, and then later borrowing a different one).

Pascal has the solution for that. Nested functions:

procedure foobar;

   procedure helper1;

   procedure helper2;
     procedure subhelper2;

   procedure helper3;
begin end;

Only the foobar function can access the helper functions.

Pascal is hardly unique in that; many languages have nested functions, and every language that has first class functions and local variables/constants has the equivalent of nested functions (potentially with even narrower scope than a whole function) even if it doesn't have a specialty nested function declaration syntax.

Since nested functions don't create new API, amen.

I found some simple rules for single caller functions:

1. Use them if they take <5 parametets - no point making a function if you need to pass all the local context into it

2. Use them iff they return something that can be clearly named and have an explainable and understandable TYPE, even in dynamic langs where you don't explictly type things - if you'd end up returning a tuple of 3 arbitrary things that can't be "named as one single concept" it's a code smell

3. Kepp those functions local, maybe define them inside the calling function, at least private if they're methods, anything - if your single caller function becomes unwillingly part of an api, which can esily happen in Python with a _funk() method that users could ignore it's private, you've lost

> Documenting code via single-caller functions is usually a mistake, because to everyone else who looks, the set of functions is an API.

Not when you work on a team that does this as a practice.

Many teams don't manage to have a common code style, which is a big problem in itself.

> Also when readers are trying to understand exactly how some function changes the system state, having to refer to numerous other functions it calls is tedious.

The point of breaking out the small functions is to name them so understanding the code becomes easier. This takes some skill and thought, of course, and if done thoughtlessly, it will not be good. But that's true of every practice...

It especially doesn't work in a team, because under maintenance + employee turnover, these little single-purpose functions start growing hairs, and they end up doing more than they're advertised to by name. Next thing you know, you're fetching the same thing from the database in three different functions, or recalculating the same value repeatedly.

Overly granular functions constructed to name blocks of code are a firm anti-pattern in my book.

Functions overbearing their purpose can be done in large functions too, arguably even easier because it’s harder to express the precise purpose in the function‘s name. And the turnover argument applies to any code convention, which makes it a management issue.

Sounds like "team" to you means "dysfunction".

I'm sorry you've been in these teams. There are good teams out there!

Over a multi-year or multi-decade time horizon, it just deteriorates.

The oldest codebases I've worked on were 30 years old; the one at my current company is about 10 years old. People come and go, refactorings are started and stopped, bugs are fixed on a tight schedule, and the structure of code divided into nominal blocks slowly dissolves.

You can’t make this absolute statement. In functional languages it is actively encouraged to use small, pure and composable functions, as I shown in my previous comment.

The rule of threes. It is VERY hard to nail an abstraction until you see slightly different implementations of it at least three times.

Also, I think we kind of said goodbye to DRY after we decided that microservices is The Way. How many microservices reinvent the same goddamn wheel?

Which is why we template, build dsls and generic support libraries, and generally automate the hell out of microservice boilerplate. (Usually after a lot of slogging).

Slogging. Exactly. The up front effort required to get microservices right is absolutely massive, but somehow it has become the "easy" way to solve all your problems. Distributed systems were and should be a problem to be avoided at all costs, and only then tackled by those few who know what they are doing.

I have been experimenting with code reuse via repository forking. Not much to show for it so far, unfortunately.

Some things overlap, but I think your comment is more about KISS. Early abstractions don't fit KISS. And I believe banging out ugly code can still fit KISS because it does the job.

DRY falls into a different category. For example if you use `.toFixed(2)` to format floats everywhere it is better to abstract it to a function. And I believe this is what the article is about.

Formatting a float is a kind of logic or knowledge about how you present values in the view. You should not repeat this knowledge because if it is later decided that the view must show 3 decimals you are in trouble.

Maybe DRY can be easily explained by: don't repeat (business) logic.


One problem with code that is destined for an early death: It usually lives (much) longer than originally expected.

I think this is a human reaction to their environment.

1. Write code 2. Cleanup the mess 3. Write more code 4. Realise there a better way. 5. Stick with it because effort to change < benefit gained.

In Refactoring Kent Beck doesn’t just show extracting variables, functions and classes but also re-inlining them.

The problem is at scale, re-inlining is hard. If it were easier to do, it might be done more.

I find that waiting for N repetitions of some abstractable code is a heuristic to mediate between the cost and benefit of refactoring.

I don’t have an answer for improving it. Perhaps making easier to automatically manipulate the code in this manner?

I’ve thought for a while that it would be interesting to treat a code base as a database. Applying documented migrations to it which describe the code transformations occurring (and how to reverse them). Then changing these aspects of code would be trivial.

But then your meta program, the set of transformations which generate your current code base, would probably suffer the same problems as your codebase.

I instead find that when using functional languages the code begs you to go in a small, tiny function. It makes the intent much more clear about what that code is doing and 99% of the time you don’t need to know how is doing it. Obviously if that code is used only once then it doesn’t make sense to extract it, but I find out that small pure functions are really composable, powerful and reusable. Maybe the best example is the pipe operator definition in F#:

  let (|>) x f = f x
It’s extremely simple, extremely powerful and it’s used pretty much everywhere.

What may help is pondering about object encapsulation, composition and usage patterns. Ie. either be the domain expert, or develop into one by building the domain.

In order to do this, recognize iterations are key and that code must both work today and allow for future evolution.

With clearer goals, one might save time and effort refraining from premature optimization and overzealous refactoring. When direction is unclear, you can comfortably lean on established accomplishments while avoiding regressions.

DRY and YAGNI may thus take a backseat to incrementally deductive design.

That's a good strategy – but you have to be confident enough that when the right abstraction emerges, you will be able to squeeze necessary refactoring in between of other tasks that need to be done. It's ready to do when you're a team lead or a solo developer, but when someone else has technical leadership, you won't be able to do this without their support.

So true. I'd almost consider it a kind of premature optimization.

Trying to nail down DRY abstractions not too early but not too late is often tricky but concern over it probably puts you in good company.

Functional programming via composition solves this problem by making your functions composable and easily decomposable. The right abstraction under this methodology is mostly correct every time and if not the abstraction is easily broken apart and recomposed with a modification due to the fact that functions can be decomposed.

I find DRY fascinating.

It's the source of a large portion of the accidental complexity I find in code. "If I just create this abstraction, all this duplicated code goes away" - we've all heard it and many of us have told it, but few of us realise that it's the prequel to the most popular story of all: "all this code is such a mess, there are all these extra layers that don't really make sense and unpicking it is such a pain, I can't believe someone wrote this".

The story inbetween is about a young, inexperienced developer who has 3-days to deliver the one-feature-to-rule-them-all, to appease the almighty project manager, necessitating an adventure into the labyrinth carefully crafted by the developer in the first story.

DRY increases "connections" in the code (often called "coupling"). Pay attention to what you are connecting.

The dominant example of this in my mind comes from some traumatic (and dramatic) work experiences involving web scrapers.

You scrape pages A and B. Both require logins. You notice that A and B use similar code to login, so you factor it out and now you have "connected" A and B. Or, more accurately, the common "login" method is connected to A and B.

The problem is that A and B are separate ships, and they are going to different places, and you tied a rope between them. That rope is going stretch and fray and break. There's no reason logging in to A and B should be similar, it just happened to be that way at one point it time. One day, probably, those two sites change their login workflows to be completely different.

So "just repeat yourself" and let each scraper be self contained. Let each ship sail their own way, don't connect them.

One case of this isn't bad, but if your not careful you end up creating dozens of connections between things going all different directions. Break your project into "things" and graph their dependencies. If you can break X by changing Y, then X depends on Y. If you can break Y by changing X, then Y depends on X. Your dependency graph should look more like a tree than a total graph.

If things needs to evolve to separate direction, you can duplicate them.

Unreasonable X is unreasonable, that's true no matter X but that's not very insightful. Unreasonable factorizing is unreasonable. So are unreasonable copy pasts.

Now a good question can be: would I "prefer" one or the other gone too far? That's merely a personal preference question, and on my side I've got a clear answer: I prefer something a little bit too abstract, to something a little bit too copy pasted. Because while abstract I still can understand more of the system more quickly, whereas duplicated I basically have to start by reading a lot more and factorizing (possibly in my head) before getting to a balanced situation...

Your own preference may vary, but I've yet to see a system maintained correctly by people copy pasting too much and not caring about anything but the one example they have to handle (because e.g. of a workflow described in a ticket).

Things can be accidentally similar, but WAY more often in copy pasted code-bases they are accidentally diverging, and that multiply the time I spend on that kind of mess by a factor that is probably near 10.

Problem with abstractions is, once they exist, people assume they are meaningful.

That is because proving that it isn't is a major task that involves dissecting the codebase, its current specs and planned future features (a seemingly pointless abstraction now can be the groundwork for a new module that will appear soon[0]). Thus the asymmetry between adding and removing abstractions.

[0] where soon either means really soon because product management needs it yesterday or never because there is too much work to do in other places.

Surprised no one linked Simple Made Easy[0] yet.

Around 30 min, Rich Hickey describes how the opposite of simple is complex, and mentions the etymology, "to braid together".

"It's bad. Don't do it."

0. https://www.infoq.com/presentations/Simple-Made-Easy/

Of course, but it could also be said that we're just composing the scrapers with a common building block, which Hickey says it's good.

As a counter, you have 10 scrapers with the same login code. You find one site has issues when they deploy occasionally and go and add retry logic to it, then next week find the same on another site. You go and copy paste it over all 10 or leave it. You find later the retry causes rate limiting problems so you again have to fix it in 10 places, or maybe someone fixes it in one and not the others because the implementations have drifted despite solving the same problem.

Or, you use the same code for all and if the sites change the login flow then you add a custom one for the site which is different.

Fundamentally, I don't agree with the argument that I should copy and paste code now to avoid the possibility of copying and pasting the same code later.

It would seem like retry is working at a different level of abstraction than logging in and that code should not live together. As such, there should be the opportunity to inject a role player that understands what it means to retry.

The base case of retrying is `zero retries` (or `one try`).

> As such, there should be the opportunity to inject a role player that understands what it means to retry.

In each of the 10 scrapers?

If it's common code for the retry, now all my login code needs to raise the same type of exception or otherwise signal in the same way that the login has failed. Except instead of the abstracted retry code needing to work with one login function it needs to work with ten independently maintained functions that happen to be the same currently.

Retrying and stuff like that could either be handled with inversion-of-control which requires building a set of interfaces that provide stuff like 'login()' which is made available to some kind of orchestrator that handles retrying and other metalogic. This requires understanding some new domain specific abstraction really well or providing ample "escape hatches" in the even the abstraction doesn't work for all cases

Or, in my opinion, handling it in the typical way functional programming does, you'd have your stateful computations like login represented as functions returning IO, which you can easily use off the shelf functionality to rate limit and retry (like cats effect, fs2, etc... In scala). This kind of programming isn't as mainstream as it could be, but if you can build retry once and use it for pretty much any side effecting computation, you wouldn't feel a need to DRY up things that should be separate in an attempt to share code.

That's right. DRY, at the limit, is compression; in some ways, it's maximally coupled, such that small changes have large effects. The quality of the abstraction is what is levered up.

A mutable variable causes even more "connections" as every function that touches that variable is tied to it.

If you construct your program via the point free style using function composition, the dependent function can easily be swapped out of the composition and replaced with the modified function that you need while maintaining DRY to the maximum possible effect.

You are talking about a fundamental question of program design that is largely solved by functional programming via the point free style.

It is talked about here:


> A mute-able variable

This would be a variable that can be muted. More commonly, when people refer to "mutable" variables, they mean variables that can be mutated. It's not clear what muting a variable would mean.

Please don't be unnecessarily pedantic on HN. This is covered by the site guideline:

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize."


Sometimes a human makes a spelling error by accidentally placing a hyphen in the middle of a word.

From context, hopefully other humans will recognize the slight error and comment on the topic rather than walk away totally befuddled and comment only on the error.

Do you not understand what I am saying in the comment? It's too late for me to edit the comment, but I hope the meaning is clear despite the slight error.

I've fixed the typo ('mute-able' -> 'mutable') in your comment above.

To be fair, it is pretty easy to fork a shared component when necessary, but it is painful to maintain multiple components with duplicate logic. So it is reasonable to err on the side of DRY when in doubt.

naively I'd think you can simplify them by having a single interface but with or without sharing internals (as you see fit)

but in a way that just pushes the problem to coupled interface signatures..

now I wonder, is there research in variability based interface design ? looking at systems and estimating how much some parts can change and how much are probably easy to pin down forever ?

It's the source of a large portion of the accidental complexity I find in code. "If I just create this abstraction, all this duplicated code goes away"

For me it's usually, "Oh crap that thing I changed I had to change here and here too, whoops its good now... Wait no I also had to change it here... and here... now that we're done with that we should be fine... DAMMIT!"

The problem is that DRY is not the thing that solves this problem.

It looks like it is, you'll see plenty of people claiming it is, it may even lead people into solving the problem after they gather some experience. But it's not what solves the problem.

What does solve the problem?

My take on it would be the approach from SICP[0]: separate data (objects) from transformations (streams). On a personal note I would also like to add:

- keep data serializable

- cluster transformations accordingly (Keep things that belong to each other as close as possible to each other, in a literal meaning, same file/module/etc. A new coworker should have the feeling of entering a public library - she'll know where to look for.)

- it will probably never be the case that you have to invent a new data structure or algorithm

Developers understand such terms like 'function' in very different ways (see top comment in this thread). Some have a more abstract approach and understand it as 'pure', where others have a more instruction-bundeling approach and understand it as 'procedure'. Either is valid, but are completely orthogonal to each other when it comes to its usage. I understand Gary Bernhards approach to 'functional core, imperative shell'[1] as a mean to talk about this - instructions demand for composable building blocks, and those building blocks come either from:

- built-in functions (with an already universally understood API)

- or your own (clustered) utils (and demand an easily understandable API).

[0]: https://mitpress.mit.edu/sites/default/files/sicp/index.html

[1]: https://www.destroyallsoftware.com/screencasts/catalog/funct...

I don't see how this solves the problem. I can have pure functions for parsing the HTML of each site: they take the text and return a simple data structure with the information I need. Chances are, if you write one and then another, you'll find there's plenty of duplicated code. Should you try to decompose those parts into common (pure) functions or not?

I assume your API can be expressed as this function signature:

  parseDataFromHTMLResource :: Resource -> HTMLRaw -> Maybe Data
...and assuming we validate the HTML:

  validateHTML :: HTMLRaw -> Maybe HTML
I think your question relies on the separation of 'Resource', as a domain separation like right below will probably create a lot of duplication (since any resource is free to structure their HTML within the standard however they want):

  parseDataFromGoogleHTML :: HTML -> Maybe Data
  parseDataFromGithubHTML :: HTML -> Maybe Data
Perhaps you can reduce the duplication by converging to different fingerprinted resources. So a HTML resource, fingerprinted by style, will then guarantee your data:

  data HTMLFingerprinted =
    | HTMLStyleGithub
    | ...
  htmlFingerprintedFromHTML :: HTML -> Maybe HTMLFingerprinted -- So the Maybe will be here
  parseDataFromHTMLFingerprinted :: HTMLFingerprinted -> Data  -- ...not here
(Note that this may be what the parent means. It helps solving "For me it's usually, "Oh crap that thing I changed I had to change here and here too, whoops its good now... Wait no I also had to change it here... and here... now that we're done with that we should be fine... DAMMIT!"" with "I just create this abstraction".)

I would say if you're able to implement the function 'htmlFingerprintedFromHTML', you deserve the abstraction and thus DRY the codebase on the fly. And this "no pain no gain" mantra is what I personally really like on a 'functional core'. Code stays only duplicated where absolutely needed until you find a solution for the abstraction.

...and I guess implementations of fingerprinting HTML may vary a lot.

Yeah, I have no idea. Code that avoids this problem is mostly DRY if you look after the fact, but going from the DRY principle to code does very often lead to the wrong abstractions (that won't solve the problem at all) and too little repetition (that will make your code break on the opposite way: you change one piece and unrelated stuff breaks).

DRY is a good learning tool, and it is an after the fact property of good code. But people shouldn't ever preach it.

A lot of principles like DRY (as described out of correct context in this article) have cult like followings people follow mindlessly leading to unnecessary introduced complexity.

I'm always amazed at how eager people are to over-engineer a solution that makes it a mess to deal with moving forward. Developers at large like to appear clever, tend to have (fragile) large egos, and don't seem to want to veer from established dogma--much of which based on little evidence or evidence that doesnt apply to a case they're dealing with.

A friend at work (years ago) introduced me to WET: Write Everything Twice, a cheeky response to DRY enthusiasts. It falls just short of the Rule of Threes (as soon as you write it a third time, refactor it out).

I think this article does have something going for it. DRY should be about knowledge. Don't repeat yourself by handling tax rates all over, get that into a central place. This is not about turning similar looking blocks of code into a clean single block that handles everything as these tend to actually hurt maintainability (ever see the littering of conditionals in "DRY" code because multiple call sites use the similar code slightly different? Yeah, you did it wrong).

Another comment wrote it well by mentioning SPOT: Single Point of Truth. DRY and WET SPOTs. I feel like an analogy is forming that someone more quippy than myself can ferret out.

DRY tests are. the. worst. Way to lock us into last years’ requirements, bud.

This is where DAMP comes in. Descriptive And Meaningful Prose.

I should be able to read the test failure message and know what I broke. Barring that, I should be able to read the test (not the whole fucking test file) and tell what I screwed up.

Anything else kills momentum, and I just want to get away from your code as fast as possible. Which means more tech debt.

On the other hand, articles like this or principles like WET don't suddenly make it ok to just go ahead and duplicate code all over the place. It's not like code cannot be knowledge. Depending on the type of code SPOT and (what gets misunderstood as) DRY are totally about the same thing and mean the proper solution is to take those bits of similar code and make them into a function. Anyway, you likely know that already but I just felt information like this needs to be on this page as well to avoid possible beginners reading it might think all duplication is fine :] Because I've seen what happens when code containing the knowlegde gets duplicated and the result was never good and most of the time horrible on all fronts.

What they meant by DRY is otherwise known as SPOT — Single Point Of Truth — which is harder to misinterpret. The same “truth” — which can be data, values, behavior, policy, etc. — should not be defined multiple times in separate places, because a future change would have to be applied to all the places, or else cause different parts of a program or datastore to have inconsistent views on what the “truth” is.

If you google for it, you will find the synonymous “Single Source Of Truth”, which however makes for a worse acronym.

Incidentally, this may explain why caching is hard - cache invalidation in particular. By definition, it must violate this SPOT/SSOT principle.

While there is a thematic connection, SPOT is usually more of a design-time (or coding-time) principle. Caches still represent the same source, just time-delayed.

That's not quite true; it's closer to say that it's very easy to design a cache mechanism which accidentally violates SPOT.

I feel like this article ended before it should have. I'm still not exactly sure what the author means by Don't Repeat Knowledge. Should we not be refactoring or... just don't go overboard?

I'm still not exactly sure what the author means by Don't Repeat Knowledge

I think they're assuming that everyone has read The Pragmatic Programmer. To quote the original DRY principle from there:

Every piece of knowledge must have a single, unambiguous, authoritative representation within the system

Note that this has only an accidental relationship with code duplication, and in some cases could increase the latter.

More often than not duplicate code is duplicate knowledge.

It often is an implicit business rule that links several statements together or a system requirement like freeing db connections and locks in the right order after use.

DRY skips the part about dependency management: Will tying these parts together make sense for future readability, changability, or not? Often it is wise to be clear what and how to separate and encapsulate.

If there is only one right way, there is no reason for change or changing together, so is not shared knowledge.

Hah I'm glad I'm not the only one. I finished reading it and thought that it felt like an intro and they entirely left out the meat of the article. Thought maybe it was just a lack of related knowledge as I just dabble in coding but yeah I figured at some point they would define what it was actually supposed to mean in depth. When that didn't happen I was left wondering what the point of the article was.

They probably tried to add more info but couldn't do so without repeating some knowledge.

The actual problem here isn't to refactor or not refactor. The mistake a lot of developers make, including here on HN is in thinking of it mechanistically, that the code is just a bunch of operators and data and we've just gotta push it around the page some without really understanding it. That's how a lot of people interpret DRY, as an end in itself to be accomplished by pushing the symbols around and sweeping them up into something pretty.

The most important thing isn't to apply these heuristics, but to understand the problem space in which your code operates before you lay down your abstractions. That's a difficult thing to do without domain knowledge, and in a lot of enterprises you will never get access to the kind of domain knowledge you need to refactor effectively unless you're the lead or in management.

To the extent that your code is a series of statements about how the system behaves in response to a particular data input it's easy to read and documents itself. And to the extent that your data structures and statements resemble statements that a domain expert might make(move gantry 30 meters to the left, then drop the crane) they become easy to change in response to changing requirements. Domain knowledge tells you what the fixed elements of the problem space are(is it always a gantry? Does the gantry move in any directions other than left? What does it mean to drop the crane and do we do it different ways?). That informs how you structure your code and what the most clear factoring is.

It will not always be the smallest refactoring.

I just wish someone would teach comp sci kids that checking for error conditions is more important than making something look pretty and/or complex.

When you refactor you are taking shots at moving around where coupling occurs. If your code is maximally decoupled it is primitive copy-paste code that never calls functions and intoduces unique variables for each section - and if it's maximally coupled it will look like swiss cheese, trying to reuse the same functionality for everything with clever parameterization, recursion, indirection and globals. Intentionally coupled code is most common in memory-starved environments since implict dependency helps reduce data overheads.

And so "DRY", to the extent that it's useful, encourages you to find slack areas in the code where there's low potential for introducing coupling, and to factor those out so that you have code that is mostly-decoupled without also being redundant and hard to modify - the factoring reflects "knowledge" about the problem. And yet it's not always obvious when you have the knowledge or not. Sometimes redundant-looking code is a form of hardcoded data and a factoring would only push it towards being fully data-driven(which exacts a price in debugging). The Rule of Three is just a common way of making this decision about knowledge.

If your textual copy past includes some requirements of conforming to a contract, or simply actual copy past of "knowledge" (which can very well also be a under the form of a big textual pattern), the mere form of "primitive copy-paste code that never calls functions and intoduces unique variables for each section" does not actually reduces coupling (except MAYBE if it is expanded down to the metal, including expansion of syscall and/or low level libraries, but you won't do that).

Also what have globals anything to do with that, and why do you put them in the supposedly factorized code. You are mistaking it with a bad mess you once saw, maybe? On the other hand, code bases obtained through copy paste based programming can not be considered anything else than a bad mess.

But yes, factorizing can be done badly, even to the point of being counterproductive. Like anything.

The issue is that "DRY" isn't actually about coupling at all, it's about creating unambiguous ways to do things. If you have a single http library that's DRY, but if you've got a function that encapsulates several tangentially related bits of code that's not DRY. The acronym just doesn't say what it means.

Confusing knowledge for syntax leads us to both false positives and false negatives. An example of each:

In designing a web page, if you find yourself saying "there is a button here" in HTML, and in CSS, and in JS, and on the back end, that is not DRY even though the syntax looks nothing alike.

Two different APIs, for services serving different purposes controlled by different external entities, happen to have the same structure and you find that a large chunk of code can be factored out of both. I would argue that "what we currently need to do to API 1" is a separate piece of knowledge from "what ... API 2", and unifying them is not DRY.

I hadn't heard of the Rule of Three, but it parallels my own heuristic. The first time, I write the code to do the thing I need. The second time I encounter a similar thing, if I can't find the right abstraction to unify them, I go ahead and repeat myself, writing a second, similar round of code that does what it needs.

If I encounter it a third time, then I've got enough data points to make a good guess about what the right abstraction will be. If I've done a good job so far, it shouldn't be too difficult to refactor it. (Strong, static typing helps.)

This is, of course, just a heuristic, and it's not all-or-nothing. I'll take my best guess about what the right abstraction is going to be, and I'll try to get it right the first time. The second round also presents opportunities to take two points and extrapolate a line.

It all comes down to experience: not just with the system, but with the domain that the system is about, and with the way systems change and grow. No one rule of thumb ever encapsulates all that.

I use the same approach to automate processes. The first time, I do it manually. The second time, I still do it manually, but I think "Hey, I did this once before. This is looking like something I maybe ought to automate."

The third time I automate it. By then, I understand it well enough to have good odds on being able to do the automation successfully.

How often did you automate something yet?

If it's more than three times, you ought to automate the automation!

You're kidding (I think), but I missed a piece.

Part of the point of doing it the second time is to make sure that I really understand what I'm doing and how I'm doing it. Without that, I can't write the automation on the third time.

Well, if the task is "automating things" (for very general values of "things"), I don't understand how I'm doing well enough to automate that.

Why? It's already automated, the question about whether to automate a class of things could be answered by adding a dimension to this: https://xkcd.com/1205/


Having a couple lines that are similar or copied in several places shouldn't be considered such a bad thing. Repetition reveals similarity, and having clear signals of similarity is really important. It's often more expressive / easier to understand than a single method name.

Premature abstractions are way worse than repetition. A poor or insufficient abstraction leads to obfuscation which leads to misunderstanding which leads to novel constructs for the same responsibility. Because a poor abstraction can be really really difficult to back track, you end up with hacky work-arounds to get something done.

I think encountering novelty in a codebase is the biggest thing that damages comprehension; and repetition actually enhances comprehensibility.

I have so much to say on this topic that I feel like I can't say anything. I'll just leave it at: beginners tend to do too little, intermediates tend to do too much and experts try to do no more than enough.

> [...] the original “Don’t repeat yourself” was nothing to do with code, it was to do with knowledge

> The trouble with DRY is it has no reference to the knowledge bit, which is arguably the most important part!

Okay. Now what does this mean? Is this article effectively a tantalizing recommendation to read The Pragmatic Programmer?

Like DRY is wrong? I don't really get the point of this article.

The comment that it's about knowledge, not code resonated with me.

Like, if you saw Http.getClient(...).doGetRequest(...) a few times, it wouldn't be worth pulling them out into a myGetRequest(...) method. Your teammates already understand the existing, repeated statements, but they haven't seen myGetRequest(...) before, so you wouldn't be making the code any more readable to them.

But if you had Http.getClient("auth.myservice.com:8443").doGetRequest(...) in a few places, then I would pull out the host and port (or maybe the whole line), since it contains knowledge of where/how to authenticate.

Coming from the other direction: if I were reading the code, I can imagine myself looking for the one place where the auth happens, but I can't imagine myself needing to know the one place where Get requests happen (even if the 'Get' code is repeated much more than the 'auth' code)

I think, you are making a good point for what may be called mutable knowledge. I.e., when writing a shell script to iterate over a few files, nothing of this is of utter importance, since the script is meant to be run only once, thus we may consider any knowledge immutable. Also, maintenance and readability aren't much of a consideration here. In the case of a "real" application, when connecting to an endpoint, as in your example, we may consider address and credentials subject to change, so this is mutable knowledge, which ought to be represented once. However, the method of connecting is hard coded in several places, so it's pretty much immutable. If we expect the method of connecting (http) subject to change, we should consider to put this in a single spot, as well, even, if this may mean that the code may be harder to follow, since we introduced another layer of abstraction. And so on…

DRY was introduced in the Pragmatic Programmer, and Dave Thomas pointed out in a recent Changelog episode that DRY doesn't mean "Don't repeat code", it means "Don't repeat knowledge."

One concrete example: If your software has to create really complex objects, would you rather describe _how_ to create those objects in 10 places or one place? That's a scenario where you don't want to repeat yourself.

Dan Abramov [wrote about](https://overreacted.io/goodbye-clean-code/) this (linked in the OP), but in his example he's removing repetitive code. He's not removing multiple copies of the _knowledge_ about what the program is supposed to do.

It's a subtle difference that seems more difficult to describe than I'd like, but it's an important one.

Sometimes "A little copying is better than a little dependency."

I've seen so many times people going all in on DRY not understanding that just as dangerous as duplication is _coupling_ - the inevitable result being some ungodly $COMPANY_NAME_common lib with a thousand dependencies, and usually only depped in a codebase for a config parser and a string helper. See also node_modules and left-pad.io.

I absolutely subscribe to that, but then again, I don't have a Rule of Three or similar...

It's a bit difficult to get across in text, but the minimum number of repetitions of a piece of code to make it "worth" putting it in a function is... 1. (According to me, and Tony van Eerd of Postmodern C++ fame. I had come to this conclusion on my own, but his talk really articulated it well.)

It's all about limiting the scope of side-effects, accidental reuse or variables, etc. etc. such that a human can do chunking to understand the whole.

I generally find that this is not an easy thing to capture in "metrics" or "rules". Guidelines with reasonable rationales, etc. etc. and when-not-to's, definitely, but that's a really hard thing to do and it doesn't get many clicks.

EDIT: ... and just to get back to DRY. The acronym is far too absolutist, but Try-Not-To-Repeat-Yourself-Too-Much-Unless-You-Have-Good-Reason-To isn't quite as catchy, is it?

No, they're not saying it's wrong, but rather it's commonly misunderstood. From the motto alone you might think the point is to abjure all code duplication. But Hunt & Thomas' intent was instead to warn against duplicating sources of truth/knowledge - that is, all knowledge embedded in your code needs to have a canonical source, and all other references should derive from that source or you risk divergence (or in the best case must always remember to make necessary changes in multiple places).

So for example, documentation (truths about the code) should derive from the code (eg by doc generation). Otherwise the docs & code will drift apart. Or if you're passing domain information across the wire between client & server, you should derive the data structures at both ends from a common source.

>all knowledge embedded in your code needs to have a canonical source

I don't get it. Code /IS/ knowledge and whenever I copy-paste code around, I duplicate not only code but also knowledge.

Well (and I'm elucidating, not necessarily defending) they mean knowledge somewhat specific to the project, not in an absolute philosophical sense.

So you have an API that belongs to this project. When you change it, do so in one place, and then run your doc generator rather than change it in both function/method signatures and docs.

Or you have domain knowledge embedded in classes, and a wire protocol between peers or client & servers using different languages. Derive the data structures in the two different languages from common source (either one of the languages, or both from metadata).

I think the distinction between this kind of project-specific 'knowledge' and more abstract 'everything is knowledge' issues is clear enough in practice. But it is just a rule of thumb rather than a deep philosophical principle, and like all such will break down in individual cases.

whenever I copy-paste code around

But that's just one source of code duplication. Another might be (for example) duplicated code deriving from code generation. DRY might advocate this (as there's a clear canonical source of knowledge), whereas a generic rule against all 'duplication' wouldn't.

It's true that code is knowledge, but I think it's talking more about not having slightly different versions of that knowledge floating around the same code base. For example, if there is one canonical way to do something, such as adding a table row and having to remember to update counts somewhere else, then that should be done in one place. That way, if that agreed-upon way changes you don't have to remember that it's done in 25 other places across your code base.

Sometimes, things happen to be the same now, but they got there for different reasons and are likely to evolve differently in the future.

In that case, there is not a single piece of knowledge being duplicated, but rather two separate pieces of knowledge being possibly unified.

>Like DRY is wrong? I don't really get the point of this article.

More that it's a guideline, not a law. We should always use best judgement to decide when the tradeoff of readability and declarative code is worth a small amount of repetition, rather than religiously refactoring something for the sake of it.

Does anyone else see a certain irony in the raw number of Dry articles in existence? Or the fact that people keep writing them?

I think the deeper problem in the software industry is that we have no collective memory and need to DRY up as a whole.

"It feels good" - I think that is a really important point! Overzealous DRY'ing is like a game. Every few tokens you save by clever reuse is a small victory. So it is easy to lose sense of the big picture. Programmers often like logical games and challenges, but it is dangerous to treat development like that.

We should be weary of micro-optimizations for "elegance" which actually hurt the larger-scale maintainability of the system.

The comments here suggest two things: (1) most people misunderstand DRY (ie. they think it's about code rather than knowledge duplication), and (2)the article didn't do a great job of clearing the issue up.

Though an alternative to (1) is that the meaning of DRY in common dev parlance has changed & has come to mean something different from Thomas & Hunt's intention.

I feel like this article is being critical about something without justly staking a clear claim about what the right approach is. In my experience, the benefit of DRY code is bug reduction and overall increased new development velocity. There is a whole class of bugs around similar behaviors that devs and product managers expect to move in sync which don't, because features develop over time and it was just easier to code separate small bits than refactor into a common code path. Yes, it can make readability harder to unify into abstractions and create the right configs or import steps. But the time hunting down and fixing the bugs, plus the drag on overall feature development due to having to write updates in multiple places and test for them is far worse to deal with for not taking that preventative measure.

The comments should make it painfully obvious there isn't a general rule that applies to all projects in all situation.


Software engineering is too nuanced to be summed up in an acronym and surely the inventor of each acronym only intended it to be a basic rule of thumb.

The opposite of the rule of three and pattern detection is an instant detection stupid code (no funny acronym for that). An example of stupid code:

  let setX = (e) => {this.x = +e.target.value}
  ... other setup ...
  input({onchange:setX, value:this.x})
This code is not a subject of undry or rule of three, as it’s ratio of meaning to character count is too low.

And yet some frameworks make a decision to abstract it out at a wrong point:

  let [x, setX] = ...
  ... other setup ...
    onchange:(e) => setX(+e.target.value),
instead of clean and readable

  input_num(this, 'x', {})
  // or
  this.input_num('x', {})

I don’t mind repeating something as long as I write it in a style that could become a function in a straightforward way. Seems like the first time you have to change something, you realize it applies to one case and not the other. The more important thing is that the repeated/common parts are obvious.

For example, if you reuse the same logic in a couple places, where the only difference is some specific variable, it should be written as a block with alias variables at the top. That way, two different cases look literally the same (except for a couple assignments at the top).

I don’t disagree with this but it is more complicated because often the order statements are executed in is knowledge. Much duplicated code is duplicated knowledge.

The question is whether it is a coincidence or the same concept.

It's memetics.

"DRY" is catchy, easy to talk about, easily verbed as a recommendation, seems like a good idea, and seems to be recommended by people who know what they're doing.

The problem is that it seems self explanatory, so no one discusses the definition. At the same time, the more obvious definition isn't the right one.

As a counter-meme, I have been proposing we refer to overaggressive syntactic deduplication as "Huffman coding".

You can have specious repetition, code that is identical only by coincidence, not in a deep semantic sense.

To determine what it is, you need to understand the domain. Part of that is knowing not just what it is, but how it generalizes and predicting how it is likely to change.

Of course, this is approaching perfection. One could also cut and paste monkey-like, e.g. instead of looping (without a performance need to unroll loops).

Imo, Dont Repeat Yourself makes most sense when you forget about code and take it to mean that you shouldnt repeat your work.

IE, if you wrote some code in one place that you needed elsewhere, copy pasting the code is fine under my interpretation because it allows you to spent the least amount of time working on a problem you've already solved.

nowadays, as golang dev, i don't even know how DRY looks like :D

what I've learnt throughout many years of coding is that purest mantra one should follow is YAGNI. devs think like devs and strive for perfect code. but that goes directly against the business. the majority of the entire world is being run on really bad code. but that bad code works. and that is what is important.

in a way, you should treat things like blackboxes with strict interfaces. no one should care how the box works, as long as its interface works like it is supposed to.

PS: the dangers of DRY are introduction of deep dependencies that might, and probably will, bite you in the ass along the way. DRY should be used only for libraries, not for business logic - ever.

Unlike Abramov's original post, I actually agree with this one, being more nuanced. The author acknowledges that DRY (and ultimately, clean code et al) itself is not evil; it's just misunderstood.

This article would benefit from some code examples.

As it is, it left me with the same thought as those who claim to never need debuggers or object-oriented features: Fine let's say you're right - how do I implement your system?

DRYing the code is just an entry level form of simplifying the code. It's a good starting point, but its value quickly taps out once junior devs start writing incrementByOneFromZeroToNLoop(n) functions.

Code is closer to a craft like carpentry than a pure knowledge job. There aren't any rules, only heuristics, and it takes time to hone your skills (not "learn" them).

The biggest limiter in this is the excessive tendency for flat hierarchies in dev. It flies in the face of the apprentice - journeyman - master system that has always naturally structured the delivery and learning of craftsmanship.

I wish the author here or Dave Thomas would explain what they mean by DRY meaning “knowledge”. They both say that and leave it as obvious...

Happy to do as asked, though I am a different Dave Thomas.

Per my understanding, it's anything you might say about your system, especially as it relates to your domain.

"There is a button here", "this is what we store about users", "broken widgets are red", "this is how we calculate interest", ...

Any programming guideline regarded as Dicta Bölke is going to lead to garbage.

They always must be leavened with Good Judgement.

The article is weirdly void of a counter-proposal, or a definition of what it means to not repeat yourself in terms of "knowledge"

Why do developers not see the forest for the trees?

DRY like anything can be properly used or misused. For example, you can normalize a database so much that any basic query comes with a massive overhead (recursion). There is a middle ground between "religion" (pure DRY) and "chaos" (no DRY).

How does normalization lead to recursion? Normalization is one point where DRY really is critical, because redundant data in a database can lead to data corruption when it get out of sync. And if the data is corrupt, your whole business is screwed.

I may be mis-communicating what I am trying to say.

I once had the (mis)pleasure of dealing with some novice DBAs. There were instances in the data model where things were so recursively linked that one query would fan out to 200+ to actually return the set of meaningful data. As you might of imagined, this caused scaling issues when you started dealing with significant amount of data (hundreds of GB of data in the DB).

Eventually we brought in some pros and one of the first things they did was eliminate a number of overly recursive queries and duplicated some (not all) fields of data to speed up performance. We went from 1->200+ fanout in a typical query to 1->25-ish. The performance gains were insane.

Of course it violates the "don't repeat yourself" and to abstract linked (repetitive) data in a relational way. But sometimes this best practice can really be counter-productive in edge performance scenarios. But in general yeah, don't be repetitive, abstract your stuff, and keep it clean and easy to change globally if/when needed.

So what's the problem with what people do? The article doesn't go into detail. Keeping things 100% DRY is never a bad thing, with the only exception that code becomes too obscure and hard for new contributors to start on, which granted it's pretty important, however it's not like people are getting DRY wrong.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact