Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Prefer duplication over the wrong abstraction (sandimetz.com)
204 points by hyperpallium on July 9, 2016 | hide | past | favorite | 96 comments

I generally agree that it's good not to have many dupe but in this case there was a lot of upvotes and a lot of comments and tagging the link as "dupe" just destroy the discussions.

Why not just make successful "dupe" link stay but not bring karma to the poster ? That will stop the perverse incentive to try to submit old successful link for easy karma.

The new title makes me so happy.

For a long time I've thought that Don't Repeat Yourself is the most important software engineering rule. I still do, but in line with this article, I've learned that some things that appear to be duplication if you just look at the literal code actually aren't. Just because you've got a chunk of ten or twenty lines that are identical right now doesn't mean that they are actually identical.

It's hard to give concrete examples of such an abstract concept, but one I encounter with some frequency is some chunk of code that grabs something from a URL. Assuming you're using some half decent library that takes care of the brute mechanics, it's easy to end up with repetitive code to deal with setting up a form and putting on a couple headers and dealing with authentication or whatever, and it's tempting to try abstract on that yet further. Yet I find all the attempts to do so make the problem worse, because the superficially-similar code is actually quite dissimilar. The pressures on the instances are different. Here I may be deeply concerned about SSL, there I may be adding header hacks for some broken auth scheme, over there I've got some sort of serialization problem. Trying to put them all behind the same abstraction is just pain. It turns out the level of abstraction offered by these often years-old, mature HTTP libraries is more correct than may initially meet the eye.

For code to be "duplicated", the full context of the code need to be duplicated, not just have superficial visual similarities. It's important that all the code with the same context, the same evolutionary pressures, the same likely change planes end up in one place, and it is equally important that code that lacks that similarity not end up bound together. In the end, it's the same error.

I agree. I follow the DRY principle very strongly, but not on the literal code level - just one level above, on the "what is this doing" level. So in your example, if I see a dozen places where the program is using that URL library to e.g. send requests to get a different dataset from the same API, I'll quickly DRY it into a function. That's because all those places do the same thing - "fetch a dataset named $name from API XYZ". But I'll leave alone all the other places that use the same URL library, even with pretty much identical parameters, if they're doing different things. They may look the same now, but they may start to differ later. And even if they don't, placing them under a common abstraction when they represent different things is basically a lie. I don't like it when the code lies to people.

> For a long time I've thought that Don't Repeat Yourself is the most important software engineering rule.

It's not. "Don't introduce unnecessary dependencies" and "keep dependency graph without cycles", "keep things modular" and "all operations in a unit (function, module, ...) should be from the same abstraction level" are just as important, if not more, but they're a little more high-level, and thus less actionable. DRY is just simple to enforce, that's all to it.

I think the concept is formally identical to database normalisation but applied to code.

If the code is duplicated and you know the different part will have to evolve the same way: factorize now. If the different part are just formally similar by accident wait a little before factorize.

Similarly in a database you create a table for users but you don't create a table for "family names" and "first names" even if you could factorize them : when Cassius Clay become Mohammed Ali you don't want to rename all Cassius to Mohammed and all Clay to Ali.

I agree. I my mind, DRY really is not about similar looking lines of code. I think of it as an attempt to strive for a concept-to-implementation-ratio of 1.

There's a reason why Torvalds said, "Talk is cheap. Show me the code."[1] Quite often when people write about abstract concepts in software that violate well-known principles, they cannot come up with concrete examples that show a good time to break form. Because they are few and far between.

Take URLs for an example.

> Here I may be deeply concerned about SSL

On web pages, instead of codifying "http://" and "https://" everywhere (a form of duplication), it is possible to drop the protocol to use "//"; the browser will fetch subsequent files using the security of the original page. (Mixing secure and insecure content is an architectural mistake.)

An application shouldn't be concerned with HTTP vs HTTPS. In Java, that concern is made transparent by HttpURLConnection (and its subclass of HttpsURLConnection), for example.

> a chunk of ten or twenty lines that are identical

When seemingly identical chunks of code are repeated, they can often be abstracted and parameterized. Lambda expressions help reduce duplicated code snippets. Of course, when I say "parameterized" I don't mean "pass in a metric ton of parameters" as there are techniques for shortening long parameter lists (such as multiple methods and the builder pattern).

Lines 70 to 89 are an example of duplicated code[3] (here are four lines, but you get the idea):

    this.namespacesAware = xmlNode.getAttribute("namespacesaware");
    this.prunetags = xmlNode.getAttribute("prunetags");
    this.hyphenReplacement = xmlNode.getAttribute("hyphenreplacement");
    this.booleanAtts = xmlNode.getAttribute("booleanatts");
These should be a map of properties rather than a volley of String instance variables:

    private String namespacesAware;
    private String prunetags;
    private String hyphenReplacement;
    private String booleanAtts;
It might not seem like much duplication at first, until you look at the code that uses those parameters[4]:

        final String namespacesAware = BaseTemplater.evaluateToString(elementDef.getNamespacesAware(), null, context);
        if (namespacesAware != null) {
        } else {

        final String hyphenReplacement = BaseTemplater.evaluateToString(elementDef.getHyphenReplacement(), null, context);
        if (hyphenReplacement != null) {

        final String pruneTags = BaseTemplater.evaluateToString(elementDef.getPrunetags(), null, context);
        if (pruneTags != null) {

        final String booleanAtts = BaseTemplater.evaluateToString(elementDef.getBooleanAtts(), null, context);
        if (booleanAtts != null) {
The entire code block could be reduced to one line of code had properties been used instead of instance variables. (Maybe a loop and two lines of code, but either way the duplication could be easily mitigated.)

Without seeing source code examples, it is extraordinarily difficult to identify problems in the code and explain how the code could be improved. Hence the Torvalds quote.

> the same abstraction is just pain

Sometimes code is poorly designed, not designed for extensibility, or has hard-coded assumptions that should be exposed as properties. Take line 52 of the HtmlToPlainText class[2]:

    private static final int maxWidth = 80;
That one line forced duplicating the entire class, which was otherwise perfect for my needs. But the class did not expose behaviour to change the maximum width of a line. An incorrect assumption was made at the design level: 80 characters ought to be enough.

The correct solution would be to make it an attribute and send a pull request to the author. :-)

[1]: https://lkml.org/lkml/2000/8/25/132

[2]: http://grepcode.com/file/repo1.maven.org/maven2/org.jsoup/js...

[3]: https://sourceforge.net/p/web-harvest/code/HEAD/tree/trunk/w...

[4]: https://sourceforge.net/p/web-harvest/code/HEAD/tree/trunk/w...

My favorite example of code you should almost always[0] duplicate are view classes (in MVC terminology).

Instead, I see people subclassing views to change their behavior and dealing with absolutely mind-numbing behavioral bugs in their subclass. Furthermore, the subclass's implementation is totally dependent on the _exact_ implementation of the superclass. Change it, and you get a whole crop of new, hard to debug, bugs.

Solution? When you have a view class that almost, but not quite, does what you want, copy the entire class, rename it, and make it do what you want. You'll be done before your next break, and nothing that happens anywhere else will cause it to break in the future.

[0] The one exception are situations where all you are doing is configuring an existing view class in code and/or adding behavior in places that were _specifically_ anticipated by the class in question. For example, setting up the options on an existing table view class, and responding to row selections using a method meant to be overridden by subclasses.

But if your table view is actually different than the class you've got now—for example, it does something strange in responses to selections—you're much better off finding an open source replacement for the table view (if it's vendor-provided, like UITableView in iOS), copying it, renaming, and modifying it to do exactly what you need to it do.

Been bitten by this in React using ES6 where we would sub-class components and override render to provide another layout (or whatever; change behavior).

The right way to do it in React is to define a new Component and make the Component you'd like to depend on wrap around the returned JSX in render instead. Probably not a revolutionary paradigm, but it has made me better at reasoning about and structuring React components.

That might work if you have a very small app (example: HN, Reddit, Twitter web client etc)

Imagine, on the other hand, large-ish web app (example: Salesforce, Netsuite). 300 tables in the database. Perhaps 50 of those can be manipulated directly from the menu. Customers, Purchase orders, Quotes, Reports, plus all the supporting stuff like Countries, States, Currencies and so on and so forth.

Each of those need at least CRUD, plus printing, paging, searching -- are you really going to copy-paste that 50 times? And if there is a bug in any of those, you now need to fix it in 50 places. Who has time for that

Or, say, one day you want to add "Export" functionality to all of those. If you have a nice inheritance hierarchy (or composition or whatever), you add it in one place. You are done. If you were copy-pasting, then you do it in 50 places?

So no, that copy-pasting business does not work beyond smallish apps..

> Each of those need at least CRUD, plus printing, paging, searching -- are you really going to copy-paste that 50 times?

In MVC terms, that's either (a) the M, or (b) the C. Not the V.

I suggested not trying to fight the (hidden) behavior in a superclass when working on the V layer. I stand by that.

I've also advised some extremely large projects (over 5000 unique screens). My advice scales really well.

You can do whatever the hell you want on small (toy) projects.

>> In MVC terms, that's either (a) the M, or (b) the C. Not the V.

Wait, what? How is the user supposed to invoke CRUD ops or printing or searching without a V? I don't think we are talking about same things at all.

In fact all of these operation span M, V and C. It's not anything new either it has been around at least since Smalltak (in current terminology).

> Instead, I see people subclassing views to change their behavior and dealing with absolutely mind-numbing behavioral bugs in their subclass.

Because they're breaking a rule of good design, implementations should depend on abstractions, not other implementations. Never subclass a concrete class, only abstract classes.

> Solution? When you have a view class that almost, but not quite, does what you want, copy the entire class, rename it, and make it do what you want.

Better solution, move the duplicate code into a common abstract superclass. Allow the views to differ where they're concrete and keep the common code abstract in the superclass. Then you can't ever break one implementation by changing another implementation, your core complaint.

> Better solution, move the duplicate code into a common abstract superclass. Allow the views to differ where they're concrete and keep the common code abstract in the superclass. Then you can't ever break one implementation by changing another implementation, your core complaint.

And then you suddenly find yourself having a "god object", a class that does huge amount of things and most of them are not related to anything else in the class - they just exist to facilitate code deduplication lower in the inheritance tree.

And then, one page really needs a different version of a behaviour that's common to many of them, and you're back to either duplicating or polluting the "god object" even more.

Inheritance is a bad tool for this problem. It's better to separate different pieces of functionality and then compose them in views. If your view needs a component behaving in a slightly different way, then you can just create a new component (or even subclass the old one, if you must).

> And then you suddenly find yourself having a "god object", a class that does huge amount of things and most of them are not related to anything else in the class - they just exist to facilitate code deduplication lower in the inheritance tree.

No, you don't. Point of fact if most of the subclasses are duplicating some code, it does relate to what the superclass is and that code belongs there. This is not a god object.

> And then, one page really needs a different version of a behaviour that's common to many of them, and you're back to either duplicating or polluting the "god object" even more.

That's what overriding is for.

> Inheritance is a bad tool for this problem. It's better to separate different pieces of functionality and then compose them in views. If your view needs a component behaving in a slightly different way, then you can just create a new component (or even subclass the old one, if you must).

If composition is possible, it's always a better choice than inheritance of course; when it isn't, a shallow inheritance hierarchy is fine. Deep hierarchies are never ok.

> (or even subclass the old one, if you must).

No, you don't subclass concrete implementations to change them, that leads back to the whole problem you're complaining about. Implementations should never descend from other implementations.

Exactly this. At Abebooks[1], they had a Servlet called "AbeServlet" that was duplicated for every new page that needed behaviour. By the time I arrived there were close to 200 different servlets all copied from the same "template". Eventually a new requirement came: we need to support i18n because of a merger with European counter-parts. French, British English, Spanish, and German.

Hundreds of servlets had to change. Took about two weeks of coding by several developers.

When I arrived, I created "AbeMetaServlet" (bad name, but good idea). All my servlets extended from it, as it contained common behaviour. To make my pages i18nized, I changed something like a dozen lines of code and was done in less than an hour--about 16 servlets, if I recall. I didn't know about the upcoming merger. I simply despise duplicating code.

There was another example at that shop where they had processes that started up, ran to completion, then paused until time for the next run. All the batch processes had been copied from a template with no thought to the future.

I was tasked with updating a batch job for associating pictures of book covers with inventory that vendors uploaded. Rather than copy/paste, I rolled my own and fixed a bug. (There wasn't any code that could be re-used, architecturally.) The bug added several hours of run time to the picture maintenance process. The first task each batch job performed was to fetch a list of 10,000 vendor names over an abysmally slow network drive. Undoubtedly this bug was copy/pasted into the other processes.

I re-wrote the code to request the vendor names from the database--and only open up directories for valid vendors. The code, which originally ran in something like 12 hours, was reduced to 8 minutes.

All the batch jobs likely suffered from this error. Including the one that took almost 24 hours to run. Anyone know what happens when a batch job needs more than 24 hours to finish?

Anyway. Had a proper design been developed at the start, one simple change would have made all the batch processes faster.

[1]: http://www.abebooks.com/

I see the author's point, but it overlooks code maintenance.

At step 4, where "Time passes" it should read, Programmer A fixes bugs, optimizes, adds features, etc. Repeat.

If the code is duplicated, the maintenance effort is also duplicated or the duplicates diverge significantly making them more difficult to understand in relation to each other. Why does methodA do this, but methodB does that?

Tools can help prevent ProgrammerB and ProgrammerX from coming in and stinking up the place. Set up sonarqube and let it tell them they must pass the quality gate before merging.

Agreed. I think this is actually dangerous advice. While I agree that the wrong abstraction can cause lots of pain, the repetition of a bug through duplicate code is arguably a worse issue.

One example might be not using an abstraction for accessing filesystem resources in a web based service. You might end up duplicating a ton of code that improperly takes a string from a request without sanitizing it and then have a massive security problem all over your code.

This would not be a candidate for duplication tho, because the underlying intent is also logically the same. Author is talking about things that coincidentally happen to have similar code at the moment, but are fundamentally and logically unrelated, having a high likelihood of diverging in the near future.

I understand the intent. I still believe the advice is dangerous if improperly applied.

That's true for all advice, especially in software development.

"Brush your teeth" is dangerous advice if improperly applied.

Programmer A replaced the duplication with a new abstraction at step 3, so maintenance activities can be performed once on the abstraction definition instead of once per copy. It wasn't until the functionality diverged (thereby invalidating the abstraction) that the author advises replacing the abstraction.

>Programmer A replaced the duplication with a new abstraction at step 3

Why would deduplication stop maintenance? Unexpected NullPointerExceptions are found. Unexpected UTF-8 string inputs are discovered. An interger overflow is discovered. A memory leak is found in the method. The method is used in a performance critical loop and now must be optimized.

Maintaining code is the majority of programming work that I've witnessed. If multiple copy/paste methods have to be maintained, it always devolves into a mess of technical debt until someone comes along, pays the debt, and pulls them all up into the same abstraction.

> Why would deduplication stop maintenance?

I didn't say it would, I said it would stop repetitive maintenance--meaning you won't have to update many duplicates of the same logic because there aren't duplicates.

I tried to update my comment to be more clear, but I'm not sure that I succeeded. :p

I love the thinking and agree with this as much as can be agreed with something general and lacking specific examples or specific situations. It's a nice reminder that there are other options besides modifying a base class to fit only one case's requirements.

I'd call it forking instead of duplication to emphasize that the paths are different, not equal.

When this problem does come up, it's not usually trivial. Often the abstraction doesn't break at the same time the second use is introduced, it breaks much later after the second use is built, making it more difficult to fix.

The much more common problem I've experienced in industry is people making an abstraction with only one use case, anticipating before even having a second use case. It's much easier to discuss and fix if everything you're facing is concrete, and anticipating uses without having them at hand nearly always results in a broken abstraction.

I'd love to see more about how to determine the abstraction is wrong.

> If you find yourself passing parameters and adding conditional paths through shared code, the abstraction is incorrect.

This seems too simplistic. The right answer for this description is often inheritance. What are better ways to identify a broken abstraction?

It seems to me that it would be better to address step 6, where programmer B adds cruft to the abstraction rather than re-working it. I appreciate that pragmatics sometimes dictates that duplication is the right option, but instead of mostly giving up and advocating an emphasis on duplication, I'd rather encourage better design of abstractions in the first place, and more willingness to genuinely improve abstractions later on if necessary (rather than more complex logic or whatever). Especially with abstractions as small as methods – I very much question preferring duplication in cases as small-scale as those. This may depend on the expressiveness of the language you're using, though.

This is the correct answer. It sounds like in that case the initial abstraction was either poorly designed, or the behavior being added doesn't fit the abstraction.

And conversely, the right abstraction is preferrable over duplication.

There's almost always chance to stop a little, think/research hard and create something novel, maybe leveraging specific features your language/framework offers.

Duplication is fine until it gets out of hand, at which point you'll be able to identify the right abstraction.

Yeah. I'd agree with this. I've had to maintain some terrible codebases with duplication everywhere. The truth is, I have an easier time fixing bugs in simple but duplicated code than I do fixing bugs in super abstract but DRY code. Which has me thinking: clear, obvious code with duplication is better than magic code which is DRY.

Obviously, the best is DRY, obvious, and explicit...

Also, a rule of thumb is premature optimization is bad. So is premature abstraction.

This is an very important concept, "premature abstraction is bad"

Then you're in the situation that your code is a mess of copy-paste and its obvious what you should do, but management prefers that you work on features instead of changing "proven, reliable" code. Thus the mess is there to stay.

That bad culture is a separate problem which will make everything worse.

In your case, it seems mixed - while you do have to update code in multiple places the flip side is that you can more confidently change code in one location without breaking something else. If you're on a death-match piling up technical debt, that might be less painful than dense inter-connected code, albeit still worse than working somewhere with more long-term thinking.

Yeah, picking the wrong abstraction too early also gives you "proven, reliable" code that's sometimes even harder to maintain.

First example of this that occurs to me was an old Rails app I took over. We needed to migrate it up to modern versions of Ruby and Rails since it was getting harder and harder to support the old ones, and they were about to hit their maintenance end of lifes, but someone had extended the Rails framework stuff with their own extra layer.

Now we had code that changed subtly with a new version of Ruby that was used in dozens of places for actually-somewhat-different things, so the fixes had to account for all those different ways it was being used too. Not a lot of fun, you can fix one endpoint but then have to make sure you don't re-break it with how you fix the next one...

I think this is especially pertinent for the current SOA/microservices trend that many companies are adopting nowadays. Build small, simple, highly focussed services that specialize at one thing. Services do one thing, and do it well, at the cost of flexibility and the ability to generalize over different use cases. I think this something that is perfectly ok and shouldn't make programmers squirm, even if it leads to code duplication.

In some of the more popular microservice frameworks for instance, I see some attempts to build abstractions for things like data persistence -(what if we want to swap out Postgres for Mongo, etc?) But I think that's a good example of a needless abstraction. In fact by building an abstraction like that, you lose many of the features that perhaps made that specific database so great in the first place.

It seems to be easier to mentally justify logic duplication in the hardware world. An FPGA is an abstraction of logic. In general, it trades off performance for more flexibility. An ASIC is specifically designed from the ground up to do one task very well, but deviate from that task and it's borderline useless. I think that for microservices, code should be treated more like ASICs. One off, application specific code that we won't be afraid to throwaway and rewrite from scratch if we need to.

This made me throw up in my mouth a little.

Duplication is not inherently wrong as there is a point where you should stop abstracting and live with repetitive code, but that point tends to be way further out than most inexperienced devs would assume.

If you're using duplication as a temporary tool to help find the right abstraction, fine. But don't check that code into source control.

The problem with duplication is that it's difficult or impossible to track. It sucks if you have to spend two days figuring out some complex abstraction, but at least you know it's there and you can work your way out to see all the code that's affected by your change.

If there's duplication in the code base you might never know that you changed function A and it works just great now, but function B that you have no idea even exists because it was duplicated from function A three years ago now has subtle bugs.

If an abstraction is wrong, make it right. Don't use that as an excuse to abandon the difficult work and duplicate code. You're just pushing the problem off on to the dev two years from now who has to maintain it.

So then you make all your nice pretty abstractions, and something comes from the business where it needs to be different in some wacky pathological case. Now your abstractions suck.

Duplication leaves you room for that messy business logic. Increased abstraction often means increased coupling, and that can be much harder to maintain. It's a judgment call on a case-by-case basis.

What you should do, when messy business requirements break your pretty abstraction is a) find an appropriate new abstraction, and b) refactor the current code to use the new abstraction. Abstractions aren't forever, they're meant to change and be replaced as appropriate.

Now doing the Right Thing may sometimes require additional work even for a simple (but unfortunately abstraction-breaking) feature. That's why people like to duplicate and then leave things duplicated - because it's the lazy option. You can get some more technical debt in exchange for being done this afternoon. But we all know what happens when you accumulate too much of that debt.

It's sometimes actually the best option. We're talking about principles in a vacuum here but I have seen so many tall class hierarchies where now in order to change one isolated feature that shouldn't affect the rest of the system, I'm creating consequences for the whole system.

Decoupling is a more important principle for maintenance than DRY. Obviously you want to factor out common bits of utility code but you absolutely do not want to force all code into the same model.

I think the problem here is with inheritance, which I personally consider a code smell. There are few specific places when you need it, and by using inheritance there you're strongly signalling a particular abstraction happening - one of the "is-a" type. It definitely should not be used for just code deduplication.

If refactorings were described as bidirectional ST (syntax-tree) transformations we would automatically get the defactoring and the refactoring.

I could definitely use "inline method" or "name mangle method and special case".


If new business rules completely invalidate your abstractions in ways that aren't easily fixable, you're doing it wrong. Those wacky pathological cases should just be new implementations that plug right into your existing stuff, even if it requires slightly reworking the abstraction it's better than giving up and duplicating everything. If you're finding duplication easier, it just means you're abstracting things poorly and need to get better at doing it. Mostly likely, you've abstracted too early before you had enough different cases to actually understand how to split the concrete from the abstract correctly.

> If new business rules completely invalidate your abstractions in ways that aren't easily fixable, you're doing it wrong.

Why, you might have been doing it right all along. The abstractions were good for old business rules, but are wrong for new business rules. You need to replace your abstractions with better ones. You can't expect to predict future business requirements perfectly 100% of the time; there's no shame in having to rewrite stuff to accomodate changing reality.

Good abstractions aren't about predicting, they're about being nimble so that when change does come along, it's easy, because you don't have a ton of duplication to fix in order to change the abstraction. The new business rules don't belong in the abstraction in general anyway, they belong in the concrete implementations, that's the point of having implementations.

Like I said, it's a judgment call on a case by case basis. Taking 'DRY' to an extreme leads to an overly self-coupled code base, taking decoupling to an extreme leads to the need to apply the same fix in multiple places.

You can remove the judgement call by making it a design rule, as extreme programming did long ago. When you've duplicated something 3 times, it's time for some abstraction because you have enough examples now to see what needs abstracting and what doesn't. The problems you're describing are just symptoms of premature abstraction. Abstractions tend to fail when they come before the concrete, they tend to succeed when they were lifted out of several examples of the concrete.

Agree with all of that. My hatred of extreme DRY is due to having been burned by premature/poor abstraction so many times.

Also, there are even exceptions to that rule.. I've seen a number of cases, let's say you've got a bunch of command line utilities that launch different jobs, there's some code that's duplicated across like 10 of them. Be careful with deduplicating that, you can wind up forcing very different things to pretend they're the same.

To put it another way -- it doesn't make a whole lot of sense to abstract some "forNtimes(closure, N)" function instead of just writing a for loop. There's a limit.. that example is laughable but I've seen a lot of abstractions that actually increase the amount of code in my day.

We're probably in close agreement then. However I'll disagree with your last statement as that's actually often a standard abstraction and the clearer less error prone way to write an n times loop. I'd rather see

    20.timesRepeat(() => doStuff());

    for(var x = 0; x<20; x++){
for and foreach loops are virtually always in need of some cleaner functional abstractions that are already standard parts of virtually every collections library and are far more amenable to chaining. Abstracting loops is not premature abstraction, they're already abstracted in your collections lib.

That depends on the language, I love flatMap, but I never want to see functions attached to the integer type. Just stop with that :) Principle of least astonishment and all.

An integer is an object, objects have methods, such methods are standard fare; if it's astonishing, try more languages. This is the idiomatic way to do this for example in Smalltalk.

    10 timesRepeat: [ self doStuff ]
What's relevant isn't whether you're astonished or not, but whether it's idiomatic or not. If you can't see an integer as an object with methods, then you haven't fully grokked OO.

I've tried lots of languages, thanks. It's still astonishing and will always be astonishing.

WTF does "timesRepeat" have to do with what an integer represents? Nothing, that's what. Integer is a noun, conventionally a 32-bit value type, while "timesRepeat" is a control structure. It's absolutely crazy for "timesRepeat" to be a member of integer.


You're completely ignoring what I said, idomatic; this is how it's normally done in Smalltalk, and it's absolutely idiomatic to hang control structures off objects since that's how the entire language is implemented from top to bottom, it's not crazy at all, it's good design that keeps the language itself minimal. As for what #timesRepeat: has to do with Integer, I'd think that's pretty obvious, an object should be able to enumerate itself, to write a loop where you inc a counter is procedural, not object oriented and I'm very much talking about OO here, not procedural programming.

Same thing applies to ranges.

    1 to: 10 do: [:num | Transcript show: num  ]
#to:do: is a control structure on Number of which Integer is a subclass. Anonymous methods even have methods, consider the basic while loop.

    [ aList isEmpty ] whileFalse: [ aList removeFirst ]
#whileFalse: is a control structure on BlockContext, the class represented by [ ]. No, there's nothing weird at all about hanging control structures off of system objects, that's one of the best ways to design a minimal language and avoid the plague of a hundred reserved words that you encounter in C variants.

I disagree.

Abstraction isn't the end all of every implementation. Don't get me wrong, it can simplify your code in beautiful ways; but only if it makes sense. Otherwise it creates unnecessarily complex code that is very difficult to build on top of.

Imagine having to decompile a very complex thing just to add a small feature or extra parameter.

Simple and transparent is always better in my experience. It is a lot easier to refactor and combine than it is to deconstruct (more often than not, code that is too complex to get deconstructed just gets rewritten).

Side point: Write code that is easily ripped out and replaced (I.e modular). Personally, I think this is one of the most important principles in continually evolving software. Make shit easy to replace.

Also it's not true that duplication is untrackable. Comments, find/replace, or just construct your code in such a way that it is easy to find (helper methods, wtc.)

I'm surprised by the number of developers who think the "right abstraction" is always attainable now. In situations with complex, shifting requirements, particularly with a new product, it is very easy for a good dev team to lock in on a set of seemingly correct abstractions that make subsequent development very difficult.

I've recently been on both sides of this fight, unwinding premature abstractions in favor of some duplication, and imposing a set clean abstractions on some horror-show code.

The premature abstractions were imposed in a situation where we were trying to map several different application-layer protocols to a common set of functions. The handling of the different application layer protocols should almost certainly have been left "unabstracted" by the original developers at this stage (where understanding is murky, time is short, and complexity is high).

On the other hand, we also had a metrics reporting framework where everyone just kind of rolled their own thing depending on how they were feeling that day. It was a massive mess, it was unreliable, it was uneditable, it was inefficient, and the only cure was to rip the whole thing out and impose a set of clean abstractions around the different reporting scopes that we need. This was a case where requirements were not likely to change, the complexity of the task was relatively low, and our understanding of what we needed to achieve was not 'murky'.

I'm definitely on the side of defensively avoiding premature abstraction in certain situations. I think it leads to healthier, more transparent, easier to change code in some situations, especially in the nascent stages of a complex machine (that nevertheless must function in production).

I'll say also that unwinding premature abstraction just feels.. ugly, politically and aesthetically. But that doesn't mean it isn't the correct decision sometimes, and it certainly doesn't mean that we wouldn't have been better off if we had just embraced the particular during a time when we didn't (and couldn't) know any better.

These types of situations are very common in "view" and "view controller" classes.

Someone tries to DRY a complex view that handles takes in many options and parameters to address several use cases.

A redesign or requirements change happens and now you have an over complicated view class that needs to get deconstructed. Or you can try and build on top of /subclass/extend an over complicated thing. Neither is ideal.

Reading and understanding code is A LOT harder and time consuming than writing new code. So in reality, this view would just get rewritten (duplicated) anyway.

The problem with premature abstraction is that it becomes difficult or impossible to reabstract since you're not looking at the root issue anymore but rather attempting to unroll a failed abstraction.

Often by the time the abstraction is such a failure that it needs to be rewritten, it's evolved into its own system with a menagerie of hacks latched on to let it live another day. You're not going to just come up with a regeneralization because more often than not it's not generalizing anything anymore.

Seems a bit much to "throw up in your mouth a little". Premature abstraction and the high cost of indirection are responsible for some of the hardest problems in software and you don't offer a solution beyond "try anyways".

I'm so glad to hear a pithy saying given to this.

So many times I've fought for ripping out some indirection-heavy pattern and just doing {thing} procedurally, only to run up against the concern of the duplication it'll introduce.

This doesn't validate the procedural position in a procedural vs polymorphism debate; if anything, the example can be summarized by "there was a good polymorphic abstraction, programmer B came along and couldn't figure out how to extend the abstraction to implement his new requirement, so he made the abstraction less polymorphic and more procedural, and then this happened over and over again until the accumulated procedural logic made the whole thing unbearably bad".

I think the appropriate summary for this article would be: bastardized abstraction < duplicated logic < proper abstraction.

I feel this is good advice. I think the harder problem is getting buy-in from everyone you work with to do this. Often times I take this approach and have to really try to sell it to people reviewing my code because they've bought into this idea of "DRY ever, no matter what".

DRY yourself is useful when 1. You're repeating yourself more than 2 or 3 times and 2. The inputs/outputs and intention of the code are very, very clear. If that's not case, I'm ok with a little copy/paste.

Disagree. You should never "prefer" one over the other without considering and understanding the actual line where the decision becomes non-obvious.

The author is missing a major important factor, that it entirely depends on the size/scope of the duplication, and the size/scope of the changes necessary for functionality.

More lines/complexity that won't change means you should be more likely to want to abstract it. More changes to the code means you should be more likely to want to duplicate it.

Man I dunno. This is so tough.

On one hand, absolutely. Stacking params on params is a recipe for disaster. But often times with duplication you do need to update the duplicates. Which means you wind up with 6 pieces of code all similar but slightly different such that it fails in increasingly nuanced and edge casey ways.

No silver bullet unfortunately.

You don't have to choose. If you have a function* with excessive arguments, it has too many responsibilities. Identify the responsibilities and make a function for each. Then replace the call sites of the original function with only the new functions that are appropriate for that use case. If you find that the same N new functions are repeatedly called in the same places, you can pull those out into a new function as well. It's all very simple, though learning to decompose responsibilities correctly takes practice.

* For sanity's sake, I'm just going to say "function" in place of "abstraction".

Another useful rule: if you find yourself needing to change an existing abstraction, especially if you need to make it more complicated by (say) adding a parameter, this is often an indication that your design is broken, and that you need to change that so that you don't need the extra parameter. Most often in my experience the root cause of most problems of this sort is different data representations. Something is big-endian over here, little endian over there. This quantity is represented as a string over here, as an integer over there. I've found that being absolutely ruthless about being consistent in data representations goes a very long way towards keeping my code from spinning wildly out of control.

This also applies to CSS and is the reason why most CSS codebases are a mess. Functional CSS mitigates this (at the cost of duplication, but personally I've found it much more easy to maintain).

The original problem comes from trying to map logical components to CSS classes and forcing inheritance. This is a good intention but doesn't work well. With the project evolving, it gets complicated to alter something without breaking something else, because the inheritance chain means that rule conflicts are likely and not that easy to spot, so to have a mental model of how the element will behave visually means a lot of cognitive load.

"Don't succumb to early optimization" applies as much to design patterns as it does to algorithms.

I've wasted a lot of time thinking how to de-duplicate stuff only to have to re-duplicate it a few months down the road (or fill up the de-duplicated version with if statements..)

How do you know that the abstraction is wrong (and how) if you don't create it?

It seems to me like the article is saying - don't ever make mistakes, they are costly.

Perhaps a better tag line would be "duplication is far cheaper than a bad abstraction".

The take-away from the article for me is if subsequent activity (the changes requiring parameters and if-branches) show it was or has become a bad abstraction, then undo the abstraction and re-introduce the duplication.

Then, why not fix the abstraction instead?

Sometimes you don't know what the right abstraction is yet, you only know it's not the one you have.

I think the point of the article is that, if you aren't totally sure that an abstraction is correct or justified, then it's better to avoid building one in the first place. Not saying to never make mistakes, it's highlighting a particular kind of mistake that people don't always think about, and suggests that you can avoid making this mistake by not being so obsessive with DRY

I think such conservative approach can be justified when working on existing code, which has been somewhat proven to work. But I don't think it's justified when you're building the code.

I simply don't think you can always get the good abstraction right. But if you don't attempt it, you will never get it right. What you're advocating is not attempting it.

If it's new code, and you are unsure what abstraction is needed, build the simplest abstraction possible. YAGNI

That way, as the code develops, and it becomes more clear what is needed, then you can easily build it at that time.

If you have code that needs to be duplicated, throw it into a function, a lightweight and powerful abstraction.

That's what I am advocating in this thread. ;-)

I think the point of the article is to catch eyeballs and get shared because "I'm pleased to announce that I'll be teaching a public Practical Object-Oriented Design course in San Francisco on May 9-11, 2016" ...

In some sense, yes, it's what article is saying. But yet I agree with it absolutely: it's basically the same thing I try to convey to some my colleagues for years.

Because the problem is that many programmers — almost all novices and even some pretty experienced ones — don't admit to themselves the simple fact, that they might be wrong in the what they are thinking right now, at the moment. Sure, they won't deny that they make mistakes sometimes, but yet somehow they are pretty self-confident while actually working. Basically, they think they are much smarter than they are. What's making the whole thing even worse is the fact that people (again, especially the novices) don't actually think about "how to make it cheaper (less painful)", they think about "how to code properly". Basically, "what should I do to not be laughed at, how do I seem smart?" They just know they must follow some "golden rules", so called "best practices". You know, the patterns, no goto, no code duplication ever. All that programmer etiquette.

That's why often instead of simple, short and not-elegant-on-purpose solution, non comprehendible, thousands of lines long abstractions are born. Programmer tries to justify it by that it "solves the problem we don't have yet", but usually it turns out it doesn't — which isn't a real surprise, as it's hard to solve the problem you don't know about yet. And while the crude and simple solution might have been rewritten in an hour, that not so perfect abstraction of yours requires days of works, and usually just keeps collecting hacks until it is so complicated that it's impossible to change anything without breaking something else.

Of course, the point is, as always, that you have to keep balance. But there is a strong bias towards "abstraction over duplication" and such, so it's better to push programmers towards simpler, imperfect solutions.

I don't think it's necessarily just people trying to appear smart to their peers and creating abstractions to do so. I think it's an intermediate programmer problem that comes from seeing their own novice programmer code spiral out of control from sprawling redundancy. It comes from one too many meetings where someone says something like "can't you just change the website so that all the red links and controls are now blue", and kicking oneself in hindsight for not somehow tying it all together. And I think it culminates in programmers thinking that if only they'd made it with some kind of unified but flexible abstraction it wouldn't have been a problem.

So when it comes time to write the next app that "unified but flexible" becomes the mantra, and yes it is fun at the early stages of a project to fantasize about permutations and invent great abstraction towers to cover these cases. But I think the problem that people have is that they don't respect that flexibility comes in infinite dimensions, and that you can only pick and choose a few specific dimensions of flexibility to cover. Not only that, but each unnecessary dimension of flexibility added makes the code significantly longer and more complicated and just generally difficult to reason with. In hindsight from the previous project it all seemed obvious which dimensions should have been flexible, but making that call correctly up front is incredibly difficult.

I think we can agree that the abstraction that has more lines of code than the original duplication is probably wrong. But that's something very different. I don't think abstracting duplicated code precludes sticking to YAGNI.

Not that I am that good at following her advice, but: you have to look at your code in a more dynamic way; you try several designs, thus several abstractions, and you know which abstractions are bad because they don't work for you at solving the problem at hand. That and some generic rules of thumb, like: prefer composition over inheritance.

This is another one of the factors that makes development time estimates difficult. While adding/removing/improving the code, you realize that some part of the code needs to be abstracted/duplicated, or vice versa. And this could take unknown amount of time to get it done.

Also, this is where the technical dept incurs over time, if the abstraction/duplication is held off.

Is it just me or did the author just provide the perfect argument for pattern matching in function calls as seen in Elixir? One can easily split/fork the functionality of the same abstraction depending on the arguments. This preserves the names (which preserves the abstraction), but extends the functionality to adapt to the new use case in parallel, without any dependencies.

Also, by "wrong abstraction", the author is arguing for abstractions to be backed by reasoning, not circumstances a coder might find themselves within code. Duplicate code is not itself reasoning, but a circumstance. Though it can be abstracted away, elimination cannot be the only reason.

This also highlights why more languages need features to help the coder who finds themselves in these situation. Circumstance is a context, and is driven by dependency. Again, dependency is the enemy. We need tools and code that fight it, not cause or encourage it.

To know when to go for an abstraction, and when just to live with the code as is for the time being, that's arguably the difference between a good and a great programmer. And even great ones get it wrong from time to time.

This fits in very well with the concept write code that is easy to delete: http://programmingisterrible.com/post/139222674273/write-cod...

And here's the link to that discussion earlier this year: https://news.ycombinator.com/item?id=11093733

Edit: fixed the typo where I said wife instead of write.

Kids these days. Back in my day, wife code was forever /s ;)

Ha ha yeah, I came back to look and saw that interesting typo. Whoops.

Abstraction isn't the enemy, it's programmer X not recognizing that the abstraction has now become a square peg into a round hole.

When requirements change, it's usually a ripe opportunity for refactoring.

The problem is that we like to take the path of least resistance. So we add a parameter to the function instead of splitting it into two different functions. Saving one hour now, that is much better spent on something else. But might waste thousands of man-hours in the future. From a business side though, you might not afford to the extra man-hour right now, but maybe if the product is successful you will afford the extra thousand man-hours in the future.

You can always abstract away duplicated code after - what you cannot do as easily is fix a bad abstraction. It leads to a world of pain.

This seems like more of a work around for limited languages which lack the features for constructing certain abstractions. Work arounds are often a good engineering practice, but let's recognize them for what they are.

If the code is duplicated then there's some abstraction, otherwise the code wouldn't be duplicated. If abstraction is wrong, just refactor the code, inline old functions, extract new abstraction.

Sort of a tangent, but I always liked this quote: "It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." - Alan J. Perlis

Agile principles allow for "wrong abstractions" as technical debt.

The goal is to keep the team moving forward and constantly refactoring.

You should only solve the same problem once. Avoid coupling and complexity. Try to make the code easy to understand.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact