Hacker News new | past | comments | ask | show | jobs | submit login
The Wrong Abstraction (sandimetz.com)
220 points by rumcajz on Feb 5, 2016 | hide | past | favorite | 119 comments

I can tell an amateur programmer from a professional by looking at their order of priorities when they grow a code base.

Amateur programmers tend to put code de-duplication at the top of their priority list and will burn the whole house down to that often-trivial end.

This writer is pointing out that there are other concerns that far, far trump duplicated code -- and she's right. However she's not elaborating enough on what it is a "wrong abstraction." We can be more precise.

The real offense when we factor duplicated code is the new dependency that is added to the system. And this is what amateurs don't understand. Every dependency you bring into your code architecture costs you and should be judiciously introduced. De-duplication of code ALONE is rarely a strong enough reason to add a dependency.

If you want to be a professional programmer one of the most important things to acquire is a distaste for dependencies. Every dependency you add should be carefully considered (and lamented as a reluctant necessity if you decide to introduce it). As a 20 year veteran in this industry having worked on myriad code bases, I will always prefer a code base with duplicated code and fewer dependencies than the other way around.

So back to the "wrong abstraction". When we compose systems, we are looking for fewest dependencies and stable dependencies. What I think the writer means by "the wrong abstraction" is a "volatile dependency".

I'm trying to be precise here because a common reaction to terms like "the wrong abstraction" is that wrong/right, its all subjective. The truth of the matter is that it's not subjective at all -- the higher-quality system is the one with optimally few dependencies and stable dependencies, these are measurable qualities.

Dependencies (coupling) is an important concern to address, but it's only 1 of 4 criteria that I consider and it's not the most important one. I try to optimize my code around reducing state, coupling, complexity and code, in that order. I'm willing to add increased coupling if it makes my code more stateless. I'm willing to make it more complex if it reduces coupling. And I'm willing to duplicate code if it makes the code less complex. Only if it doesn't increase state, coupling or complexity do I dedup code.

The reason I put stateless code as the highest priority is it's the easiest to reason about. Stateless logic functions the same whether run normally, in parallel or distributed. It's the easiest to test, since it requires very little setup code. And it's the easiest to scale up, since you just run another copy of it. Once you introduce state, your life gets significantly harder.

I think the reason that novice programmers optimize around code reduction is that it's the easiest of the 4 to spot. The other 3 are much more subtle and subjective and so will require greater experience to spot. But learning those priorities, in that order, has made me a significantly better developer.

> I'm willing to add increased coupling if it makes my code more stateless.

I like statelessness as a top priority. However I'm not sure how statelessness ever comes into tension w/ coupling. Aren't they mostly orthogonal concerns?

> I'm willing to make it more complex if it reduces coupling.

Complexity = f(Coupling), in my definition. So an increase in coupling results in an increase of complexity. Sounds like you have a different definition of complexity -- I'd love to hear it.

There's a few ways in which state vs coupling can play out. Often they're part of the architecture of a system rather than the low-level functions and types that a developer creates. As an example, should you keep an in-memory queue (state) of jobs coming into your system or maintain a separate queue component (coupling). By extracting the state from your component and isolating it in Rabbit or some other dedicated state management piece, you've made the job of managing that state easier and more explicit.

As for complexity, there are many different types. Coupling is a form of complexity, but it's not the only one. Cyclomatic complexity is another and one. Using regular expressions often increases the complexity of code. And one need only look at the spec for any reasonably popular hash function to see a completely different sort of complexity that's not the result of either coupling unique paths through code. The composite of all the different forms of complexity is how I'd define it since they all add to a developers cognitive load.

I think those four criteria beautifully capture many goals of software design. And for example, we could say that we reduce system state with a functional approach, make component coupling looser by indirection with object orientation, and decrease code complexity by structured programming.

Amusing arguments for statelessness from the ØMQ Guide by Pieter Hintjens [1]:

> "If there's one lesson we've learned from 30+ years of concurrent programming, it is: just don't share state. It's like two drunkards trying to share a beer. It doesn't matter if they're good buddies. Sooner or later, they're going to get into a fight. And the more drunkards you add to the table, the more they fight each other over the beer."


> "Code that wants to scale without limit does it like the Internet does, by sending messages and sharing nothing except a common contempt for broken programming metaphors."

[1] http://zguide.zeromq.org/page:all#toc45

Off the top of my head, there are two reasons people seem to de-duplicate code. One is because two or more things happen to share similar code. Another reason is because two or more things must share similar code.

It seems like you are speaking of the first reason. There is no dependency, and the programmer is creating one. IMHO you should have at least 3 instances before creating an abstraction to reduce your code.

The second reason is different though. By creating the abstraction you are not adding a dependency, you are making an implicit dependency explicit. There is a huge difference. In this instance any duplicate code is bad.

I have a third reason. Abstraction, or code de-duplication, if done right, ends up making code easier to reason about. The paper that made me rethink everything I knew is


and explains this wonderfully.

I think that principle I've heard before called "1, 2, 3, abstract"; as in, wait until you see it at least three times before considering extraction.

I'd also add - wait until the code is 'stable', i.e no longer under active architectural development, connected only to other stable parts (or with stable/authoritative interfaces) and having then existed in such a state for a continued period of varied usage. then refactor.

I'd say this is sound advice for core components of the system, but we may want also to consider the types of the dependencies. For example, I would not wait for three times if the dependency is some kind of external system (DB, UI, Network, MQ, OS, Library, Framework, or the like) which is volatile in one of the worst ways - not directly under your control, unlike your own source code.

Agree. Frequency matters. If it is used everywhere, no reason not to abstract.

The distaste is for complexity. Adding new functions to reduce duplication can add complexity. It can also reduce complexity. It can also decrease complexity for a while and then in the long run increase complexity. Recall the famous quote, "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." Sometimes dependencies are really cheap - when they are the right abstraction. I think the article put it fine.

Rails itself is an example of an abstraction that reduces complexity for a while and then adds complexity when you reach a certain size. So it was the right abstraction at first, and then the requirements change and it slips over to the wrong abstraction. Here is an insightful comment about why that is both inevitable and doesn't matter: https://news.ycombinator.com/item?id=11028885

Here's a very objective and powerful way to measure complexity: dependencies and volatility.

Otherwise we're all saying "complex" but not being clear and likely meaning different things.

For example, a lot of people believe that "not easy" = "complex" but as Rich Hickey articulates that's a counterproductive way to think of complexity. (See http://www.infoq.com/presentations/Simple-Made-Easy)

"dependencies and volatility" But what does this even mean? I'm okay with using Rich Hickey's definitions. But I don't recall that in Rich's talk.

If your system's design results in your stable components depend on the non-stable (volatile) components, your system is complex. This is because volatile components change often and these changes ripple to your stable components effectively rendering them volatile. Now the whole system becomes volatile, and the changes to it become very hard to reason about - hence complex.

Avoiding this problem has been captured, among others, by the Stable Dependencies Principle (http://c2.com/cgi/wiki?StableDependenciesPrinciple), which states that the dependencies should be in the direction of the stability. A related one is the Stable Abstractions Principle (http://c2.com/cgi/wiki?StableAbstractionsPrinciple), which states that components should be as abstract as they are stable.

I can tell an amateur programmer because they aren't getting paid. Flippancy aside though:

While I don't think it's what you're arguing for, I'd fear some would use an argument like this to defend a workflow and culture where new features are started by copy and pasting a mass of code, and then going in and tweaking little things here and there. Then when something has to change across the system, there are endless traps because things you need to change and maintain, that look identical, aren't quite. These workflows and cultures exist.

There's a balance somewhere and finding that is the hard part, right?

Thats what I was thinking reading the article.

I understand and sympathize with the idea you suggest here, but I also wonder about the fine tuning. We accept that both duplication and dependency are bad (and certainly that the minimal, maximally stable dependency set is best) but when making that tradeoff what parameters make a duplications better or worse than a dependency? Are there languages or toolchains which cause this tradeoff to fall in the opposite direction?

In some sense this is academic, but in a very real sense disdain for dependency is something I worry can prevent a project from going through an important high-energy transitory period where semi-formed dependencies exist to solve concrete tasks but have not yet annealed into a final, low-energy form.

In small scopes a professional knows how to skip by this risk and get straight to the better abstraction. In large scopes whole projects must (often) pass through high-energy areas. Professionalism thus demands that these high-energy zones be identified and reduced.

But being too eager to avoid dependency might inhibit growth.

> In some sense this is academic, ...

Whether all of this is academic or not, I don't know. But what I do know is that these ideas and their practical implications have an ENORMOUS impact on the practitioner's and business's productivity.

> in a very real sense disdain for dependency is something I worry can prevent a project from going through an important high-energy transitory period where semi-formed dependencies exist to solve concrete tasks but have not yet annealed into a final, low-energy form.

We must always riff and hack and creatively explore our domains in code -- this is another practice of the software professional, and notions of "architectural soundness" and "dependency reduction" should never paralyze us from creative play, sculpting and exploration. In these modes of development its best we "turn off" all the rules and let ourselves fly.

But for a code base that has to survive longer than 6 months and that will have more than one collaborator -- this is where it becomes essential to maintain architectural soundness in the shared branch. (My development branches, on the other hand, are in wild violation of all sorts of "rules" -- so there is a difference between what gets promoted to production code and all the exploratory stuff we should also be doing.)

I think the idea of exploratory work is very similar to what I mention going on in small scopes, but I think this evolution occurs at large scales, too. All the development branch isolation in the world can't and shouldn't stop this sort of broad scale evolution.

I think the right approach is not to avoid dependencies, but to manage them.

Let's say you are unsure of the correct UI framework just. React, Knockout or Angular? React native? Maybe you don't yet know which database best suits your usage and scaling needs. Should you avoid committing to those dependencies? For how long? Doesn't this slow you down?

A good way to approach this is to isolate the dependencies so that you don't have to commit to the actual implementations (React, Mongo, PostgreSQL, Angular, ZeroMQ, whatever it is you need) early. Of course you start the work with some set of frameworks and libraries, but so that no other part (the parts that does all the important stuff that it unique to your application) of the system knows of the implementations. This way, if the need arises, changing the implementation details will not be expensive.

Isolating the implementation behind an abstraction sometimes introduces boilerplate and duplication, but as the article mentions, the dependency is usually more costly.

I think writing an abstraction that allows you to decouple your app in a way that you could use either Angular or React is going to take far longer than just rewriting your app if you ever get to the point of needing to switch.

In case of a UI, the trick is not so much to write an abstraction for this, but to write the application logic such a way that it does not depend on the UI framework.

Your own UI code should be limited to displaying the data with the help of the framework. This minimal UI code should depend on the application code and that should be the only dependency between the two.

There are simple techniques to keep view logic and application logic decoupled, for example by introducing a separate data structures prepared and optimized for view components to consume. An added bonus of this is testability of both application and UI logic without opening a browser.

>when making that tradeoff what parameters make a duplications better or worse than a dependency?

This depends entirely upon the context. Overduplication and tight coupling are universally bad but the trade off between them is often a matter of opinion.

In practice I've found that once you've hit the point where you're trying to trade duplication + tight coupling off against one another the code is already in a decent enough state that there's other more important work to be done.

> I can tell an amateur programmer from a professional by looking at their order of priorities when they grow a code base.

> Amateur programmers tend to put code de-duplication at the top of their priority list and will burn the whole house down to that often-trivial end.

To be fair, lots of how to program books spend a lot of time on teaching you how to abstract, and sing abstractions praises to the heavens, as it were. It takes some experience to learn that real world stuff is not like the toy programs in books.

To be fair, working only from book knowledge and no experience is precisely what makes someone an amateur.

Novice, not amateur.

Amateurs aren't paid, professionals are.

Novices are inexperienced, journeyman and masters are more skilled and experienced.

A novice can have a ton of knowledge (from books), but be too inexperienced to apply it.

A novice can be a professional, this is what internships and entry-level jobs are supposed to be for. Paired with mentorship and structured work assignments (structured in the sense of increasing complexity, scope, and responsibility) they're brought up to journeyman and, later, master level.

They can also be amateurs. Given forums, books, manuals, mentors (real-life or online), they can be brought up to journeyman and master level as well.

Besides I think everybody who e.g. makes some side-projects, or is active in open source is in fact amateur in the sense of "a person who does something (such as a sport or hobby) for pleasure and not as a job" :)

In general, if a professional also donated their time to something it doesn't make them an amateur. See lawyers doing pro bono work, or carpenters building a habitat house as examples.

I was following usage as it was in the thread, though I agree that amateur is being thrown around this thread where novice is the word implied.

Yeah, sorry. I realized after I posted that I should've put this upstream.

Professional means you teach (notice the word root in "profess" as in "professor"). It really means you know enough that you can teach others how to do it right, not about get paid for it per se.

You have the etymology of "professor" and "professional" completely wrong. You can't just notice the same root in two words and then completely reinvent the meaning of one to make it have something to do with the meaning of the other. The evolution of language is complex. Here: http://lmgtfy.com/?q=etymology+profession

Do professional football players teach playing football? Some, probably, but not all. We don't call those who don't teach amateurs. They're getting paid. The amateurs are the high school and (arguably) college players, along with rec club and pick-up game players.

You're noticing a common root, but not the meaning of the word in the modern day.

Decimate means to destroy 1 in 10 of something (like an opposing army). But today we use the word to mean destruction of a large percentage.

I suppose an argument can be made that modern use of amateur is more akin to what used to be novice. However, I'd have a hard time accepting that except when it's used as a slur. We talk about amateurs in many fields, but don't intend to dismiss them as unskilled or inexperienced, we're classifying them as non-professionals. In a forum like this, filled with amateur programmers, it seems, to me, that it's wrong to misuse the term in this manner when a large portion of the readers here are amateur programmers but of moderate to high skill level.

Of course. You can't learn that sort of judgement (when to use vs when not to use) from a book.

But our profession's training material, at least in my experience reading, drills it in your head to use all these abstracting devices.

It's certainly true that you can get some "book knowledge" that tells you that you can over abstract. I mean, this blog post is one example. But I only hear this sort of stuff from things like blog post from experienced devs, it seems to me. (Or maybe I just read the wrong kind of books?)

Sounds like the books actually teach poor practise, divorced from context. The harms of abstraction is nothing that can't be printed - this isn't qualia

There's even a slogan for that: Beware the Share


I broadly agree with this, but draw a slightly different conclusion: that the imperative when growing a large code base is proper tooling/planning for dependency management. If good dependency management is cheap, programmers will use it, and even if some dependencies are volatile, side-effects can be bounded by the dependency graph.

We have ~good (and certainly cheap) dependency management in Ruby (gems) and JavaScript (npm), and it leads projects straight to dependency fractal. No, it can't be bounded by making it easier.

> The real offense when we factor duplicated code is the new dependency that is added to the system.

A slight tangent on that note, but I think many of the problems with current web development result from the same root cause: adding yet another dependency to solve an almost trivial problem. Sometimes going to the extreme of for the sake of saving few keystrokes. Need one function to find an item in a collection? Reference Lodash. And then drop references all over the place.

Yes some dependencies are useful, but they all need to be handled with care. Wrap that search operation to an internal utility function and inject Lodash as its implementation if you don't want to reinvent the wheel. This is what DI is for (and nothing more).

Maybe the feedback of exploding the compile times as a result of a complex dependency graph would make people more sensitive to the issue. But then again, that did not prevent it from happening either. Oh well, it's not my codebase (yet).

I think javascript is intrinsically going to be one of the worst examples of that. Stemming back to the days when javascript programmers learned that using a library like (then jQuery) lodash is the _only_ way to correctly get your code to work in all cases. That culture then became ingrained, and in my opinion helped lead to the state most JS packages are in these days.

In my experience, this is one of the greatest things about the new prevalence of tools like babel; it pushes that abstraction layer down below in-code dependancies. There's still a dependency to manage, but it's not a library import or similar.

Babel doesn't solve the problem of having a minimal standard library. Unlike say Python or Ruby, where batteries are included, JS is roll it yourself, add a dependency, or go without in almost every case. Even for seemingly trivial things like "Array#contains"

It may not help that case in particular, but there's a _lot_ that ES6/ES2015 add to the table: http://babeljs.io/docs/learn-es2015/

A lot of things in there would need a library for cross-browser compatibility still. That's my point, but I take yours as well. It's still a very BYOL (bring your own library) kind of a language!

Additionally, what I also see that tends to separate out experienced vs inexperienced programmers is attempting to deduplicate things because they appear similar. By this, I mean that these two things just happen to do similar things or have similar fields, but the operation or structure that they are attempting to represent is fundamentally different.

When they eventually diverge, because they were different operations or structures, so they won't evolve the same... This is where you end up with functions that take 10 different option parameters, all of which activate a different code path. Or structures with completely different fields set depending on where it came from. And guess what? Now you're back to basically one code path or structure per use case, and an extra dependency that didn't exist before and is a nightmare to maintain without breaking one of those existing use cases.

> This is where you end up with functions that take 10 different option parameters, all of which activate a different code path.

Or you could abstract the commonalities out into a higher-order function.

What is the higher order function for two disparate code paths, controlled by a parameter switch? `if`?

There are also different kinds of dependencies. Some dependencies are so common, they might as well be a feature of the language itself (Google's OAuth Java client libraries or Apache HTTPClient, for example). It would be a waste of time to write a custom OAuth or HTTP library - much better to just add the dependency and go on your way.

IMO the danger is greater with other languages that don't have the library support that Java does. That Ruby Gem you use today may get abandoned in 6 months and become a piece of technical debt you have to deal with later down the line. It's always a balancing act.

There's also nothing inherently wrong with duplicated code. It can make a code base harder to maintain, but so can over-abstraction. But I agree with you and the author; most projects are better off when developers err on the side of verbosity.

Those dependencies you mentioned are all highly stable. That's the objective & precise way to say that they are "the right abstractions" [1].

[1] This should further articulate why "right/wrong abstraction" is not the useful nomenclature. I bet you I could find several programmers who could write a "better"/"more good" OAuth library, but that's not what is of priority here. What is most important is this objective quality: is the dependency (it's interface) stable? If it is, then my code architecture remains sturdy.

Of course there are different kinds of dependencies. I believe it was not the point of wellpast that dependencies are evil and bad, I think it was that dependencies have cost, and most the time programmers consider them as (almost) free.

Indeed, dependencies have nearly zero immediate cost, but in the long run they are expensive and need to be weighed against the gain from their use.

If this is mostly something amateurs do, then who are those people that bring us all these wonderful abstractions like AbstractClickableLittleButtonWithTinyImageOnTheLeftSideFactoryControllerStrategyImplementation?

Expert amateurs.

Sandi, the writer, is a she ;)


I think the first priority should always be simplicity (KISS). Simplicity is understood by perfectionist developers (Simplicity is the ultimate sophistication) and also by beginners (if code become complex, there will be bugs). De-duplication (DRY) is important but should always come after simplicity.

But all we have to do is teach your half-baked "fewest, most stable dependencies" theory to the amateurs and then they'll be professionals, performing like 20 year veterans. Eye roll.

"Programmer B feels honor-bound to retain the existing abstraction"

I think it's more often likely that the next person comes along and lazily forces the most brittle, minimal change possible to make their new requirement work without thinking about the larger context. Even if that means putting a complicated conditional into the existing function, while leaving the existing function name intact.

This isn't about honor. It's simply bad programmer behavior. It might be due to lack of skill or experience, or maybe laziness, or lack of discipline, or some other negative attribute.

Or maybe programmer B is trying to maintain a monolith of pasta for which there is no documentation, the original programmer has long since left the company, no one is entirely sure what the business rules of the program are actually supposed to be, there are no tests, and you don't have time to actually read and understand all 100k lines of code.

Management just needs this one little feature tweaked, so you dive in to where you think that needs to go, make the new thing happen while absolutely changing as little as possible because god help us if anything breaks.

Now imagine that you're not the second or third person to be responsible for this, but maybe the tenth or so, and it's been going on for decades. And the last person to work on it was really new to this language and didn't understand its idioms very well, or vice Versace, and you don't understand the language very well.

I greatly enjoy these conversations about code quality and maintainability, but I've encountered it more than once where business constraints don't allow you to even start to think about these issues. You have no choice but to respect the abstraction you inherit because your job is to get in, get out, and pray you didn't break some edge case that no one would ever think of.

Or maybe I'm just an amateur/novice.

You can always chose to change the surrounding code whenever you are adding something to an existing mess, to make a bit less mess. I have heard excuses about not being given time to do it from the management many times before, but the fact is that you are the one who deals with the code, you need to make the decision. If you are not comfortable doing some change, you are the one who needs to be writing the tests or making sure you understand the code. It's not some kind of separate task. It's essential for doing your job right, so you should not expect extra time allocated for it.

Just adding your small change without touching the rest of the application is usually the easiest way, though, so most people just do that instead.

The best rule I have heard is to always leave any code you touch in a better state than you found it.

I don't think this is a realistic attitude. I've gotten my fingers burned more than once by trying to take this approach.

The problem with web applications is that they are not self-contained. You don't necessarily know who or what is calling what or how for some legacy applications.

I can agree with the concept of improving code as you see it, but changing the abstraction--the topic of conversation here--cannot be done willy-nilly for any non-trivial app. You have to dig up all the people are, or were, or will be counting on that abstraction, and that's a challenging thing to do sometimes.

And then you have to provide a business reason for you to allocate the time necessary to go read those 100k lines of code and understand them or write tests for them. And you tell your boss this is a mess and needs serious work, and he says, "Dude, this isn't rocket science, and we're going to burn this whole thing down in a few years anyway, and all I need you to do is add this one button in this edge case and don't break our standalone desktop app that isn't going to get updated to understand this functionality."

You're idealism is admirable, and I don't disagree with it. But the reality of the world is that if you work for a company whose primary product is not software (hell, even sometimes when you do!) the priorities are about the business functionality, not about the state of the code or the right abstraction.

I just think it's a little disingenuous for people to have these conversations in a vacuum and attribute anything from malice to incompetence to a person who takes a different approach.

Maybe I am being idealistic, but I think the sentiment in the discussion concerns not only the ideal coding techniques themselves, but also the bigger picture of business needs. That is, if we took the approach of keeping code in somewhat decent shape, we would end up, in the long run, producing the whole system more efficiently, and reducing the TCO of the system for the user and the business.

I certainly hope that the discussion is not only about making the lives of the programmers easier.

>It might be due to lack of skill or experience, or maybe laziness, or lack of discipline, or some other negative attribute.

I think you might be committing the fundamental attribution error. )https://en.wikipedia.org/wiki/Fundamental_attribution_error

>the tendency for people to place an undue emphasis on internal characteristics (personality) to explain someone else's behavior in a given situation rather than considering the situation's external factors.

Rather than considering external issues like time crunch, management pressure, or too-high a workload, or anything else it's much easier to just assume everyone else is a lazy and shitty programmer.

> This isn't about honor. It's simply bad programmer behavior. It might be due to lack of skill or experience, or maybe laziness, or lack of discipline, or some other negative attribute.

Time pressure may also be an issue.

Or, what if the system is only for prototyping, and later turns out to be needed for production (a management decision)?

Then management should be made aware that they're incurring technical debt by taking that decision. It can be worth compromising code quality for short-term goals on occasion, but you should do it with your eyes open.

I agree. I think Sandi was just being generous and forgiving, for the sake of not getting sidetracked. The fact of the matter though is that those "small" refactors do take place quite commonly. Out of laziness or else, and start a slow spiral...

The problem was programmers between B and X didn't refactor to update the abstraction with their new understanding of the domain, and to better meet likely business requirements.

When programmer A introduces an abstraction, it's not as if they are saying "this abstraction is the one true way to represent the domain." The abstraction is a tool. It helps cover some duplication, and ideally express intent. But the tool needs to be updated when it's not useful.

You: "We can implement this new feature but doing it right would mean we have to design a new architecture for the entire system, and that would take several months. Or, we could just hack it, change two lines of code and be done with it, but it might bite us in the future."

Management: "Change those two lines of code."

If you found yourself agreeing with article, yet have ever mocked mathematically-derived abstraction patterns like monads, it's time to reconsider.

Math is, at its core, the study of abstraction. Things that have been found to be good abstractions in math probably are good abstractions in programming.

Learn from history. Mathematically-derived abstractions probably are used because they are the right abstractions.

Agreed, but I think the reason for the mocking is that they're notoriously hard to understand being several layers removed from the programming context. I understand monads in the context of programming, but I wouldn't pretend to understand how they could be applied in other fields. Haven't got the slightest idea.

Perhaps importing abstractions straight from mathematics isn't the only answer. We might do better if we simply added some type and law restrictions to our abstractions as well as clearly stated assumptions. That way everyone would know when they don't apply anymore.

Programming languages are some of the shittiest languages ever created. Mathematics is probably the best language ever created. The more programming can be like math, the better.

I don't think math is the study of abstraction. It seems to me it's proving truths about formal systems. Abstraction has the same purpose in math as it does in programming, a tool to more effectively communicate ideas.

Not sure I agree. Usually, mathematics is developed backwards - you have a concrete question that you want to answer, and you reason that it can be answered so long as such and such as true. The formal system is developed post-hoc to give yourself a language to reason in, but you're really trying to take the result you already "knew" to be true, and find the least restrictive system description to which it still applies. Then you look for parallels and generalizations. The abstractions are in a large sense the most important and difficult thing to create, because you're creating a schematic and saying "if you can rephrase your problem into these terms, then I already proved this intuitively correct thing for you, so you don't have to worry about edge cases". Algebra, geometry, calculus, etc. all follow this model. A lot of the conclusions in, e.g. analysis are obvious once you impose continuity.

The object of mathematics is to produce abstractions that make proofs possible or trivial. Sets, fields, groups, categories, functions, integers, reals, complex numbers, quaternions - these aren't notational conveniences that are introduced to make it easier to talk about. They're important because if you have something with the properties of a group, you know some powerful truths about it for free. If you change anything about the definition of group, the set of truths you know changes. People have been grinding on what the definition of a set should be for a century, trying to build the best possible abstraction, and they have all of the same problems coders have. You assume to much, it's not very general. Don't assume enough, there's nothing interesting to say that's unilaterally true.

So I would argue that characterizing math as the study of abstraction is largely fair. You "see" a result is probably true some of the time, and then you try to find out just how general you can make that statement.

Thanks for the response.

> People have been grinding on what the definition of a set should be for a century, trying to build the best possible abstraction, and they have all of the same problems coders have. You assume to much, it's not very general. Don't assume enough, there's nothing interesting to say that's unilaterally true.

This is a great point.

In addition to @sebatos response, I'd add that a formal system is nothing than an abstraction of some concrete problem.

I haven't really thought about it deeply, but I'm pretty sure this isn't true. It seems like mathematicians are often coming up with new proofs and formal systems and only later noticing that they're applicable to some real-world problem.

For many decisions in programming, there are tradeoffs, and relying on general rules of thumb can be much worse than judging the specific case. And heuristics are often interpreted differently by different people.

Such is this discussion, imo. It's very possible to over-abstract. If something isn't extended, maybe it shouldn't be designed to be extendable, and if it isn't configured, maybe it shouldn't be configurable.

On the other hand, abstractions are the heart of programming, and finding a good one is what can make you much more productive and maybe even have more fun. The feeling when I've abstracted something and got more out than I put in is maybe the best feeling there is in this craft.

The difficulty is in finding the balance. I also dislike the rule of 3 as a hard general rule. There are many times when it makes sense to abstract something out of 2 uses, if there is a lot of code and clear separation and commonality involved, or you anticipate more uses later. Maybe even abstracting sub-problems out of one case makes sense, if separating the problem cleanly into two parts makes it easier to reason about. There are many times when it's a bad idea too; it depends on the specifics and it's a judgement call based on experience.

(And, I would even say, there is room for personal taste in programming about this; where the sweet spot lies may vary from person to person.)

I wrote about the problem here: http://250bpm.com/blog:36

The fact that abstractions can often add complexity rather than remove it should be taught in schools.

The problem with distinguishing between the two cases is that it often requires business domain knowledge. And programmers, sadly, rarely care about the domain logic.

> And programmers, sadly, rarely care about the domain logic.

Oh, but we do. It's the one thing we want to know but we pretty much never get from our customers. No surprises there - most people are not equipped to express their "domain logic" in a way that is useful for automating it; quite often they don't even understand it. Humans can hand-wave their way through anything, relying on intuition, patterns and methodologies. They don't need to understand what they're doing deep enough to express it as code.

And so what usually happens is that your client/manager gives you a nonsense spec saying "this thing should do that when clicked, and that thing should do something else". And, as a programmer, you then have to reverse-engineer the domain abstractions from it; abstractions the people giving you the spec probably don't understand themselves.

That's why understanding the domain logic requires cooperation between programmers and those who understand the domain.

It might not be that they don't understand the domain "deeply enough." Maybe they understand it in the way that is relevant for their work, which typically doesn't require formalization.

Making sure this collaboration actually happens is one of the big topics of Domain-Driven Design. We can't just expect to be given a correct model; teasing that out is part of our job. If the spec is nonsense, we should say that and try to fix the process.

What I meant is that looking at an average programmer's resume feels relly weird: A web shop. High-performance trading. Game development. Scientific computing. Telecom. And so on. No way anyone can be a domain expert it that many areas.

Also, you often hear folks saying things like "I am a Java programmer." Stuff like "I am a telco programmer" is much more rare.

Being able to lead your customers to explain clearly what they really want (and, in some cases, to help them understand why they can't have/afford it) is one of the skills of a productive programmer. It involves a lot more than just listening, though that is a good starting point.

If only Programmer B had avoided that initial choice to preserve the broken abstraction rather than abandoning it then and there.

He must have been practicing Deadline Driven Development.

> He must have been practicing Deadline Driven Development.

or The Team Has A Hammer Development

TL;DR: If you have made a mistake, return back to the previous known working state, proceed from there.

It does not matter much which was the specific mistake, really: a bad choice of an abstraction or something else.

Good abstractions help immensely; programming is all about them. Let's not forget it, too.

Richard P. Gabriel wrote about this in connection with the idea of design patterns, e.g. in Patterns of Software. Here's a nice quote from the beginning of a chapter:

> "The room fills with cold, conditioned air; outside the heat hazes, filtered through greened glass windows: a new building hardly first populated. The speaker is wild-eyed, explaining new ideas like a Bible thumper. His hair is a flat-top; his mouth frowns in near grimace. He strides to my seat, looks down and says in a Texas drawl, 'and the key is simply this: Abstractions. New and better abstractions. With them we can solve all our programming problems.'"

I'm reminded off ESR's "curse of the gifted" email, schooling Linus Torvalds on the lack of modularization and code-sharing in Linux's driver code. :)


Oh my. I recall reading this years ago. I had totally forgotten that Eric basically tells Linus to grow up and learn to use version control. Now we're all using Linus' version control.

I'm reminded of the "I'm with those guys" comic.

I hate needless abstractions for the sake of removing duplication, DRY, etc. It's mad. Most of these languages don't have proper macros and so you end up going through these logical contortions to achieve a particular pattern of behavior... and it's just noise. Anyone who comes along can look at that code for hours and never know whether it performs any work (or even what that work is).

A little duplication is fine until you figure out what the real problem is. The problem is usually in your data. Maybe it's not structured properly or you need to simplify the steps to transform it at an earlier stage.

A good specification will go a long way to reducing the desire to introduce hapless "abstractions." An abstraction, in the mathematical sense, will hold over the domain and introducing a new one merely allows you to manipulate objects on the lower level using new algebras, predicates, etc.

Code abstractions are often the leakiest abstractions. Especially the kind the author is talking about. Avoid them. Even if you have a little bit of duplication. Only start worrying about that duplication when it starts spanning compilation units/modules/whatever and is actually causing problems. Then look at the data and figure out how to structure it so you don't need that code all over the place.

I think lack of things like macros is the bigger problem. There are a limited ways to abstract, or represent duplications, in a lot of languages. Sometime a comment in an issue tracker might help.

It's funny how often I've made this argument with even fairly experienced programmers and they seem to have a visceral reaction to the code being "less dry" than it could possibly be.

Similarly, once a codebase uses several very wrong abstractions, it becomes significantly more confusing to work on, exponentially increasing cost.

The temptation to use a mature library as a dependency is very strong since time is initially saved. When that library introduces the wrong abstraction, the consequences can be severe.

I typically argue for (at least) creating the correct abstraction and having it wrap the mature library to constrain or properly name its behavior, and to make it less strongly coupled to the rest of the system.

It doesn't just take experience to realize these sorts of things, it takes a willingness to question one's own code and imagine how it might have been done better, or how it might appear to someone who didn't write it.

It's all a bit no-true-Scotsman though.

abstraction != dry bad abstraction != anti-dry

Abstraction is primarily about separation of concerns, not about avoiding repetition. Drying out code that's repeated all over isn't the same as creating a formal abstraction for some element of the overall logic.

Which is why

>once a codebase uses several very wrong abstractions, it becomes significantly more confusing to work on, exponentially increasing cost.

And drying out code makes it easier to maintain, but it doesn't guarantee that the architecture isn't a mess.

The problem is perhaps that CS teaches algos, and sometimes it teaches design patterns. But there's almost no useful theory of abstraction design.

Design patterns are more or less as good as it gets, and all they do is give give you a cookbook of stock formulas to try.

Beyond that, there's no useful way to reason about abstractions, test them for domain fit, or rate them for elegance and efficiency.

Precisely. Very well put.

If anything, some developers use design patterns as a grab bag when solving a problem... a better approach is to model the solution and then be ready to "back in" to a design pattern upon noticing strong similarity or observing that the design pattern is a bit more abstract way of doing the same thing.

Because of the tendency to pick a pattern first and design for the domain later, many instances of design pattern use in the wild are subtly (or not so subtly) incorrect.

> Beyond that, there's no useful way to reason about abstractions, test them for domain fit, or rate them for elegance and efficiency.

Very true indeed. Looking at a design through the lens of coupling and testability and modularity is a good start and can reveal many problems, but I think the real gotcha has to do with naming: Once we name a concept/abstraction we are likely to reason within that abstraction, and we rarely consider whether we are stretching it too far, or if it is even that thing anymore.

A graphic I found very enlightening in the past year is here: http://blog.thecodewhisperer.com/images/age_old_battle/virtu...

from this blog post: http://blog.thecodewhisperer.com/2013/12/07/putting-an-age-o...

There's a lot of other good stuff there too, when you start digging through the archives.

Generally, I find that I've got to build a prototype first, which is quick and dirty and ugly, but works. Then iterate as the structure emerges. Some things stay in flux, and it's okay to leave them messier, but eventually the code and functionality settles out and commonality arises. Quite often, I don't necessarily know what I'm doing going in; I've got a general goal, but until I experiment a bit, the best way to get there is unclear.

One of the joys and pitfalls of a multi-paradigm language like C# is that there are so many different ways to skin a cat. You can pick and choose procedural, object-oriented, functional, or some bastard mishmash of all of the above and more.

I would argue that 'Refactoring' gives you a bunch of tools for discovering your good abstractions. Whereas GoF puts a bunch of ideas in your head without giving you any preparation for using them responsibly.

This may be obvious, but when working on Java I've found that this applies not just to classes but also to methods. When refactoring a poorly-factored class, often the most effective way forward is to inline all the private methods (introducing a lot of duplication) and then factor out the duplication.

It is even more effective when you are faced with a bunch of classes, most of which do nothing but delegate their responsibilities to others. I have a case in front of me right now.

Data-flow analysis was a method that was swept away (somewhat unfairly) on the grounds of being too procedural and not OO. One of its key tenets was that once you had discovered the essential flows and interactions, you should discard the problem-space partitioning that helped you find the graph, and use the graph to partition the solution space (which might well result in the same partitioning, but not always.)

This is the old truth - premature abstraction is a root of all evil (I know original quote was "optimization", but isn't an abstraction just some "architectural optimization"? ;)

On the other hand: I often see copy pasted code which have these same characteristics author wrote about: "Another additional parameter. Another new conditional. Loop until code becomes incomprehensible."

If somebody writes too many conditionals, parameters and creates incomprehensible loops, then he will do it always, no matter whether in abstracted code or in copy pasted code...

The problem is that most of programmers are not skilled enough to write good code.

So sometimes there is a need for refactor wrong abstractions in legacy code, sometimes there is need for cleaning tons of copy-pasted code and make good abstractions... Either way - maintaining is hard.

There is another koan I would recommend to the interested:

  Quantity is easier to control than complexity.
I propose that “the wrong abstraction” is “complexity without benefits”. In light of the sentence above, duplication is clearly the lesser evil.

While I have no objection to the notion that duplication can be the right choice in some circumstances, and I completely agree with the author in thinking that pressing on rather than backtracking and taking a different path is a source of much complexity, his pseudo-example misses a third alternative: instead of (re)introducing duplication wherever the current abstraction is used, derive a generic abstraction covering the common aspects of the previous and new requirements, and use that in implementing two new abstractions, one to replace the old, and the other for the new cases.

A 3rd option would be to simply apply good programming standards like a function doing one thing and only one thing.

At some point, you're gonna have to introduce some logic into your program, and that logic is gonna have to be in a function.

Meh, it's an easy situation to get out of though. Especially with a typed language and decent IDE: right-click, "inline function"; check your git status to see what changed, go to those files and remove unused branches (which a decent IDE should hightlight and provide 2-keystroke fixes for). 2 minutes, done.

Grotesquely replicated code is a much more difficult situation to resolve.

I'm guessing the OP primarily uses untyped languages. I'd offer that that is the core of the problem, not the abstractions.

Isn't a bit of this governed by which programming paradigm your language uses? I get the feeling the answer and approach are much different when comparing Forth to Java.

Yes. In theory there is a programming paradigm which fixes this, called "aspect oriented programming." I haven't seen a really accessible aspect-oriented system however.

Decorators in Python are pretty much the same idea. I see them used quite a lot in Flask and Django.

patents will do that

My point was that the way your language builds abstractions might increase or decrease the problem.

I read somewhere, but have forgotten where, that having doing things same way twice is ok, only when you're second time repeating yourself, think about using an abstraction.

Or to say it more succintly, duplication is not a problem, triplication is ;)

I have found that following this simple rule I allow the development to go forward, and can go back to it when I have better knowledge of the system at large. At later date I might have a better luck in choosing the correct abstraction.

Or, in terms of OOP, wrong abstraction is when you have subclasses with pretty much everything redefined in each.

I don't think this is very common though. You will more likely see a lot of duplication in poor legacy code, and not because the coders were so clever, but quite the opposite: it's when they barely even understand how to extract abstractions.

Or baroque structures of abstractions where you have a real problem finding where stuff actually gets done - sometimes combined with a need to base all application classes on a common shared abstract base class even though they really don't share anything.

One thing I've put a lot of thought into recently is what parts of a program should typically be abstracted, because when you come across them they might be your best bets for big wins.

Top of the list: constants should probably all go in special structures with the goal of guaranteeing consistency and making them easy to understand at a glance. Everyone might know that charge code "X" means a check payment, but what happens when a new developer looks at that code with "X"s everywhere? It's more verbose to use constants.chargeCodes.CHECK_PAYMENT but nobody will misunderstand what you mean, and your IDE will be able to verify that your codes are valid. That's worth an awful lot of extra characters.

Bonus win: when your legacy backend finally gets upgraded, you have the option to change to a new code for check payments, easy as pie.

Firstly, there is no such things as perfect abstractions, hence the The Law of Leaky Abstractions. Secondly, without abstraction we would be writing in machine code. So unless the authors enlightens us on what makes an abstraction wrong the whole discussion is a waste of time.

My own observation is that programmers have a marked tendency to assume that because two methods have similar or identical behaviour, that this implies they should share an implementation. In large code bases, its common to find methods that have identical implementations 'by chance'. By which I mean that the commonality is a side effect of requirements that could readily change.

In my opinion, the best defence against this is good documentation: if a two methods have clearly documented behaviours, then even if their implementations have been fused, a subsequent programmer will have more context (and more confidence) reduplicating the code in response to further changes.

On the other hand, moving to the right abstraction is one of the most powerful ways to improve a code base. Often the right abstraction cannot easily be seen from existing code - they come from the domain.

Someone once told me: "Just looking at Java code, you'd never find monads."

In my team we usually work with the principle of 3

"if it's non obvious wait for 3 cases before abstracting it"

The same applies to dependencies. I've grown very reluctant towards introducing a dependency for the sake of deduplication. I'd rather have isolated modules than a de-duplicated Big Ball of Mud.

It's taken me perhaps longer that t should have to get here, but boy oh boy do I think this is spot on.

False dichotomy. You can avoid code duplications while using the "right" abstractions.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact