Amateur programmers tend to put code de-duplication at the top of their priority list and will burn the whole house down to that often-trivial end.
This writer is pointing out that there are other concerns that far, far trump duplicated code -- and she's right. However she's not elaborating enough on what it is a "wrong abstraction." We can be more precise.
The real offense when we factor duplicated code is the new dependency that is added to the system. And this is what amateurs don't understand. Every dependency you bring into your code architecture costs you and should be judiciously introduced. De-duplication of code ALONE is rarely a strong enough reason to add a dependency.
If you want to be a professional programmer one of the most important things to acquire is a distaste for dependencies. Every dependency you add should be carefully considered (and lamented as a reluctant necessity if you decide to introduce it). As a 20 year veteran in this industry having worked on myriad code bases, I will always prefer a code base with duplicated code and fewer dependencies than the other way around.
So back to the "wrong abstraction". When we compose systems, we are looking for fewest dependencies and stable dependencies. What I think the writer means by "the wrong abstraction" is a "volatile dependency".
I'm trying to be precise here because a common reaction to terms like "the wrong abstraction" is that wrong/right, its all subjective. The truth of the matter is that it's not subjective at all -- the higher-quality system is the one with optimally few dependencies and stable dependencies, these are measurable qualities.
The reason I put stateless code as the highest priority is it's the easiest to reason about. Stateless logic functions the same whether run normally, in parallel or distributed. It's the easiest to test, since it requires very little setup code. And it's the easiest to scale up, since you just run another copy of it. Once you introduce state, your life gets significantly harder.
I think the reason that novice programmers optimize around code reduction is that it's the easiest of the 4 to spot. The other 3 are much more subtle and subjective and so will require greater experience to spot. But learning those priorities, in that order, has made me a significantly better developer.
I like statelessness as a top priority. However I'm not sure how statelessness ever comes into tension w/ coupling. Aren't they mostly orthogonal concerns?
> I'm willing to make it more complex if it reduces coupling.
Complexity = f(Coupling), in my definition. So an increase in coupling results in an increase of complexity. Sounds like you have a different definition of complexity -- I'd love to hear it.
As for complexity, there are many different types. Coupling is a form of complexity, but it's not the only one. Cyclomatic complexity is another and one. Using regular expressions often increases the complexity of code. And one need only look at the spec for any reasonably popular hash function to see a completely different sort of complexity that's not the result of either coupling unique paths through code. The composite of all the different forms of complexity is how I'd define it since they all add to a developers cognitive load.
> "If there's one lesson we've learned from 30+ years of concurrent programming, it is: just don't share state. It's like two drunkards trying to share a beer. It doesn't matter if they're good buddies. Sooner or later, they're going to get into a fight. And the more drunkards you add to the table, the more they fight each other over the beer."
> "Code that wants to scale without limit does it like the Internet does, by sending messages and sharing nothing except a common contempt for broken programming metaphors."
It seems like you are speaking of the first reason. There is no dependency, and the programmer is creating one. IMHO you should have at least 3 instances before creating an abstraction to reduce your code.
The second reason is different though. By creating the abstraction you are not adding a dependency, you are making an implicit dependency explicit. There is a huge difference. In this instance any duplicate code is bad.
and explains this wonderfully.
I'd also add - wait until the code is 'stable', i.e no longer under active architectural development, connected only to other stable parts (or with stable/authoritative interfaces) and having then existed in such a state for a continued period of varied usage. then refactor.
Rails itself is an example of an abstraction that reduces complexity for a while and then adds complexity when you reach a certain size. So it was the right abstraction at first, and then the requirements change and it slips over to the wrong abstraction. Here is an insightful comment about why that is both inevitable and doesn't matter: https://news.ycombinator.com/item?id=11028885
Otherwise we're all saying "complex" but not being clear and likely meaning different things.
For example, a lot of people believe that "not easy" = "complex" but as Rich Hickey articulates that's a counterproductive way to think of complexity. (See http://www.infoq.com/presentations/Simple-Made-Easy)
Avoiding this problem has been captured, among others, by the Stable Dependencies Principle (http://c2.com/cgi/wiki?StableDependenciesPrinciple), which states that the dependencies should be in the direction of the stability. A related one is the Stable Abstractions Principle (http://c2.com/cgi/wiki?StableAbstractionsPrinciple), which states that components should be as abstract as they are stable.
While I don't think it's what you're arguing for, I'd fear some would use an argument like this to defend a workflow and culture where new features are started by copy and pasting a mass of code, and then going in and tweaking little things here and there. Then when something has to change across the system, there are endless traps because things you need to change and maintain, that look identical, aren't quite. These workflows and cultures exist.
There's a balance somewhere and finding that is the hard part, right?
In some sense this is academic, but in a very real sense disdain for dependency is something I worry can prevent a project from going through an important high-energy transitory period where semi-formed dependencies exist to solve concrete tasks but have not yet annealed into a final, low-energy form.
In small scopes a professional knows how to skip by this risk and get straight to the better abstraction. In large scopes whole projects must (often) pass through high-energy areas. Professionalism thus demands that these high-energy zones be identified and reduced.
But being too eager to avoid dependency might inhibit growth.
Whether all of this is academic or not, I don't know. But what I do know is that these ideas and their practical implications have an ENORMOUS impact on the practitioner's and business's productivity.
> in a very real sense disdain for dependency is something I worry can prevent a project from going through an important high-energy transitory period where semi-formed dependencies exist to solve concrete tasks but have not yet annealed into a final, low-energy form.
We must always riff and hack and creatively explore our domains in code -- this is another practice of the software professional, and notions of "architectural soundness" and "dependency reduction" should never paralyze us from creative play, sculpting and exploration. In these modes of development its best we "turn off" all the rules and let ourselves fly.
But for a code base that has to survive longer than 6 months and that will have more than one collaborator -- this is where it becomes essential to maintain architectural soundness in the shared branch. (My development branches, on the other hand, are in wild violation of all sorts of "rules" -- so there is a difference between what gets promoted to production code and all the exploratory stuff we should also be doing.)
Let's say you are unsure of the correct UI framework just. React, Knockout or Angular? React native? Maybe you don't yet know which database best suits your usage and scaling needs. Should you avoid committing to those dependencies? For how long? Doesn't this slow you down?
A good way to approach this is to isolate the dependencies so that you don't have to commit to the actual implementations (React, Mongo, PostgreSQL, Angular, ZeroMQ, whatever it is you need) early. Of course you start the work with some set of frameworks and libraries, but so that no other part (the parts that does all the important stuff that it unique to your application) of the system knows of the implementations. This way, if the need arises, changing the implementation details will not be expensive.
Isolating the implementation behind an abstraction sometimes introduces boilerplate and duplication, but as the article mentions, the dependency is usually more costly.
Your own UI code should be limited to displaying the data with the help of the framework. This minimal UI code should depend on the application code and that should be the only dependency between the two.
There are simple techniques to keep view logic and application logic decoupled, for example by introducing a separate data structures prepared and optimized for view components to consume. An added bonus of this is testability of both application and UI logic without opening a browser.
This depends entirely upon the context. Overduplication and tight coupling are universally bad but the trade off between them is often a matter of opinion.
In practice I've found that once you've hit the point where you're trying to trade duplication + tight coupling off against one another the code is already in a decent enough state that there's other more important work to be done.
> Amateur programmers tend to put code de-duplication at the top of their priority list and will burn the whole house down to that often-trivial end.
To be fair, lots of how to program books spend a lot of time on teaching you how to abstract, and sing abstractions praises to the heavens, as it were. It takes some experience to learn that real world stuff is not like the toy programs in books.
Amateurs aren't paid, professionals are.
Novices are inexperienced, journeyman and masters are more skilled and experienced.
A novice can have a ton of knowledge (from books), but be too inexperienced to apply it.
A novice can be a professional, this is what internships and entry-level jobs are supposed to be for. Paired with mentorship and structured work assignments (structured in the sense of increasing complexity, scope, and responsibility) they're brought up to journeyman and, later, master level.
They can also be amateurs. Given forums, books, manuals, mentors (real-life or online), they can be brought up to journeyman and master level as well.
You're noticing a common root, but not the meaning of the word in the modern day.
Decimate means to destroy 1 in 10 of something (like an opposing army). But today we use the word to mean destruction of a large percentage.
I suppose an argument can be made that modern use of amateur is more akin to what used to be novice. However, I'd have a hard time accepting that except when it's used as a slur. We talk about amateurs in many fields, but don't intend to dismiss them as unskilled or inexperienced, we're classifying them as non-professionals. In a forum like this, filled with amateur programmers, it seems, to me, that it's wrong to misuse the term in this manner when a large portion of the readers here are amateur programmers but of moderate to high skill level.
But our profession's training material, at least in my experience reading, drills it in your head to use all these abstracting devices.
It's certainly true that you can get some "book knowledge" that tells you that you can over abstract. I mean, this blog post is one example. But I only hear this sort of stuff from things like blog post from experienced devs, it seems to me. (Or maybe I just read the wrong kind of books?)
A slight tangent on that note, but I think many of the problems with current web development result from the same root cause: adding yet another dependency to solve an almost trivial problem. Sometimes going to the extreme of for the sake of saving few keystrokes. Need one function to find an item in a collection? Reference Lodash. And then drop references all over the place.
Yes some dependencies are useful, but they all need to be handled with care. Wrap that search operation to an internal utility function and inject Lodash as its implementation if you don't want to reinvent the wheel. This is what DI is for (and nothing more).
Maybe the feedback of exploding the compile times as a result of a complex dependency graph would make people more sensitive to the issue. But then again, that did not prevent it from happening either. Oh well, it's not my codebase (yet).
In my experience, this is one of the greatest things about the new prevalence of tools like babel; it pushes that abstraction layer down below in-code dependancies. There's still a dependency to manage, but it's not a library import or similar.
A lot of things in there would need a library for cross-browser compatibility still. That's my point, but I take yours as well. It's still a very BYOL (bring your own library) kind of a language!
When they eventually diverge, because they were different operations or structures, so they won't evolve the same... This is where you end up with functions that take 10 different option parameters, all of which activate a different code path. Or structures with completely different fields set depending on where it came from. And guess what? Now you're back to basically one code path or structure per use case, and an extra dependency that didn't exist before and is a nightmare to maintain without breaking one of those existing use cases.
Or you could abstract the commonalities out into a higher-order function.
IMO the danger is greater with other languages that don't have the library support that Java does. That Ruby Gem you use today may get abandoned in 6 months and become a piece of technical debt you have to deal with later down the line. It's always a balancing act.
There's also nothing inherently wrong with duplicated code. It can make a code base harder to maintain, but so can over-abstraction. But I agree with you and the author; most projects are better off when developers err on the side of verbosity.
 This should further articulate why "right/wrong abstraction" is not the useful nomenclature. I bet you I could find several programmers who could write a "better"/"more good" OAuth library, but that's not what is of priority here. What is most important is this objective quality: is the dependency (it's interface) stable? If it is, then my code architecture remains sturdy.
Indeed, dependencies have nearly zero immediate cost, but in the long run they
are expensive and need to be weighed against the gain from their use.
I think it's more often likely that the next person comes along and lazily forces the most brittle, minimal change possible to make their new requirement work without thinking about the larger context. Even if that means putting a complicated conditional into the existing function, while leaving the existing function name intact.
This isn't about honor. It's simply bad programmer behavior. It might be due to lack of skill or experience, or maybe laziness, or lack of discipline, or some other negative attribute.
Management just needs this one little feature tweaked, so you dive in to where you think that needs to go, make the new thing happen while absolutely changing as little as possible because god help us if anything breaks.
Now imagine that you're not the second or third person to be responsible for this, but maybe the tenth or so, and it's been going on for decades. And the last person to work on it was really new to this language and didn't understand its idioms very well, or vice Versace, and you don't understand the language very well.
I greatly enjoy these conversations about code quality and maintainability, but I've encountered it more than once where business constraints don't allow you to even start to think about these issues. You have no choice but to respect the abstraction you inherit because your job is to get in, get out, and pray you didn't break some edge case that no one would ever think of.
Or maybe I'm just an amateur/novice.
Just adding your small change without touching the rest of the application is usually the easiest way, though, so most people just do that instead.
The best rule I have heard is to always leave any code you touch in a better state than you found it.
The problem with web applications is that they are not self-contained. You don't necessarily know who or what is calling what or how for some legacy applications.
I can agree with the concept of improving code as you see it, but changing the abstraction--the topic of conversation here--cannot be done willy-nilly for any non-trivial app. You have to dig up all the people are, or were, or will be counting on that abstraction, and that's a challenging thing to do sometimes.
And then you have to provide a business reason for you to allocate the time necessary to go read those 100k lines of code and understand them or write tests for them. And you tell your boss this is a mess and needs serious work, and he says, "Dude, this isn't rocket science, and we're going to burn this whole thing down in a few years anyway, and all I need you to do is add this one button in this edge case and don't break our standalone desktop app that isn't going to get updated to understand this functionality."
You're idealism is admirable, and I don't disagree with it. But the reality of the world is that if you work for a company whose primary product is not software (hell, even sometimes when you do!) the priorities are about the business functionality, not about the state of the code or the right abstraction.
I just think it's a little disingenuous for people to have these conversations in a vacuum and attribute anything from malice to incompetence to a person who takes a different approach.
I certainly hope that the discussion is not only about making the lives of the programmers easier.
I think you might be committing the fundamental attribution error. )https://en.wikipedia.org/wiki/Fundamental_attribution_error
>the tendency for people to place an undue emphasis on internal characteristics (personality) to explain someone else's behavior in a given situation rather than considering the situation's external factors.
Rather than considering external issues like time crunch, management pressure, or too-high a workload, or anything else it's much easier to just assume everyone else is a lazy and shitty programmer.
Time pressure may also be an issue.
Or, what if the system is only for prototyping, and later turns out to be needed for production (a management decision)?
When programmer A introduces an abstraction, it's not as if they are saying "this abstraction is the one true way to represent the domain." The abstraction is a tool. It helps cover some duplication, and ideally express intent. But the tool needs to be updated when it's not useful.
Management: "Change those two lines of code."
Math is, at its core, the study of abstraction. Things that have been found to be good abstractions in math probably are good abstractions in programming.
Learn from history. Mathematically-derived abstractions probably are used because they are the right abstractions.
Perhaps importing abstractions straight from mathematics isn't the only answer. We might do better if we simply added some type and law restrictions to our abstractions as well as clearly stated assumptions. That way everyone would know when they don't apply anymore.
The object of mathematics is to produce abstractions that make proofs possible or trivial. Sets, fields, groups, categories, functions, integers, reals, complex numbers, quaternions - these aren't notational conveniences that are introduced to make it easier to talk about. They're important because if you have something with the properties of a group, you know some powerful truths about it for free. If you change anything about the definition of group, the set of truths you know changes. People have been grinding on what the definition of a set should be for a century, trying to build the best possible abstraction, and they have all of the same problems coders have. You assume to much, it's not very general. Don't assume enough, there's nothing interesting to say that's unilaterally true.
So I would argue that characterizing math as the study of abstraction is largely fair. You "see" a result is probably true some of the time, and then you try to find out just how general you can make that statement.
> People have been grinding on what the definition of a set should be for a century, trying to build the best possible abstraction, and they have all of the same problems coders have. You assume to much, it's not very general. Don't assume enough, there's nothing interesting to say that's unilaterally true.
This is a great point.
Such is this discussion, imo. It's very possible to over-abstract. If something isn't extended, maybe it shouldn't be designed to be extendable, and if it isn't configured, maybe it shouldn't be configurable.
On the other hand, abstractions are the heart of programming, and finding a good one is what can make you much more productive and maybe even have more fun. The feeling when I've abstracted something and got more out than I put in is maybe the best feeling there is in this craft.
The difficulty is in finding the balance. I also dislike the rule of 3 as a hard general rule. There are many times when it makes sense to abstract something out of 2 uses, if there is a lot of code and clear separation and commonality involved, or you anticipate more uses later. Maybe even abstracting sub-problems out of one case makes sense, if separating the problem cleanly into two parts makes it easier to reason about. There are many times when it's a bad idea too; it depends on the specifics and it's a judgement call based on experience.
(And, I would even say, there is room for personal taste in programming about this; where the sweet spot lies may vary from person to person.)
The fact that abstractions can often add complexity rather than remove it should be taught in schools.
The problem with distinguishing between the two cases is that it often requires business domain knowledge. And programmers, sadly, rarely care about the domain logic.
Oh, but we do. It's the one thing we want to know but we pretty much never get from our customers. No surprises there - most people are not equipped to express their "domain logic" in a way that is useful for automating it; quite often they don't even understand it. Humans can hand-wave their way through anything, relying on intuition, patterns and methodologies. They don't need to understand what they're doing deep enough to express it as code.
And so what usually happens is that your client/manager gives you a nonsense spec saying "this thing should do that when clicked, and that thing should do something else". And, as a programmer, you then have to reverse-engineer the domain abstractions from it; abstractions the people giving you the spec probably don't understand themselves.
It might not be that they don't understand the domain "deeply enough." Maybe they understand it in the way that is relevant for their work, which typically doesn't require formalization.
Making sure this collaboration actually happens is one of the big topics of Domain-Driven Design. We can't just expect to be given a correct model; teasing that out is part of our job. If the spec is nonsense, we should say that and try to fix the process.
Also, you often hear folks saying things like "I am a Java programmer." Stuff like "I am a telco programmer" is much more rare.
He must have been practicing Deadline Driven Development.
or The Team Has A Hammer Development
It does not matter much which was the specific mistake, really: a bad choice of an abstraction or something else.
Good abstractions help immensely; programming is all about them. Let's not forget it, too.
> "The room fills with cold, conditioned air; outside the heat hazes, filtered through greened glass windows: a new building hardly first populated. The speaker is wild-eyed, explaining new ideas like a Bible thumper. His hair is a flat-top; his mouth frowns in near grimace. He strides to my seat, looks down and says in a Texas drawl, 'and the key is simply this: Abstractions. New and better abstractions. With them we can solve all our programming problems.'"
A little duplication is fine until you figure out what the real problem is. The problem is usually in your data. Maybe it's not structured properly or you need to simplify the steps to transform it at an earlier stage.
A good specification will go a long way to reducing the desire to introduce hapless "abstractions." An abstraction, in the mathematical sense, will hold over the domain and introducing a new one merely allows you to manipulate objects on the lower level using new algebras, predicates, etc.
Code abstractions are often the leakiest abstractions. Especially the kind the author is talking about. Avoid them. Even if you have a little bit of duplication. Only start worrying about that duplication when it starts spanning compilation units/modules/whatever and is actually causing problems. Then look at the data and figure out how to structure it so you don't need that code all over the place.
Similarly, once a codebase uses several very wrong abstractions, it becomes significantly more confusing to work on, exponentially increasing cost.
The temptation to use a mature library as a dependency is very strong since time is initially saved. When that library introduces the wrong abstraction, the consequences can be severe.
I typically argue for (at least) creating the correct abstraction and having it wrap the mature library to constrain or properly name its behavior, and to make it less strongly coupled to the rest of the system.
It doesn't just take experience to realize these sorts of things, it takes a willingness to question one's own code and imagine how it might have been done better, or how it might appear to someone who didn't write it.
abstraction != dry
bad abstraction != anti-dry
Abstraction is primarily about separation of concerns, not about avoiding repetition. Drying out code that's repeated all over isn't the same as creating a formal abstraction for some element of the overall logic.
Which is why
>once a codebase uses several very wrong abstractions, it becomes significantly more confusing to work on, exponentially increasing cost.
And drying out code makes it easier to maintain, but it doesn't guarantee that the architecture isn't a mess.
The problem is perhaps that CS teaches algos, and sometimes it teaches design patterns. But there's almost no useful theory of abstraction design.
Design patterns are more or less as good as it gets, and all they do is give give you a cookbook of stock formulas to try.
Beyond that, there's no useful way to reason about abstractions, test them for domain fit, or rate them for elegance and efficiency.
If anything, some developers use design patterns as a grab bag when solving a problem... a better approach is to model the solution and then be ready to "back in" to a design pattern upon noticing strong similarity or observing that the design pattern is a bit more abstract way of doing the same thing.
Because of the tendency to pick a pattern first and design for the domain later, many instances of design pattern use in the wild are subtly (or not so subtly) incorrect.
> Beyond that, there's no useful way to reason about abstractions, test them for domain fit, or rate them for elegance and efficiency.
Very true indeed. Looking at a design through the lens of coupling and testability and modularity is a good start and can reveal many problems, but I think the real gotcha has to do with naming: Once we name a concept/abstraction we are likely to reason within that abstraction, and we rarely consider whether we are stretching it too far, or if it is even that thing anymore.
from this blog post: http://blog.thecodewhisperer.com/2013/12/07/putting-an-age-o...
There's a lot of other good stuff there too, when you start digging through the archives.
Generally, I find that I've got to build a prototype first, which is quick and dirty and ugly, but works. Then iterate as the structure emerges. Some things stay in flux, and it's okay to leave them messier, but eventually the code and functionality settles out and commonality arises. Quite often, I don't necessarily know what I'm doing going in; I've got a general goal, but until I experiment a bit, the best way to get there is unclear.
One of the joys and pitfalls of a multi-paradigm language like C# is that there are so many different ways to skin a cat. You can pick and choose procedural, object-oriented, functional, or some bastard mishmash of all of the above and more.
Data-flow analysis was a method that was swept away (somewhat unfairly) on the grounds of being too procedural and not OO. One of its key tenets was that once you had discovered the essential flows and interactions, you should discard the problem-space partitioning that helped you find the graph, and use the graph to partition the solution space (which might well result in the same partitioning, but not always.)
On the other hand: I often see copy pasted code which have these same characteristics author wrote about:
"Another additional parameter.
Another new conditional.
Loop until code becomes incomprehensible."
If somebody writes too many conditionals, parameters and creates incomprehensible loops, then he will do it always, no matter whether in abstracted code or in copy pasted code...
The problem is that most of programmers are not skilled enough to write good code.
So sometimes there is a need for refactor wrong abstractions in legacy code, sometimes there is need for cleaning tons of copy-pasted code and make good abstractions... Either way - maintaining is hard.
Quantity is easier to control than complexity.
Grotesquely replicated code is a much more difficult situation to resolve.
I'm guessing the OP primarily uses untyped languages. I'd offer that that is the core of the problem, not the abstractions.
My point was that the way your language builds abstractions might increase or decrease the problem.
Or to say it more succintly, duplication is not a problem, triplication is ;)
I have found that following this simple rule I allow the development to go forward, and can go back to it when I have better knowledge of the system at large. At later date I might have a better luck in choosing the correct abstraction.
I don't think this is very common though. You will more likely see a lot of duplication in poor legacy code, and not because the coders were so clever, but quite the opposite: it's when they barely even understand how to extract abstractions.
Top of the list: constants should probably all go in special structures with the goal of guaranteeing consistency and making them easy to understand at a glance. Everyone might know that charge code "X" means a check payment, but what happens when a new developer looks at that code with "X"s everywhere? It's more verbose to use constants.chargeCodes.CHECK_PAYMENT but nobody will misunderstand what you mean, and your IDE will be able to verify that your codes are valid. That's worth an awful lot of extra characters.
Bonus win: when your legacy backend finally gets upgraded, you have the option to change to a new code for check payments, easy as pie.
In my opinion, the best defence against this is good documentation: if a two methods have clearly documented behaviours, then even if their implementations have been fused, a subsequent programmer will have more context (and more confidence) reduplicating the code in response to further changes.
"if it's non obvious wait for 3 cases before abstracting it"