I reproduce the relevant part:
Dependencies (coupling) is an important concern to address, but it's only 1 of 4 criteria that I consider and it's not the most important one. I try to optimize my code around reducing state, coupling, complexity and code, in that order. I'm willing to add increased coupling if it makes my code more stateless. I'm willing to make it more complex if it reduces coupling. And I'm willing to duplicate code if it makes the code less complex. Only if it doesn't increase state, coupling or complexity do I dedup code.
State > Coupling > Complexity > Duplication. I find that
to be a very sensible ordering of concerns to keep in mind when addressing any of those.
>> Existing code exerts a powerful influence. Its very presence argues that it is both correct and necessary.
I read the article in 2016 and that phrase stuck with me ever since, I had never thought about it but it's such a simple and self evident fact but so easy to miss. It's a powerful concept to know. Both when writing and when refactoring code.
If something gets bad enough, I will refactor the whole damn thing, but only at jobs where there are unit tests. If there are no unit tests, this truly becomes an impossible task and it's best not to touch anything you don't need to.
You have to have good tests if you ever want to tackle technical debt.
That is, you find some code. Its presence says "I'm here for a reason", your answer of "maybe it's just because..." comes as, precisely, an _answer_ to that argument.
Neither the argument itself nor your answer are necessarily and always correct. This doesn't argue that point, just that the presence of a piece of code makes such an statement.
> We know that code represents effort expended, and we are very motivated to preserve the value of this effort. And, unfortunately, the sad truth is that the more complicated and incomprehensible the code, i.e. the deeper the investment in creating it, the more we feel pressure to retain it (the "sunk cost fallacy"). It's as if our unconscious tell us "Goodness, that's so confusing, it must have taken ages to get right. Surely it's really, really important. It would be a sin to let all that effort go to waste."
Advice is nice and all, but at the end of the day, even that new method to write code, gets to perform on the profiling table of your manager.
Use it, run it against procedural or functional approach, measure it, decide.
Everything else is politics, religion and that one irresponsible guy who gets high on new things and away with it somehow, while touring companys.
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
To clarify: when I wrote "procedural abstractions", that included functional.
The assumption that data is something to be maintained internally, at best hidden behind an interface (a procedural one) and at worst "exposed" is so ingrained that it's hard to think of it any other way.
However, why can't we have "data" as the interface? The WWW and REST show that we can, and that it works quite swimmingly. With a REST interface you expose a "data" interface (resources that you can GET, PUT, DELETE, POST), but these resources may be implemented procedurally internally, the client has no way of knowing.
I've also done my bit, with In-Process REST and Polymorphic Identifiers for the "REST" half, and Constraint Connectors for the "reacting" half.
Not to mention the better expressive power for describing data structures with algebraic data types (just + and * for types really).
If you make global arrays instead you will always have a wonderful idea of what your program's state is, and you can easily use and transform it with simple table iteration.
And yet it says so in the first sentence in the Wikipedia page for functional programming https://en.wikipedia.org/wiki/Functional_programming
>a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.
But I'll take it that you don't have much functional programming experience.
Of course one can still go with a big global array and keep updating it in-place. A good programmer can write Fortran (or C in that case) in any language.
And of course a lot of so-called "functional" programs just outsource their state management to some sort of relational database. And the people talking about their creation will praise the state-less-ness of their creation. Unironically.
What can you do? ¯\_(ツ)_/¯
Anyway, more practically, the vast majority of workloads do not have computation as their primary function. They store, load and move data around. Computers, generally, don't compute. Much. For those workloads, a paradigm that tries to eliminate the very thing that the problem domain is about, and can only get it back by jumping through hoops, is arguably not ideal.
This doesn't mean that functional programming eliminates state. Avoiding changing-state and mutable data is different and the Wikipedia article is referring to how functional programming doesn't mutate existing data, so you avoid the stale reference problems that can occur in OO languages.
Instead, the state is the current execution state of the program. Function calls are side affect free (except when interacting with the external world, which is a special case I'm not covering here). Because of this, the only way data can be communicated from one function to another is by passing it in when calling the function, or by returning it. This means the state is just the data local to the currently executing function, and any calling functions (though the data in that part of the state isn't available to the current function it's still in memory until the calling function returns).
Contrast this with procedural programming languages, where state can also be maintained in global variables, or object oriented languages, where objects maintain their own state with the system state being spread around the whole system.
And please, no beaten up buzz words and selling pitches needed.
It was one of the first to emphasis data structures in addition to code.
There is a free pdf online discussed on HN in 2010.
(from wikipedia) "The Turbo Pascal compiler written by Anders Hejlsberg was largely inspired by the "Tiny Pascal" compiler in Niklaus Wirth's book."
From linus rants: https://www.reddit.com/r/linusrants/comments/8ou2ah/in_fact_...
No, you can't do everything with just data structures. Everyone knows this. 1st-year junior programmer knows this. It's obvious. The original question did not talk about this, you misunderstood the level of analysis it was aiming at.
The fact that SQL is not turing complete is a meaningless truism here, because Linus obviously did not mean that we should all start using SQL instead of C. The point he is making is that data structures are of much bigger importance to get right in order for the program to be good. Not just fast or just maintainable, or just easy to understand. But all of those things and many others.
Of course one can force anything into a relational database. The data analog of "Turing tarpit".
Ironically graph databases are way better for describing relations than relational databases.
Easily represented as a vertices-array and an edges-array. It's conventional to index the (directed) edges to optimize iteration over all edges from a given vertice. If you're being "sloppy", you can also represent edges as vector<vector<int>> (one array of destinations per source). This is more convenient but comes with the slight disadvantage that these arrays are physically separated (for example, you'll need two nested loops instead of only one to iterate over all edges).
> Ironically graph databases are way better for describing relations than relational databases.
Local state here is better than global state, especially if you consider the advice to write shorter functions - if your functions are small, so are the scopes in them, and the local state is easy to trace and memorize. Global state is not bounded, there could be hundreds of constants, enums and global objects to keep in mind.
Immutable data structures are easier to comprehend than mutable ones, because there's less points where state is modified. If you take Redux as an example, you still need to know what is in the "immutable store" at any point in order to understand how the code uses it; Redux tries to minimize the pain by limiting changes to the store in actions/reducers and by giving you access only to part of the store(local state vs global). However, you still need to understand what changes a sequence of actions perform on the store, so that's still state you need to be concerned with.
We all know the issues that arise out of mutable state, values getting changed for seemingly no reason, race conditions literally sapping every little bit of will to live you had, mutable state doesn’t scale well (at least from a complexity standpoint). Now you’ve gotta worry about locks, and all that fun stuff if you try to do any kind of non trivial concurrent programming.
Now, I’m not saying immutable data structures are literally the silver bullet, but they do almost completely solve all the above mentioned issues with mutable state. But, they too, have their own issues. Working with immutable structures can be significantly slower, especially as the amount of state grows, any modifications mean you have to create a new structure and copy data, and now you’re also going to be having a lot more memory being used, and that’s not mention the conceptual differences you have to adjust to if you’ve never worked with strictly immutable structures before (“what do you mean I can’t just update this array index with arr[i] = 2?”). But, in my experience, debugging can be orders of magnitude easier, concurrency is something that is actually enjoyable to work with, and not a chore of mutex checking, and hoping some random thread isn’t going to fuck up all the data, and given the power of modern computers, the memory bloat that comes along with immutability isn’t really an issue anymore.
But, I’m also one of those people that thinks functional programming is the one true path, so I may be a bit biased/misinformed on some crazy mutability benefits that make the bullshit worth it.
But, I digress, and honestly think a nice balance between concepts, and using what works best for the task at hand. However, I’m super excited for the future of functional languages. I love Elixir/Erlang, once you get the hang of OTP/actor model/functional programming, it’s absolutely mind blowing how easy it is to write highly concurrent programs that are super reliable, easy to restart from failures (just let it die is what they say right?). Nothing like the headache’s I experienced when I first learned concurrency with pthreads in c++. And what’s exciting is the Erlang VM is legendary for being able to build up highly reliable and super concurrent, complex software, however one of it’s biggest dings was it’s far slower than something like C when it comes to raw computations. This is largely because of it’s functional nature, since it’s data structures are immutable, any modifications will result in having to make new copies of data, while C could just do it right in place. However, now that raw computing power is becoming cheaper and faster, they is becoming much less of an issue. And the Erlang VM can handle things like clustering servers to run processes across several computers built right in. I don’t want to imagine what it’d be like to have to set that up with our old friend C out of the box (but C also doesn’t have the overhead of things like a VM or a garbage collector, so it’s not like it doesn’t have a ton of advantages over Erlang I just wouldn’t want to use it to build concurrent software across multiple servers).
Also, my FactoryFactoryProviders are alive and well :)
Think of a function, where everything is immutable, but instead if full of if/switch statements and complicated branching behavior. Even if it is deterministic, it will become intractable for reasoning once it reaches certain scale.
I don't think you should restrict yourself to thinking only about mutability and immutability in your program, but also that of the entire system. If your program is completely self-contained, that's good, but often they need to integrate with outside services and communicate over the network and write data to disk etc.
Those dependencies also result in state that might affect the behaviour of your software and you need to consider it when designing and writing code.
a) You can "store" (even if it's in-memory) just the products and their quantities. Then each time you need to display the cart you go and compute the corresponding prices, taxes, discounts, whatever.
b) You can store each cart line, whether it has discount(s) or not, as well as its taxes and the cart's global taxes and discounts and whatever else you can imagine.
Option "b)" is probably more efficient (you are not constantly recomputing stuff) but you will be better off in the long term by going with option "a)":
- Your cart management and your discount/tax computation are less coupled now (the cart doesn't really need to know anything about them)
- You have less opportunities for miscalculation because everything is in one "logical flow" (computeDiscounts()/ computeTaxes()) instead of being scattered (some stuff is computed when you add an item or when you remove it, or when you change the quantity, or when you specify your location, etc..). The code will most probably just be simpler with option "b)".
The article argues that you should sacrifice the performance in cases like this. I agree.
Anyway I myself so wholeheartedly agree with the minimizing state idea.
On the one hand, often there can be shared lines of code without a shared idea, this shouldn't be a candidate for being factored out into some new abstraction.
On the other hand, you may want to introduce an abstraction and factor out code into a common library / dependency / framework when there's a shared well-defined concern/responsibility.
That said, on the gripping hand, I say may because even if there's the opportunity to introduce a clean future-proof abstraction, introducing it may be at the cost of coupling two things that were not previously coupled. If you've got very good automated test coverage and CI for the new common dependency and its impact upon the places it is consumed, then perhaps this is fine. If the new common dependency is consumed by different projects with different pressures for change / different rates of change / different levels of software quality then factoring out a common idea that then introduces coupling may cause more harm than good.
To help weed out requirements I tell people that on their third copy / paste, they might begin to consider reducing the duplication. At that point, they both have had time to think about the code and had gained experience with it to discover what the requirements are.
Another problem with bad code reuse is code locality. Like with Instruction and memory locality helping to improve runtime performance via caching, code locality helps improve mental processing. The further you separate related pieces of code, the more you need to have a good abstraction for it so you can correctly reason about the code. Without a good abstraction, you are having to jump between far areas of code to figure out what your function does.
Programmers have had DRY drummed into them so hard that it is almost heretical to even consider the tradeoff of increased coupling that arises from it. Coupling is good if things should change together because they are linked in some fundamental way such that it would be wrong for them to be different. Coupling is bad when things should evolve independently and the shared link is incidental.
The problem is that it is surprisingly hard to tell the difference up front. In the moment of writing the code, the evidence for the shared abstraction seems overwhelming, but the evidence of the cost of coupling is completely absent. It exists only in a hypothetical future. Unless there is strong evidence for a shared underlying conceptual link, I often consider only the 3rd repetition of a shared piece of code evidence for the existence of a true abstraction. Two times could just be chance, three is unlikely to be so.
This is actually representative of a problem in the industry as a whole I think. A lot of things have short term benefits but long term drawbacks. Because of the drastic, recent growth, orgs are bottom heavy (very few people have experienced long term drawbacks of X compared to how many people who just learnt X). Additionally, because of the extremely quick turnover of people, it's even rare that people who implement X are there when X blows up in people's face. They went on to implement Y...and will be gone before Y blows up.
So most tools, libraries, frameworks and abstractions are HEAVILY optimized for the short term. Optimized for getting a project set up quick. Optimized for the initial "Hello world". Optimized to get an API and a form in seconds. Very few tools/patterns are optimized for ease of long term (hell...these days long term means a year or two) maintenance. The ones that are generally get a bad rep.
And building stuff that's both good short AND long term is very, very hard.
A little copying is better than a little dependency.
It's usually the intermediate developers who like having rules to follow to know that they're doing well that tend to over-apply DRY and other principles. Only experience (aka pain over time) seems to show when to break (or just not apply) the rules.
Perhaps it's just the way things are taught/learned. Instead of just showing what's good and have them interpreted as rules, each should be shown as a rule of thumb with a concrete example of when it should not be applied. Even if they don't clearly understand the difference at the time, they'll always recall that there are exceptions and not feel so motivated to apply it in every instance.
The real problem is when engineers abhor duplication, and in order to reuse existing code, they simply call the same code from multiple places without thinking through the basis for this reuse.
Mindless deduplication is not an act of abstraction at all! This is a very important point, because a "wrong" abstraction that is conceptually sound is not that hard to evolve, and if the code is called from N places then you get to look at those places to understand how to evolve the abstraction. Improvements to one part of the code benefit N parts, and you save work.
The only other factor to keep in mind is the dependency graph and coupling, as my parent mentions.
Mindless deduplication is more common than you'd think, especially with bits of code like utility functions and React components. For example, you end up with a component called Button that has no defined scope of responsibility; it's just used in one way or another in 17 different places that render something that looks or acts sort of like a button. This is not the "wrong abstraction," it is code reuse without abstraction.
An abstraction can be seen as a compression process, mapping multiple different pieces of constituent data to a single piece of abstract data; 
There are wrong abstractions.
Quoting the start of the article: Abstraction in its main sense is a conceptual process where general rules and concepts are derived from the usage and classification of specific examples, literal ("real" or "concrete") signifiers, first principles, or other methods.
For the Button case, you'd have to come up with some concept of what a Button is and does, beyond what code lives in its file (e.g. an onClick handler that calls handleClick, etc.) in order to have an abstraction.
There are "wrong" abstractions (in the sense of abstractions that turn out to need to be changed later, like any code), but if you lump all deduplication into abstraction then you will have a skewed sense of the cost of changing an abstraction.
The cost of changing an abstraction also depends on your programming language; if you spend a lot of time in a dynamically-typed language, you may internalize that refactoring is tedious and error-prone and often intractable.
Granted we were all n00bs and nobody will see that code again so it wasn't a big deal... but the intent, direction, and possible future of the code seems like something that should be considered once you start sharing.
Yes. Dare I say, intent is one of the most important things here. Two new pieces of code may be structurally identical, and yet represent completely different concepts. Such code should not be DRYed. Not just because there's a high chance you'll have to split the abstraction out later on - but also because any time two operations use a common piece of code, there's an implicit meaning conveyed here, that the common piece of code represents a concept that's shared by the two operations.
When systems need to use newer functionality, port them to the new components.
I've had a mixed experience with this, but at some point you get the API right, and then it works.
In the end, almost every conceptual way to slice up software can be viable if you are good at whatever you do, and terrible of not.
I apply this to DRY, coupling, encapsulation, APIs, etc. Also, I prefer to focus on consistency and readability over most other concerns. I mentally, or physically!, note areas of code I want to improve but don't feel like the time is right now, right now. During future work that touches that code I will refactor it if a solution has presented itself.
These days I prefer languages with bomber language services and tooling to make refactor in the future as painless as possible(types). I prefer explicit over implicit(sorry Ruby and Chef), and configuration over convention(looking at you Gradle).
I think a good test for this is if you can write a reasonable unit test for the code in question. If your unit tests essentially become two separate sets of tests, testing the different branches of code, it's probably the wrong abstraction. If your tests work and you've built a reasonable standalone library (even if it's not useful to anything but your exact product), that's at least a signal the abstraction is sustainable.
Now we simply need a formal objective definition of "reasonable code" and the industry should never have this problem again.
A red flag is vague function names like "processThing" or "afterThingHappens". If a function can't be summarized concisely, it's probably doing too many things, and the abstraction is likely to break down later when the callers' needs diverge.
As a senior engineer who recently became an engineering manager I always caution my devs about abstracting too liberally. Junior engineers are particularly bad about this. They see a handful of functions that are duplicated across a few (unrelated) projects and they want to create a new repo/library and share it. Then I direct them to the Slack channel for our platform services, which has a sizable shared library across dozens of services. That shared library is a frequent source of problems.
It takes a while, but I usually beat that primal impulse out of them.
He is what I think you do when you do that.
You just created yet another internal API. Designing, creating and _documenting_ good API's is _hard_. The most likely result is an undocumented dodgy half finished API that doesn't fully encapsulate the thing it's supposed to deal with. So you end up with code that both uses and bypasses the API you just wrote.
If you do that and later decide that you want to move some functionality from one side of the API to the other you've just set yourself up for a hella lot of work.
The other thing is, you want to make changes to duplicated code. You can limit the risk to code base you're actually working on and not a bunch of unrelated programs.
This reminds me of the recent post on HN about a company migrating from microservices back to a monolith, for this exact reason.
Just because something hasn’t been shared yet isn’t good justification in my opinion, if you know it will be, especially if it’s a library/API.
I'm all for not repeating myself, but there is a different between "usually avoid" and "never"
Copying and pasting in many situations would seem a breeze compared to the nest of abstractions required to avoid it.
If I'm writing an API to move a robot, my problem space is fairly bounded, and I know that someday I will want force control at some end effector. I know that there's a 6 axis robot I've been eyeing, etc.
Maybe I'm being downvoted by web devs?
$input_folder = "/some/annoyingly/long/path/my-cool-project/input/"
$output_folder = "/some/annoyingly/long/path/my-cool-project/output/"
$project_folder = "/some/annoyingly/long/path/my-cool-project/"
The problem was that when I developed the script the actual consumer hadn't been finalized, so that output folder path was just a placeholder. When it comes time to deploy we get the actual path which is now some NFS thing like
$project_folder = "/whatever";
$input_folder = $project_folder . "/input";
$output_folder = $project_folder . "/output";
Those paths are not the same data repeated twice just because they share common substrings. They are two paths that serve distinct purposes. The developer likely chose that syntax because it looks like a setting that can be changed. It could just as well have been read from a settings file.
$project_folder = "/whatever";
$in = "in";
$out = "out";
$suffix = "put";
$directory_separator = "/";
$input_folder = $project_folder . $directory_separator . $in . $suffix;
$output_folder = $project_folder . $directory_separator . $out . $suffix;
$in = "in";
$out = "out";
$suffix = "put";
This isn't an over-abstraction, it's an over-extraction. Each abstraction should be non-trivial.
$project_folder = "/whatever";
Most software on my machine has probably arrived from apt-get at some point or another. Even if it's all Perl scripts, I'm not going to overwrite them directly, only to have my changes either removed on update, or blowing up the package manager's consistency checksums, or [insert random reason I can't conceive, because I'm not familiar with internals of apt]. So it's either config files, command line, environmental variables, or I'm going to build a wrapper that bends installed software to my will.
> The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise
It will become obvious which duplicated code to abstract when you find yourself changing all/many at the same time or fearing you’ll forget/break something if you don’t change all instances. Writing tests is also a good motivator as it means even more code per duplication (and reduces the fear of breaking something).
It really takes a lot of duplication for this to get out of hand. Wait til it happens, you’re a software engineer you’ll figure out how to get rid of duplicated code just fine. Coming up with a great abstraction is extremely difficult before seeing at least a few examples.
Date: Tue, 22 Aug 2000 16:00:52 -0400
From: "Eric S. Raymond" <firstname.lastname@example.org>
To: Linus Torvalds <email@example.com>
Linus Torvalds <firstname.lastname@example.org>:
> On Tue, 22 Aug 2000, Eric S. Raymond wrote:
> > Linus Torvalds <email@example.com>:
> > > But the "common code helps" thing is WRONG. Face it. It can hurt. A lot.
> > > And people shouldn't think it is the God of CS.
> > I think you're mistaken about this.
> I'll give you a rule of thumb, and I can back it up with historical fact.
> You can back up yours with nada.
Yes, if twenty-seven years of engineering experience with complex
software in over fourteen languages and across a dozen operating
systems at every level from kernel out to applications is nada :-).
Now you listen to grandpa for a few minutes. He may be an old fart,
but he was programming when you were in diapers and he's learned a few
https://news.ycombinator.com/item?id=11077799 (2016 discussion)
I now try to find the middle ground by remembering to "do the simple thing" even if it appears less elegant. This makes it easier to refactor in the future (if required) at which point more information will be available to design a more appropriate abstraction than would have been possible before.
So now he’s indispensable, a situation I assiduously avoid (you can’t work on the cool stuff when you have to be there to maintain the old thing).
Many copy & paste scenarios can be avoided without creating any meaningful abstraction. Generally this is best done with a stateless (pure) function that has few if any conditionals and does not involve design patterns such as inheritance, overriding, or even creating new compositions. It should feel easy and boring when you are doing this.
Abstraction is a new representation that calls for deep thought and I agree is easy to mess up.
The need for DRY also depends on type-safety. Type-safe boilerplate generally adds verbosity, but few bugs. This article is coming from the Ruby world where a bug lurks behind every untested line of code: this can create a lot of pressure to make boilerplate free code which can end up turning into abstractions. But every code change (without code coverage) in dynamic languages is also an enormous risk. In type-safe languages the compiler can help ensure that the process of removing duplication is correct.
That being said, good thoughts on solving duplication correctly.
In a functional paradigm, even the most trivial of abstraction pays off handsomely. This is exactly what "map" is for example.
I’m pretty sure you’re talking about composition in terms of heavy weight enterprisey terms but I think that is a pretty fine line.
This can be tricky if the person who wrote the abstraction takes it the wrong way. At my previous company, I've been yelled at for doing this. Some developers get emotional about their code, in which case undoing their abstraction causes offense. How do you get around this?
Most of the time you will either be welcomed for your proper obsequence, or find out that there's a jealous guardian of that code and no matter what you do it won't alleviate that tension anyway.
Being able to evolve 2 pieces of code separately is powerful, and realistically is a more common case than wanting a change to pop up in 2 different places.
Very regularly I'd hear "Code duplication is fine, do not use an abstraction here".
What he meant was "In this case the abstraction might be incorrect. Use abstractions only when they actually relate to business logic, not just because two pieces of code happen to look the same".
Unfortunately, while that was very obvious to him, to new developers to the team (like me) it sounded like "Do NOT use abstractions, they are evil". Over the years I developed a habit of never thinking about abstractions because they are evil. I duplicated code that should have been abstracted and today we pay the maintenance cost for that.
tl;dr: Experienced folks, be careful when you caution your peers against abstractions. Be very explicit and assertive that they _CAN_ be used correctly and one shouldn't avoid them.
It's hard to get across that the answer is usually one of the Greys and, even then, the shade will probably vary a little from time to time.
Rules are the children of Principles, they're important handrails as you're learning but to progress from there you have to understand the Principles behind them and how they confirm or contradict eachother.
- It is ok to have the code duplicate twice. Add a comment to track the conscious decision.
- But when I find myself to do it the 3rd time, it is time to think if I can factor it out.
Three use cases tell me more than two about how things can be abstracted and if it makes sense.
If your only priority is removing duplication at all costs, you'll end up with worse code than if you just let code be duplicated.
Not including cross-cutting concerns that modify all usages (eg, changing your logging or dependency injection library).
Duplication has it's own set of dangers leading down the road to a verbose mess of convoluted crap code in most cases.
Most established code bases make me want to puke before long.
With two copies of the code you can’t be sure if the similarities are factual or coincidental. At three the situation begins to crystallize quite rapidly.
What definitely does make things easier is simply having less code to fix. Although measuring the 'amount' of code is at least somewhat subjective.
I think there is always the problem that programmers want to apply all these principles, including DRY, without actually thinking about it. Once it's applied illogically you are in strange and awful territory.
The nature of software development is change. Extensive use of abstractions can make your code base rigid and averse to change.
I can't find the snippet but it's somewhere in the "DDD is not for perfectionists" vein
- Programmers may look at a bug in a simple shared function and conclude that it “obviously” should be fixed, and do so quickly without really understanding what else could go wrong. (As a completely contrived example: You “fix” something that previously couldn’t return a negative value, and move on; turns out this “fix” allows a bug somewhere else to crop up, a catastrophic improper cast from signed to unsigned, blowing up your -1 into an iteration over billions of expensive operations.)
- Bug priority levels vary between features, even if code is shared. Your abstraction may make it effectively impossible to fix just one high-priority feature, if your deployment is (say) set up to run hours or days of regression tests on all affected parts. Generally, the more segregated things actually are, the easier it is to set priorities well.
Just because something is duplicated doesn’t mean that it’s that way forever, either. At a good branch point, such as a new project, you can aggressively prune out things that won’t apply to that branch even if they helped keep things stable on the previous project.
“Generic” functions make it difficult to find all uses or even understand what scenarios they belong to. With bland say-nothing nouns and meaningless verbs like execute() or process() that appear everywhere in the code, you’re just crossing your fingers and hoping for the best.
Definitely a lot harder to fix when you've done this though.
I think "the wrong abstraction" is too lofty a title this is just about oversize functions.
this is not to say that if you rollback the code and then commit tonfinding and implementing new abstractions you won't win - but the second part is necessary or you are just digging a deeper hole.
think of it as a wrapped transduction - younhave to do the second part as well.
The case filled code that she is describing seems to be a result of the programmers not fully grasping the purpose of the code, and being unable to tell if the current abstraction is fitting. I understand that deadlines and the sunk cost fallacy play a factor, but, at least for me, finding the right abstractions / architecture is most satisfying part of coding! Shouldn’t that be what these programmers are focusing on in the first place?
Ideally, we'd like our code to be:
- Mutable (i.e. easy to modify
- Good at doing what it's supposed to do.
- Other stuff that I'm forgetting.
The general recommendation against duplicate code is intended to promote mutability (by avoiding multiple implementations that need to be changed). If you apply it blindly without keeping mutability in mind, you can get situations like the one the authors describes.
I see some of the same myopia when people talk about testing. Testing is there to ensure that your code is correct, and that it's easy to make changes without affecting correctness. As soon as you find yourself writing tests that aren't for those two reasons, consider whether it's worth the effort.
Just off to prototype this now, should have it done by the end of the day... :)
And at that point, you don't really need to worry about compilers, you just have AGI looking at the code.
What you are proposing requires a technological miracle to implement. That's why it doesn't make sense. When we can do miracles, we will obviously use them in the mentioned and in many other areas. The problem is to create AGI.
Programmer B feels honor-bound to retain the existing abstraction, but since isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter.
Programmer B's poor decision doesn't mean you should reach for ctrl-v, in my humble opinion. But I'm willing to change my mind, if there's a compelling case.
I find that a particularly good way of learning (at least over plain reading/coding).
I've always found the easiest way to refactor is to get really good code coverage on the outermost layer of code that uses the abstraction, remove/ignore unit tests if there are any on the abstraction itself, then refactor the abstraction with vigor until it feels appropriate and hopefully elegant or is removed if necessary. As long as all your tests still pass, you should be good to go.
There's no point trying to think about abstractions before you know what the problem is.
You don’t yet know what abstraction you need or what extensibility or generalizability you need, and prematurely extending in these directions can either paint you into a corner where you have to do terrible things to avoid throwing things away and rework, or else you have to bite the bullet and do a bunch of rework.
There can be a lot of benefits to just duplicating things like config, occasional pieces of function bodies, classes with modified functionality, or even using whole projects as templates, like quickly getting a collection of inter-related web servics going with copy/paste code and factoring out common code later.
Abstraction is bad and is the price you pay for being able to move lots of things around at once. WET (Write Everything Twice) > DRY (Don't Repeat Yourself) because you might be able to grok what the heck you meant at the time. It kills me that my colleague wants to be clever. NO! Clever is bad!
Every single lookup was a copy & paste while loop with business logic inside the loop, and then a break statement.
This is a textbook example of when not to copy and paste.
In his words: we are a solving problems, we are resolving paradoxes.
You only find yourself at step 8 after a suite of bad decisions, and possibly even bad decisions that you signed off on during code review.
It's much more likely that the code in question is responsible for one particular thing, and switches between several different ways to achieve some sub-goal. Those parts should be lifted into some kind of interface, where the different variations are lifted into different implementations of that interface. A lambda function is the most general interface possible, so it's probably not the best choice, you'd eventually end up with callbacks calling each other without it being entirely clear which callback does what.
A typed lambda function (i.e., with a defined arguments + return signature) is exactly as specific as any other typed interface, an arbitrary lambda function isn't, sure, but there's few languages where a static interface and an arbitrary lambda function are both available tools.
This is what lambdas are for. Going to interfaces just adds needless syntax sugar.
int countlines(file afile, bool hasfuckeduplineendings, bool needtobusywaituntilreadswillsucceed)
My preferred solution, and I don't claim that this is correct, is just to put a global variable
bool nexthasfuckeduplineendings = false; //set to true before counting lines in a file that needs preprocessing for fucked up line endings
bool needtobusywaituntilreadswillsucceed = false; // this is a hack. Certain specific files will just fail to read for an unspecified period of time, they will fail and fail and fail and then succeed. For cases that we know this will happen, set this to true.
See how awful and fucked up this is?
It is "obvious" that this hack is just so wrong.
But is it really? It's clear, gets shit done, and is super transparent about how wrong it is.
Should every reader hang in a busy loop?
Should every reader preprocess line endings?
Maybe "no" and "no".
What do you all think?
This is what OP recommends too.
This is because while the namespace is wide, in practice you work within a “working set” of your daily use packages.