Not a conclusive example. In the industry code that isn't DRY is a much bigger p...

swatcoder · on May 30, 2024

Having specialized in project rescue, touring all over "the industry", you can't possibly make that generalization.

For every purported best practice, there are teams/orgs that painted themselves into a corner by getting carried away and others that really would have benefited from applying it more than they did.

In the case of DRY, it's an especially accessible best practice for inexperienced developers and the project leads many of them become. Many many teams do get carried away, mistaking "these two blocks of code have the same characters in the same sequence" with "these two delicate blocks of code are doing the same thing and will likely continue to do so"

Having advice articles floating around on both sides of practices like this helps developers and teams find the guidance that will get them from where they are to where they need to be.

Context, nuance, etc, etc

AnimalMuppet · on May 30, 2024

In Zion National Park, there's a hike called Angel's Landing. For part of the hike, you go along this ridge, where on one side you have a cliff of 500 feet straight down, and on the other side, you have a cliff of 1000 feet straight down. And in places, the ridge is only a couple of feet wide.

Best practices can be like that. "Here's something to avoid!" "OK, I'll back far away from that." Yeah, but there's another cliff behind you, of the opposite error that is also waiting to wreck your code base.

Listen to best practices. Don't apply them dogmatically, or without good judgment.

PaulHoule · on May 30, 2024

If that's what they wanted to prove they should have shown a better example.

swatcoder · on May 30, 2024

That's fair. I think the insight/concept behind the essay is sound, but I agree that the example (and writing) could be a lot better.

ric2b · on June 4, 2024

DRY should honestly just be restated as "Avoid multiple source of truth", two pieces of code that look similar but have two separate goals are not something that falls under DRY unless you can pull out some common logic that actually is the same goal and can become the single source of truth for that goal.

znkr · on May 30, 2024

I am the industry for over 10 years now. Whenever I have to work with a project where someone used DRY consciously, I know I am in for a world of pain. Consolidating code is easy, pulling it apart is a lot harder.

a1369209993 · on May 30, 2024

> Whenever I have to work with a project where someone used DRY []consciously[], I know I am in for a world of pain.

Huh. When you put it that way, that's actually a good point. In my experience, competent programming will try to consolidate repeated code, and then cite "because DRY" if asked why, but I can't think of any case where I or anyone else competent started with "needs more DRY" as the original motivation (as opposed to "this is a incomprehensibly verbose mess" or the like).

Conversely, starting with "don't repeat yourself [and don't let anything else repeat itself]" as a design goal does seem to correlate well with cases where someone temporarily (newbie) or permanently (moron/ideologue) incompentent followed that design principle off a cliff.

lpapez · on May 30, 2024

> Consolidating code is easy, pulling it apart is a lot harder.

I absolutely agree with this, and the only thing I would add is that is difference is even more pronounced in codebases using a dynamic language.

Sure it's not easy to navigate a bowl of duplicated spaghetti, but navigating opaque DRY service classes without explicit types is a nightmare.

Luckily as an industry we've realized the benefits of static typing, but your point still holds true there.

rqtwteye · on May 30, 2024

"Consolidating code is easy, pulling it apart is a lot harder."

My experience is the opposite. The less code, the better. I just spent a week on refactoring UI automation test code where they had copied the same 30 lines of code into almost 100 places. Every time with an ID changed and some slightly different formatting. It took me a few days to figure out that these sections do the same thing so I decided to introduce a function with ID as parameter. It was a lot of work to identify all sections and then to make sure they are really equivalent.

Saved us 3000 lines of code and now we can be sure that timeouts and other stuff is handled correctly everywhere. An we can respond to changes quickly.

that's DRY to me. Don't copy/paste code. Introduce functions. Ideally in the simplest way. When you have functions, you declare the same behavior everywhere.

EugeneOZ · on May 30, 2024

I have 20 years in the industry and one of the rules I learned is: Articles justifying laziness are ALWAYS warmly welcomed and praised.

To get internet points easily, write something of that:

“Clean code is overrated”

“SOLID is holding you back”

“Tests are less important than profits”

“KISS is the only important principle”

“Declarative programming is only suitable for pet projects”

“Borrow checker is the plague of Rust”

and so on.

mytailorisrich · on May 30, 2024

How do you consolidate code?

Good way to go at it is to isolate the functionality that is used many times and to pull it aside in its own function (or similar). That's just good code practice and also makes it easy to refactor and modify as needed.

znkr · on May 30, 2024

It’s not about being used many times, but about the necessity to evolve in the same direction. When that happens, it usually manifests as toil for the team. Consolidating code means to change the structure of the code so that only one piece needs to be modified in the future. That can take many forms, but it usually involves creating a new shareable component.

Shareable components are more effort to maintain, so just creating them because they consolidate code is not always a good idea. You really want to have positive ROI here and you only get that if you actually reduce maintenance burden. For raw code duplication that doesn’t have a maintenance issue on it‘s own, the bar is a lot higher than most people think.

PaulHoule · on May 30, 2024

Well, this morning I just fixed a case where somebody had used btoa to base64 encode something in Javascript and used methods from Buffer somewhere else because they'd been intimidated away from using btoa. (Ok, it is dirty to use UTF-8 codepoints if it is byte values, you can write btoa("Á") but btoa("中") is a crash.)

It would have been OK if they'd used the right methods on Buffer but they didn't.

These encoding/decoding methods are a very good example of code that should be centralized, not least so you can write tests for them. (It is a favorable case for testing because the inputs and outputs are well defined and there are no questions of whether execution is done like you might encounter testing a React component) It is so easy to screw this kind of thing up in a gross way or an a subtle way (I'm pretty sure btoa's weirdness doesn't affect my application because codepoints > 255 never show up... I think)

There's the meme that you should wait until something used 3 times before you copy it but here is a case where two repetitions were too many and it had a clear impact on customers.

mytailorisrich · on May 30, 2024

Raw code duplication is always a maintenance issue when centralising it when you notice the duplication (instead of keeping copy-pasting it) costs nothing.

12_throw_away · on May 30, 2024

I basically agree, but doesn't this just mean, if I'm consolidating non-DRY code, that I'm now the one using DRY consciously, and the next dev will be cursed with all of my newly introduced DRY abstractions?

znkr · on May 31, 2024

If you don’t have another reason for consolidation than consolidation then yes :)

actionfromafar · on May 30, 2024

Can concur. Mostly it was I causing the pain, earlier.

jeltz · on May 30, 2024

Not from my experience. Unnecessarily duplicated code, even when there are small differences which are likely accidental, is usually much easier to fix than too DRY code. Pulling apart false sharing can be really hard.

kitkat_new · on May 31, 2024

example? duplicating (literally copy paste) is easier than even finding duplicated code with small differences.

danielmarkbruce · on May 30, 2024

In your part of the industry, perhaps. My experience has been the opposite.

ravenstine · on May 30, 2024

Same. From what I've seen, most code is written with abstractions and DRY as a high priority rather than writing code that is performant and doesn't take jumping between 5 different files to make sense of it.

danielmarkbruce · on May 30, 2024

I started writing Go around 2012 or so because of the file jumping thing. Drove me nuts. I'm sure there were many folks doing the same thing.

nkozyra · on May 30, 2024

> In the industry code that isn't DRY is a much bigger problem than code that is too DRY.

As with anything dogmatic, it truly depends. There are times when the abstraction cost isn't worth it for a few semi-duplicate implementations you want to combine into a single every-edge-case function/method.

PaulHoule · on May 30, 2024

There's a certain psychological attraction to messy and confused situations which people are just too comfortable with but it explains why things like GraphQL (didn't have a definition for how it worked for years because "Facebook is going to return whatever it wants to return") inevitably win out over SPARQL (which has a well-defined algebra).

One of my biggest gripes (related to the post) is the data structure

   create table student (
      ...
      applied_date                         datetime,
      transcript_received                  datetime,
      recommendation_letter1_received      datetime,
      recommendation_letter2_received      datetime,
      rejected_date                        datetime,
      accepted_date                        datetime,
      started_classes_date                 datetime,
      suspended_date                       datetime,
      leave_of_absence_start_date          datetime,
      leave_of_absence_end_date            datetime,
      ...
      graduated_date                       datetime,
      ...
      gave_money_date                      datetime,
      died_date                            datetime
   )

which is of course an academic example but that I've seen in many kind of e-business application. Nobody ever seems to think of it until later but two obvious requirements are: (1) query to see what state a user was in at a given time, (2) show the history of a given user. The code to do that in the above is highly complex and will change every time a new state gets added. The customer also has experiences like "we had a student who took two leaves of absence" or "some students apply, get rejected, apply again, then get accepted" When you find data designs like this you also tend to find some of the records are corrupted and when you are recovering the history of users there will be some you'll never get right.

If you think before you code you might settle on this design

    create table history (
       student_id                         integer primary key,
       status                             integer not null,
       begin_date                         datetime not null,
       end_date                           datetime
    )

which solves the above problems and many others in most situations. (For one thing the obvious queries are trivial and event complex queries about times and events can be written with the better schema.) I can't decide if the thing I hate the most about being a programmer is having to clean up messes like the above or having to argue with other developers about why the first example is wrong.

If "No code" is to really be revolutionary it's going to have to have built-in ontologies so that programmers get correct data structures for situations like the above that show up everyday in everyday bizaps where there is a clear right answer but it is usually ignored.

gls2ro · on May 30, 2024

Two points here just for fine grain discussion:

1. The first table structure is a flat non-normalized table structure that trades normalization for easy to query and select computed properties

2. Second structure is a normalized table structure that trades the normalization for joins.

PaulHoule · on May 30, 2024

Either one is normalized so far as I know.

It is easy to write a query for the first that gets a list of students names and the dates they applied. That query is harder for the second one. On the other hand figuring out what state a user was in at time t could be a very hard problem with the first table.

My experience with the first is that you find corrupted data records, one cause of that will be that people will cut and paste the SQL queries so maybe 10% of the time they wind up updating the wrong date. Systems like that also seem to have problems with data entry mistakes.

The biggest advantage of #2 is ontological and not operational, which is that in a business process an item is usually in exactly one state out of a certain set of possible states. Turns out that this invariant influences the set of reasonable requirements that people could write, the subconscious expectations of what users expect, needs to be implicitly followed by an application, etc.

Granted some of the dates I listed up there don't quite correspond to a state change, for instance the system needs to keep track of when a student started an application and when the last document (transcripts, letters, etc.) has been received. With 5 documents you would have 32 possible states of received or not and that's unreasonable, particularly considering that a student with just one letter and a very strong application in every other way might get accepted despite that. It's fair to say the student can have an "open application" and a "complete application". Similarly you could say the construction of an airplane or a nuclear power plant can be defined by several major phases but that these systems have many parts installed so if the left engine is installed but the right engine is not installed these are properties of the left and right engine as opposed to the plane.

barryrandall · on May 30, 2024

The number of person-hours wasted on over-engineered products that never even made it to release could have: solved the halting problem, delivered AGI v2.0, made C memory-safe without compromising backward-compatibility, or made it easy to adjust mouse pointer speed on Linux.

ldjkfkdsjnv · on May 30, 2024

Abstraction too early is usually a mistake, no one is smart enough to predict all the possible edge cases. Repeated code allows someone to go in there and add an edge case easily. Its a more fool proof way of programming

jacknews · on May 30, 2024

"In the industry code that isn't DRY is a much bigger problem than code that is too DRY."

which industry is that?

in general programming, absolute nope

not-DRY code can be weaseled out with a good ide

badly abstracted code, not so much

in fact in a way, DRY is the responsibility of the IDE not the programmer - an advanced IDE would be able to sync all the disparate code segments, and even DRY them if necessary

but when I read DRYed code, the abstraction better be a complete and meaningful summary, like 'make a sandwich', and without many parameters (and no special cases), or else I'd rather read the actual code

i understand the impulse to try to factorize everything but it just doesn't work beyond a certain point in the real world; it's too difficult to read, and there's always an 'oh, can you just' requirement that upends the entire abstract tower.

goatlover · on May 30, 2024

You didn’t provide any evidence for this, you just stated your coding preference. Which is usually the case in these discussions. Some anecdotes, and then people making grand claims based on personal preference. Obviously, some programmers have thought the opposite and have their own anecdotes.

jacknews · on May 31, 2024

the comment I replied to was merely a strong opinion

same same

i don't believe there is much evidence, certainly nothing conclusive, in this debate

but factorizing code concentrates the logic

that can be an advantage, to a certain degree, but it also reduces resilience, by specializing the code, and can reduce readability by forcing lookups of nested abstractions