Hacker News new | past | comments | ask | show | jobs | submit login
A Short Cautionary Tale About Refactoring (zigamiklic.com)
45 points by ziga-miklic on March 21, 2021 | hide | past | favorite | 41 comments

This is a cautionary tale about a) commenting non-obvious code and b) adding tests. Someone had to think hard about how to solve this problem. Everyone coming after shouldn’t have to. I must always be able to do drive-by refractoring with just seconds of thought. If tests are green and quick manual validation doesn’t show anything broken - then I’ll assume the refactoring was ok.

The job isn’t to make something work. It’s to make it work AND be resilient to a bad refactoring. If you didn’t properly protect your implementation with tests and comments you didn’t implement it at all.

To the downvoters: why? Genuinely curious.

I have not down voted it myself but there is the assumption that unit tests will always catch bad refactorings, that human error doesn't apply to writing unit test, that every developer is perfectly conscientious, competent, is not under tight deadlines etc., that your legacy code base has comprehensive unit tests..

The author here: I agree with @Gupie, having unit tests ensures that you have correctly refactored the code and have not introduced new bugs. Checking the unit tests can also help you better understand what problems the code is solving and the edge cases you may be missing.

Normally, if unit tests were missing, I would add them before starting refactoring. In this case, to be honest, I'm not sure how I could have written a unit test to catch the issue. An E2E test is the only one that comes to mind. Please let me know if I am missing an easy way to test for a 404 image when offline. Thank you!

The danger is in the combination of all 3: non-obvious assumption (code should work offline), lack of tests, lack of documentation of the assumption.

Flip ANY of the three, and there is no problem. Had I been the original author I would have gone for the comment, it’s clearly the simplest one in this case.

Try to actually design the test that would have caught this. There is no way in this universe that one can blindly rely on tests when refactoring.

Code that is obvious is better. That’s also difficult here (You’d for example make a function call of image_that_should_work_offline(..)” that simply returns its argument, but that’s perhaps a bit contrived). So the second best is simply a comment.

// Image inlined because error may be due to dropped connection

You don’t have to have tests to prevent bad refactoring. 100% test coverage isn’t realistic for most code bases (or isn’t even a net positive in many cases).

self-documenting code, or tests, or comments would have saved the situation. You don’t need to design a test in this case. A one line comment that takes five seconds to write is enough.

Also, if you see something that doesn't make sense later, add comments once you figure it out. Even if there wasn't any other change, but for sure with the replacement feature.

This applies not just to would-be bugs, but to anything where an interface confuses you to the point you have to thoroughly read the implementation. For example, the countless utility functions that accumulate in a legacy codebase.

I call this "drive-by" commenting, and I do this so that others don't have to waste their time figuring out the things that I did. I find it particularly important to surface assumptions a piece of code may make, that aren't obvious from its signature. Random recent examples:

- A C++ function that takes two containers and internally does std::transform on them: hidden assumption here is the second container is at least as big as the first one (otherwise std::transform will try to read past its end).

- A C++ function that internally calls std::unique and std::sort: the inputs now must be valid for comparison under == and <, and also the output is now sorted - which may, or may not, be something some users already rely on. Better to either pin this behavior down in documentation, or explicitly disclaim it ("relative order of input elements is not preserved") to leave some wiggle room for future changes.

A lot of such things can be expressed using type systems, compile-time checks and runtime checks, but it's not always obvious to the reader. My approach is: if a function makes an assumption or guarantee in a way you won't see it in its signature, and your IDE won't pick up on it automatically, then it should be described in the comment.

Some such things can also be pinned down in unit tests, to better protect against inadvertent breakage due to refactorings - but I say also, not instead, because day-to-day, nobody actually looks at unit tests related to random functions they call, while a properly configured editor or IDE will show documentation comments of function signatures as you type or browse completions.

The author here: I completely agree! When I wrote the post I did come to the same conclusion. I recommended adding a comment or changing the naming to make the inline image obvious, but I missed it when I committed the changes. So I definitely need to practice what I preach.

Its an interesting concept.

A better link: https://fs.blog/2020/03/chestertons-fence/

That was such a great read. Thank you both for sharing!

Imo the big takeaway from the story is that the code probably needed an explanation (aka comment) to prevent others for making the same faulty assumption that the images were inlined for no reason and wasting the same amount of time "refactoring".

In other words, it could/needed still to be improved for better maintainability, not just in the obvious way (Chesterton's fence, as mentioned by another commenter)

Yea, there are many schools of thought on what/how to comment "properly". But I have found this simple rule to cover most of the cases, it's not perfect but better than most:

"The code is the HOW, the comments are the WHY"

That's a good rule of thumb. Personally, I also add comments for the HOWs that may be relevant to users of a function/class, that cannot be easily expressed in code in the function signature or class declaration.

Things like, "relative ordering of results is not preserved", or "this function is O(n^3)".

A good example of why the fundamental rule of refactoring is don't change behaviour (such as taking a dependency on a http request where there wasn't one before)

The author didn’t consider that a changed behavior - the inlined image already had a dependency on http to be downloaded in the first place. No, this is all on the original author.

I argue that the inline image had no such dependency, as it was an internal component of the control and could just as easily be used in a non-http context. By loading the image through a http request, the image component is effectively being injected with the http service in order to operate.

The correct refactoring is one the author fuelled with coffee, which is naturally the real lesson here.

Precisely that.

This isn't refactoring per-se it is focusing on the cosmetics of the code at the expense of functionality. There are many other reasons why an inlined image may have been appropriate and the author overruled all possible concerns by going for replacing a large block of 'ugly red' by a smaller block 'pretty green', which is a very bad reason to make a change in functionality, especially without understanding what is going on.

The problem is noticing that you are changing a behaviour.

Automated tests to the rescue

The problem is noticing that something is a behavior that needs to be tested.

> Luckily I spotted the issue before I pushed my branch ... I extracted the inlined image into a separate React component and used it where needed. This way we still get both benefits

I expected the author to mention adding a code comment so future maintainers don't make the same mistake.

The author here: Thank you for reading my post! You are correct - I should have. I missed adding a comment when creating the commits, but did include it in my recommendations in the paragraph just below the one you quoted:

> ...if there is a good reason for the current approach, document it. Add a more descriptive name or a comment to avoid confusion in the future.

Last week, I refactored an update query into smaller queries looping and doing only 20 updates at a time because too many updates was taking more time than allowed. It turns out that one query that happens is to remove a field of it isn't one of the query values. Well, if you split that list of query values into several disjoint sets, then it turns out that the field is removed in all records! Whoops.

Missing findings:

- someone should have pushed back hard on the design that embedded such a huge png in the design

- the programmer in the article should have broadcast his finding by submitting a code comment

That post then links to another post about 'Chesterton's Fence', which references this story below, which reminded me a lot about how Brexit came about.

> Suppose that a great commotion arises in the street about something, let us say a lamp-post, which many influential persons desire to pull down. A grey-clad monk, who is the spirit of the Middle Ages, is approached upon the matter, and begins to say, in the arid manner of the Schoolmen, “Let us first of all consider, my brethren, the value of Light. If Light be in itself good—” At this point he is somewhat excusably knocked down. All the people make a rush for the lamp-post, the lamp-post is down in ten minutes, and they go about congratulating each other on their un-mediaeval practicality. But as things go on they do not work out so easily. Some people have pulled the lamp-post down because they wanted the electric light; some because they wanted old iron; some because they wanted darkness, because their deeds were evil. Some thought it not enough of a lamp-post, some too much; some acted because they wanted to smash municipal machinery; some because they wanted to smash something. And there is war in the night, no man knowing whom he strikes. So, gradually and inevitably, to-day, to-morrow, or the next day, there comes back the conviction that the monk was right after all, and that all depends on what is the philosophy of Light. Only what we might have discussed under the gas-lamp, we now must discuss in the dark.

Sounds like a good example of Chesterton’s Fence, a rule that basically says to never remove a fence until you can explain why it’s there.

The flip side is that refactoring once in a while can clean up undefined behaviors that no one is around anymore to understand or explain. There is also a problem with companies that try to make engineers hot swappable and don’t focus resources on retention, often the people who understand the system’s special complexity are long gone.

That’s no way to maintain a codbase though. Refactoring can’t require investigation that approaches the effort of building the fence in the first place!

That code base will rot out of a fear of touching it.

The answer to the Chesterton rule is: put a note on your fence to explain why that fence is there, and under what conditions it can be removed in the future. (With code you also have the luxury of tests to verify the fence is there so long as the conditions for it being needed still remain)

To be honest, I've never understood Chesterton's Fence - maybe you can help me understand.

On a many codebases I've worked on, it seems prohibitively expensive to research the specific context for every nonsense piece of code before refactoring it. Does it count to say "this code exists because the original author added a lot of nonsense - let's refactor"?

I agree. It's also the case that understanding often comes from changing. So in a sense the fence is just a blanket "never change anything". That's obviously bad.

I think a more reasonable approach is to change stuff in a conservative way where you limit the blast radius. Like running something like Ruby's scientist library on the new code.

> Does it count to say "this code exists because the original author added a lot of nonsense - let's refactor"?

No. Why is the code nonsense? What caused the author to land nonsense vs something different? It's very possible that the authors have very poor taste and/or decision making, or maybe had different understanding of requirements or potential future direction. Understand how they were mistaken and why the refactored version is better.

Unless you're dealing with code which is literal nonsense (i.e. doesn't compile and/or never did what it was supposed to), assume you're mistaken until you see how the code is mistaken instead.

Take a look at the git log to see if you can at least explain one or two reasons why the original author added such non-sense. Was it a feature tackled on at a later date? Did they try to fix some bugs?

If you can't find a reason (or is not willing to do so), it's practically calling the previous programmer either a bad one, or just doesn't care about anything at all. In that case, refactoring might be the least of your concerns.

Checking the vcs history of the file to understand when, and perhaps why, the giant inlined png was added would be a good first step. Usually that will link you to bugs and work items in your issue tracker as well. At the very least you'll get the names of the people responsible, and if all else fails you can go ask them.

Stick around long enough in one place, and you'll end up doing git blame on some gnarly code and discover that the culprit responsible was... yourself.

> I thought I had learned the hard way earlier in my career not to presume I know better than the original author.

It happens in other fields, too. When I moved to a new city, and my new doctor disparaged a prescription from my previous one, without being able to offer any understanding for why the decision might originally have been made, I promptly found myself a new new doctor.

I think this is also a good tale about the value of testing.

If the original code hadn’t tested these error paths, the author wouldn’t have found out about this use case until after the first reports came in. Worst case, they’d never know and the customer would get a shoddy experience forever!

I'm still not sure about the inlining tough. The module of the error component could request the image before the error gets rendered so the image lands in browser cache and can be displayed when no network is available.

But its also possible the app doesnt use bundling and the error module is cached in isolation. The inlining would be fine then :)

The author here: Thank you for reading my post. I think it is a valid use case as this is a React application and the inlined image is inside JSX. It is also a part of the main app file. So when the user goes offline, the function with the inlined image is run and the image is rendered. As the image source is inlined, the browser does not need to make a request for the image.

If something looks weird, and you know your company to usually be staffed by intelligent people, then you should first understand why the code is the way it is before you change it.

This also would have been a good case for a 1 line comment: // embed OOPS image since client may not be connected

Another way to write this: do not make changes to something that you do not understand. Assume the original authors are not entirely stupid and may have had a reason to do what they did.

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact