This is a thought-provoking and well-written blog post about programmer biases when it comes to reading, and judging, other people's code.
The author is making the point that code readability is ultimately in the eye of the beholder. I've come to share the author's views, and I have to say I don't hear it said much in programming culture. At most places I've worked, there's this culture of constant refactoring under the guise of "continuous improvement," when really, if you look closely, it's really motivated out of disdain for the last developer's programming style, and in my opinion, a general aversion to reading code.
Reading code is about 10x as hard as writing it. It takes more concentration, it's less fun, it's harder, and it doesn't impress anyone. You have to know the language better than the person who wrote it, because not only do you have to understand why the code does what they intended it to, but you also have to understand why the code does other things they didn't intend (a.k.a. bugs). But in my experience, you save your team
a lot more time and energy in the long run by preferring to read and understand existing code.
Alternatively, we refactor because the right abstraction for the previous phase of the project is not the right abstraction for the current or next phase of the project.
While I do have a respect for Chesterson's Fence as a concept, sometimes the answer to "why is it this way" is "we were learning as we went, and if we did it again, we'd do it another way."
I look at it this way: When you look at an older city built before the age of the car, they were built to be tiny to start, not more than a set of shacks. As the town built wealth, the buildings went to more sturdy, to multi-story, and to more ornate structures. Similarly, our code should start simple and dirty, then cleaned up as it has proved its worth, and then refactored to more robust patterns as the code has built the wealth (and demand) to justify it.
We should consider rewrites, then, as a sign of value, rather than as a sign of the previous programmer's failure.
> sometimes the answer to "why is it this way" is "we were learning as we went, and if we did it again, we'd do it another way."
This is true! In fact there's a lot of that, in my own experience. Rewrites are probably most useful on code you wrote, rather than on someone else's, and right when you realize what went wrong, while you're still intimately familiar with the old code.
I've watched two different companies follow the "things will be so much better if we rewrite" logic and rationalize large scale rewrites that cost millions, and years, and failed to achieve the aims of doing it better the second time.
Rewrites should be considered a sign of value only if we actually learned from out mistakes, only if both everything wrong with the old code and everything right with the old code are well understood. If you rewrite anything substantial before that, you're just guessing, and you're most likely (in my experience) going to suffer taking longer than you want and making the same mistakes again. I've seen that happen to many very smart people.
So there's a balance. Rewrites are sometimes valuable, but not automatically valuable. Sometimes rewrites are very harmful. The best chance you have of knowing which one is to read lots and lots of code before you start, to make absolutely sure that the code you're replacing is never being replaced only because the readers didn't understand it or didn't like it's patterns.
OTOH, if you have complete test coverage in place before a rewrite, you can freely annihilate and redo large portions of code without having to study too hard.
I've seen smart people fall into the following trap:
1. Previous developers did X and X is bad, therefore, their code needs to be rewritten without doing X.
2. Oh crap - turns out, the new code has all these requirements to match the old code's functionality in ways we didn't expect (letters A-W).
3. Okay, so we're doing Y and Z in the rewrite, knowing it's pretty bad, because we didn't know we'd have to do A-W and now we're short on time. Oh, and parts of the code still do X.
Now, wait a year for one third of the team to leave because they're rushed, overworked, burnt out, going in a different direction... and another third to get laid off because the project went way over time and budget...
4. Previous developers did X, Y, and Z, and X, Y, and Z are bad, therefore, their code must be rewritten...
That's the point of Chesterton's fence analogy: Don't rip down the fence until you know why the fence was put up.
I think we're in agreement on this - if you don't have the tests, you don't know what the system does. The Michael Feathers approach is my favored path forward in these cases. Rewrites are more valuable in the small (class-level) than in the large (application-level) in the vast majority of cases. And if you absolutely need to replace an application (say, your company standardized on Oracle and Tcl and you can't hire any new developers because they laugh when you tell them your stack...) you do it piecemeal, building tests in your old system so that you can reliably replicate the functionality in a way that functions as a living, reliable spec.
> sometimes the answer to "why is it this way" is "we were learning as we went, and if we did it again, we'd do it another way."
True, and that's believable when it's people rewriting their own code, but sometimes you'll have people who didn't even try to understand the existing code express a desire to rewrite it.
I suspect this is as culturally determined as the differing American, European, and Japanese attitudes to rebuilding buildings. European cities typically retain medieval street layout and as many buildings from that time as possible, sometimes even rebuilding to the original style after destructive events. Whereas Japanese houses have a ~30 year lifespan and are routinely rebuilt. And I'm looking up the hill at a castle that accreted between the 12th and 17th centuries.
Just as the street layout will outlive the buildings, APIs tend to be extremely durable and intolerant of destructive rewrites. Consider the Python transition.
There is a lot the context you’ve built up in writing that code that could never fit in the comments (even if thoughts could easily be expressed in words, they would dwarf the code and not directly correspond to it; actually comments can actually make code reading harder in this way). It really isn’t about the language either, at least the programming language, but how the problem was defined and understood in the first place, how this understanding was encoded in the software.
Reading code is basically trying to reverse engineer the thinking of the programmer by looking at second order output. Of course it will be hard! It isn’t just style, nor would I say mainly just style. Continuous improvement is often just a matter of rebuilding the context that was lost with the last programmer.
There is a lot the context you’ve built up in writing that code that could never fit in the comments
Isn't that precisely what Knuth was trying to resolve when he came up with the idea of literate programming[1]? The fact that you might end up with more words than code isn't a really problem if the end result is better (for some value of 'better') than just the code.
Yes. But those words don’t come for free, they could be much more expensive than writing the code itself, it’s like trying to teach something rather than just doing it.
it’s like trying to teach something rather than just doing it
If you work on a team a lot of your time is spent 'teaching' (explaining) your code to other developers. Or teaching yourself about it when you come back to something you wrote 6 months ago. Or 'teaching' a QA person your logic to understand where a bug is coming from. Or using your code to literally teach a concept to a junior developer.
Writing documentation can feel like teaching rather than just doing, but that's not necessarily a bad thing if you're working on something that other people need to understand.
This is not the right model for assessing the cost of documentation. If the program is in any sense designed (as opposed to being assembled and modified on the basis of hunches until it appears to work), then the ideas expressed by those words must have been known no later than the completion of the work. Therefore, the cost of documentation is that of writing down these ideas, and the cost of not documenting is the cost of repeatedly reverse-engineering them from the code.
It is my experience that you can markedly improve the speed and accuracy with which a newcomer can understand a code base with supplementary documentation of considerably fewer words than are in the code itself - so long as those words are well-chosen, and focus on the programmer's intent.
You mean waterfall right? Ya, then I guess. If the program is understood before it is written (waterfall), then you merely write down these ideas along side the code. If programming plays any matter into evolving the design (as you say, “hunches” that go into a feedback loop), then this will break down quickly.
Edit. Never mind you mean afterwards. But is the assumption that design can occur independently of programming really true in practice?
It depends on the complexity of the problem you are working on. Something well understood before programming starts has a better chance of being well documented with short prose (because it is well understood, a lot of shared universal context can be relied on). There are lots of things out there that don’t meet this criteria, however.
My description of blind trial-and-error programming is not a veiled reference to agile development, if that is what you are thinking - it is, rather, an anti-pattern in development that is neither agile nor waterfall. There is nothing in agile that says you should just try things until something seems to work. No line of code gets written without the programmer of having an expectation of it making some contribution to the solution, and the issue is how well-founded that expectation was.
Update: Perhaps the canonical example of non-agile trial-and-error programming without well-founded expectations is the programmer who is putting delays in various parts of his program in an attempt to fix a concurrency error.
One's understanding of the program does not necessarily break down under iterative development, as, if you are doing it right, each iteration improves your understanding.
The usefulness of documentation does depend on the complexity of the problem, but in the opposite sense: programs solving simple, well-defined problems do not benefit much from additional explanation (there's not much to say that is not obvious), but the more complex things get, the more it helps.
There are plenty of cases where programming helps you explore the design space, where you have little knowledge about the APIs you are using, so you poke them a bit here and there, obtaining experience in knowing how to use them in the way that you need (because, let’s be honest, even the best frameworks have defficiencies in their documentstion, if you decide to read the docs at all). Likewise, it isn’t that weird to write some code that you know is broken so you can fix it in the debugger where live values and feedback are available. Heck, many people code from interpreters these days which are as exploratory as you can get!
Of course, we can argue about different kinds of programming have different needs. Prototyping doesn’t require documentation and so can move much faster than product development, for example. The cost of not documenting is a huge win for the prototyper, allowing them to try out and throw away designs while worrying less about sunk costs.
The design has to come from somewhere, after all. A design team with prototyping resources really values those resources.
Your example is a case where small amounts of additional documentation can be useful. If you are working with a poorly-documented API and you find something that is unintuitive and non-obvious (it might be as simple as an arbitrary choice between equally valid design options, where exactly one needed to be chosen) but which matters in what you are using it for, then a note of that fact could save a lot of time in the long run, depending on what you are coding for: if it is just for yourself, then you only have to consider what is best for you, but if it is an actual product where it is likely that others will have to understand it, that note might pay for itself many times over.
I sometimes describe programming as a one-way hash operation on requirements. A lot of information and context gets lost when writing software, and I haven't seen a workable solution to that problem yet.
This is closer to the root of the problem than saying code is unreadable... the information isn't lost because of the code being produced, it's lost because the developer leaves. Documentation won't work because requirements change faster than they can be documented. I don't know a solution other than just trying to convince that developer not to leave.
IMO that's part of why readable code is so important. If I can look at a piece of code and understand its behavior then I can know something. I might not know what stated requirement it was trying to solve, but I can know for sure what requirements it implements.
Compare that to sloppy code bases with side effects everywhere where you don't know what it's supposed to do or what it does.
The biases are real, but readability is perhaps less of an ultimate concern than maintainability and reusability. Both of these depend on the ability of yourself (and others) being able to understand and adapt the code. A bit of foresight in making the life of future you (or colleague) easier goes a long way. Technical debt is real too and the interest can be high.
I would say that readability is a facet of maintainability. If a good developer looks at code the first time and thinks, "WTF," it could use some better readability. That's about as nailed down as I can make it (because it's so objective).
Throughout most of my career reading and maintaining code got fobbed off on the new programmers and the less skilled. Maintenance programming has a bad reputation, partly because it requires reading and figuring out someone else's code.
Now I make a living reading and maintaining code no one else will touch. Lots of companies can't afford to rewrite a mostly-working system, or they can't take the risk. I found a niche doing maintenance work and now I enjoy fixing what other programmers have said they can't maintain. Freelance maintenance work pays just as well as green-fields development and has fewer customer hassles, too.
Only if you write shitty code it's 10 times easier to write it than to read.
When I write my code I use much, much more effort to try to express clearly my intent and what it is doing, and in that case it's much easier to read it than to write.
I think that we can easily say that good code is very difficult to write and very easy to read, bad code is very easy to write and very difficult to read.
The code is written once and read n times by m different people, so there is a huge gain in spending more time in making it simpler to understand rather than spending the minimum time in writing code and leave all the effort to the readers.
Readers that, very often, after they become too frustrated, want to rewrite it (and for very good reasons I'd say).
We all think that we write our code so other programmers can read it, and we believe that because while we have the code in our heads we don't have any problem reading it.
I don't question your intentions or abilities, but the fact remains that a whole lot of programmers find almost every piece of code they didn't write hard to read. Or they say that -- I think they mean they just don't like the look of it, or they can imagine writing it differently.
I'm of the opinion that companies should rewrite as much of their codebase as often as possible. Given the difficulty of trying to understand code written by someone else, why shouldn't production code be considered immutable?
The author is making the point that code readability is ultimately in the eye of the beholder. I've come to share the author's views, and I have to say I don't hear it said much in programming culture. At most places I've worked, there's this culture of constant refactoring under the guise of "continuous improvement," when really, if you look closely, it's really motivated out of disdain for the last developer's programming style, and in my opinion, a general aversion to reading code.
Reading code is about 10x as hard as writing it. It takes more concentration, it's less fun, it's harder, and it doesn't impress anyone. You have to know the language better than the person who wrote it, because not only do you have to understand why the code does what they intended it to, but you also have to understand why the code does other things they didn't intend (a.k.a. bugs). But in my experience, you save your team a lot more time and energy in the long run by preferring to read and understand existing code.