In practice it really does mean self documenting code.
Like variables called "daysSinceDocumentLastUpdated" instead of "days". The why comes from reading a sequence of such well described symbols, laid out in a easy to follow way.
It doesn't do away with comments, but it reduces them to strange situations, which in turn provides refactoring targets.
Tbh, its major benefit is the fact that comments get stale or don't get updated, because they aren't held in line by test suites and compilers.
Most comments I come across in legacy code simply don't mean anything to me or any coworkers, and often cause more confusion. So they just get deleted anyway.
In most cases, even though there's verbose variable names you still can't understand the why just by reading the code. And even if you did, why would one want to?
Most often I'm just skimming through, and actual descriptions are much better than having to read the code itself.
This whole notion of "documentation can get out of sync with the code, so it's better not to write it at all" is so nonsensical.
Why isn't the solution simply: "lets update the docs when we update the code". Is this so unfathomably hard to do?
> This whole notion of "documentation can get out of sync with the code, so it's better not to write it at all" is so nonsensical.
To me, this feels similar to finding the correct granularity of unit tests or tests in general. Too many tests coupled to the implementation too tightly are a real pain. You end up doing a change 2-3 times in such a situation - once to the actual code, and then 2-3 times to tests looking at the code way too closely.
And comments start to feel similar. Comments can have a scope that's way too close to the code, rendering them very volatile and oftentimes neglected. You know, these kind of comments that eventually escalate into "player.level += 3 // refund 1 player level after error". These are bad comments.
But on the other hand, some comments are covering more stable ground or rather more stable truths. For example, even if we're splitting up our ansible task files a lot, you still easily end up with several pages of tasks because it's just verbose. By now, I very much enjoy having a couple of three to five line boxes just stating "Service Installation", "Config Facts Generation", "Config Deployment", each showing that 3-5 following tasks are part of a section. And that's fairly stable, the config deployment isn't suddenly going to end up being something different.
Or, similarly, we tend to have headers to these task files explaining the idiosyncratic behaviors of a service ansible has to work around to get things to work. Again, these are pretty stable - the service has been weird for years, so without a major rework, it will most likely stay weird. These comments largely get extended over time as we learn more about the system, instead of growing out of date.
> Comments can have a scope that's way too close to the code, rendering them very volatile and oftentimes neglected.
I think this is a well put and nuanced insight. Thank you.
This is really what the dev community should be discussing; the "type" of comments and docs to add and the shape thereof. Not a poorly informed endless debate whether it should be there in the first place.
> To me, this feels similar to finding the correct granularity of unit tests or tests in general.
I recently had an interview with what struck me as a pretty bizarre question about testing.
The setup was that you, the interviewee, are given a toy project where a recent commit has broken unrelated functionality. The database has a "videos" table which includes a column for an affiliated "user email"; there's also a "users" table with an "email" column. There's an API where you can ask for an enhanced video record that includes all the user data from the user with the email address noted in the "videos" entry, as opposed to just the email.
This API broke with the recent commit, because the new functionality fetches video data from somewhere external and adds it to the database without checking whether the email address in the external data belongs to any existing user. And as it happens, it doesn't.
With the problem established, the interviewer pointed out that there was a unit test associated with the bad commit, and it was passing, which seemed like a problem. How could we ensure that this problem didn't reoccur in some later commit?
I said "we should normalize the database so that the video record contains a user ID rather than directly containing the user's email address."
"OK, that's one way. But how could we write a test to make sure this doesn't happen?"
---
I still find this weird. The problem is that the database is in an inconsistent state. That could be caused by anything. If we attempt to restore from backup (for whatever reason), and our botched restore puts the database in an inconsistent state, why would we want that to show up as a failing unit test in the frontend test suite? In that scenario, what did the frontend do wrong? How many different database inconsistencies do we want the frontend test suite to check for?
That makes no sense to me either. In my book, tests in a software project are largely responsible to check that desired functionality exists, most often to stop later changes from breaking functionality. For example, if you're in the process of moving the "user_email" from the video entity to an embedded user entity, a couple of useful tests could ensure that the email appears in the UI regardless if it's in `video.user_email` or in `video.user.email`.
Though, interestingly enough, I have built a test that could have caught similar problems back when we switched databases from mysql to postgresql. It would fire up a mysql based database with an integration test dump, extract and transform the data with an internal tool similar to pgloader, push it into a postgres in a container. After all of that, it would run the integration tests of our app against both databases and flag if the tests failed differently on both databases. And we have similar tests for our automated backup restores.
But that's quite far away from a unit test of a frontend application. At least I think so.
> With the problem established, the interviewer pointed out that there was a unit test associated with the bad commit, and it was passing, which seemed like a problem. How could we ensure that this problem didn't reoccur in some later commit?
It would seem that the unit test itself should be replaced with something else, or removed altogether, in addition to whatever structural changes you put in place. If you changed db constraints, I could see, maybe, a test that verifies the constraints works to prevent the previous data flow from being accepted at all - failing with an expected exception or similar. But that may not be what they were wanting to hear?
> This whole notion of "documentation can get out of sync with the code, so it's better not to write it at all" is so nonsensical.
I do believe that in a lot of case an outdated, wrong or plain erroneous documentation does more harm than no documentation. And while the correct solution is obviously "update the doc when we update the code", it has been empirically proven not to work across a range of projects.
What 'has' been proven then? No comments or docs? Long variable and method names?
I just had a semi-interview the other day, and was talking with someone about the docs and testing stuff I've done in the past. One of the biggest 'lessons' I picked up, after having adopted doc/testing as "part of the process" was... test/doc hygiene. It wasn't always that stuff was 'out of date', but even just realizing that "hey, we don't use XYZ anymore - let's remove it and the tests", or "let's spend some time revisiting the docs and tests and cull or consolidate stuff now that we know about the problem". Test optimization, or doc optimization, perhaps. It was always something I had to fight for time for, or... 'sneak' it in to commits. Someone reviewing would inevitably question a PR with "why are you changing all this unrelated stuff - the ticket says FOO, not FOO and BAR and BAZ".
Getting 'permission' to keep tests and docs current/relevant was, itself, somewhat of a challenge. It was exacerbated by people who themselves weren't writing tests or code, meaning more 'drift' was introduced between existing code/tests and reality. But blocking someone's PR because it had no tests or docs was "being negative", but blocking my PR because I included 'unnecessary doc changes' was somehow valid.
But arguments around "is this so hard?", or resolution stripping like "so don't write documents at all", are more about superiority signalling, aimed at individualistic benefit.
The fact is that, when you zoom out to org level, comments do quickly drift out of sync and value, and so engineering managers must encourage code writing that will maintain integrity over time, regardless of what people "should" be able to do.
The argument isn’t that it’s better to not write it at all, it’s that it’s not worth the effort when you could have done something else. Opportunity cost and all that.
Lazy people work the hardest. It's an up front investment for a big payoff later when you can grok your code in scannable blocks instead of having to read a dozen lines and pause to contemplate what they mean, then juggle them in your memory with other blocks until you find the block you're looking for.
Comments allow for a high-level view of your code, and people who don't value that probably on average have a slower overall output.
What you write in your first para is so self evidently true, at least to me.
I simply cannot comprehend the mindset that views comments as unnecessary. Or worse, removes existing useful comments in some quest for "self-documenting" purity.
I've worked in some truly huge codebases (40m LOC, 20k commits a week, 4k devs) so I think I have a pretty good idea of what's easy vs hard in understanding unfamiliar code.
As the late Chesterton said, "Don't ever take a fence down until you know the reason why it was put up."
A lot of people think comments are descriptive rather than prescriptive. They think a comment is the equivalent of writing "Fence" on a plaque and nailing it to the fence. "It's a fence," they say, "You don't need a sign to know that."
Later, when the next property owner discovers the fence, they are stumped. What the hell was this put here for? A prescriptive comment might have said, "This was erected to keep out chupacabras," answering not what it is, but why.
You might know about the chupacabras, but if you don't pass it on then you clearly don't care about who has to inherit your property.
What's amazingly funny is that many people think this is a positive, because they ascribe more value to working hard than to achieving results. I even thought your comment was going to go that way when I first read it.
Like variables called "daysSinceDocumentLastUpdated" instead of "days". The why comes from reading a sequence of such well described symbols, laid out in a easy to follow way.
It doesn't do away with comments, but it reduces them to strange situations, which in turn provides refactoring targets.
Tbh, its major benefit is the fact that comments get stale or don't get updated, because they aren't held in line by test suites and compilers.
Most comments I come across in legacy code simply don't mean anything to me or any coworkers, and often cause more confusion. So they just get deleted anyway.