Hacker News new | past | comments | ask | show | jobs | submit login
The Myth of Self-Documenting Code (buttondown.email/hillelwayne)
44 points by azhenley 3 months ago | hide | past | favorite | 44 comments



What works best for me is just write the code with no comments. Then, after I've forgotten how it works, go back and figure it out. Then I know just what the comments need to say!

A similar thing happens with error messages that compilers generate. The initial messages usually reflect what's going on in the compiler guts. It takes experience using the compiler to eventually figure out what it needs to say.


I do a similar thing in civil engineering. I build bridges with no safety factors, and when they collapse, I see exactly what safety factor was needed. That way, the effort of figuring it out in advance is never wasted.

Edit: The above was (clearly) sarcasm. Although in real life, I do the similar thing with log messages - I always tend to find out what log messages I needed only after I debug a problem. I wouldn't frame it as a virtue, though.


Mathematical models for bridge design rest on the rubble of extensive destructive testing and past collapsed bridges.


I mean... This isn't that far from how some civil engineering is done. The bridge you drive across is not the first pass at said bridge. And much of what makes safe structures today came from accidents years ago.

I get what you intend. That we can also learn from the experience of others.


I suspect many of the people who hate comments have mostly seen comments created by those who were forced to for some reason or another (I have heard of "commenting rate" being a "code quality" metric... which naturally leads to such situations of creating useless comments for the sake of appeasing stupid tools --- and even stupider managers), and thus aren't actually helpful.

Good comments should be more of a roadmap, and "why, not how". Also, an insanely long identifier (which then appears everywhere it's used) is far worse than a short one with a single explanatory comment at its definition.


> Also, an insanely long identifier (which then appears everywhere it's used) is far worse than a short one with a single explanatory comment at its definition.

Also, when you think about it: long identifiers are properly understood as a kind of comment.


> We need something better than comments, or at the very least we need way better comment tech!

One trick that I find very useful is the "Note" commenting convention from the Glasgow Haskell Compiler. Basically, they add an easily-greppable title to important comments and then they can have one or more inline comments refer to that title. This low-tech hyperlinking keeps separate comments in sync. Also, programmers feel comfortable writing more complete comments because the main comment is not interleaved with the code, and therefore does not disrupt the reading flow of the surrounding code.

For an example, scroll down to section 5.6: https://aosabook.org/en/ghc.html


I've experimented with literate programming with org mode in emacs a few times, but every time its just been hard to make it stick. I don't really know why. I think it may have to do with the fact that its weird to have your project code in one giant file. I mean, you could do something differently than that I would think, but that's how I've always seen it.

I actually just read the ghc chapter of aosa. Since then, I've been considering adopting this note style. It certainly seems like it could work out.

For me, its clear that there is a lot of intrinsic information I would like to capture in some way, but it really doesn't work as self-documenting code. Some people say that stuff should go into commit messages, but I really think that has limited value.

I dunno. I wish more people were working on this problem. I understand that unison has some kinda interesting takes on this, but I don't actually remember what they are off the top of my head.


This is a great idea.

I've already started writing long-ish-form block comments in files and functions with non-obvious design decisions, along with my initials and the date. I usually add a "NOTE" header of some kind, I'm going to start making sure this is easily grep-able in the future.


Great read by the author of Redis. Here's an excerpt:

"Many believe that comments are useless if the code is solid enough. The idea is that when everything is well designed, the code itself documents what the code is doing, hence code comments are superfluous. I disagree with that vision"

See http://antirez.com/news/124


I don't think that self-documenting code is a myth. That said, I think the benefit is that it can reduce the amount of commenting needed. That, in turn, has the benefit of being able to read (and understand) the code more quickly (in theory).


Can we just agree that more documentation is better? Well named variables, short functions, inline comments, annotations, examples, context, tutorials, types, friendly code owners. Any and all those things are helpful when dealing with unfamiliar code and apis.

Please please don’t discourage anyone from adding these.


Also as other people have noted (most notably Fred Brooks and Linus Torvalds), having comments (or documentation) about architecture and data structures is also very helpful.


And literate programming. Well done by Knuth, done by a few other people, but a joy to read when done well.


I think literate programming comes from a time where programming languages where not as high level as nowadays. I don't think there is really any benefit left, and the drawback is the much higher costs of making changes and refactor code, because now you have to move around and keep in sync both code and extensive documentation/explanation.


I think you have to have a more stable goal of the software for literate to work well. But I think most software can be more stable than folks think it is.

Don't get me wrong, some parts of code can easily go through a lot of churn. Hopefully, they would have been confined to a single chapter of a literate attempt. Same as they were hopefully confined to a module or package of a project.


More correct documentation is better. When reading code I'm unfamiliar with, the comments are the only thing I can be certain was never tested.


No, we can't agree. I find the noise of useless comments or documentation can make it much harder to understand a codebase. Sometimes less is more.


Definitely not. Documents grow stale the second you write them, and are often furthermore wrong or misleading. I allow API documentation when it's separated into a different place from the code, but I do not tolerate documentation intertwined with the codebase itself.

Code is hard enough to read without active sabotage taking place, I do not allow PRs to go through if there is documentation in-line with code. The developer must pull it out, and almost always, it can 100% be replicated through better function naming.


The people who argue against code comments seem to have a high chance of overlap with people who submit who huge refactoring patches that don’t do much but add bugs. They will also ask you to write a method for two lines of code that appear twice in a thousand lines. IE their opinion isn’t worth too much.

I like comments, javadoc style comments of public methods are really nice in the long run. However I find snarky or defeatist comments in unmanageable code to be a red flag. If you don’t know how something works adding a “Here Be Dragons” comment doesn’t help.


I love comments that tell me about the history of code, along with access to the reasons things were done.

I love comments that tell me the preconditions, postconditions, and exceptional conditions for a method or function.

I hate comments that tell me what the code already does (int i = 5, // set i to 5). I hate them even more when the comment is a lie.


I find the snark is just fine, as long as it is accompanied of actual information. Like, no shit this is hairy code, but why is it hairy?


Master Foo and the Programming Prodigy http://www.catb.org/~esr/writings/unix-koans/prodigy.html


A pithier version I heard a long time ago:

> Your most important collaborator is you, 6 months ago, and they aren't returning any emails.

I like the Master Foo version, because it gives some hints as to how to do things right. A "narrative description of its architecture or internal data structures" is a great summary of what I find missing in so many codebases (and is also not an easy thing to produce).


I use the rule that if something is non-obvious I write comments in the code with references to how I solved the problem.


I've recently appreciated the "human" nature of comments. The funny bits, the exclamation marks, the "hack" warnings. These comments truly make the code easier to grasp. Subconsciously, I think: "Hey, a human, just like me, wrote this. It can't be that difficult".


When you're writing code, you have your problem, and you're trying to figure out how to make the computer solve it for you. When you're reading code, you have the detailed computer steps and you need to figure out what the problem was in the first place. Two very different tasks. Self-documenting code helps you with that by encoding natural language explanations directly in the code by naming things appropriately. But to explain to other developers and your future self how you got there then you need comments. A sort of map of all the paths your mind went though to solve the problem, why the obvious solution was in fact a dead end, why you did or didn't take a shortcut and so on.


There's much mental illness in the workplace. This is one aspect: abuse whoever needs to maintain the code and get to feel smug in the process.


I think it would be nice to somehow force comments to be attached to certain lines of code and if you change said lines of code you're asked to revise, keep or remove the accompanying comments.


One approach I've seen used is placing comments in the body of version-control commits (on Git, beyond the first line).

This clearly attaches them to a specific revision of the code, which is also automatically superseded when some other change overwrites that line.

It's not far off to have a code editor plugin that shows this information visually beside the file. Kind of like the visual "git blame" in IntelliJ IDEs, but for the commit message.


Wait, so to read code comments you have to use ‘blame’ and for a line of code sort through the commits to find any potentially relevant comments? And then do it for the next line, and the next...

I must be misunderstanding.


The better version is to have something like gitlens installed into your IDE and have it dynamically render those git comments.

https://github.com/Axosoft/vscode-gitlens#current-line-blame...


That looks like an awesome tool. Thanks for the link.


Yes, that's correct. With a sane commit history it works quite reasonably.

It's a bit of a hassle but any out-of-band information is by definition a nonzero hassle to find unless a tool does it for you.

Also, "potentially relevant commits" are just all commits that change that line or lines. Following this principle, there is no justification to make a change where you can't document why it is so.


Following this principle, there is no justification to make a change where you can't document why it is so.

So the comments in the commit messages are an explanation / justification of changes, and they don’t replace code comments?

That just seems like “how you’re supposed to do it” and wouldn’t address code comments not being revised when necessary.


Next thing you know someone both edits and renames a file and you lose all your extensive documentation


Well, at least they will document why they edited-and-renamed the file!


That’s an interesting idea. If you change code after a comment and (before another comment or an end of scope), committing the change could prompt for a confirmation that the comment is still correct, recording that in the commit message.


Self-documenting code is great, but it's not the only documentation needed.


I found it difficult to follow on many levels, probably because the author has been so frustrated that it became a really passionate piece.

> When you write “self-documenting code”, it’s self-documenting to you.

When documenting code, it is also documenting to you. Just as there is no way to predict what will be obvious and not when writing code, the same applied to documentation. We all had to wade through piles of documentation that means nothing to us because the person writing it has a completely different framing. That's not an argument for either side.

> But even though comments are garbo, they’re still the only means we have of interleaving code with expressive human-to-human communication.

A lot of people engaging in these discussion completely ignore that most of the code we write now is versioned, and switching on/off the blame display on a file is enough to bring a ton of context on any single line. And if it's not enough, just get back to the merge request with the explanations and the comments, eventually discussion of the whole feature.

To me this currently the single best mechanism to switch from just following the code, to trying to get the context, when it was written, by who, for what ticket, for what release etc.

It looks to me the whole "better documentation" the author is looking for is in that meta system that is not just versioning our code, but he doesn't seem to care to jump back and forth, and everything absolutely needs to be in his editor. That's pretty alien to me.


everything absolutely needs to be in his editor

Even that is a solved problem. Editor plug-ins for git make it easy to see the blame information for every line without leaving the editor.


The only benefits I've ever received from `git blame` are: 1) knowing how old a piece of code was, 2) knowing who to email or message to ask for help with it.


Doesn't it give you a context of the code that was changed at the same time as the line you are looking at ?

I find it super useful to see that the line I'm trying to change came from a hotfix and they had to cut a bunch of corners, that they apparently never went back to fix, for instance.


Looking at the last code I wrote earlier today and the comments I have things like “this object contains active member records” because the object itself has a cryptic name that’s basically a GUID. Now I’m questioning would it be better to just save some human readable object name and store the object inside the friendly object




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: