Hacker News new | past | comments | ask | show | jobs | submit login
The case for comments in code (eatonphil.com)
113 points by goranmoomin on Sept 22, 2020 | hide | past | favorite | 74 comments



My commenting guidelines learned after nearly 40 years of programming:

- One short sentence comment for every function consisting of more than 5 or so lines of code. Keep it short. Note anything deviating from someone's expectations of the function.

- Comment high level process flow. I don't need a blow-by-blow. For the love of all that's holy please, please, please don't tell me you're incrementing i! Keep it terse and high level.

- If something is done differently than an experienced person would expect it to be then document why it's being done that way. You can write a book and that's okay. I'm old enough to remember when both CPUs, OS's, and compilers had bugs and sometimes you had to do some strange-looking workarounds. Document them - thoroughly!

Other than that I agree with the "let the code speak for itself crowd." Based off my rules excessive code comments are a code smell needing to be checked out.


> Based off my rules excessive code comments are a code smell needing to be checked out

I don't know that I fall into the "excessive" category, but I do tend to comment method bodies fairly heavily.

Basically my strategy is: Add comments in the places someone is going to ask a question about in review.

If you answer all their questions ahead of time, then you save them time, and you also save yourself time next time there is a bug or refinement in that code.

I generally keep things fairly high level though.

Some people advocate for breaking anything longer than X lines out into its own method.

What I tend to do instead of wrap blocks of code like (Python):

   #{ <section descriptor>
   ...
   ...
   #} 
That way it's easy to follow the flow of things even if they get a little long.


When the function gets sufficiently long and with sufficiently numerous 'arcs' (different bundles of if-statements that may be triggered under different conditions), how do you test the code change you're making?

Would testing it be easier if it were broken out further?


> Would testing it be easier if it were broken out further?

In my experience, you get the most effective use of your time by testing at interface boundaries.

How the code is structured inside of those doesn't terribly matter in most cases.

That said, you are almost always better off structuring a method as a linear set of steps rather than trying to turn it into some sort of "choose your own adventure" journey.

Basically a typical method of the sort I'm thinking of would be something like:

  #{ Check early out conditions
  ...
  #}
  
  #{ Prepare the values
  ...
  #}

  #{ Perform I/O operation
  ...
  #}

  #{ Clean up / build return value
  ...
  #}
You could break that out into four or five methods, but most likely the contents are going to be a bit of filler and then calling into some other library.

Easier to keep everything consolidated and avoid the spaghetti.


If each section is only a few lines, you've doubled your function length without giving any context relevant to someone who has written code before.

If those are long/complex enough that you need to keep track of the sections individually, yeah break them out!

Like, what insight does "perform io operation" provide beyond what f.write() already gives?


> what insight does "perform io operation" provide beyond what f.write() already gives?

Often maybe it won't give any insight, in which case it gets left out.

Basically, the general idea is that I am denoting and grouping chunks of code together sort of like section headers to form an outline of what is happening.

A random example from a file I was working on recently:

  #{ Attempt to handle the case where Redis loses our access token
  #  for some reason
  ...
  #}


This method already looks like it’s been reasonably shrunk to a single unit, unless “prepare values” is a lot of business logic; and, you’re already past all the major conditional checking. Fair enough.


Neat idea, but I'd like to know how do you deal with nested block.


Usually I will just add the same sort of thing at nested levels.

Thinking on it more, the purpose of the `#{` `#}` is to denote "This comment is relevant for this section of code"


I comment for myself first.

In my experience, I may not remember what the heck I was thinking writing that code even 5 minutes later.

If I ever start wondering what my code is supposed to do, once I figure it out, I comment.

After doing this for a while, I've learned what I'll probably want to know later, and what will seem obvious.

Besides this addition, I think your outline hits the nail on the head.


I agree with all of that.

It sounds silly, but if you can work how you feel into the comment, it's really helpful later on.

I like to use tags and sometimes emojis.

Just remember that you're talking to future you. Usually specific versions of these:

* `// XXX this is a bad thing that we can't improve yet, sorry`

* `// TODO this thing works, but it should also do something else soon`

* `// FIXME this is definitely broken`

* `// HACK here comes some magic stuff, but it works...`

* `// NOTE this is non-traditional or non-obvious, so let me explain`

If you do that, it makes it easy to search for tech debt too.


Well, FWIW after decades of programming, I find it useful to include a lot of comments. I also find that they get in my way.

If you've ever read "Programming Pearls", you've seen how code that looks fine can have all sorts of subtle issues. In my experience, this is common, not rare. We just miss most of it.

When I go back over code I've written, I see new things every time, sometimes because I've used the code in practice (different in various ways from what I expected), sometimes just with "fresh eyes", sometimes with "newbie eyes" ("Why would I do B before A..?"). I notice things I missed before. I miss things I noticed before.

And very often bugs exist because I didn't think of something. As soon as I think of it, it's not enough to eliminate it in code and pretend I was right all along, I want to leave a note pointing out the thing I overlooked, even though it might sound obvious when stated.

Etc. So many little things I learn about the code over time. I can write them down or leave future readers to the whims of fate. If I'll be the future reader, I write them down right where they are most relevant.

I also want to be able to glance over my code and quickly know exactly how it works. I don't want to figure anything out if it can just remind me. So, yes, definitely my comments are higher level than the code, to make them quicker to comprehend at a glance, but they sometimes drop down from the "why" into the "how".

So then, I have lots of useful comments that for my own code, I'm definitely going to keep. But the comments accumulate as the knowledge accumulates, and it can become annoyingly cluttered. Sometimes I want to see them, and sometimes I don't. (I always try to find ways for the code to "absorb" some of my comments, breaking a composite expression into steps with explanatory variable names, for example, but there are still lots of comments.)

So, I want more sophisticated showing and hiding of comments from my code editor. I want to click a key and make all comments vanish. Nothing but the code, neat and tidy. Another click, and I see abbreviated comments, remember this, watch out for that, consider whatever.... Then maybe a mouse click on a link in one of the comments to bring up more details. Ah, yes. Key click, and all comments gone again.

VS Code, I'm hoping....


It’s sometimes more than one click, but “git blame” (built into the IDE) serves that purpose for me.


i agree in general. i've only got ~15 years of professional experience, but:

- code says how it's done

- comments say why it's done

- comments say why something isn't done

- comments say how to use the damn thing

- comments say who thought it was a good idea to implement the damn thing

- answers to above questions are usually provided in code review as replies and actually should be comments in code


I'd expand the first item to NAMES say how it's done. Need it to be clearer? Add more intermediate vars and functions with good informative names.


Agree except for the 'who', that sounds like it should be in the work tracking system (e.g. JIRA) rather than comments in code.


but then, see the point about work tracking systems being outside of source history and thus prone to migration failures, etc.

also, git blame sort of fills this role, but the commit author may not be the person who acked the implementation and requires digging in history if there were any refactorings done.


We have a lint rule that will fail if your TODO comment doesn't have a link to a ticket for that task in it. It's not perfect, but I think this has reduced TODOs in our codebase to tasks that will be more or less done in the immediate future, e.g. follow-up PRs.

Because of this, I think these TODOs are pretty helpful in code review. Reviewers often leave comments like, "while you're at it, why don't you do this too?". But with a TODO and a ticket, you can indicate that you're about to do that thing, but are breaking things up into smaller PRs to make things easier to review. Since there's a ticket associated with the TODO, it's a bit more likely to actually get done.


I had heard of that idea before, and then forgot about it. Thank you for reminding me; I will be implementing that at work when I can find the time to add it to the pipeline (ie. "todo: enforce ticketed todos").

To me there are three variants of todo:

1. Personal and immediate todos. These MUST be resolved before I can even commit to VCS. This isn't a standard "todo"; rather, it marks a section of code which I've left incomplete while I jump around the code base to work on other things. I don't use "@todo" as the tag here; I use "@first-real-name" (eg. "@joe"), as JetBrains' IDEs let you define custom TODO tag names and then filter based on them. Thus, "@joe" has never made it into a commit to the VCS, as I always ensure these are resolved before I commit.

2. A true todo that is actionable, and has an extremely realistic (not idealistic) expectation that it must, or really-really-REALLY should, be worked on in the VERY near future. This is the kind of todo that would require a ticket.

3. Todos that are hypothetical, imaginative, or dreamy; ie; "I wish priorities allowed me to spend another $X days/weeks/months on this". Obviously, this is the entire reason why requiring tickets for todos is beneficial; these kind of todos need to be cut down and eliminated. It can be so hard to "give up" on something you know could be improved, but that realistically you'll never come back to on the company's dime.

I'd also like to see alerts based on the datetime when a new todo was committed to the central repository. If a todo sits in code for more than 3 months, I want that alerted. Either the ticket needs to be prioritized, or the todo needs to be deleted.


We have a JIRA rule to close tickets if they are not likely to be worked on in the next 6 months.


I'm torn on that, mostly because I worked at a shop where sometimes bugs weren't handled immediately because they were edge cases (but we had more serious ones in play).

There was more than one time where a ticket from before I even started provided a critical clue, or at least a "We knew this was a problem and you didn't let us fix it" defense when the execs asked what happened. (Execs typically were the ones who forced us to add features instead of fix bugs)


I like that idea, I have always felt the TODO is something I add while working on a function/feature so I don't forget the edge case. I have never cared for a TODO representing a work item. Same with a code base littered with comments about fixing bug ###. With out the bug and source control diff how do I know what was part of the fix and when or how it has changed over time via other bug fixes and feature updates.


If you don't comment your code you're doing a disservice to yourself.

Code comments allow me to understand how I was thinking about a particular problem when I wrote that solution. Without that context is way harder to re-establish understanding of my own code if I need to change or refactor something.

Something that I respect deeply, it's when I see a comment like "Here we use a global variable for this xyz reason. Global variables are bad and are discouraged but we used it here to get around something".

That level of self-awareness and scrappiness can only be achieved by contextualizing your decisions with a comment.

Sometimes bad code becomes decent code just by the merits of a comment that explains it.


There’s seems to be some in-between of metadata you want to include about the code, including discussions, but is too lengthy for traditional code comments, or wants to reference other parts of the code, or just doesn’t mesh with traditional usages of code comments.

Often you make a decisions “I’m doing it this way because of x” that would be helpful for future maintainers to know, but would clutter the source too much.

This reminds me of this startup: https://www.codestream.com/use-cases/code-discussion

I wish there was an totally open source, open protocol, way to do this.


As an example, picking constants: you may have a chosen the constants based on some experimentation, but including the full details of the experiment using code comments is nonsensical.

It would be great if you could still somehow have these details easily accessible from the code, but not within the source code itself. Maybe this could be done federated? ActivityPub?


It's the need for things like that, that makes me thing programming has outgrown raw text as a storage medium for programs. Really what you want to be able to do is link to some documentation that is a first clase part of the code base. Not some bolt on or afterthought.


Commit message bodies -> git-blame.


I've seen many codebases outlive their version control/metadata. Source lives forever and while it may not be perfectly proper, I'd much rather have durable comments telling me how constants were derived than metadata.


However Git allows to salvage the history of SVN during the migration, probably because SVN was one of the most popular CVS before itself. One would expect that if something dethrones Git, it would (have to) do the same.

That said, that method is not really viable if the piece of code moves too much. I would then side with the other suggestion to put that in a separate file in a doc/ sub-folder and just refer to that file in a comment. I guess that for most programming editors, "open file under cursor" is just a key combo away.


comment with a reference to a different file where you explain in detail?


Isn't that just documentation? You can make docs from comments so you could just put those longer comments in, even if they're long, since you're making docs out of them.


I prefer comments in the code (or tests) to explain the "why" of the code to external systems. I want someone to get the "why" even if they get my code from a tarball, an e-mailed diff, or if some asshole overwrote commit messages for some fool reason.

For my part I don't want to have to switch contexts to go hunt through an issue tracker or commit messages to know why someone did something. I have other shit I need to get done. I don't want someone reading my code, with their own work to get done, wasting their time hunting for my reasoning why I put "pi = 3".

I want to be helpful to future victi...maintainers of my code[0].

[0] https://groups.google.com/d/msg/comp.lang.c++/rYCO5yn4lXw/oI...


Yea, this frustrates me endlessly working in a code base that has the philosophy of no comments is better than bad comments, and as such code should be self describing with no comments.

"Self documenting" code only describes what is happening, doesn't explain why the code exists, what problems are being tackled, bugs fixed, etc. Makes it very easy to lose behaviours, because it's easy to delete or change behaviour when we don't know why it exists.

I sort of think it's common to accidentally comment what is happening, those comments to get out of sync with the code and be otherwise unhelpful, which leads to this overreaction towards self documenting code and having policies limiting comments.


I use warnings instead of TODO/FIXME, and found it useful. We tend to gloss over TODO/FIXME, even though editors highlight them.

Warnings, however, find their way to someone every time the code is executed. They're nagging enough that you eventually fix them. You can run the code with strict rules to raise an exception on warnings.

On comments: we've acquired a habit to cross-reference everything. In the code, commit message, issue, Slack. You'll often find a line, and comments around that start with "Rationale" or "Why:" with references to issues, commits, conversations, etc.


What’s the warning equivalent for yaml/config?


For config files, we add comment lines with references to issues, commits, comments, conversations, etc.

Here's how a colleague put their comment in a .json file:

  "The_whys": [
      "https://repomanagement.com/GoodCorpHere/XXXXXXX/issues/XXX#note_XXXX",
      "https://repomanagement.com/GoodCorpHere/XXXXXXX/issues/XXX#likely-parts-and-draft-fix",
      "https://repomanagement.com/GoodCorpHere/XXXXXXX/issues/XX#likely-parts-and-draft-fix"
  ]

But you make a good point. It could be done by adding an agreed upon field in TOML/JSON/YAML config files that would contain warnings, but not for now.


Jiras come and go, code is forever.

Docs, even internal, or perhaps especially, should be in the same repo as code for this very reason. That wiki you’ve got? It’ll be replaced and data migration will fail. Git history will be there to cover your behind.


I really wish GitHub issues and wikis and things like that all synced into the git repository. Maybe into an issues branch or something. Then all that information would be available and accessible if we ever migrate away from GitHub, and it’ll last as long as the project does.

Git is already a distributed document store. We should be using it for issues and documentation too!


But then where would the business value be for Github? We could have a federated system of independent git instances, something vastly more innovative, dynamic, and anti-fragile than Github could ever be, and this is the world we could have if we valued decentralized tools enough to do the work, rather than taking whatever is most convenient. But we don't care, and Github is what we got.


Not what you asked for... but the Fossil SCM syncs issues, forums, and wiki content along with code: https://fossil-scm.org/

I wish it were a little less 1990's looking. A UI makeover might do a lot for this project.


I thought the wiki at least used to do this? Did they finally give up the pretense?


My number one use case for comments is documenting invariants/preconditions/postconditions that are not enforced by any automated mechanism (e.g., a type system). I'm surprised no one else has mentioned this. To me these are exactly the things that MUST be commented. Of course, it's useful to have some mental heuristics around what other kinds of comments are helpful to readers, but unenforced assumptions/guarantees need to be documented so programmers in the future can see what they are.

One might argue that all invariants should be covered by tests, but tests (by themselves) cannot even ensure some the most basic properties about functions, such as that a function always returns a sorted list no matter what input it's given. Tests also cannot ensure that a function's preconditions are always satisfied at every call site, because new call sites can be added in the future without causing the existing tests to fail.


If you're using phabricator [1] the "pull request" (diff) message becomes the commit message, including a link back to the diff with all of the comments.

This isn't perfect, if the file gets moved you can lose information, but for the most part, viewing the history of any file gives you all of the information you need. Tools like Xcode let you see the latest commit for every line of code, which is also very useful.

Our team switched to Github for a bit and we replicated the behavior with custom scripts to merge a pr by rebasing and using the pr body, the reviewers, and the pull request url as the commit message.

1. https://www.phacility.com/phabricator/


Part of the purpose of these tools is to raise visibility to other parts of the business and facilitate progress estimation. Only pushing for comments in code seems to be like saying F the rest of the company and lets only do things that benefit engineers. Documenting issues in code is a quick way for them to get forgotten about and lost.


I reached the same conclusion.

Unlike some supervisors I've had, my boss looks over my pull requests carefully when he merges so if there is something about my code that I want him to think about I put a comment there.

If he decides to keep the comments or delete them after a merge that's up to him.

Longer term there is the "literate programming" problem.

Once facet of that which I've struggled with is that the full commentary for the code (a lot of which could be automatically generated) is probably many times larger than the code itself.

This is the case w/ the Bible, Koran, Shakespeare. Your local Uni library is lucky if it has a copy of "Ender's Game" but it certainly has 5-10 books about "Ender's Game".

From official documentation I found about 20 pages of content to describe a 1 page financial message. It's an absolutely horrific problem in graphic design to relate 5% message to 95% documentation.


> From official documentation I found about 20 pages of content to describe a 1 page financial message. It's an absolutely horrific problem in graphic design to relate 5% message to 95% documentation.

This depends. If that documentation can be reused by other participants in the development process, it's very useful. Perhaps a bit excessive, but useful. With that you could hand it off to the recipient of the message and they can implement their own processing in their own language and applications. If you handed them your implementation could they use it directly or easily port it to their own target system?

And what was that documentation? Does it add real clarity or not? I used to work on a project which had an absolutely massive message spec document. Over 1k pages (I know 1k, I think it was closer to 2k but can't find it online for free to verify), and the implementation (of the message data structures and primary functions) was certainly much less than that.

But that documentation told us the exact memory layout expected. Messages which in C may have been described with a 20 line struct definition could mean 18+ fields if reasonably formatted. Each field has expected values/types (that the C int type doesn't sufficiently specify) that need to be documented. Like "This 4-bit field can take a value from 1-10, here are the meanings of each:". Now that one line in C is at least 11 lines in the spec. So about a 1:10 ratio of code:documentation was reasonable for this situation.


The ultimate answer is that different information is meaningful to different consumers.

In the case of the financial message, fully "understanding" it could involve understanding complex regulations in multiple jurisdictions. (e.g. "is this message valid" is a question for accountants, lawyers, other subject matter experts, etc.)

Those people couldn't care less about the ISO 8601 date format but if you don't understand the subset of that format used in the specification you may have trouble parsing messages and will likely have problems generating well-formed messages (though the SME's could care less)

The beyond state-of-the-art goal that a group I am in is looking at is how to trace from one end to another; the third bit of the fifth byte is a boolean flag, but it exists because of 'that requirement', here are all the definitions you need to understand 'that requirement' and do it at interactive speed with 10 people in the bullpen to create "task forces" that can function at a vastly superhuman level (think IQ 500) at maintaining and implementing specifications.


The literate programming solution to this problem should be transclusion.

Commentary in the 'code' proper should be in the service of understanding the code. Detailed dives should contain a transcluded block of the relevant code section, which updates automatically when that code changes, and with some way of finding all transclusions of a block, so that it's possible to navigate in both directions.

Or in other cases, we want a true hyperlink: one that travels in both directions, so that it's possible to navigate to anywhere in the literate codebase that references a section, from that section.


Transclusion destroys readability of the code. It's nice when you're a professor writing numeric heavy code and want it to flow like an academic paper. Not so nice when someone else has to understand the code and inner scopes are scattered on the wind for expository purposes.


I think you're confusing macroexpansion, where a marker in the source is expanded into a different piece of code in the 'tangle' and 'weave', with transclusion, where the source contains a full copy of the code in both places, and the compiler takes care of keeping them in sync.

What I want is the ability to link a particular piece of code into an issue discussing it, such that this particular copy stays in sync. Since I can already copy-paste such that they don't stay in sync, we now have both options, perhaps even simultaneously.

Edit to add: automatic sync should be one-way, with the compiler warning if there are changes in the copy but not the source. Being able to edit the source code inside an issue is a case where more power isn't better than less.


With current technology it does.

Transclusion will become a useful technology when somebody finds a way to do it where the cure is worth the disease. That's a real "millennium problem" in my book.


Isn't anyone going to mention the importance of tests for extra documentation? I am interested in hearing your thoughts on that. I have understood how/why code works multiple times by looking at some test cases. Tests may also reflect the end decissions made on other platforms like pr comments, jira, slack.


> This is the single biggest reason I've started to push for more comments in code. More so than all other tools (issue tracker, code management system, etc.) comments in code have the greatest chance of still being around and easily searchable if they haven't been deleted.

True. But the flip side is that comments can outlive their relevance, accuracy, or purpose.

This is a good reason to reject most comments: they're an additional maintenance cost that can't be automated.


>True. But the flip side is that comments can outlive their relevance, accuracy, or purpose.

>This is a good reason to reject most comments: they're an additional maintenance cost that can't be automated.

By the same line of reasoning one should reject most code.

A comment that helps the next programmer who's going to look at the code is good. If/when the comment is obsolete, deleting it is easy.

I'm saying this as someone who rarely includes comments in a first commit. Generally I add comments that answer questions that come up in code review, because it's likely that someone else looking at the code will have the same questions my reviewers have.


> By the same line of reasoning one should reject most code.

Yes, you should reject all unnecessary code. If that is "most", then your project might have some problems. ;)

> If/when the comment is obsolete, deleting it is easy.

The hard part is to know if and when the comment is obsolete. It's way too often I stumble upon comments that contradicts the code. Is the code wrong? Is the comment wrong? Often the history reveals that the comment was correct at some earlier time, but the person changing the code didn't think about changing the comment. I think that the mechanism behind this is that after the first read through of the code we tend to stop seeing the comments.


In my experience, the mistake of updating code without updating the accompanying comment is commonly made, but does not commonly survive code review.


> If/when the comment is obsolete, deleting it is easy.

And yet it unfailingly keeps not happening. Not because it's hard to do, but because it's easy to miss.

Implementing every change both in code and comments can maybe be done, but I've neverseen it done, and I don't understand how you'd go about it.


Comments should be sparse, used only where code is not self-explanatory. If you follow that rule, spotting obsolete comments in code review is easy. I've seen it.


I agree with being sparse.

Can't say I've seen a great comment maintenance track record anywhere so far...


> reject most comments: they're an additional maintenance cost that can't be automated.

I have a really hard time reconciling this with my experiences. Sure, often times comments that explain _what_ is happening can go stale, but I have frequently spent hours debugging, asking in Slack, and picking the collective brains of other developers for a codebase to understand why we do X, or what some test is meant to represent, when a thorough comment ("This tests feature X... for product Y...") could have made it easier to recognize that certain behavior can be removed ("Oh wait that's for this product/workflow which we don't even do anymore -- it's safe to just delete that entire thing"), or understand the nuances of why the Simple Seeming way of doing something is wrong.


Comments should either be like a newspaper headline, describing the general gist of a section, such as "hide low-ranking blog comments", or to point out oddities or surprises, such as "for unknown reasons this API doesn't like single quotes" to explain why converting characters is done.


Comments are the why, code is the how.


Comment the justification for code the law....


I once read that code is the what, tests are the why and comments are for jokes.

Of course, I don't believe it to that extreme; comments are sometimes handy. But that's the thing: sometimes, sparingly.

Over the past few years, if I find myself not understanding something about a piece of code, I first go to the tests to see if it was tested, and the tests help me determine what it's supposed to do more than well enough.


Hard to talk about code comments without mentioning 'antirez's article [0] about this where he goes in depth on the different kinds of comments he used throughout Redis.

[0] http://antirez.com/news/124


We are team of ~20 software engineers working with 9-years old codebase. About 6 years ago we came to practice of writing decisions and details into git commit messages. This way they are always there, accessible via git blame, and code remains clean of distracting comments.


I work with 20 year old codebase. Sometimes code can be too many times rewritten and git blame does not help and comments in these cases help a lot on why this feature exists


Seems like an introductory thesis, totally lacking in examples. It would be great to see a few PR comments that need to be documented. I'd imagine that when that's the case, the underlying problem could be better addressed with a code change in most situations.


I will sometimes do an initial commit where i just write out the steps i plan on coding as comments. Sometimes just this process makes it so much clearer what I actually want to do.


Mandatory "Squash and Merge" solves a lot of the author's reasons why adding code comments is necessary b/c it guarantees that each commit in the repository's main branch has a PR associated with it. For instance on GitHub, mandating "Squash and Merge" on the "main" branch means that every commit will have an associated PR Number in the commit message (e.g. "Added create user modal #70").


Does anyone use git commit messages to convey more details? And uses gitlens to superpower context while editing code.


the best way to track issues is to code them in a test case, issue trackers are for communicating resolution outward


I mean come on. Are we now saying that we must maintain comments as well as tests and actual documentation? An accurate comment in code is encountered about as often as zebra with pink stripes. The code has to work, the test have to work, the documentation probably has to work (and be readable), but comments just sit there degrading.


Log statements and full English variable names are far more useful




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: