Hacker News new | past | comments | ask | show | jobs | submit login
Code Comment Style (github.com/pingcap)
45 points by jinqueeny on March 5, 2019 | hide | past | favorite | 62 comments

The best comment I ever read was huge, multi-paragraph monstrosity at the top of the file. It was written in a casual style and had misspellings. I think it gave an overview of the intent of the file (it was like some Perl script) and then rambled on about how the requirements had mopped him into a corner, and that the code was incomplete and likely buggy.

It was shockingly humble and transparent, and yet it seemed to flow from confidence, someone who had lots of experience in systems of all shapes and sizes. It was not signed, but I'm pretty sure it was by one of three coworkers, each of whom had years and years of experience running servers and networks. Its tone was a refreshing contrast to slick smokescreen recited by your average drone, in meetings and emails in the surrounding corporation.

And then there were no more comments. Yet the code was more comfortable to read, after that long intro, than most code I run into. I think it was two things:

(1) The author stated the intent. What was the file's purpose in life? Every line is easier to take in after you know that.

(2) The code was written by a human being, just like you and me. Usually code feels a little stiff, no doubt because the main audience member is a computer. A mild injection of warmth and humanity helps me keep my chin up when I face that daunting first read.

Would it have been better if it was more concise, spelled perfectly, and had a few line-by-line comments? Maybe. But it was such a treat I still remember it a decade later.

(1) is super useful and something I've never thought of before. It seems obvious in retrospect. Thanks.

I love comments at the beginning of files that explain what this component is intended to do and how it fits in the big picture. Documentation seems to be so micro focused that its very hard to understand how a piece of code was intended to be used.

The Doxygen trend of just dumping APIs and comments at the start of functions and calling it documentation is horrendous.

For documentation meant to be consumed by others, sure. But for an internal module, these are exactly the types of comments that get out of date the quickest, since they are disconnected from the code they don't stay up to date as the code evolves.

Generally speaking the purpose of an entire class rarely changes to be something completely different. In such cases new classes are made instead.

"It is well known that I prefer code that has few comments. I code by the principle that good code does not require many comments. Indeed, I have often suggested that every comment represents a failure to make the code self explanatory. I have advised programmers to consider comments as a last resort."


I tend to prefer no comments, but more descriptive methods. There may be occasional exceptions.

> I have often suggested that every comment represents a failure to make the code self explanatory.

Part of why Worse Is Better is that relying on self-discipline and a healthy environment is a risky bet.

When code comments are presented as a failure to self-document, they are simply not written, irrespective of the actual clarity of the code. Add some pressure or toxic environment and everyone who's not able to grok your codebase won't complain about it — it is self-documenting! If you don't get it, the problem is on your end.

I wish the arrogant meme plague of “self documenting “ code would just die, already.

TODO: get the link for the study showing it’s about 10% faster to read code with module level documentation

Some classes of comments are pretty had to replace with self-explanatory code: historical, design decisions and negative things.

// This code uses FOO because when it was originally written, we expected to process millions of items, so fast random access time was important. This is no longer the case -- feel free to refactor to use BAR instead.

// Do not refactor this code to use BAR! While this will make the function shorter, the result will have O(n^3) complexity, and this function may get millions of input items in some cases.

Also, the post encourages comments saying “TODO”. I’ve yet to see a codebase where that helps anything.

I've seen them be useful when used in combination with a bug tracker. That way is investigating the same problem they are likely to find out that it has already been reported. And having a todo mark a specific region of code can save time when we jump in to fix it.

I have also had situations where we decided "we are releasing in a month, cut corners and we'll deal with it later", and putting in TODOs greatly help deal with it later

I use TODO often in my code. It's how I mark things as incomplete but not forgotten, which means that the code should not go into production until the TODOs are handled.

>There may be occasional exceptions.

I feel like comments belong in wrappers for unintuitive interfaces when a judgement call has been made to not replace those interfaces in-house due to project constraints. That's about it.

One big issue with code comments that hasn't been mentioned yet: they tend to run out of sync with the actual code.

That is, whenever changes to the program are made, you are now required to also update the comment but a lot of times that last step is skipped. One may ask what's worse: the absence of comments or _outdated_ comments that no longer match the code.

The fact that someone might not keep the comments up to date in the future is no excuse for not writing the comments now, otherwise we'd never write comments.

For hairy code, I'd rather have an out of date comment than none at all. I can recall a lot more times wishing code was commented than having encountered a stale comment. And especially today, when code is usually in a VCS where you can blame the lines, it's not that hard to double-check if a comment seems off.

So, I think the absence of comments is far worse.

If the code doesn’t function exactly as a comment suggests how do you go about determining if the comment is wrong (or out of date) or the code?

By the expected behavior of the program? By checking the VCS commit log/blame? How would you determine what to do for a misbehaving function with no comment? This seems like a hypothetical question to which the only answer I can give is: it depends.

“git blame”, is what I usually reach for

Its politically correct name is git annotate. ;)

I don't think PC had anything to do with it. Git blame and annotate were added at the same time, have slightly different output, and different provenances[1,2]. The differing names are because the commands were inspired from prior VCSs that had different names for the commands.

Also, from the git annotate man page[3]: The only difference between this command and git-blame is that they use slightly different output formats, and this command exists only for backward compatibility to support existing scripts, and provide a more familiar command name for people coming from other SCM systems.

[1] https://github.com/git/git/commit/cbfb73d73f272f194bafa70d5b...

[2] https://github.com/git/git/commit/c65e898754ef68a5520b279189...


This is less of a problem if you comment for the "why" rather than the "what" or "how". The "why" generally doesn't change much over time, and if it does, it's usually blatantly apparent that the comment makes no sense in the new reality.

It can be an incredible waste of time to code based on an outdated comment, especially since changes made after a method is created are likely both subtle and required by a new design version.

That said, I wonder why ie. JavaDoc doesn't raise at least an informational message when a code block has changed, but its comment has not.

Are there IDE features that support this?

It has been mentioned. See https://github.com/pingcap/community/blob/master/code-commen... :

> Make sure the comment is up-to-date

You aren't writing comments for yourself today. You may not even be writing them for your colleagues today. You're writing the comments for someone looking at the code in the future, probably without whatever context is in your head now.

Also, the most important than you can document is the why because given enough time, a good enough programmer will be able to figure out the what, regardless of how complex it is. But if you made an unusual decision based on something going on in your head at the time, maybe based on some data or testing you had done, no one will know that unless you write it down. This goes part and parcel with a good commit message, which is another place to write down the why.

If you haven't had the experience at looking at your own code a year down the line and scratching your head at some piece of it, well you just haven't been coding long enough. Good comments and commit messages are like gold at those times.

I've seen so many awful standards in my time when it comes to comments. Most of the time it's NIH syndrome, or people who love arbitrary rules. People love to make/follow rules, but comments are too nuanced for absolutes.

I've seen the following... These are all sins in my eyes.

    // This code is the property of Blamo inc.

    // Returns the result
    return result;

    // The name of the entity
    public string Name;

    // author: @author
    // Date last changed: DD/MM/YYYY

    / public property Name
    / The name of the Entity
    public string Name { get; set; }
I always try to tell people. DRY also relates to comments.

Also if you are going to supply me with badly written inconsistent style guide that was cooked up internally with no references or reasoning. I'm just going to download a popular one from github, and have all the tooling sorted out-of-the-box. I don't negotiate with terrorists or religious nutcases.

You need a pretty damn good excuse to deviate from industry standards. Most of the time it is personal preferences and a need to be in control.

AKA. I care about standards, I just don't care about your standards.

Code commenting is an activity that's hard to get just right; it seems there's either not enough, or none at all or even worse - too much!

I've found the following works really well: when doing something non-obvious then comment it immediately. Otherwise wait to see what arises from the Amigo Review. If the reviewer asks any questions, answer them with comments. Once the reviewer stops asking questions then the code is sufficiently commented.

Not just when doing something non-obvious, but also when not doing something that might normally be expected.

“We don’t need to synchronize here because <some invariant>” and whatnot.

Basically, whenever a casual observer might think there’s a bug, either because you’re doing something unexpected or not doing something expected.

I like this process, although I'm not as consistent as I'd like to be about answering in comments rather than in the review tool. As an addendum to this, I have a general rule for code review: the developer doesn't get to push back on any (correct) request for comment/renaming/clarification. I'm prepared to accept that the fact that somebody felt the need to ask something is proof enough that the code in question isn't as obvious as I thought.

I worked for a place that had drunk the kool-aid about comments being evil so hard that essentially none were allowed, ever.

The reason given was frequently "it distracts the brain when reading the code. It makes reading the code take longer."

I'm sure whatever effect the very occasional one line comment has on reading speed is negligible.

When someone cherry picks something someone said once at a conference and forces it on everyone with dogma, you're not going to get good results.

I generally agree with comments being evil. however I tend to just automatically skip any comments when reading code, so its not the end of the world.

what I dont understand is after all this time we dont have a richer framework for code metadata. all of those design, commit, review and issue discussions are basically lost once they are closed.

they should be indexed together. there's no reason why we cant use tooling to make this problem not just go away, but be substantially better for everyone.

> a richer framework for code metadata

I believe SourceGraph (sourcegraph.org) is working on exactly this

Sourcegraph CEO here. Thanks for mentioning us! BTW, we are Sourcegraph.com (not .org...we will see who's squatting on that domain). And our plan for doing this (high-level) is https://about.sourcegraph.com/plan.

Very interesting work!

I'm a big proponent of intelligent tools with development.

I've encountered developers who look down their noses at people who use IDEs and their features, seeing it as somehow proving their lack of ability.

My argument is we're in the business of writing software. We should believe in the ability of software to make things better, including software development.

Also, I've often had a feeling that we could be doing more with our tools, different ways of viewing code to assist in understanding it and editing it.

I think developer experience is almost as important as user experience at the end of the day. And that UXers who understand coding should be hard at work making our tools incredible to use.

So best of luck with your mission!

What I have trouble with is high-level comments. I very rarely find myself wanting to add comments to an individual file but very often to an entire sub-system. Comments like "this package has classes related to Foo which is like Bar except Baz. Foos are populated from this location and are used with service Qux". I struggle to find appropriate places for these kinds of comments.

You can add a README.md file for a folder that you want "comments" for.

I'll second this approach. I started doing this recently and current me has already thanked past me for doing it.

Many languages have a place for these 'module level' comments. The ones that I know of:

- Java has package-info.java

- Scala has package.scala

- JavaScript/TypeScript has the @file annotation in the doc comment at the top of the file

- OCaml has the doc comment at the top of the file

When you want to document an entire sub-system, it makes sense to put the documentation in that sub-system's root module.

And Python has __init__.py in the module.

The most ignorant thing I've heard someone say about comments before is how it's the worst thing you can do to code. The idea was that code should always be self explanatory. Let's just say I left soon after.

  //  I use the syntax for single-line comments
  //  even for multi-line comments,
  //  so that it's easy to temporarily comment out
  //  a block of code

Found somewhere:

  Comments are called for!
   Why not in haiku format?
    That will keep them short.

Isn't it easier to put /* and */ around the block than to put // at the start of every line?

The problem with block comments (e.g., /* and */) is that when you nest them inside eachother, it's not always obvious what should happen. Of course, you shouldn't do this anyway, but line-style comments removes this vector of issues.

Most editors have a simple shortcut that adds // to each sentence in a block. It's faster when you want to uncomment certain lines in between instead of having to update the comment blocks every time.

It's striking how some of the most legendary programmers we know of, all say this one thing: comment your code. And yes, I realize that there are plenty of others who are not crazy about comments. But I'm pretty comfortable siding with Don Knuth on this one.

- Donald Knuth invented an entire system of programming ( http://www.literateprogramming.com/ ) just to be able to write better documentation for his code

- Jamie Zawinski: 'I always wish people would comment more, ... You’ve got to say in the comment something that’s not there already. ... what is this for? Why would I use it?'

- Brendan Eich: 'It’s at the bigger level, the big monster function or the module boundary, that you need docs. So doc comments or things like them—doc strings. ... There is something to literate programming, especially these integrated tests and doc strings. I’d like to see more of that supported by languages.'

- Dan Ingalls: 'As soon as I have it working, I’ll write some comments. And if I like what I’ve done or it seems like it would be hard to figure out, I’ll write more comments.'

(Quotes from Coders at Work, Peter Seibel)

Hello, I'm commenting on an unrelated thread but would you still go by your statement here- https://news.ycombinator.com/item?id=15113715 and claim that Bucklesript is better than Scala.js? How much scala.js have you written?

I wrote a fairly large Scala.js app around the 2016–2017 time frame. I used David Barri's scalajs-react and the Scala.js Material UI bindings. Keeping in mind that compile times and optimizations may have improved since then–even so I would say BuckleScript's combination of language power, compile speed, size and quality of output JS, and JavaScript interop are quite simply the best in the business. In my mind there is simply no comparison. There is a quite well-paved path for writing React apps in BuckleScript/Reason by now, and even back in late 2016 I played around with the quite young BuckleScript project and immediately saw how much simpler it would have made things. If I ever have that kind of power, I would certainly push for it :-)

Thank you!

Remember to say WHY, not what!

> use American English

What is the reasoning behind this?

Consistency, probably. American English is the de facto standard language of programming. I write prose in Canadian English, erring toward British English when it's ambiguous. I write code in American English, because that's what most people expect. But I write code comments in Canadian English, because comments are prose, unless it refers to a variable name or something.

Someone being pissy that the British write (for example) centre and not center.

A bogus argument in my opinion.

I assume because the PingCAP projects are owned by PingCAP, an American company.

I assume the goal is consistency in spelling. Choosing American over British is probably just personal preference.

Most code use American English anyway (e.g. "background-color" in CSS), so it's usually a good idea to standardize around it.

I cannot write Americanised versions of words without incredibly concious thought about doing so, which breaks my train of thought. I find it easy to understand what is meant by color, why shouldn't our cousins over the Pond be able to understand what a colour is? (stupid example, but you get the point)

I used to be afraid to comment things because I thought everyone was smarter than me and by commenting what is surely obvious to them, I'm revealing how incompetent I really am.

Idea: every comment should have an assert attached to it. If after a code change assert becomes false, your IDE warns you and you must rewrite the attached comment.

You can't express every comment as an assert statement. And when you can, then what's the point of the comment?

I don't think

    // x should never be 0
    assert x != 0
is all that different from

    // set x to 1
    x = 1

The point is not to reduce the comment to the assert, but to detect when the comment is outdated. If a comment is referring to another function in the code, then assert would check that the function exists. If you refactor the code and remove the function, you'll have to refactor this comment as well.

A docstring describing arguments to a function would check that the list of arguments to a function is exactly as described in the comment. If you add a new argument, you have to edit the docstring.

And so on.

That actually sounds interesting, but I dont see why youd need an assert for that. If you markup the comment appropriately so the compiler or a linter sees when you mention another function and identify where you describe the arguments in a docstring, it could do these checks statically.

In fact, Im pretty sure Javadoc and other tools already have that markup.

# Markdown Literary programming that don't break the syntax of any programming language [1]

## Comment Area Markup Method

Literary Programming, Programming was the first, Literary was the second.

the main purpose of the Code comment area markup method is to live Preview directly in the Code Editor Preview panel without exporting or any preprocessing.

Just add a line comment character of the programming language before each line of Markdown.

In the comments of code, you can draw flowcharts,tasklist, display data visualizations, etc.

The method is to add extension instructions in any programming language comment area:

- markdown

- manual eval code, live eval code, print result, display data visualization and other directives

When previewing or converting a format, you only need to simply preprocess: delete line comment characters with regular expressions, example: `sed 's/^;//' x.clj`


- line comment character of Clojure(Lisp) is `;`

- line comment characters of the current file type can be obtained from the editor's API.

when we edit the code, we can preview the effect in real time. Editing literary code has a live preview panel like most markdown editors.

## Advantages

- fast, live, simple, no interference.

- It don't break the syntax of any programming language, you can compile directly. comment area markup method can be applied to any programming language and any markup (including Org,rst, asciidoc, etc.), which is the greatest advantage.

- you only need a single line code to delete line comment characters using regular expressions, then you can use any Markdown parse or converter.

- Support any code editor that supports Markdwon Live preview, allowing the source code of any programming language to become rich text in real time. In the code's comment area, You can use the markdown to draw flowcharts, tables, task lists, and display images on the live preview panel, enhance the readability of your code.

- If you extend the Markdwon tag, you can implement the eval code, print result, display data visualization and other instruction tags, to achieve live programming, live test.

- When writing (reading or refactoring) code files, It can modify and live preview directly in the editor without exporting or any preprocessing.

- Reliable. Maximum code accuracy is guaranteed, and markup language errors do not affect the code.

- It hasn't interfere anyone to read the code.Markdown is simple, so if it doesn’t have syntax highlighting,it doesn’t have much effect on writing and reading. And having a gray comment area doesn’t affect reading code, especially for people who don’t understand the markup language.Strict distinction between markdown and code, and gray comment area can reduce the amount of information in the source code file, conducive to reading code.

## Disadvantages of traditional literary programming

- because traditional literary programming users are mainly technical writers, speakers, technical document Maintainers, Style is the document priority, greatly increase the amount of information in the code, interfere with the code reading, especially for non-literary programming programmers are unfriendly, or even unreadable, so there are very few applications in the field of programming.

- not universal, specific programming languages and markup languages.

- Requires a complex pre-compiler.

- Complex to use and high learning costs.

- Not intuitive.

Therefore, the method described in this paper, in addition to the document-first genre of traditional literary programming, has innovated a new genre ---- code-first genre, so that literary programming in the field of programming Widely used as possible.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact