Hacker News new | past | comments | ask | show | jobs | submit login
Remove Half of Your Documentation (orsinium.dev)
52 points by nalgeon 7 months ago | hide | past | favorite | 83 comments



The post talks (mainly) about documenting code, but what I usually miss is documentation about the whole repo/project/service. For any piece of code, I can manage to figure out what's going on (unless it's written in a way that's actually on purpose deceiving), but to understand how multiple modules work together (or why), to understand the purpose of the whole thing, that's rather difficult or impossible to know just by looking at the code.

So a big README.md file explaining the repo (why it exists, who are the interested parties, etc.) is better than the "little" documentation here and there about types, functions, examples, etc. Obviously I would love to have both, but the former is more useful than the latter imho.


I usually add in:

* How to build it

* How to install it

* How to run it

* What other software or special hardware it needs

You know, all the stuff I'll forget when I come back to the code after six months and would rather not have to figure out anew.


Dont forget:

* What it does * What it's for


Yes, that too. :) The point is to document the sorts of things you wish had been documented for you to orient yourself when you have to pick up a strange code base. Because that person is you in six months.



It only works in monorepos, what about distributed repositories/services then?


If you make a separate repo for every five lines of code, you can make another one for the docs.


You probably still have some entry point/example client that use those repos/services


You've already lost, best recourse in this case is to `rm -rf ~/repositories/`. Then make yourself a cup of tea, take a deep breath, and start over.


and how do you manage the different lifecycle of services (versioning/build)?


"Remove half of the signs, they become outdated whenever we change the roads"

Software is written by humans for humans, and documentation is a fundamental part of it, and needs maintenance just like the rest. It is beyond me why one would want to maintain only half of the thing and say "discard the rest"


Which specific points in the post did you disagree with? "Just keep the documentation up-to-date" sounds about as practical as saying "just don't write bugs."

The impact and importance of documentation varies from project to project, but it is hard to see these hard-line takes on the importance of documentation as anything more than grandstanding. In the real world, where consumers and businesses exchange money for goods and services, and where organizations, individuals and teams must wrestle with market conditions, deadlines, KPIs and so on, documentation is a supporting act.


> "Just keep the documentation up-to-date" sounds about as practical as saying "just don't write bugs."

And that’s why nobody ever bothers with type checkers, code reviews, unit tests, CI and fuzzing in order to prevent, detect and fix bugs before shipping to production. You’re right, it’s totally impractical to apply any discipline to software development.


The idea isn't "put no effort into keeping documentation up to date." It's to acknowledge that, in the real world, documentation does go out of date despite the best efforts of project maintainers. Sticking our fingers in our ears and saying "just keep it up to date then" is simply not a real position.

Despite a huge volume of tooling and methodology around reducing bugs, bugs still exist. This is a reality. One with associated costs and mitigations that we acknowledge. Likewise, the article is suggesting we acknowledge the reality that documentation does go out of date, even when effort is being put into maintaining it. That's why the point immediately preceding "It gets outdated" is "It requires maintenance" ;)


And that's what the post is about. To avoid writing bugs, you add type checkers, unit tests, DRY and other principles that you sometimes ignore (once you're experienced enough).

Clearly, you're talking about the same thing. TFA basically say 'to keep the documentation up to date, you need discipline, here's what I do'. Unironically, the 10th point I never thought about, and will immediately apply


Honestly out-of-date documentation is most often better than no documentation at all


> Remove half of the signs

This is unironically a valid approach adopted in many places. That's the basic philosophy behind putting curves and road blocks to lower driving speed instead of imperative signs that people ignore.

Or making sure there's visibility in a turn instead of putting obnoxious warning signs etc.


I am all for improving the conditions so that signs can be removed (and this is valid with code too, e.g. a strong type system removes the need for a lot of documentation), but again we should not advocate for not documenting things, that's just asking for trouble.

And speaking of signs, we definitely don't have a good balance there either - in some places I experienced confusing situations where adding a few more signs could have helped.


Not to make it too much on road signs, but just plain removing signs to force drivers to think through confusing situations is a thing:

https://bigthink.com/the-present/want-less-car-accidents-get...

The same can of course apply to code: there's many instances where someone will assume they understand a tricky bit of code because they read a well written comment. That's a bad sign, as the code probably had a lot more nuance that was not covered by the commenter. And then did they properly understand what the comment is saying ? we now have a two level readability problem.

In general, I sympathize with the approach that making it easier for people to understand shouldn't happen in comments. It's frustrating for the people reading the code, but we're paid professionals, and frustration is better to me than lack of understanding or glossing over the actual behavior (I'd trade fewer bugs for hurt feelings any day)

Comments should bring additional info that complements the code in a meta way (link to bug reports, design documents, discussion threads etc.)


This ^

Update the road and make it safe and clear to use before removing the signs. Don’t tell road builders that they can avoid placing signs because most road builders don’t build clear roads.


Here in New South Wales, Australia, we switched to European-style alphanumeric route numbers about 10 years ago. You can still see signs with the old-style route numbers (just numeric, and inside either a shield symbol or a hexagon symbol) in more than a handful of places around Sydney.

If you look hard enough (mainly confined to secondary rural roads these days), you can also still find signs with distances in miles. Australia switched to kilometres about 50 years ago.

So, yes, updating all the docs is hard!


I am sure that people who design roads worry about distracting drivers with too many signs. In any case, when one sees five road signs in twenty meters of road, one is lucky if one of them registers. It is very unfortunate when similar worry does not occur with too many comments. It is quite possible to completely clutter out of view anything meaningful by writing too many comments.


> I am sure that people who design roads worry about distracting drivers with too many signs

I wish this was the case where I live.

You want, as a driver, the roads to be boring, as you are probably much better at recognizing patterns than reading signs.

But no, in Germany, they are busy optimizing for something I'm yet to discover (neither flow nor security, that I'm sure of) and every !! city is a new adventure with 1000 signs every corner.


When the signs say "this is a road" every 20 feet, there's not much point to them.


Sure, but then the advice we should give is "please please document the WHY", rather than "stop commenting"

In any case, I personally find it very useful when I come to a new piece of code to find comments that summarise what it's happening in a few words without having to parse 10-100 lines of dense implementation - I have so much cognitive ability I can apply in one day, and any effort to optimise its usage should be welcome


Those signs usually say the current kilometer you are on the road, which is extremely useful to report accidents or damage to the road or surrounding structures with large accuracy. This was specially relevant before everyone had a GPS receiver in their pocket, but is still useful - it's still easier for police or firefighters to go to X km on some road then to a GPS waypoint as its easier to communicate verbally and more likely that people will immediately know how to get there. Not sure about every 20feet though, different countries have different distances (every 100 meters in some).


When the road doesn’t have a sign to say where it’s going or how fast you can go you’re ducked


I've just learned a new english word: Delineator. Maybe you should too.



That's true, sorry, I didn't want to sound that aggressive.

Let me try again: have you ever experienced the importance of delineators when driving?


Actually no, I'm a city boy and use public transport. But delineators are not what I'm talking about, rather signs which say "this is a road" every 20 feet, those don't exist since they would be entirely pointless, like comments which mirror (hopefully) the line of code that follows.

There's a thing that happens when you start coding, you put comments next to the sharp corners to remind you what it does (because your code is obscure and non-obvious, you've just started doing this), then you want it to look tidy and professional, and occasional comments look unbalanced, asymmetric, so you add comments on the obvious bits to give a nice uniform look to it. You end up with

  // say hello
  printf("This is foo (version %s)\n", VERSION);
I'm as guilty of this as everyone else. The OP is saying try to avoid that, comments have a maintenance cost.


Oh, I'm not talking about comments, but literally road signs.

I've seen plenty of these comments myself in various states of correct- and usefulness (from right and helpful to redundant to wrong, because of a change to always been wrong) in the last (almost) 20 years. Some of them had been my own.


Until it snows a lot.


I fear these posts lead do the somewhat conscious "misunderstanding" that "documentation sucks and my code is self-documenting, I even have annotations..", similar to the "we are agile, no up-front planning is needed since it always turn out wrong".

The amount of repos ( private ) I've stumble upon where only some documentation exists on how to build it but not even a word on what it is, what it does and where does it fit is quite high.


>The amount of repos ( private ) I've stumble upon where only some documentation exists on how to build it but not even a word on what it is, what it does and where does it fit is quite high.

When you find something consistently sucks about real world projects, it is a signal that these businesses and/or the markets they operate in value things differently to you. It is possible they were all wrong in a very consistent and obvious way. Or it's possible that you are missing something.

As developers, we spend a lot of time with code and software internals. We are directly exposed to, say, the frustration of onboarding onto new projects with spotty documentation. This makes us slightly delusional about how important it is in the grand scheme of things.

Ceteris paribus, I'd bet on a business with lots of code and barely any documentation over a business with lots of documentation and barely any code.


Documentation for the product I'm working on recently hit 3k pages PDF (it's split in 6 volumes).

Most of this documentation was created because there was no product management. No, really... some customer-facing guy would come and talk to the guy who he knew from R&D, and they would cook up some stuff and add it to the product. And then QA or the customer would start pulling their hair because of how badly this stuff was designed. Then, because the bogus piece of code was already released, the documentation would "catch up" by listing all the gotchas.

Another way documentation was written was when there wasn't even time to roll out the piece of code as in the example above, and instead of writing the code to do something the documentation would be augmented with instructions for the customer for how to do that themselves. Most notably the whole upgrade thing is given to the customer as documentation, where in no less than some 30 odd steps a customer may upgrade or totally ruin their multi-million production system.

Splitting documentation into multiple volumes (by subject) had an interesting effect of increasing the overall volume substantially as some subjects had to be repeated across multiple files.

As a cherry on top, we also have a handful of diagrams in this documentation which are rotated 90 degrees. Because they didn't fit on a vertical page of PDF designed for printing. Computer users be damned!

All in all, our documentation doesn't serve the purpose of educating users on what they can do with the system. It's a liability mitigation. Deliberately written in passive voice it makes it hard to guess whether the system or the user are responsible for carrying out the action. And, spreading the contents out across multiple chapters in multiple volumes makes it more likely that the user can be led to believe that instructions were always there, but they never found them. Or that some defects were documented, but they didn't know better than to read the section were they were documented.

On the other hand, the documentation that is supplied interactively with the system (eg. for the command-line tools or GUI wizards) is very terse and usually simply repeats the name of the command or the title on the button.

BTW, we also have a customer support division that's bigger than any R&D department :)


Good to see the churners of modern software development have now convinced themselves that documenting their code is a problem because it requires maintenance. I guess it becomes easier to churn and break things when you leave no record of what your code is supposed to do. You don't need to worry about violating guarantees or expectations when your users have no guarantees!


It's not just that it "requires maintenance". The maintenance itself is duplicated effort, which is often not done well.

So you spend countless hours converting a perfectly acceptable language (code), into "plain english". The result is often so poor, and no better than the code, that nobody uses it (for good reason). Then what's the point of keeping the doc? so you just have these old relics, ancient books that provide no insight whatsoever, but nobody wants to delete.

IME a lot of older engineers are still stuck on this fantasy that we're going to eventually do all engineering "in plain english". Some of these ideas have thankfully died off, but 10-20 years ago, lot of these people were running amok making crazy demands that everything read like a book. Do technical work without doing it, basically.

This is a pipedream that's existed as long as software. Maybe we'll get there someday, but it seems like some things are moving more technical and less "intuitive". For instance, many of the same older engineers seemed to assume we'd stop using the command line.

It seems that most of us have abandoned this, but I've encountered a few that assumed we'd just be passed this by now. I can't completely fault them for this instinct.


In find that documenting the Why with code comemts helps. Keep it concise.

Something like:

// We no longer allow deleting users data immediately because we are required by law X in jurisdiction Y. See ticket #3256149


I wonder why I don't hear more people talking about how Copilot et al have just totally changed the dynamic around writing comments in one's documentation. When a 1 line English language comment generates 5-12 lines of code better than most undergrads can write, of course you'd expect people to start writing more of these kinds of comments.


In the context of this article, I think Copilot is a big contributor in useless comments. If you have already written a block of code, and then go back and start writing a comment for it, the suggestion is almost always a naive and useless repetition of what’s happening directly before, much like the ”# create a user” example given in the article.

I wish the comment suggestions could be disabled completely in the VSCode extension but it’s currently not possible: https://github.com/orgs/community/discussions/8062


They're not useless to me. I like seeing what the author was thinking in plain English.


For every function, it's great to have a plain language comment explaining What it does and Why it does it. You can also include How it works (if it's sufficiently complex), and When you expect it to get called if you want. If you care at all about your future self, or others who have to look at your code, the best way to help is to leave a verbose history.


Useless blanket advice. Of course you should write meaningful comments.


Not writing dumb/ephemeral comments is another thing altogether.

In particular it's easy to require comments everywhere. It's a rule people don't have to think much about, and most linters will provide out of the box rules to flag methods and class variables without comments.


OK:

   # remark on uselessness of blanket advice
   Useless blanket advice.


This is EXACTLY why it should be there. Because in the real world, after 2 years and 3 programers and 2 managers later it will became:

   # remark on uselessness of blanket advice
   if (!translation_db_connected) {
       throw "Cannot connect to translation database";
   }
   check_permissions();
   if (Date.week() % 4 === 0 && !is_admin()) {
       echo_xf(i18n.monthly_maintenance_warning);
   }
   echo_bf(i18n.context_translate('remark458'));
   check_dangling_block();
   if (WM_API >= 7) {
       nxptr_free(i18n.handle);
   }
And suddenly the useless comment is the only thing that makes sense


Or the useless comment is the only thing that doesn't make sense. Which is far more common in my experience.


Quite, according to the Excel spreadsheet, "remark458" is "it takes many a mickle to make a muckle", that must be a typo for "remark485", oh, or could be the comment is in the wrong place. If only there were a single source of truth.


Your problem is that you wrote a bad comment, not that you wrote a comment


This is not so very 'of course'. There is lots of documentation nazis out there forcing you to document that get_height 'gets the height'.


In a language without units of measure, `get_height` should be documented!


Indeed. We've just been through this a few days ago here with a customer. We have fields for gross and net weight, GrossWeight and NetWeight, and the official systems which we talk to expect this to be in kg, so "of course" our fields would be in kg as well.

Well, nobody documented units in the file exchange documentation, so customer sent us weights in grams because that's how they stored it.


Lockheed Martin didn't get the memo:

https://en.wikipedia.org/wiki/Mars_Climate_Orbiter


For languages with types, you can (and absolutely should) use types for your units (rather than raw double/float/int) and encode your conversions into your types. You can thank me later.


Specifically in languages with types and zero-cost abstractions. If it's not zero cost, then… there's a cost, which must be weighed.


get_height_in_inches


My point is that it's extremely common for people to go "this doesn't need to be documented", and then I come along as a user of the library and have to actually run the thing to determine its semantics.


get_acceleration_in_m_per_s_squared?


I get suspicious whenever I see some undocumented functions when others are documented. The question is if it has been deemed to trivial to document or if the original documentation had been wrong and removed (because for example `get_height` does not just return "the height").


In all the projects I worked on, the comment disappeared after a few years, or removed thanks to OPs method. And the code became unreadable.

I have never seen code that was self-documented. It’s always a mess because the devs are too lazy to write comments.


I’d gladly enforce documentation on every piece of code, even if that means some dumb documentation in some places


get_height_in_cm ?


"Why don't we just filter out the noise?" - asked the signal processing student to the teacher.

"If we knew what was the noise and what was the signal we wouldn't call it noise in the first place!" - and the student reflected silently.

I predict most people will remove the wrong half of their docs if they follow this advice.


If you can identify the half that was written to have been written, rather than to be read, yes, delete it. Nobody is getting any value from it anyway.


Documentation, like a car's turn signals, is about conveying intent.

Just because many people don't use their turn signals (or use them poorly — like the interstate driver who's still had their turn signal on for the last 10 miles), that's not an argument for removing turn signals and letting people just figure out that you're turning at the moment they need to slam on the brakes.

If I'm remediating an incident, I don't want to guess what the code is _supposed_ to be doing. I want to _know_, so that I can spot the problem.


I agree on the content but disagree with the title.

If you write tons of doctested documentation, that's a good thing?

Imho, there is never enough documentation. Even if it's outdated, it's still better than reading directly the source code and guessing everything from scratch.


> Imho, there is never enough documentation. Even if it's outdated, it's still better than reading directly the source code and guessing everything from scratch.

Hard disagree here. If the docs differ from the code, I'm left to figure out which is right.

Too much documentation often leads to people writing pseudo code in English. We don't let you just copy and paste blocks of code into other blocks, why would we let you copy code into English? It's the exact same problem.

Documentation should be expressing something you need to know about the code, not narrating it line by line.


>"If the docs differ from the code, I'm left to figure out which is right."

That's a good thing! It makes you think about what is actually the truth. If there was only code, you could easily get the wrong assumption silently. You wouldn't be confused, but you would also be wrong..... when the code and documentation doesn't match, you have to expend brainpower to figure out what's going on. This is a good thing, it also helps you learn:)


No, it means the person who made the change didn't do their job properly.


Please do not follow this advice. Once you become good at writing clear code, then seek ways to reduce documentation by using conventions, clearer organization and naming, etc. But you need to be a documentation master level 99 before even attempting to do that.


Thanks for mentioning https://github.com/orsinium-labs/arguard - didn't know that tool


The points in that post are all valid. Lots of docs are just a waste of letters.

Document the intent.

Why did you decide to spend the time to write this code. What shortcuts were taken (what code doesn’t do!), and why!


I've seen a lot of teams ask developers to write duplicate documents. The supplier provides decent docs, and I guess they want you to read the document and tell them what matters for our project.

Perhaps it's not that crazy a request, but you will likely have to use the supplier docs anyway. I'm basically being asked to make a worse version of the document the vendor gave me.

These are not crazy long documents either.


I expected to mostly agree with general statements that would have added nothing to my reflexion, but this post quality is better than the average and I managed to get some insight, ideas and links to tools I didn't know.

Its not the greatest or anything, but I want to point it out as most articles I read recently were good but 'unactionable' (if this word exist).


Keep up-to-date could be a job for an AI.

You wrote your documentation for a specific version. The AI could know detect code change and if it affects the documentation. Not 100%, but it could help to find the outdated docs.

In huge projects/docs it is not so easy to detect outdated documentation, especially when a team write the docs. I am excited to see what tools we get soon for this.


Can the AI detect intent, and understand what the customer wanted? Or whether a function properly implements a requirement?

Because that’s what should be in the comments.


Single handedly the most important documentation is HOW to use the API to do things. Having lots of examples helps.


Stupid and dangerous advice, too little documentation ALWAYS creates more problems and loss of time than too much

Figuring out completely missing information is harder than understanding even what a bad comment really meant, and especially what an outdated one did (git...)

Of course the right thing is indeeed to use code as expressive as possible, but then also to document carefully every useful thing that can't be explained by the code, and to keep the documentation up to date.

Yes, the current tooling makes it harder than it could be to notice, find and track the documentation you need, but dealing with that is on average a lot better than omitting informations


This is a hard proposal for most (in my experience), but:

Documentation first and living documentation.

As in:

1. Process is documented before it’s implemented. This will of course be at least slightly incorrect as it won’t know about edge cases yet.

2. Each person who implements has the ability to (quickly and easily) modify the documentation. This might be done in a “needs approval before changes are merged” way depending on the level of trust the org has in the person.

3. All people are trained to go to the documentation first and to go through it as they implement.

The end result is documentation that gets better over time and is the always up to date single source of truth that anyone can use.


Documentation is like lines of code. A useful tool but if you have too much of it, I start wondering why we need so much. Make your abstractions less leaky of you can, make your API higher level, and you don't have as much to document.


Done. I've also removed two-thirds of it and 99%


The pointless comment is definitely going to get worse with copilot, where you write a comment to prompt the generation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: