
Code only says what it does - mjb
http://brooker.co.za/blog/2020/06/23/code.html
======
dkarl
_This is a major problem with code: You don 't know which quirks are load-
bearing. You may remember, or be able to guess, or be able to puzzle it out
from first principles, or not care, but all of those things are slow and
error-prone._

This is a problem from both the negative (not breaking things) and positive
(knowing how to add things) perspectives. The positive perspective was written
about by Peter Naur in one of my favorite software engineering papers,
"Programming as Theory Building," in which he describes how the original
authors of a codebase have a mental model for how it can be extended in simple
ways to meet predictable future changes, which he calls their "theory" of the
program, and how subsequent programmers inheriting the codebase can fail to
understand the theory and end up making extensive, invasive modifications to
the codebase to accomplish tasks that the original authors would have
accomplished much more simply.

I highly recommend finding Naur's paper (easily done via Google) and reading
it to understand why divining the "theory" of a codebase is a fundamentally
difficult intellectual problem which cannot be addressed merely by good
design, and not with 100% reliability by good documentation, either.

~~~
hinkley
I think I have to disagree with Naur on this, in that people using the
Scientific Method don't ship their theories, but we do.

As a scientist who has just succeeded in testing a hypothesis, I now need to
go back and document a simplified series of steps that should lead any
independent party to the same phenomenon. Once we are on the same page, they
can confirm or refute my theory based on their own perspectives on the problem
space.

During that process I may discover that I based half of my experiment on
another hypothesis that I never tested, or was plain wrong. Now I've
discovered my 'load bearing' assumptions. I may discover something even more
interesting there, or I may slink away having never told anybody about my
mistake.

Essentially, scientists still 'build one to throw away'. We haven't in ages.
And my read on Brook's insistence that we build one to throw away is that it
was aspirational and not descriptive. And notably, he apparently recants in
the 20th anniversary edition (which is itself 25 years old now):

> "This I now perceived to be wrong, not because it is too radical, but
> because it is too simplistic. The biggest mistake in the 'Build one to throw
> away' concept is that it implicitly assumes the classical sequential or
> waterfall model of software construction."

So we are very much at odds with the scientific method. And we have the
benefit of hindsight. We have seen the horrors that can occur when you take
the word Theory out of context and try to apply it to non-scientific theories.
We should learn from the mistakes of others and summarily reject any plan
where we do it too.

In other words: next metaphor, please, and with all due haste.

~~~
Jtsummers
I think I have to disagree with you on this one, I've used the scientific
method (though not in an explicit checkbox-y way) plenty of times to ship and
debug code.

In particular, since (as I've said on this forum many times) I work primarily
on the maintenance end of software. I don't know what the creators or previous
developers were thinking, especially with more recent projects (documentation
quality has really gone down hill, people call autogenerated UML diagrams
"design docs", but without commentary they only reflect the state of the
system, not its design). I have to try different changes based on my
understanding of the system and see the consequences. That is, I form a
hypothesis about what will happen if I do X, I do it, I collect the results
and I've either confirmed my hypothesis, refuted it, or left it in an
indeterminate state. I form another and repeat. Over time I build up a model
(theory) of how the system behaves and should be updated/extended. Since I
can't keep tens of thousands of lines of code in my head, let alone hundreds
of thousands or millions, I always only have a model (theory), because I never
have the totality of it in my mind. Though good code, with good use of
modules, makes it easier to keep large chunks in mind, I still have to have a
model of how those modules work and work together.

Hell, this is half (or more) of testing for older software systems. You put in
some input and see if you get the output you expected. If you don't, you
evaluate why (is my model wrong or is the system wrong) and repeat.

~~~
hinkley
I don't mean 'use' as in a #7 torx wrench. I mean 'use' as in air.

I have shipped bug fixes using organized hypothesis checking as well.
Especially sanity checks (make sure the instruments are working). But it is
not the software developer's default behavior, and I'm sure you've lamented it
just as I have. You and I are tourists, and many around us aren't even that.
So when we speak of whether 'we' apply formal rigor to our work? Is it still
rigor when there is no discipline? I don't think rigor is something you do on
a random Thursday. It's something you do all the time.

So no, 'we' do not _use_ the scientific method. We dabble.

And so when someone like Naur tries to summarize software with a line about
theory proving, he's not speaking about everybody. If he were honest he might
not even be speaking accurately about himself.

ETA: But he's talking about the long arc, not a single bug fix. That we are
circling in on what the actual problem is and feeling it out with code. But
since we stop at "if it ain't broke don't fix it", we never actually
crystallize the thing we built. We never test the hypothesis we suppose that
we have created. We have spot checked this organic thing that never gets
pinned down and might actually be DOA. We hope the evidence we are wrong is
just 'glitches' or problems with the user's machine. Until someone comes to us
with a counter-proof that shows unequivocally that we were wrong.

Which leads to problems like those mentioned in this comment tree.

------
ris
So many times this.

"Clear code shouldn't need comments" \- clear code can make it easy to see
_what_ but it can never say _why_. Let me know what corner cases you thought
about when you wrote this.

"The comments are in the commit messages" \- almost nobody _ever_ goes looking
for them there, they're effectively invisible from `git blame` when they
_remove_ lines, people rarely make fine grained enough commits to be able to
target specific lines or blocks sufficiently with context.

"Nobody ever updates comments, so they're always out of date" \- don't hire
such people. It is an crucial task _resolving_ the meaning of comments to make
sure everything still makes cohesive sense. Neglecting to do this will often
lead to commits that don't quite grok any subtleties of the original design.
Don't make the _reader_ of the code do the job of trying to piece together the
scattered history of 5 different people's intentions. Of course, it's also
useful to try to keep comments as close to the code in question as possible so
that references which need updating are obvious to see.

~~~
sanderjd
> _almost nobody ever goes looking for them there_

I've seen this claim a number of times and it's always so odd to me. One of my
most common activities each day - certainly more common than the activity of
writing new code - is reading the commit history for different files. It's
always surprising to me to hear that this is an uncommon thing to do.

Edit to add: But I also think comments and documentation of all kinds are
good. I don't advocate good commit messages instead of comments, but rather in
addition to comments. The more documentation the better.

~~~
aflag
Which tool do you use to view commit messages and revisions for each file? I
think one of the reasons this is uncommon is due to lack of tooling (or wide-
spread knowledge of them). I'd really like to be able to easily see all the
previous commits that affected a specific line while I'm editing code. But I
usually have to resort to interacting with git, rather than having something
popping up on my screen (I use pycharm and vim regularly).

~~~
kevinschumacher
Not the person you were replying to, but in PyCharm:

\- right click on the line number, click Annotate; this gives you the commit
date and author in the gutter

\- hover over the date/author name; this gives you the commit hash and message

\- click on the hash itself in the popover; this shows the git commit graph on
the Version Control tab

\- right click on the date/author name, click Annotate Revision; this opens up
the version committed then, with its git blame in the gutter.

~~~
aflag
That's nice, and setting the options to detect movements really improves it.
It'd be nice to see the commit message, rather than the author, without the
need to hovering, though.

------
trixie_
If I had a nickel for every programmer who thought their code was so good it
didn't require comments... or thinks somehow that unit tests make up for
comments... only to come back years later and have no idea why the logic is
working how it is.

~~~
selykg
This was a thing at my last job. The lack of comments always made me cringe.
While not every line needs a comment, there's a reason why comments would be
useful, especially in security software.

I learned a lot of bad habits there and I'm glad I no longer work there.

Their excuses were literally "no one reads comments" "no one keeps comments up
to date" "my code is self documenting" etc.

~~~
trixie_
The self documenting one always gets me. It's not like the person has never
read code that is difficult to understand.

Yet they think it is just other people who write 'bad' code. Their own code
can't possibly be bad. In fact it is so good that it 'documents itself'. It's
just a statement that drips with arrogance.

~~~
selykg
Ya, but with a security product there are important considerations and while
we commented on areas where we fixed a bug due to something fixable, it was a
pain in general.

Decisions maybe don’t belong in code but with a security product I feel there
has to be some quality level of comments to explain why and how. Someone
coming along later can’t be expected to be in the authors head and the author
won’t remember all this stuff years later when it might matter or need to be
rewritten.

Such a shit show

------
keithasaurus
I agree with most of the points made here, though I think some of the bias
toward up-front exhaustive documentation is probably not a good fit for most
of the projects I've been a part of. Prototyping often reveals necessary
changes due to resources constraints, or to unconsidered corner cases.
Documentation needs to be a living thing as much as the code, and I think that
pushes you toward documenting within the code more than externally.

One of the more important points the author brings up is that authorial intent
and the 'why's of comments are the most important. A corollary I'll bring up
to that is that the 'what's should be encoded in tests. Tests can be great
documentation, and they have the added benefit of informing developers when
the goals of the software is being voided (when they fail).

What has worked for me is conceiving of documentation this way:

\- Design Documents: Historical use only, not to be updated.

\- Readme: intro to project; why it exists, overview of how it's meant to
function, how to edit, etc. Tends to be updated when big things change.

\- Code comments: why something exists, what considerations were made in that
code's creation

\- Test descriptions and comments: binding goals of previous development to
future development

This approach has done a pretty good job of keeping documentation from getting
too out-of-sync with code while enforcing basic business objectives, still
tilting the balance toward development rather than documentation.

~~~
jariel
This is quite good actually.

I would add that some elements of design are worth keeping up, like a general
architectural overview and the details of some things, like state-machines or
specific kinds of statefulness.

It can be done in the comments, at the package level, that way developers can
keep it up to date without much fuss.

------
ChrisSD
A big issue with documenting what the code does is that the code and
documentation can very quickly fall out of sync. As this posts says, it's much
more useful to document the intent of the code, or _why_ there's this mess of
seemingly hacky code (see issues #80681, #82108, #66065).

Also be wary of unit tests that are overly tied to the specifics of an
implementation. These can be worse than useless when it comes to changing
code. I.e. asking "why are my tests failing?" and finding out it's only
because I breathed near the code.

~~~
mrkeen
System tests too!

The last code-base I worked on had do-everything system tests (with
unexpectedly good coverage, I'll admit). They were so slow and passed often
enough that I didn't immediately spot flakiness.

I got suspicious when the tests started failing regularly when I added new
unrelated code.

------
macintux
I’ve had similar arguments here once or twice. There’s so much context that
isn’t deducible from code.

You rarely need to document the “how” (that much should be evident if the code
is well-written) but you absolutely should document the “why” (or, often as
important, the “why not”: what code _could_ be here but isn’t).

~~~
mjw1007
I agree that you shouldn't document "how", but when I'm reading unfamiliar
code, I find I what I miss is "what", not "why".

To my mind, in well-written code each function should be documenting its
contract: what it assumes, what it guarantees if that assumption holds.

(And if it turns out that what you'd write is just the function's name and its
parameter and return types with a few grammatical particles added, maybe it's
OK to omit the documentation.)

Then if you find that in order to do that you have to write a little essay, or
you need terminology that doesn't correspond to a named thing in the codebase,
or you're repeating yourself in multiple comments, that tells you something
you need to put in higher-level documentation.

~~~
gimboland
Yes. Floyd-Hoare logic is worth learning about in this sense — a formal system
for imperative languages where the rules essentially say "given preconditions
X and code Y, if X is true before you run Y, we guarantee Z to be true
afterwards". I've never ever ever proven code correct using this formalism,
but that way of thinking permeates my every action as a programmer.

------
kevsim
I think the most reliable thing to see what quirks are load bearing is tests,
particularly regression tests. You change something, you break expected
behavior, you fix it and you write a test. Now the next person may wonder if
some quirk is load bearing, but they'll know for sure when running the
regression tests.

Additionally, I'd say naming things, though one of the hardest things we have
to do, can go a long way towards explaining the "why". Some programmers I've
worked with have a knack for knowing just went to break a giant line into
separate lines, giving local variables great expressive names, and all of a
sudden the code reads a million times better.

All that being said - I agree with most of the points of the article and do
push my teams to do a lot of upfront writing down of designs. These things
tend to go stale, but in the moment they're a great tool for fleshing out
ideas and sparking discussions.

------
aequitas
My pathway into software development was through electrical engineering and
embedded systems. So I don't know if this applies to other ways into software
development as well. But what really stood out to me in the beginning was how
useless code comment where. I would almost always see code like this:

    
    
      x = 1;      // assign 1 to x
      y = x * 2;  // multiply x by 2
    

I don't know if it was because they thought electrical engineers needed to be
explained everything about code. Or if it was because all teaching material
used this style and people just copied it. But I never understood why you
would add comments like this, but had to do so anyways otherwise I would not
pass my exams.

It took me a while to learn that comments are the tool in which you can
express your expectation of what the code should do.

~~~
blt
Those samples were probably written by someone who had learned how to code in
assembly language. In assembly, this kind of comment reminds the reader of the
semantic meaning of what is in each CPU register. It could be very useful.
Then the person learns C and never drops this habit.

------
phendrenad2
It gives me no end to pain that "Comments are lies because they aren't code"
is a fad that we're currently suffering through as an industry. For decades
prevailing wisdom was that comments were a net benefit, and now in the last
few years this trend has become prevalent. How much perfectly-good code is
going to have to be rewritten from scratch in 10 years because no one
remembers what it does?

~~~
wanderr
If no one understands what it does, it's not perfectly good code is it? Of
course there are rare cases where code cannot be simplified, made more
readable or self explanatory and in those cases comments are vital. But the
aim should be for the vast majority of code to be easily readable by humans.

~~~
Jtsummers
Essential versus accidental complexity.

Perfectly good code can be unclear because of the accidental complexity
included within it. Memory management, error handling (especially in languages
with less expressive type systems), configuring hardware/database/network
connections, etc. Those things are important, but they prevent the essential
portion of the program from being expressed on its own.

Type systems, a brief example: C versus Ada. Implement a network protocol
where the data packet has specific n-bit sized fields with ranges less than
the maximum for that size. You can easily do this in both languages. But in C,
you'd either need to add bounds checking to all of those fields or risk
letting errors propagate. That error handling obscures the essential portion
of the program. In Ada, you make a type that is n-bits and only accepts values
of the correct range. The errors can still exist in received packets, but the
error checking is partially elided from the code because the type system
itself can catch it.

There's nothing _wrong_ with the C code, and there's nothing _wrong_ (many
will disagree with that) with choosing C to implement the protocol. But it
will increase the complexity due to factors beyond the inherent, essential
complexity of the network protocol itself.

------
sosborn
The marketing corollary is that metrics only tell you what happened, they
cannot tell why. Yet somehow, entire companies have been built on the promise
that they can answer the "why" by looking at the metrics.

------
voiper1
Relevant: Writing system software: code comments. from
[http://antirez.com/news/124](http://antirez.com/news/124) \+
[https://news.ycombinator.com/item?id=18157047](https://news.ycombinator.com/item?id=18157047)

------
ChrisMarshallNY
I comment my code[0].

I don't particularly care what people think about it.

I will say that I have turned over a _lot_ of code, over the years, and
virtually _never_ get asked about what it does. When people ask me about my
code, I generally tell them where to look, and contact me if they need
explanations.

I don't get contacted, so I guess they could figure it out.

I also tend to write a lot of supporting documentation.

We do have to be careful, though. Documentation can easily become "concrete
galoshes"[1], so things like header/auto documents are pretty important.

[0] [https://medium.com/chrismarshallny/leaving-a-
legacy-1c2ddb0c...](https://medium.com/chrismarshallny/leaving-a-
legacy-1c2ddb0c8014)

[1] [https://medium.com/chrismarshallny/concrete-
galoshes-a5798a5...](https://medium.com/chrismarshallny/concrete-
galoshes-a5798a55af2a)

------
sktrdie
What I’ve noticed is that code is the only medium of communication that is
non-ambiguous. Designers, product people, stakeholders, etc, all use
_ambiguous_ mediums. It’s impossible to understand just how pedantic and
explicit you need to be when writing code versus, say, giving your human
colleagues instructions.

So it's 0% ambiguity for code. 100% ambiguity for the rest. In other words,
communicating with the computer is overly pedantic, and code bares all the
frustration. Can we make it more fair? 70-30 perhaps?

Rather than concentrating on mediums that help with communication (as this
post mentions: Design Docs, TLA+, comments…) I want a new medium that allows
me share the burden of overscrupulousity with the rest of the people in my
team, and not just developers.

~~~
webmaven
Pseudocode, perhaps?

------
deeg
When I teach coding I tell the students to document their code well but don't
document _what_ the code is doing it. That should be evident from the code
itself (if it's not, then consider a refactor). Rather, explain _why_ the code
is doing it.

------
corbins
The counterexample here is the declarative style of programming. Most ideally
this looks like an executable spec and is documentation itself.

~~~
ativzzz
Sure, but then you offload the complexity to the functions used as the
declarative building blocks, so you do the documenting in a different place,
though you will probably end up documenting complex declarative business logic
anyway. (like why is process X that is so similar to process Y require Z
different declarative blocks)

------
client4
This article fits nicely with the recent post discussing how Linus spends the
majority of his time writing emails. For projects with n+1 contributors,
inter-contributor communication is just as important as what code is being
written. Emails, commit messages, code comments, docs, are all just different
ways to communicate.

I always think of the Underhanded C Contest[^1] as my favorite example of
readable code that doesn't act as expected after a quick read.

[1] [http://underhanded-c.org/](http://underhanded-c.org/)

------
bob1029
If you want your cake and also the ability to consume it, you might want to
consider what functional programming can do for you regarding the ability of
your codebase to self-document itself. Having type systems that are very
closely aligned with the abstract business model is the best way to avoid
frustration when you are trying to figure out why something is the way it is.

The trick is understanding that functional vs imperative is a spectrum, and
trying to force 100% on one side or the other is how you wind up killing any
project. We find that keeping our business-level abstractions functional with
the underlying infrastructure code imperative provides the best of both
worlds. The code that is changing and analyzed most frequently is in the
functional domain, whereas code that we touch maybe 1-2 times per month lives
in an imperative domain (but sometimes functional wherever it makes sense here
too).

~~~
ajuc
I find functional code much shorter and cleaner, but also when you want to
change something along a new axis you need to do a much bigger rewrite than
with imperative/procedural/object-oriented code.

OO code: "here's a detailed and long-winded description of what happens that's
hard to understand. Ignore 95% of it and change that 1 little detail and hope
for the best"

Functional code: "here's a concise and easy to understand description of what
happens, understand it fully, throw it away, and create a new, just as clear
and concise description of what should happen from now on"

------
cloogshicer
The huge value that I see in a formal specification language like TLA+ is that
we could have a precise way of communicating the problem in a way that is
agnostic to the implementation language.

Imagine something like StackOverflow, but instead of posting a question, you
post a formal spec. Thinking even further, you could then find a way to
combine/interface these specs and build something like a global database of
computational problems.

We're currently doing this already with StackOverflow, but we're focusing on
the implementations, not the problems themselves.

Please correct me if there's a mistake in this line of thought, I'd love to
know.

~~~
jackhiggs
What you're talking about there is a Model Repository. We're building one at
the bank I work at, except because our modelling language (or meta model) is
based on OMG's MOF we can generate artifacts (code) from our models. You can't
do that with TLA+ as far as I know. It's pretty powerful - you can compose
models together very easily, as well as generate loads of useful things for
data-in-motion.

~~~
cloogshicer
Hey, thanks a lot for your response! It's really hard to search for abstract
ideas like this if you don't know the terminology (like Model Repository), so
this is super helpful. This is a very interesting topic for me, may I ask you
a few questions? I sent you a request on LinkedIn.

------
nardi
False. If we used only meaningless symbols like “A”, “B”, etc. for names, then
it would be true. But if I have a method named “addItemToCart”, then I know
what it _ought_ to do. If it does NOT in fact add the item to the cart, then
I’ve found a bug. It’s true that a short method name might not capture all of
the subtleties, but usually the variable names within the method can give you
an idea of what the intent of the programmer is as well. Obviously there are
still things you should write comments for, but really well-thought-out names
can get you surprisingly far.

------
saagarjha
About the only time I use comments are to delineate a block of code that for
whatever reason can’t be a method or because I am working around some bug and
I think I might be tempted in the future to remove that code as “useless”.
(Think “the dispatch_main here ensures that the code runs on the next runloop
iteration, which is necessary for the animation to work”.)

------
dllthomas
Code is a mixture of what and how (and with IaC, sometimes who and where).

I agree that "why" is the role of documentation. I've been experimenting with
tying the two together with (machine checked, automatically surfaced) cross-
references, so we can better know what bits of documentation a test supports,
&c. I haven't yet gotten rigorous about it.

------
softwaredoug
“works as coded”

My last job we used to say that if asked whether our code was correct or bug
free ;). Often the devs get thrown under the bus if something doesn’t work
“correctly” when in reality it might perfectly pass all unit tests based on
the best understanding of the problem.

Of course whether we could get any support to help define “correct” from
anyone was another matter...

~~~
xsmasher
This fits my theory of programming, and theory of bugs - We take a problem,
create a plan, and then write code that implements that plan.

Defects can come from:

    
    
       * having/being given the wrong problem
    
       * right problem, but plan does not actually solve it
    
       * right plan, but your code did not correctly implement it

~~~
webmaven
Then of course there are the maddening cases of accidental correctness. ie. a
bug in your implementation of the wrong plan does the right thing.

------
smitty1e
I'm looking at some ETL code from the dark past, and it uses low-level Db code
to shovel in .csv files like a boss.

I'm all' "Well, I guess that code that obtains the .csv files must be rock
solid."

Famous last words.

A freshly done system later, I'm left to infer that there had been some other
cleansing to to which I was never privy.

------
ThomasDeutsch
The "why" can be expressed in code.

Think of "Event Storming". A great way to talk about the "why" and to grow an
understanding of how a problem can be solved. The result can be multiple
"flows" that describe a series of events from the first command to the
expected outcome.

We have the option to directly translate a flow to code. And by keeping the
flow in one place, we also keep a direct mapping from our code to the
EventStorming-results. The "why" will not be lost. The code can contain
multiple flows and every flow can have a scenario-like description like: "the
user is able to select a product"

This is what my passion is all about. To keep the "why" in one place. This
also enables better collaboration between multiple disciplines (like UX <->
DEV)

This is the idea behind scenario based programming. I am working on an open
source project:
[https://github.com/ThomasDeutsch/flowcards](https://github.com/ThomasDeutsch/flowcards)

Write me a line if you would like to get involved.

Have you found other solutions for this problem?

------
bryanrasmussen
Should we be looking out for lying code?
[https://softwareengineering.stackexchange.com/questions/2023...](https://softwareengineering.stackexchange.com/questions/202352/should-
we-be-looking-out-for-lying-code)

------
rcshubhadeep
Here is little demo of codeBERT -
[https://youtu.be/oDqW1JHmaYY](https://youtu.be/oDqW1JHmaYY)

codeBERT is trying to predict if a certain function and its docstring are
associated or not.

Thought about sharing it, I guess it is interesting in this context

------
rawoke083600
Ja to me the "rule-of-thumb" is still: "Code Is The How, Comments Are The Why"

------
bobbane
Any thoughts on applying this article to...

programming languages defined by a single portable implementation?

------
AtlasBarfed
So.. code is what it is?

