
Guidelines for writing readable code - ristem
https://alemil.com/guidelines-for-writing-readable-code
======
aequitas
My one addition to this would be: comment what you intent the code to
accomplish not what it actually does.

I can see what code does, it written there, in code! I want to know what its
purpose is so I can decide if I have to remove/refactor/fix it.

Often code doesn't do what was intended (a bug) and by only commenting what it
does the bug transpires into the documentation as well.

Compare "// loop over array of numbers and add them together" with "// compute
sum of all product prices in cart".

The first is only helpful if you are explaining a language to someone or are
not confident about the language. The second tells the importance of this
piece of code in the greater whole.

~~~
v3gas
And the second should, imo, not be a comment, but rather a function name.

~~~
mannykannot
That only works in simple cases. Most cases are simple, but the ones that
matter most are the ones that are not.

There is sometimes a great deal that can usefully said, about what a function
is for, that is only apparent in the context of its use. If the purpose of a()
is accomplished by calling a1(), a2() and a3(), it would not be unusual for
the purpose of a2(), for example, to only be understandable in the way a1(),
a2() and a3() work together to satisfy the purpose of a(). If the only words
you are allowed to write are the names of the functions, there is nowhere to
document this interaction (especially if the name of a() is supposed to say
what it does, not how it works.)

The more you break down a solution into functions, the more the satisfaction
of the purpose of the program becomes a consequence of how the functions
interact.

For an example, look at the comments to the function checkListProperties in
christophilus' SQLite example [1], and consider how you would convey the same
information through function naming. Then look me in the eye and tell me
that's the way you write code.

[1]
[https://www.sqlite.org/src/artifact/9711a7575036f0d3](https://www.sqlite.org/src/artifact/9711a7575036f0d3)

------
falcolas
Number 6 (DRY), and number 9 (use libraries), both lead to problem 16 (leaking
abstractions). Personally, I follow a rule I read in a blog some time ago and
don't try to de-duplicate code until I've copy and pasted it three times. That
way I know what the _actual_ pattern that's being repeated is, not just what
_I think_ the pattern is.

As for number 9, I'll agree with this, so long as it's in the standard
libraries (which are slow to change and unlikely to break/change functionality
when they do change). But when you use 3rd party libraries, you are still
responsible for that code in the long run. You'll have to ensure it remains
up-to-date, ensure that you're testing that it's doing what you intend for it
to do, and when it's eventually compromised or removed, you'll have to deal
with that too.

See: Left Pad.

~~~
kevinconaway
> Personally, I follow a rule I read in a blog some time ago and don't try to
> de-duplicate code until I've copy and pasted it three times

Known as "Three Strikes and You Refactor"[0]:

\- The first time you do something, you just do it.

\- The second time you do something similar, you wince at the duplication, but
you do the duplicate thing anyway.

\- The third time you do something similar, you refactor.

[0]
[http://wiki.c2.com/?ThreeStrikesAndYouRefactor](http://wiki.c2.com/?ThreeStrikesAndYouRefactor)

~~~
cimmanom
The problem with this is the cases when you don’t remember that you (or
someone else) already used this same snippet twice. In large codebases with a
lot of people working on them, this can expand to dozens of copies.

~~~
pc86
If you have dozens of instances of nearly identical code and not a single
person knows of more than 2 instances of them, you've got much bigger
problems.

~~~
kitd
This would be true if everyone on the team was equally involved from the very
start on a greenfield project, and they aren't all already fighting other
forms of tech debt.

Sadly the real world isn't so accommodating.

------
bognition
Very good stuff here. I totally agree with many of these points. Especially,
these 5.

3\. Simplicity is king.

5\. Naming is hard, but it's important.

9\. Prefer internal functions over custom solutions.

11\. Avoid creating multiple blocks of code nested in one another.

14\. Split your classes to data holders and data manipulators.

To be honest all of them really feed into number 3. The lower the cognitive
burden to reading your code the easier it will be the maintain and work with.

~~~
yonixw
I disagree with no.11 ... I prefer nested code of if's because as a reader I
learn which conditions are relevant and assumed, rather than check-it-all at
the start.

~~~
wongarsu
If I put all my early exits at the start, for the remainder of the function I
only have to remember the list of preconditions established by the early
exits. With deeply nested code I instead have to keep a stack of currently
fulfilled conditions in mind.

Sure, at any given point the stack is shorter than the list so there's less
things to keep in mind, but for me dealing with mental lists is much easier
than dealing with mental stacks.

------
agentultra
I also suggest _consider your data_. There are two main problems we face as
programmers: _managing complexity_ and _transforming data_. The former gets
the majority of the treatment in articles like this but I believe the latter
is often more important. We must consider the shape of our data, the usage
patterns of our data, and consider structuring our programs to accommodate
such patterns and transformations.

An example of where rules of thumb about abstractions turn into poorly
designed code is when processing large arrays or vectors of data. It is common
to model your data in some kind of product type like a struct or union and
process each item using some method of iteration. However if the main usage
pattern of our data is to perform this transformation over the vector within
an outer loop we've shot ourselves in the foot from the get-go. Our caches are
constantly swapping data. This is called, _premature pessimization_.

So don't get trapped into thinking that our job is _only_ about learning SOLID
principles or various aphorisms and mnemonics. The other side of our job is
transforming and moving data.

------
elviejo
My personal documentation guideline is, just answer the five W questions:

* What it does: The code answers this in a clear manner

* Who: did it, the code repository answers this.

* Where: is used, your IDE should be able to show this

* When: was it made code repository, when is used stack traces

* Why: why does it exist? that is the only part that needs to be in a comment.

~~~
slededit
I like to know how well the author understood the problem. Unfortunately git
allows people to rewrite history and hide the guess+check process many go
through.

------
jowiar
It’s a bit more meta than what this guide covers, but I wish something was
said about treating writing code as fundamentally a writing process. Most
documents like this focus on “what readable code is”, and not “how do I
achieve that product”.

Brainstorm, outline, draft, revise, rearrange. Have a high-level structure
that makes sense, clean up “sentence structure” and “word choice”. Often the
best path to “readable code” is “brain dump a mess, than revise” rather than
“write a perfect first draft”.

------
yonixw
I really don't agree with no 11. My reasoning is that the reader should know
at each line what conditions are assumed.

Returning in the start is as anti-pattern for me as goto in the middle. both
of them leave you with lines of code that only a debug session (in mind or in
action) can tell you what lines will be called. I will choose nested code any-
day if that means i can simply follow what `if`s are relevant rather than
guessing.

~~~
aw1621107
> My reasoning is that the reader should know at each line what conditions are
> assumed.

I don’t get it. How do early returns not convey assumptions that can be made
in the rest of the function?

> Returning in the start is as anti-pattern for me as goto in the middle. both
> of them leave you with lines of code that only a debug session (in mind or
> in action) can tell you what lines will be called.

This is really confusing to me. What do you mean by “in mind or in action”?

In any case, if you have runtime conditions you can’t tell what code will be
executed in general just by looking at the code regardless of whether you’re
looking at nested code or code with early returns.

> I will choose nested code any-day if that means i can simply follow what
> `if`s are relevant rather than guessing.

Are the conditions for early returns not relevant or something? What about
them leaves you guessing?

~~~
yonixw
they are all gathered at once.the following lines could assume any
combinations of those conditions. in simple examples it is not an issue but as
the function get more complex I found it excruciating.

~~~
aw1621107
> the following lines could assume any combinations of those conditions.

Do you mind expanding a bit more on this? From what I understand, if you have
early returns the only way you can get into the rest of the function is if
_all_ of the conditions pass, not any arbitrary combination of them.

Also, do you have an example of a more complex function with early functions
that you find difficult to read?

------
d--b
It’s generally good advice. But some items do not acknowledge that there often
if a trade-off to be made and the trade-off depends on the program you make.

Sometimes it is more readable to duplicate code or to have functions that do
more than one thing. If I have a function and I need to follow four or five
levels of function calls and piece things together to understand what that
function does, then I can’t read the code...

~~~
kilburn
You can definitely go silly by splitting your functions too much. However,
most times I hear this complaint, the real reason is:

> 5\. Naming is hard, but it's important.

In other words, you should be able to understand what a function does just by
the naming of the function itself. When you have a function that "calls 5
other functions" you should be able to understand what it does by just looking
at the 5 function calls, without needing to go inside those child functions to
check it out.

The only case where this doesn't apply is when you are tracking a bug and the
input/output of the function doesn't make sense. That's the only case I can
think of to actually dig deeper...

------
MaxBarraclough
I'd make the following addition, which I was surprised not to see:

24: Avoid the 'inner-platform effect' [0]

Make proper use of the machinery of the standard library.

This makes your code more readable, more portable, shorter, faster, and less
error-prone, and it doesn't even take any real effort - indeed, it makes
writing things _easier_!

The standard-library was probably written by someone smarter than you, almost
certainly someone more familiar with the language than you, and it's
definitely better tested, better documented, and, importantly, better known,
than whatever you were about to write. This is a pretty basic point, perhaps,
but most of them are.

Special case: use standard-library data-structures. If you're implementing
your own hashset over a raw array, rather than using the standard HashSet
class/template, you'd better have a very good reason.

[0] [https://en.wikipedia.org/wiki/Inner-
platform_effect](https://en.wikipedia.org/wiki/Inner-platform_effect)

~~~
kaltherX
Isn't this #9 Prefer internal functions over custom solutions?

~~~
MaxBarraclough
Oops, I missed that! Yes, it's pretty close.

I don't like the look of #10. I'm a believer in single-point-of-return.

------
lampe3
> `3. Simplicity is king.`

Yes, this is true.

But for me, people get it the wrong all the time.

If you have an object which you have to check yeah don't check for everything
you could ever imagine. Like I have seen code where the actual check was
longer than the code after.

But if you are looping through a big array and I mean big not your little 1000
index array and you think any loop is good enough and suddenly you have a
reactive system which was the simplest way to code but the thing is over
reactive so your calling your function with a slow loop function all the time
just because was simpler to read or to implement your just doing more harm
with that rule than actual benefit.

This is also the reason why people use `.map()` in javascript to loop through
stuff without ever knowing what map is actually for.

~~~
lloeki
May I recommend watching "Simplicity is complicated" talk by Rob Pike:

[https://www.youtube.com/watch?v=rFejpH_tAHM](https://www.youtube.com/watch?v=rFejpH_tAHM)

[https://talks.golang.org/2015/simplicity-is-
complicated.slid...](https://talks.golang.org/2015/simplicity-is-
complicated.slide#1)

~~~
lampe3
Thanks, I will.

------
oldandtired
Number 3 has some merit. But. Number 3 is also the cause of much grief. I have
had to deal with the consequences of "gun/guru" developers doing 3 because it
is easy, without them considering the effects on those who use the systems in
question.

It is our responsibility to look at what is needed and what complexities, if
any, are required to achieve the results. Of course, the opportunities to
actually talk with the end-user of any software system is rarely available in
the normal course of development, especially for junior programmers. It would
be a useful tactic to have all programmers have to deal with the end-users and
actually face their complaints and have to learn what the code is doing to
those end-users.

------
kelnage
Urgh, I was nodding away until I reached guide 16. Although the idea it
expresses is fine, their example of parsing CSV files yourself is a terrible
one, especially since they link to a previous article where they anticipate
some issues that can be encountered when parsing CSVs, but ignore the legion
of others (different separators, quotation characters etc).

If you’re going to parse a CSV file, please, just use a standard library for
your language - please don’t just split on commas and call it done. The only
case where it might be acceptable to do it yourself is if you have complete
control of the input - but that would to me imply a static data that could be
added to the code base instead of a CSV file.

------
Sir_Cmpwn
I want to expand on #2 - the right tool for the job might not be a flashy new
library/framework/tool, but it might be one you don't already know. You
shouldn't pick new technology _because_ it's new, but you should have a
surface-level understanding of what lots of technology does so you might be
able to connect the dots with your use-case and explore something foreign in
depth. Equally risky to using flashy tools is over-reliance on the old tools
you know, which may not be best suited to the task.

------
esdkl22
I have a question about point #12. Are most people in agreement that the if
else loop is preferrable over a ternary operator?

One of my worst professional programming experiences was having a senior
developer who took apart usage of map, filter, reduce, and replaced them with
if else loops, and sometimes breaking things in the process.

The idea that you would one day have a junior who would fail to understand a
relatively simple concept seems like fantasy. Let them get stumped once, and
then teach!

~~~
PunchTornado
no, but it all comes to the complexity of the operation.

Like the article said, something like this is better than an if else loop:
$variable == $x ? $y : $z;

But if you write stuff like this:

$variable == $x ? ($x == $y ? array_merge($x, $y, $z) : $x) : $y;

I am going to be annoyed. In the second case, please use if else.

~~~
esdkl22
You're correct. I didn't look closely enough at the finalized piece of code. I
misinterpreted and thought the offer was unsatisfied with the original,
appropriate use for a ternary operator ($variable == $x ? $y : $z)

------
ummonk
“Do not duplicate your code” rarely improves readability, unless the part that
you dedupe is a simple semantic component of its own and not closely
intertwined with the nonduplicated code it is embedded in. More often, it can
lead to the hidden trap of abstractions mentioned in the OP’a post.

The primary reason to avoid code duplication is to improve maintainability and
refactorability, not to improve readability.

------
hikarudo
I highly recommend "The Art of Readable Code: Simple and Practical Techniques
for Writing Better Code", by Dustin Boswell and Trevor Foucher. It's an easy
read, and well explained with good examples.

------
dorfsmay
I'd add, beside obvious docstrings, comment to explain why the code does
something when it's not obvious (say a corner case).

------
Walkman
Point 11. is called " return early".

------
enriquto
Disagree strongly with the comment about naming : "Names of variables and
functions should be distinct and provide a general idea of what they do".

The clearest code that I have read used consistently single-letter variable
names. If a variable name holds a meaning, you are then repeating this
information each time you use the variable. It is better to name the variable
"a", and put a comment upon its declaration. This is the convention used in
mathematics and physics, and it works really well.

~~~
ab71e5
Maybe some of them, like in tight loops. But the most difficult code to
understand for me is Matlab scripts written by physicists, naming variables
a,b,c or i,ii,iii etc. Variables with longer lifespan should have more
explicit names in my opinion.

~~~
enriquto
Of course! I am talking about the names of local variables, not the names of
externally visible functions or global variables.

~~~
alecbenzer
There's a spectrum of how 'local' a variable is though. There's a big
difference between a local variable whose uses span ~5 lines vs. ~30 lines.
You're making a trade-off between repeating information (what the variable is)
and forcing the reader to keep in their own (human) memory what each variable
is for (or else forcing them to constantly go back and re-discover what the
variable is for).

Go sometimes gets a lot of flak for its style here (partially because it's
sometimes misapplied) but I think it works pretty well: it's fine to use a
short non-descript variable name like `a` if its uses only span a few (~5-10)
lines (or other certain special cases like `i`, `j` for loop indexes),
otherwise use longer descriptive names.

~~~
enriquto
This sounds more a problem about your total number of local variables than
their names. If you have too many variables, using longer names is not going
to help you as much as organizing your code differently.

