
Optimize for readability first. What slows down developers when reading code - valyard
http://va.lent.in/optimize-for-readability-first/
======
barrkel
The most common form of unreadability I see is overly abstracted code in Java,
littered with patterns that obscure the kernel of logic, forcing you to do
global code searches in order to see through all the indirections.

~~~
deong
I'll go even further and say this is essentially the only form of readability
I care about. The article talks about things like "unnecessary one-liners"
with the ternary operator as an example.

I don't care at all about one-liners, whether or not they're immediately
understandable. If I'm trying to understand how to find a fix a bug or add a
feature, then if, as the author did, assume that the obvious bad code things
like meaningless variable names and spaghetti control flow aren't there, then
the only hard problem is "where in the code does this thing happen". Make that
problem easy to solve, and I can spare the 10 seconds it takes to understand a
ternary operator. Hell, I can spare the 10 minutes it takes to understand some
weird nested list comprehension. Stick a comment telling me what you're doing,
and I may not even need to bother.

The same is true with low-level optimizations. If you need to do something
clever, then comment it, and we're fine. Obviously one shouldn't go overboard,
but I don't see anything wrong with the kind of premature optimization in
which the programmer sits down and immediately knows of a fast way and a slow
way to do something and chooses the fast way (within reason, of course, and
that's a subjective judgment, but it's one I'm OK with).

Basically, make your code easy to approach. Make it as easy as possible for me
to go from, "I need to change what happens when the user does X" to finding
the code that governs that behavior. If all that code is in one tidy little
location, then aside from the obvious need for good variable names, sensible
whitespace, etc., write it however you like. I'll figure it out.

~~~
pkolaczk
Exactly! There is nothing wrong in complex code as long as it is isolated from
the rest of the code and well documented in terms of its API.

------
stinos
_When by Go to Declaration you get to an interface or an abstract class and
have to figure out what implementation there might be used. It gets worse with
Dependency Injection,_

This seems far-fetched to me and doesn't have much to do with readability
anymore. Moreover most of the time this is imo how code _should_ be. If you
start programming with readability in mind and as such start throwing away
standard design practices because they lead to one more click or less
readability, you are doing it wrong. That comes dangerously close to using
copy-paste instead of a function because it is clearer since you can see the
implementation directly. I'd argue that you should not have to figure out what
the implementation is. That's the whole point of using interfaces (and
dependency injection for that matter) - if you have to figure out what
implementation used then you are either the implementer or the code is wrong
somehow, like it is violating substitution/dependency inversion principles. Or
you are studying the code in-depth, in which case you have to go through all
implementations anyway. Calling all that 'readability problems' is a bridge
too far.

~~~
jeremyjh
When you are searching for a bug, you do need to understand the
implementation. Of course by definition something is "wrong". Calling a
function is fine by the author since you can easily jump to its definition.
But if the actual function call is determined at run-time based on XML
configuration, then you first need to figure out how that fits together for
the particular case you need to investigate.

~~~
valyard
OP here. You both are right. I'm not saying that DI and interfaces are bad. Of
course not. I'm just trying to find a balance between modularity, loose
coupling and code graph connectivity. Of course it's not just black or white.

------
gr3yh47
"Debugging is twice as hard as writing the code in the first place. Therefore,
if you write the code as cleverly as possible, you are, by definition, not
smart enough to debug it." by Brian Kernighan

this is one of my favorite quotes ever

------
givan
“Programs must be written for people to read, and only incidentally for
machines to execute. ” - Abelson / Sussman

------
grymoire1
In the 1980s, someone wrote an essay promoting "throw-away" code. Essentially
he said every module (a) should be small in size, and (b) should have
documentation that describes what the module does.

Someone maintaining the code then has the option of keeping the interface,
deleting the code and re-writing it, rather than try to understand the thought
process of an overly clever programmer. I've always found this useful when I
write code others have to maintain.

------
VeejayRampay
I've always thought that what makes code hard to read is the fact that several
persons are usually working on it. Beauty is in the eye of the beholder as we
all know. Some developers like snake case, others prefer camel case. Some
prefer 4-space indentation, while other can't stand it and much prefer 2
spaces. Some like braces on the same line, some others prefer it on the next
line.

What really bothers me is that editors in general force people to share a
codebase formatted for a given style that may or may not satisfy everyone.

And the worst is that I don't see why that has to be. Ideally "code" would be
some form of abstract syntax tree and the editor would apply a user-generated
"stylesheet" on top of it to display the code as the user wants to see it.

Upon save, all the style would go away and only the abstract representation of
the code would go to the shared repo.

Not an easy thing to make happen for sure, but that'd be awesome.

~~~
valyard
If you work in a team you absolutely MUST enforce a coding standard. It better
be a community standard most of developers use. All team members must use
either tabs or spaces and have IDE configured to automatically apply styles.

It will save A LOT of time later.

~~~
VeejayRampay
I (respectfully) disagree. The editor should take care of adapting the
codebase to the standards the programmer likes. Trying to force 20 developers
into a "consistant" style only makes room for a soft consensus that doesn't
satisfy anyone in the end.

If we can do it for web pages (with CSS) there's no reason why we couldn't be
able to do it for a source file.

~~~
valyard
In any IDE you can set up code formatting options. By pressing a button you
are essentially applying this CSS you are talking about. You can write however
you want but when saving a file you (either manually or automatically) should
execute reformatting in IDE.

I agree with you that having CSS-like option would be better, since code is
data, CSS is formatting options. But you'd have to teach all other tools you
use to work with this CSS approach. For example for git source files are just
text. Git doesn't understand code semantics. For its diff routine tabs changed
to spaces == every line of code changed.

There's no much use for a VCS if every commit changed everything.

------
FollowSteph3
I'm sure it's been said many times before but the book Code Conplete 2 is a
great follow up read to this article.

------
pkolaczk
"[Scala] is essentially a write-only language (please don't quote me on
this!); "

I'm sorry, I quote you on this.

If you ever compared Hadoop codebase (Java) with Apache Spark codebase (Scala)
you'd know this is BS - Spark code is order of magnitude more readable and
easy to understand compared to Hadoop. Lack of side effects and lack of low-
level mutable shared state protected by locks everywhere are two of the many
reasons.

Of course, you can write terribly unreadable code in any language, and
probably those academic languages with rich functionality and expressiveness
make it a little easier to abuse features, or to choose right features for the
job. But on the other hand I'd argue they make it easier to write readable
code, once you care about readability and learn how to do this. Writing
readable code is a skill you have to master. It is not something that comes
for free in no time.

------
rectangletangle
I've recently begun digging into Haskell; it has a lot of interesting and
novel features. However, I can't get over it's syntax. It often uses non-
alphanumeric characters in place of function names. Many functions are poorly
named, e.g., `elem` vs Python's `in`. I strongly prefer the more natural
English style syntax; even when you know a language well, it reads faster.
That being said, Haskell is awesome for numerous other reasons.

500 line functions are the devil, they're harder to read, test, modify, and
reason about. They tend to encourage code that isn't DRY, or reusable. The
only benefit they have is an _almost_ always insignificant performance boost,
at the cost of everything else.

Also magic isn't terrible if it's used in moderation, and it's well
documented. Though it usually results in things being substantially harder to
GREP.

~~~
arethuza
"I can't get over it's syntax"

I think there is a syntactical pain barrier for a lot of languages. However,
once you are through that barrier you stop noticing the syntax.

e.g. When I first looked at Python I remember thinking "significant white-
space - that is _evil_ ". However, I forced myself to write some Python and
after a fairly short time I didn't notice that aspect anymore and now I think
significant white-space is actually pretty sensible.

I've had similar experiences when starting out with Common Lisp, PostScript
and a variety of other things that don't look like a variation on C.

I would agree that Haskell does look a bit weird - but _everything_ looks
weird the first time you see it. Indeed, I can remember seeing C for the first
time when I was 16 or so (early 80s) and thinking that looked weird compared
to the Basic I was used to...

~~~
cousin_it
I think line noise operators and single character names are bad in any
language. Haskell is among the worst offenders, but C++ isn't much better. At
some point this frustrated me so much that I started making my own language.
Semantically it was pretty much a dialect of ML, a statically typed, eagerly
evaluated language with mostly immutable data, but syntactically it aimed for
extreme readability:

1) All keywords and standard functions have full English names, no
abbreviations.

2) Keywords are UPPERCASE, types are Capitalized, values and functions are
camelCase. Easy to tell the difference at a glance.

3) Mandatory type declarations everywhere. Limited type inference within
expressions, but that's it.

4) Only one kind of syntactically significant parentheses, instead of () [] {}
<> as in C++ or Java.

5) No two-character operators. Instead of x != y you have to write NOT (x =
y). Sorry!

6) Significant whitespace.

The code looked something like this:

    
    
        GENERIC(Type) FUNCTION identity(value: Type): Type
          RETURN value
    

I have a lot of code written in that dialect, just for my own amusement. It's
pretty verbose and boring, but extremely easy to write, impossible to
misunderstand, and sticks to memory like nothing else I've ever tried. I still
want to finish an implementation someday.

~~~
mamcx
That is a bit close to pascal. I have think in something similar, but extend
the idea:

#Naming conventions. ENFORCED BY THE COMPILER by default

#For type/class/interface

Str #Title-case mean public

_Str #private

___Str #protected

#For functions, properties

isPublic #Camel-Case mean public.

_isPrivate #private

___isProtected #protected

PI = 3.1416 #All CAPS means CONSTANT

I wonder how do mutable vs. inmutable. I have think in:

thisIsInmutable

thisIsMutable*

Because if is only a declaration point (using for example let vs var) but not
the rest of the use of the name, you lost the ability to know if the thing
mutate or not. The other idea is doing this only for functions:

fun List.Add(self*, values:Array)

Or like in other languages

fun List.Add!(values:Array)

The other idea (derived from the fact the Lexer/AST give far more info than
the text) is color-mark the vars and differentiate by color and/or font style
between mut/inmutable, functions, types, vars, etc

------
annnnd
This is a great write up and should be necessary read for every novice (and
not so novice) programmer.

That said, I would compose list "What makes code hard to understand?"
differently:

    
    
      1) improper naming of functions / variables (if you can't name it properly, you can't develop it)
      2) excessive documentation where not needed (document only non-obvious stuff)
      3) *complex* one-liners (because simple one-liners actually help readability)
      4) and 5) can stay. :)
    

The main point is valid though. I would phrase it like this: optimize for
correctness first, readability close second, and don't optimize for
performance until you can identify the significant gain ( _always_ by
measuring it!).

~~~
valyard
I don't mind improper naming because all the functions are essentially just
nodes in code graph. Of course it helps if you can figure out what they do
just from their names, but you have to read code anyway.

Same for excessive docs. Sometimes you can find huge walls of text before
every function, or meaningless javadoc comments which say that function
GetData "returns data". But those are usually needed for 100% docs coverage
which might be enforced by corporate policy.

~~~
deong
> Of course it helps if you can figure out what they do just from their names,
> but you have to read code anyway.

This is, I think, the root of the (very minor) disagreement I have with your
article.

I really _don 't_ want to have to read the code anyway. There may be 250,000
lines of code in a moderately sized system. If I have to read a significant
fraction of them to figure out what's going on, that's the biggest sin you as
the original developer can commit by a couple of orders of magnitude.

I want it to be obvious where to look for the code that controls whatever it
is I want to change. If you make it easy for me to find those 50 lines of
code, I don't care if you write them in assembler. 50 lines of _anything_ are
vastly easier to deal with than 250,000 (or god help you million) lines of the
absolute best thing you can write.

~~~
valyard
Proper naming helps but you can't 100% trust it.

bool allwaysReturnsTrue() { return false; }

This is of course an extreme example (8

~~~
deong
Bad names are bad precisely _because_ they force you to read a bunch of
irrelevant code to have any hope of understanding what the code is doing. It's
the same problem that over-abstracted code presents. In both cases, there's no
forest I can look at -- just millions of individual trees.

In a weird way though, bad names are actually better than many real-world
problems. Once I find that code, I can change the name and the problem goes
away. If you have a 100 line function full of one-letter variables, that's
probably terrible. But I can spend a week poring over that function, figure
out what they all do, and change the names. From that point, the function is
fixed.

I have no hope of fixing the problem with a big mess of fetishized design
patterns. The confusion in many of these systems is baked into the core of the
architecture in the form of a dozen layers of abstraction and indirection
between me and figuring out the behavior of the system. Usually the only way
to fix that is "rm -rf" and starting over from scratch.

------
cessor
Making use of "chunks" makes code better to read. A function is essentially a
proper chunk (whereas a comment is a bad way to do the same thing). Depending
on the paradigm, I usually enjoy code that has tiny units, but many of those -
i.e. a function with 100 lines is harder to read than 100 functions with one
line each. Ideally each name adds proper meaning to the code so that the
intent becomes clear. Code like this is usually easier to change, reuse and
test (tdd).

This can go the wrong way - many people mentioned how too many abstraction
layers in big java projects are a pain, and I agree, as they often make the
code more meaningless and abstract, rather than concrete. Oh yeah and pattern
names do the same thing.

~~~
ajuc
> i.e. a function with 100 lines is harder to read than 100 functions with one
> line each.

I understand where you come from, but I disagree with that particular example.
100-lines long function is harder to read than nessesary, but can be
understood without problems.

100 functions doing the same work will be a nightmare to keep in head at once,
at least for me. If you refactor one function into 100 you're not doing it
right.

~~~
klibertp
In this case (of 1 function split into 100) each of them is probably used
exactly once. The effect on readability, especially if they are defined in an
order of execution, is similar to placing a comment (function name and args
and return value) next to every single line in the 100 line function. No
matter if the comment was actually needed or not there. So I agree, it will
probably result in worse readability than keeping the function as one, but
commenting sections of it and extracting only repeated patterns into separate
functions.

~~~
ajuc
The biggest problem are relations between the functions.

In one function control flow is so simple we don't think about it. It's
sequence. In 100 functions either each call the next one (and even if the 100
functions are placed in calling order I cannot assume that when reading the
code - I must do 100 ctrl+clicks to see the control flow), or there's another
function calling all of them (but what's the point - we still have 100-line
function that way), or there's some more complicated calling sequence and I
have to draw it somewhere to understand it (sorry, can't keep 100 things in my
head at once).

To understand the code I have to follow the control flow through these 100
function. I don't think any 100 lines of code in one function can be harder to
read than that.

~~~
cessor
And this is where meaning is relevant. Good naming is important to convey the
level of abstraction. What you describe is as if you were to compare the
functioning of, say, the liver, by looking at its cells. Proper naming conveys
what you are looking at and you can chunk things together. If all the
functions are named at the same level of abstraction, of course the code will
be much worse to read.

If all the names are just a and b, well, of course it is harder to follow the
flow of a program because the use and meaning of the functions has to be
infered by their definition. Good names help identify paths of flow, but I
rarely see this separation in a clear way, resulting in the problen you
describe.

~~~
ajuc
Even with perfect naming you have to jump down to the bottom level to see the
details. And details are the only thing that matters in the end.

You can't put all the details in the name of the main function (otherways just
change the name of that 100-lines-long function and keep it - if it can be
named perfectly (and shortly) without skipping any details - it's OK as it is,
no matter the size).

------
chton
I'm using "Optimize for readability first" from now on instead of the usual.
I'm not sure if I'd call that 'optimization', but it definitely gets the point
across a lot better, especially to newer developers.

------
vrybas1
I'm getting closer & closer to make this statement a rule:

"Code need to be that good - no one in a team can write any better"

If you, knowing current requirements, wrote a perfect code - it will still
become messy at some point. Feature by feature new things will appear here and
there. Functions size will grow, this CASE statement will grow etc. But still
the "core" of it will be strong.

If you, knowing current requirements, deliberately wrote a "good enough" code,
what will happen with next change? The code won't survive it and will be
deleted(at best case). Therefore, bad code is completely useless from the
start.

------
joshka
An additional point that solves some of this - learn the difference between
declaration and implementation. Think, why you care about the declaration? Why
would you generally want to navigate to the declaration instead of the
implementation.

The shortcut keys are similar, but just a little bit extra to navigate to the
implementation over the declaration. Ctrl+F12 instead of F12 in Visual Studio,
Ctrl+Alt+Click instead of Ctrl+Click with Resharper. Even easier, your IDE may
support changing the default keybindings.

------
seanmcdirmid
Use a proportional font for code reading! I really can't stand reading archaic
type writer fonts anymore that are only optimized for ascii art.

~~~
robert_tweed
The problem with that is that it doesn't work when you have code (and
comments) formatted in neat columns. Since plain text doesn't have any
portable way to handle tabstops, everything has to be padded out (after the
initial indent) with extra spaces. Characters must have equal widths or
nothing lines up properly.

This is pretty common in long repetitious declaration blocks, at least when
the code is written by someone that cares about formatting. It vastly improves
readability. Example:

    
    
       int   some_variable     = 100;     // Foo
       float another_variable  = 1000.0;  // Bar
    

The gofmt tool actually does this kind of column formatting automatically.

~~~
morsch
Kind of strikes me as something the editor could do ad-hoc. Maybe the tab
character could return to let programmers tell the editor to align the
adjoining lines according to the tabs. Would work well at least in your
example.

~~~
seanmcdirmid
Elastic tab stops:
[http://nickgravgaard.com/elastictabstops/](http://nickgravgaard.com/elastictabstops/)

You could also pad your spaces in the editor so that they align along a global
axis. This does fix your code to a particular editor, however.

~~~
robert_tweed
Yeah this is nice. All we need now is for this to be supported in all major
text editors and we're set!

The only possible issue (not sure how it's handled, can't remember if/how
gofmt handles it either) is when sometimes you actually want something to be
misaligned because the width is so far out of the norm, e.g:

    
    
      int const1   = 10;  // There's lots of space here for a long comment
      int const2   = 5;   // Here too, unless the tabs get moved too far right
      float const3 = ( const1 * 123.45 ) / MAGIC_CONSTANT; // Not here.
    

Normally when this happens I try to regroup the code so that "plain" constants
are together and long formulas are in another group, but sometimes the order
is important for some reason or doesn't make logical sense to regroup items,
so the formatting has to suffer a bit.

~~~
morsch
As long as you rely on the tab character to signal the programmer's intent on
inserting an elastic tabstop, that's not an issue. In this case, insert a \t
in front of the comments on the first two, but not the third line. All three
lines could have \t's in front of the equals character so align on it.

The other nice thing is that it degraces quite gracefully in non-supporting
editors.

