The most common form of unreadability I see is overly abstracted code in Java, littered with patterns that obscure the kernel of logic, forcing you to do global code searches in order to see through all the indirections.
I'll go even further and say this is essentially the only form of readability I care about. The article talks about things like "unnecessary one-liners" with the ternary operator as an example.
I don't care at all about one-liners, whether or not they're immediately understandable. If I'm trying to understand how to find a fix a bug or add a feature, then if, as the author did, assume that the obvious bad code things like meaningless variable names and spaghetti control flow aren't there, then the only hard problem is "where in the code does this thing happen". Make that problem easy to solve, and I can spare the 10 seconds it takes to understand a ternary operator. Hell, I can spare the 10 minutes it takes to understand some weird nested list comprehension. Stick a comment telling me what you're doing, and I may not even need to bother.
The same is true with low-level optimizations. If you need to do something clever, then comment it, and we're fine. Obviously one shouldn't go overboard, but I don't see anything wrong with the kind of premature optimization in which the programmer sits down and immediately knows of a fast way and a slow way to do something and chooses the fast way (within reason, of course, and that's a subjective judgment, but it's one I'm OK with).
Basically, make your code easy to approach. Make it as easy as possible for me to go from, "I need to change what happens when the user does X" to finding the code that governs that behavior. If all that code is in one tidy little location, then aside from the obvious need for good variable names, sensible whitespace, etc., write it however you like. I'll figure it out.
I remember having to work on some J2EE based systems 10 years ago or so that were nothing but a vast (many many thousands of classes) collections of abstractions - actually finding the code that actually did anything was quite an exciting adventure. I swear one system had about 25-30 layers of stuff between Struts Actions and the underlying call to actually drop a message on a queue. Not helped by the need for whoever designed the system to use every single feature of J2EE even if really wasn't required.... CV driven development.
I couldn't agree more. I came into development as an intern and it took me months to loosely wrap my head around how the code was organized.
Here I am 4 years later on the same codebase (a Swing-based application), and it is very difficult to hunt down bugs. Fixing them is often easier than finding the place in the code where it's happening, even if you know where it theoretically is.
When by Go to Declaration you get to an interface or an abstract class and have to figure out what implementation there might be used. It gets worse with Dependency Injection,
This seems far-fetched to me and doesn't have much to do with readability anymore. Moreover most of the time this is imo how code should be. If you start programming with readability in mind and as such start throwing away standard design practices because they lead to one more click or less readability, you are doing it wrong. That comes dangerously close to using copy-paste instead of a function because it is clearer since you can see the implementation directly. I'd argue that you should not have to figure out what the implementation is. That's the whole point of using interfaces (and dependency injection for that matter) - if you have to figure out what implementation used then you are either the implementer or the code is wrong somehow, like it is violating substitution/dependency inversion principles. Or you are studying the code in-depth, in which case you have to go through all implementations anyway. Calling all that 'readability problems' is a bridge too far.
When you are searching for a bug, you do need to understand the implementation. Of course by definition something is "wrong". Calling a function is fine by the author since you can easily jump to its definition. But if the actual function call is determined at run-time based on XML configuration, then you first need to figure out how that fits together for the particular case you need to investigate.
OP here.
You both are right. I'm not saying that DI and interfaces are bad. Of course not. I'm just trying to find a balance between modularity, loose coupling and code graph connectivity. Of course it's not just black or white.
In a case like this, you really ought to have a standard set of unit tests that covers the contract of the interface. If you run that against all the implementations and none of the tests fail, the bug just isn't in the implementation itself and you can safely move on to looking for integration issues. Of course, if the unit tests for the interface are incomplete you can't rely on them, but in that case you've built yourself a leaky abstraction that you should go back and look at before doing any more systems testing. Effective dependency injection simply requires comprehensive unit testing.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." by Brian Kernighan
In the 1980s, someone wrote an essay promoting "throw-away" code. Essentially he said every module (a) should be small in size, and (b) should have documentation that describes what the module does.
Someone maintaining the code then has the option of keeping the interface, deleting the code and re-writing it, rather than try to understand the thought process of an overly clever programmer. I've always found this useful when I write code others have to maintain.
I've always thought that what makes code hard to read is the fact that several persons are usually working on it.
Beauty is in the eye of the beholder as we all know. Some developers like snake case, others prefer camel case. Some prefer 4-space indentation, while other can't stand it and much prefer 2 spaces. Some like braces on the same line, some others prefer it on the next line.
What really bothers me is that editors in general force people to share a codebase formatted for a given style that may or may not satisfy everyone.
And the worst is that I don't see why that has to be. Ideally "code" would be some form of abstract syntax tree and the editor would apply a user-generated "stylesheet" on top of it to display the code as the user wants to see it.
Upon save, all the style would go away and only the abstract representation of the code would go to the shared repo.
Not an easy thing to make happen for sure, but that'd be awesome.
If you work in a team you absolutely MUST enforce a coding standard. It better be a community standard most of developers use. All team members must use either tabs or spaces and have IDE configured to automatically apply styles.
I (respectfully) disagree. The editor should take care of adapting the codebase to the standards the programmer likes. Trying to force 20 developers into a "consistant" style only makes room for a soft consensus that doesn't satisfy anyone in the end.
If we can do it for web pages (with CSS) there's no reason why we couldn't be able to do it for a source file.
In any IDE you can set up code formatting options. By pressing a button you are essentially applying this CSS you are talking about. You can write however you want but when saving a file you (either manually or automatically) should execute reformatting in IDE.
I agree with you that having CSS-like option would be better, since code is data, CSS is formatting options. But you'd have to teach all other tools you use to work with this CSS approach. For example for git source files are just text. Git doesn't understand code semantics. For its diff routine tabs changed to spaces == every line of code changed.
There's no much use for a VCS if every commit changed everything.
You're absolutely right. It's about design more than anything else, but at least for the cosmetic details, I feel the current state of affairs is sub-optimal.
By cosmetic, I mean the pettier details like indentation, method names in bold or underlined, return types on the same line or before the function definition, position of braces, snake case versus camel case, etc.
That I think the editor should take care of. For more serious issues like your example of poorly named variables, there's nothing it could do (and is really the core of the problem, as you pointed out).
This is why it's important that you have a coding standard but not so important what choices it actually makes. Just like you have to pick whether to drive on the right or the left.
Golang apparently enforces a visual style. To a certain extent, Python does as well.
"[Scala] is essentially a write-only language (please don't quote me on this!); "
I'm sorry, I quote you on this.
If you ever compared Hadoop codebase (Java) with Apache Spark codebase (Scala) you'd know this is BS - Spark code is order of magnitude more readable and easy to understand compared to Hadoop. Lack of side effects and lack of low-level mutable shared state protected by locks everywhere are two of the many reasons.
Of course, you can write terribly unreadable code in any language, and probably those academic languages with rich functionality and expressiveness make it a little easier to abuse features, or to choose right features for the job. But on the other hand I'd argue they make it easier to write readable code, once you care about readability and learn how to do this. Writing readable code is a skill you have to master. It is not something that comes for free in no time.
I've recently begun digging into Haskell; it has a lot of interesting and novel features. However, I can't get over it's syntax. It often uses non-alphanumeric characters in place of function names. Many functions are poorly named, e.g., `elem` vs Python's `in`. I strongly prefer the more natural English style syntax; even when you know a language well, it reads faster. That being said, Haskell is awesome for numerous other reasons.
500 line functions are the devil, they're harder to read, test, modify, and reason about. They tend to encourage code that isn't DRY, or reusable. The only benefit they have is an almost always insignificant performance boost, at the cost of everything else.
Also magic isn't terrible if it's used in moderation, and it's well documented. Though it usually results in things being substantially harder to GREP.
I think there is a syntactical pain barrier for a lot of languages. However, once you are through that barrier you stop noticing the syntax.
e.g. When I first looked at Python I remember thinking "significant white-space - that is evil". However, I forced myself to write some Python and after a fairly short time I didn't notice that aspect anymore and now I think significant white-space is actually pretty sensible.
I've had similar experiences when starting out with Common Lisp, PostScript and a variety of other things that don't look like a variation on C.
I would agree that Haskell does look a bit weird - but everything looks weird the first time you see it. Indeed, I can remember seeing C for the first time when I was 16 or so (early 80s) and thinking that looked weird compared to the Basic I was used to...
I think line noise operators and single character names are bad in any language. Haskell is among the worst offenders, but C++ isn't much better. At some point this frustrated me so much that I started making my own language. Semantically it was pretty much a dialect of ML, a statically typed, eagerly evaluated language with mostly immutable data, but syntactically it aimed for extreme readability:
1) All keywords and standard functions have full English names, no abbreviations.
2) Keywords are UPPERCASE, types are Capitalized, values and functions are camelCase. Easy to tell the difference at a glance.
3) Mandatory type declarations everywhere. Limited type inference within expressions, but that's it.
4) Only one kind of syntactically significant parentheses, instead of () [] {} <> as in C++ or Java.
5) No two-character operators. Instead of x != y you have to write NOT (x = y). Sorry!
6) Significant whitespace.
The code looked something like this:
GENERIC(Type) FUNCTION identity(value: Type): Type
RETURN value
I have a lot of code written in that dialect, just for my own amusement. It's pretty verbose and boring, but extremely easy to write, impossible to misunderstand, and sticks to memory like nothing else I've ever tried. I still want to finish an implementation someday.
That is a bit close to pascal. I have think in something similar, but extend the idea:
#Naming conventions. ENFORCED BY THE COMPILER by default
#For type/class/interface
Str #Title-case mean public
_Str #private
___Str #protected
#For functions, properties
isPublic #Camel-Case mean public.
_isPrivate #private
___isProtected #protected
PI = 3.1416 #All CAPS means CONSTANT
I wonder how do mutable vs. inmutable. I have think in:
thisIsInmutable
thisIsMutable*
Because if is only a declaration point (using for example let vs var) but not the rest of the use of the name, you lost the ability to know if the thing mutate or not. The other idea is doing this only for functions:
fun List.Add(self*, values:Array)
Or like in other languages
fun List.Add!(values:Array)
The other idea (derived from the fact the Lexer/AST give far more info than the text) is color-mark the vars and differentiate by color and/or font style between mut/inmutable, functions, types, vars, etc
It's probably not very relevant, but I think you could get some inspiration from Avail http://www.availlang.org/ and maybe REBOL or Red www.red-lang.org
Either way, good luck and keep working on it, something like this is needed IMHO (also, Go is quite nice on that front, of course) :)
I agree for the most part; with persistence it can be overcome. But why make anything harder to learn than it has to be? `long_descriptive_variable_name` gives you a lot more useful information than `:++-`. What's the benefit of making a language look like runes? For this reason alone, I find LISP to be easier to read.
I did terrible things with reader macros back in my Common Lisp days so I probably can't comment on how readable my code was to other people.
Certainly the people I acquired the reader macro abuse habit from produced some files that when I first opened them I didn't think they were Lisp at all.... :-)
NB I'm glad to have got those tendencies out of my system in academia where it did relatively little harm to others.
> I think there is a syntactical pain barrier for a lot of languages. However, once you are through that barrier you stop noticing the syntax.
No, not always. Some languages are intrinsically worse with terseness than others; for example: APL, Perl, and Haskell. The association fallacy is so useful :)
Not sure I would agree about Perl - personally I found it pretty easy to pick up. That was Perl 4 though.
I don't think terseness is always a problem in the same way that verbosity isn't always a feature - look at COBOL or the various graphical "programming" systems (oddly popular in the world of integration products).
Haskell is a bit of an odd one out there. Its syntax is extremely simple and the major source of peculiar symbols is various functions, which are much easier to reason about than syntax.
I believe APL has a simpler syntax than Haskell (if we just look at grammar complexity): all of its complexity is in the various functions it supports.
APL's syntax is only simple if you consider parsing as a language acceptance check. Getting the actual syntax tree requires knowing what will be assigned to what names during run time.
This is a great write up and should be necessary read for every novice (and not so novice) programmer.
That said, I would compose list "What makes code hard to understand?" differently:
1) improper naming of functions / variables (if you can't name it properly, you can't develop it)
2) excessive documentation where not needed (document only non-obvious stuff)
3) *complex* one-liners (because simple one-liners actually help readability)
4) and 5) can stay. :)
The main point is valid though. I would phrase it like this: optimize for correctness first, readability close second, and don't optimize for performance until you can identify the significant gain (always by measuring it!).
TFA quotes Premature optimization — everything one tries to optimize before profiling and running tests on a working system.
Perhaps that same thinking needs to apply to readability. Premature readability—everything one tries to improve readability before running tests.
Readability seems a lot like pornography[0]; people claim to know it when they see it (or, in this case, when they don't) but are hard-pressed to offer a well-defined or objective description.
What we get instead are well-meaning but subjective prescriptions.
Shouldn't the alleged gains of readability practices be held to same measuring requirement as optimization?
Are there solid, objective studies on what makes code more readable? In a small group it might be best to have a discussion about coding practises to achieve some consensus, but if that's not feasible then some guidelines based on research would help.
I don't mind improper naming because all the functions are essentially just nodes in code graph. Of course it helps if you can figure out what they do just from their names, but you have to read code anyway.
Same for excessive docs. Sometimes you can find huge walls of text before every function, or meaningless javadoc comments which say that function GetData "returns data". But those are usually needed for 100% docs coverage which might be enforced by corporate policy.
> Of course it helps if you can figure out what they do just from their names, but you have to read code anyway.
This is, I think, the root of the (very minor) disagreement I have with your article.
I really don't want to have to read the code anyway. There may be 250,000 lines of code in a moderately sized system. If I have to read a significant fraction of them to figure out what's going on, that's the biggest sin you as the original developer can commit by a couple of orders of magnitude.
I want it to be obvious where to look for the code that controls whatever it is I want to change. If you make it easy for me to find those 50 lines of code, I don't care if you write them in assembler. 50 lines of anything are vastly easier to deal with than 250,000 (or god help you million) lines of the absolute best thing you can write.
Bad names are bad precisely because they force you to read a bunch of irrelevant code to have any hope of understanding what the code is doing. It's the same problem that over-abstracted code presents. In both cases, there's no forest I can look at -- just millions of individual trees.
In a weird way though, bad names are actually better than many real-world problems. Once I find that code, I can change the name and the problem goes away. If you have a 100 line function full of one-letter variables, that's probably terrible. But I can spend a week poring over that function, figure out what they all do, and change the names. From that point, the function is fixed.
I have no hope of fixing the problem with a big mess of fetishized design patterns. The confusion in many of these systems is baked into the core of the architecture in the form of a dozen layers of abstraction and indirection between me and figuring out the behavior of the system. Usually the only way to fix that is "rm -rf" and starting over from scratch.
In a study that I did I found that bad naming makes people approximatly 15% slower. The comparison was between single name identifiers, abbreviations and "normal" words.
Have a look, you might have to translate it though...
Making use of "chunks" makes code better to read. A function is essentially a proper chunk (whereas a comment is a bad way to do the same thing). Depending on the paradigm, I usually enjoy code that has tiny units, but many of those - i.e. a function with 100 lines is harder to read than 100 functions with one line each. Ideally each name adds proper meaning to the code so that the intent becomes clear. Code like this is usually easier to change, reuse and test (tdd).
This can go the wrong way - many people mentioned how too many abstraction layers in big java projects are a pain, and I agree, as they often make the code more meaningless and abstract, rather than concrete. Oh yeah and pattern names do the same thing.
> i.e. a function with 100 lines is harder to read than 100 functions with one line each.
I understand where you come from, but I disagree with that particular example. 100-lines long function is harder to read than nessesary, but can be understood without problems.
100 functions doing the same work will be a nightmare to keep in head at once, at least for me. If you refactor one function into 100 you're not doing it right.
In this case (of 1 function split into 100) each of them is probably used exactly once. The effect on readability, especially if they are defined in an order of execution, is similar to placing a comment (function name and args and return value) next to every single line in the 100 line function. No matter if the comment was actually needed or not there. So I agree, it will probably result in worse readability than keeping the function as one, but commenting sections of it and extracting only repeated patterns into separate functions.
The biggest problem are relations between the functions.
In one function control flow is so simple we don't think about it. It's sequence. In 100 functions either each call the next one (and even if the 100 functions are placed in calling order I cannot assume that when reading the code - I must do 100 ctrl+clicks to see the control flow), or there's another function calling all of them (but what's the point - we still have 100-line function that way), or there's some more complicated calling sequence and I have to draw it somewhere to understand it (sorry, can't keep 100 things in my head at once).
To understand the code I have to follow the control flow through these 100 function. I don't think any 100 lines of code in one function can be harder to read than that.
And this is where meaning is relevant. Good naming is important to convey the level of abstraction. What you describe is as if you were to compare the functioning of, say, the liver, by looking at its cells. Proper naming conveys what you are looking at and you can chunk things together. If all the functions are named at the same level of abstraction, of course the code will be much worse to read.
If all the names are just a and b, well, of course it is harder to follow the flow of a program because the use and meaning of the functions has to be infered by their definition. Good names help identify paths of flow, but I rarely see this separation in a clear way, resulting in the problen you describe.
Even with perfect naming you have to jump down to the bottom level to see the details. And details are the only thing that matters in the end.
You can't put all the details in the name of the main function (otherways just change the name of that 100-lines-long function and keep it - if it can be named perfectly (and shortly) without skipping any details - it's OK as it is, no matter the size).
I'm afraid you're talking about procedures returning values rather than functions. If so, then agreed.
But 100 (pure) functions might be much easier to understand than a 100-line function with many loops, branches, jumps and complex internal state mutations. Assuming no recursive or mutually-recursive calls, you can convert 100 one-line functions to a flat function (no nesting, no loops) with 100 expressions.
Another difference in readability is in order - with 100 functions the order of evaluation does not matter, and with a 100 loc imperative procedure, the order of evaluation does matter and is additional thing to keep in mind.
That's a big assumption. I don't think I've ever seen 100-lines long function with no dependencies between the 100 lines of code... Maybe in a constructor which calls 100 setters? Which is another great example why turning long functions into sea of one-liners isn't always a good thing.
I'm using "Optimize for readability first" from now on instead of the usual. I'm not sure if I'd call that 'optimization', but it definitely gets the point across a lot better, especially to newer developers.
I'm getting closer & closer to make this statement a rule:
"Code need to be that good - no one in a team can write any better"
If you, knowing current requirements, wrote a perfect code - it will
still become messy at some point. Feature by feature new things will
appear here and there. Functions size will grow, this CASE statement
will grow etc. But still the "core" of it will be strong.
If you, knowing current requirements, deliberately wrote a "good
enough" code, what will happen with next change? The code won't
survive it and will be deleted(at best case). Therefore, bad code is
completely useless from the start.
An additional point that solves some of this - learn the difference between declaration and implementation. Think, why you care about the declaration? Why would you generally want to navigate to the declaration instead of the implementation.
The shortcut keys are similar, but just a little bit extra to navigate to the implementation over the declaration. Ctrl+F12 instead of F12 in Visual Studio, Ctrl+Alt+Click instead of Ctrl+Click with Resharper. Even easier, your IDE may support changing the default keybindings.
Of course, once you implement an editor which does a good job of keeping things aligned even when using proportional fonts, I'll switch immediately. It's not like I didn't try - default font in Pharo is proportional and I tried working with it for a few weeks. Smalltalk syntax is so clean and minimal that I thought it would be ok, but even there I had to change the font to monospace - in the (few, like populating a dict, writing to a stream, things like this) instances where I wanted the code to be aligned I couldn't do this and it was too irritating for me.
Notice how the first "put" is not aligned with the second one here. It looks aligned in Pharo though, when displayed with default font. This alignment may fall fall apart if I so much as change a font. This is bad, and that's why until the editor is able to preserve my alignment across different fonts I'm going to use monospaced ones.
Sounds like that formatting scheme would lead to changes on lines unrelated to the change, e.g. if I renamed symbol2 to sym2, the spacing on #symbol will be reduced. To me this sacrifices diff readability for code readability. It's generally not a tradeoff I like to make, but I can see how it is appealing.
This is not an issue for me, because - which is strange - the diff viewers I use seem to be more advanced than my editors, in that the diff viewers can ignore unimportant whitespace changes while editors can't handle alignment properly :-) But that is a trade-off, of course.
Why would you want your editor to preserve your alignment across different fonts? Or to be more precise, why does an editor need to bother with multiple fonts. It's not Word or Latex, the code only needs to look good and read well in one font, and that font is most definitely not a mono space one.
You mean that all the programmers should use one and the same font across all their editors? I'm sorry, I don't understand. Is my code meant to be read only by me? What do you mean by that?
Just as I said, the code has to look good in one font, not N. When I read a paper written by someone else, the first thing I don't do is casually change its font. We invent so many unreasonable reqs to justify the continued used of archaic type writes fonts, sad.
I'm sorry, I'm really trying to understand your point but I really see nothing unreasonable in wanting to view someone's code displayed with my preferred font without changing the structure of said code.
I'm a bit confused by some of the replies - you're talking about proportional fonts for a text editor to view fonts. Using a monospace font makes this easy; using a proportional font makes this very hard.
How would an editor know what to align? It's not part of syntax highlighting - unless you define your own rules.
The fact that this is not available in Vim nor emacs (I haven't checked; maybe it is) would show that it really is a rare want.
a) use proportional font in my editor
b) be able to align things horizontally
c) ensure that this alignment is preserved when someone opens my code with another editor or another font
Presently it's impossible to get all three, so I'm going with b) and c). seanmcdirmid seems to want only a) and b), but without c).
As for solution, sean himself pasted something interesting:
I wasn't aware of that, but it seems interesting, but I didn't look into that much yet. Also, while it seems to cover half of c) (with different font) it will be probably editor dependent (EDIT: but then again, every other solution would be too).
d) let me use any editor; e.g. emacs, vim, or notepad
None of these editors know alignment outside of primitive spacing and tabbing, so therefore archaic monospace fonts win by default; QED.
I'm claiming that not all of these requirements are necessary. We can ensure alignment if the font is fixed for code, or we could ensure alignment in the editor. Neither of these are crazy relaxations.
The problem with that is that it doesn't work when you have code (and comments) formatted in neat columns. Since plain text doesn't have any portable way to handle tabstops, everything has to be padded out (after the initial indent) with extra spaces. Characters must have equal widths or nothing lines up properly.
This is pretty common in long repetitious declaration blocks, at least when the code is written by someone that cares about formatting. It vastly improves readability. Example:
int some_variable = 100; // Foo
float another_variable = 1000.0; // Bar
The gofmt tool actually does this kind of column formatting automatically.
Kind of strikes me as something the editor could do ad-hoc. Maybe the tab character could return to let programmers tell the editor to align the adjoining lines according to the tabs. Would work well at least in your example.
Yeah this is nice. All we need now is for this to be supported in all major text editors and we're set!
The only possible issue (not sure how it's handled, can't remember if/how gofmt handles it either) is when sometimes you actually want something to be misaligned because the width is so far out of the norm, e.g:
int const1 = 10; // There's lots of space here for a long comment
int const2 = 5; // Here too, unless the tabs get moved too far right
float const3 = ( const1 * 123.45 ) / MAGIC_CONSTANT; // Not here.
Normally when this happens I try to regroup the code so that "plain" constants are together and long formulas are in another group, but sometimes the order is important for some reason or doesn't make logical sense to regroup items, so the formatting has to suffer a bit.
As long as you rely on the tab character to signal the programmer's intent on inserting an elastic tabstop, that's not an issue. In this case, insert a \t in front of the comments on the first two, but not the third line. All three lines could have \t's in front of the equals character so align on it.
The other nice thing is that it degraces quite gracefully in non-supporting editors.