Hacker News new | past | comments | ask | show | jobs | submit login
Simple Ways of Reducing the Cognitive Load in Code (chrismm.com)
281 points by christianmm on June 28, 2016 | hide | past | favorite | 195 comments

"Use names to convey purpose. Don't take advantage of language features to look cool."

I can't say enough about this. Please write code that is easy to read and understand, not the most compact code, and not the most "decorated" code, or "pretty" code or neat because it uses that giant list expression or ridiculous map statement thats an entire paragraph long.

Similarly what bugs me is when I receive a pull request where someone has rewritten a bunch of code to take advantage of new language features just for the hell of it and that did not lead to an increase in clarity.

I guess its in vogue now to add a lot of bloat and complexity and tooling to our code. "Use the simplest possible thing that works." Tell that to the Babel authors with their 40k files...

Using intermediate variables is one of the most underrated tools to make code more understandable. It's the definition of something completely unnecessary from a technical standpoint that is all about conveying meaning and clarity to other programmers. And it can be used to help group and "modularize" chunks of code within a routine without necessarily going to the extreme of pulling out a separate subroutine, which can be overkill in some circumstances.

Agreed about using intermediate variables.

What's better than comments to describe what the code does? CODE that describes what the code does. (Let the code describe WHAT the code does, and if necessary, the comments describe WHY the code does it like that.)

In C++, if I use an intermediate variable to decompose a complicated expression into easier-to-understand sub-expressions, I like to make the intermediate variable `const` to emphasize that it's an intermediate component.

beautiful has it's value too. I prefer a code that creates paragraphs using an extra line break. A single comment can explain what each paragraph means. Reading (simply) a file is "open it, use it, close it", in my code it's three paragraph of code.

I hear this a lot, often in the for "Good code doesn't need comments", but I'm more than a little skeptical of this view.

I need actual examples to get on board, and not just the usual "idiot" programmer strawmen - actual examples that aren't obviously unreasonable.

My main reason for skepticism - Full, proper English sentences are capable of a lot more nuance, and precise semantics than whatever the programming-language syntax might support. I agree code can be written with more clarity, but it cannot substitute actual text, i.e. Code is not documentation.

I agree, intermediate variables are a highly underrated technique that can make code more understandable at a glance. What is annoying is that quite a few IDEs flag intermediate values that can be removed as a possible error, which probably discourages quite a few beginners from using them.

I think people underestimate the cost of vertical length. It's less obvious in small examples, but there's a huge difference in readability between a class or function that fits on one page and one that doesn't, so it's well worth making individual lines a bit less readable if it means you need less of them.

The question here is with is "a bit less readable"? Many people with run with that advice and start playing code golf on a production codebase.

> so it's well worth making individual lines a bit less readable if it means you need less of them

What does it mean?

Our teacher of programming told us something like this: if your program has so many lines it cannot fit on one screen, it contains at least one bug, so don't make them, keep them short. I try to make all projects made of small files, where each one can fit on one screen. Great for comprehensibility and productivity (less scrolling). Sometimes, a procedure gets longer than one screen, so what one needs to do is to put as much source as possible on a line, so that the entire thing fits into one screen.

Making individual lines a bit less readable is well worth it if making individual lines a bit less readable means you need less of them. ("Worth it" is a construct I don't really understand the grammar of - I'm a native English speaker)

Intermediate functions :)

I recently got a code review that in several places suggested I switch to the new Java 8 stream API [1]. I just flatly responded that it was far less readable, even if I could condense a half-dozen lines of code down to one. Where I can quickly scan over a foreach loop to get the jist of what it's doing, I have to closely examine each call in the new approach to have any idea what it's doing.

[1] http://www.oracle.com/technetwork/articles/java/ma14-java-se...

Stream based programming is a paradigm that with some training and proper code indentation is much, much faster to read than a nested for loop. Once you get used to it,you can literally fast-scan code written in this style with the confidence that you are not missing anything.

Also, assuming you do not use mutable state, it also has the advantage of being easily parallelizable without any code changes. (As well, as easily combinable, throttlable, samplable etc)

But it _definitely_ has a learning curve (like Calculus) and one needs to invest time in it to get used to this paradigm. Time must be spent in learning all the various operators and how they can be effectively leveraged. Stream-based code is certainly not something you can read off the cuff.

The good news is that this is valuable time investment since all stream/observable libraries across languages have a large set of identical operators and shared API surface - the cross platform applicability is fantastic.

Once you know stream based code in language A, you can automatically read stream-based code in language B, without even knowing the syntax of language B.

I love observable stream libraries like RxJS. It makes async programming so easy! I could never write async JS code effectively with callbacks or even promises. Too much state to keep in my poor head.

We now even have standards like ReactiveX coming out that are attempting to define a cross-language/platform spec for observable flows and standardized patterns on how to create new stream operators, etc.

At risk of a terrible analogy, I consider Stream based code to be something of a mini-language like Regex. (but easier to read). No one who hasn't put in time understanding regex elements will ever follow regex patterns. Ditto for stream based stuff - you need AOT learning, but the rewards are worth it.

"Stream-based code is certainly not something you can read off the cuff."

    List<Integer> transactionsIds = 
                    .filter(t -> t.getType() == Transaction.GROCERY)
I honestly think most programmers fluent in Java 7 programming, can guess this is finding "grocery" type transactions, sorting by "value" transaction property in descending order, extracting the transaction ids, and returning the result as a list. ("map" could be confusing, as this usage is borrowed from functional programming.)

May take a while to learn the performance and memory usage characteristics, and how to correctly write code in this style, but I'm not buying that it is difficult to "read off the cuff."

If you're a .NET programmer, we've been doing that kind of stuff with LINQ for years; it's all over the place.

The only patterns that I still like to keep as old-style loops are constructs that ReSharper transforms into hairy-looking Aggregate() expressions

Err..yes, Java 8's filter, sort, map and collect are the most basic stream operators and you are correct that even folks un-used to this style can determine an understanding. But in code that uses observable stream-based programming you nearly always use more complicated operators like flatMap, merge, skipWhile, takeUntil, combineLatest etc. Without spending time learning these operators and how streams work, one is always going to scratch one's head. (Unless you possess an FP background)

Even in Java 8 plain streams, the Collector is a powerful paradigm for grouping operations with a large amount of variations and it takes (at-least it did for me) some involved time learning how to effectively leverage this in day to day code.

> ("map" could be confusing, as this usage is borrowed from functional programming.)

On the other hand, both Python and Perl have 'map' as a built-in function. There has been spillage out of functional programming for a while.

Isn't the point of stream API a) to be able to iterate over infinite set of values b) to fiddle with the rate of "async" object generation?

I'm not sure I understand the difference between `compose' and stream APIs.

Haven't touched Java in a decade and that is beautiful. I like Ruby's functional idioms and LINQ though.

I agree in general case, but this example

                 .filter(t -> t.getType() == Transaction.GROCERY)
seems to be net improvement to me. It reads like SQL, and eliminates many causes of error (wrong indexing variables, off-by-one, copy-paste error in boilerplate).

Yes it requires learning several new concepts, but in the end it pays off.

BTW I wouldn't switch old code to this style, because there never seems to be time for that. But new code written like that is perfectly OK IMHO.

EDIT: I really should've checked the code more carefully - the initial version had no indexing variables. Still, it reads better and has less boilerplate.

While some people may like to read code that looks like SQL, I've found Java 8 features like this are poorly supported by the debugger, so debugging stuff like this tends to require "horse whispering" or rewriting the logic into something that can be stepped through.

GNU Gremlin is a stream API to traverse graphs.

That reads more like Haskell than like SQL. (Doesn't have any bearing on the rest of your point one way or another.)

I think it reads better with intermediary variables. Granted sometime you go to name an intermediary variable that does two things (just like male_siblings = (2 * (X + Y)) if I have a brillant idea for the variable name.

As a functional programmer (Haskell, Erlang) I can honestly say that after years of FP I think differently about how to describe the intent of my code. It's more difficult for me to understand a foreach loop with mutable state than a foldLeft.

I think it's a familiarity problem. I felt the same at one time, but once I became more familiar with the operations, I found the stream API much easier to scan.

I was asked at work to explain why I prefer the stream library - other developers have lagged a bit on usage.

The first reason is, the method calls are designed to only do so much, and are named after what they're designed to do. Filter is for filtering. Map is for mapping. I can read the first word of each line to get an idea of what it's doing (filter sort map).

Another bigger reason for me is: you know exactly what code is generating those transaction ids. If you are interested in how the transaction ids are generated, it is damn obvious which code you need to look at. And if you aren't interested in how the transaction ids are generated, it is damn obvious which code you can safely ignore.

In comparison, what does a for loop do? It does all sorts of things, oftentimes several different things at once. As such, I leave for loops for more involved processing, and use the stream API for straightforward transformations.

As someone who could very easily be on the other side of that code review (and I'm preeeettty sure I'm not in this case?) I feel obliged to at least try to provide a counterpoint :).

So I agree that enormous blobs of unreadable crap are indeed unreadable, and that regardless of how neat and functional your code is, it can still be complete gibberish to most people. That being said, long chains of streams can be broken out quite nicely by using intermediate names:


  Comparator<Transaction> descendingTransactionsByValue = comparing(Transaction::getValue).reversed();
  Stream<Transaction> groceries = transactions.filter(t -> t.getType() == Transaction.GROCERY);
  Stream<Transaction> sortedGroceries = groceries.sorted(descendingTransactionsByValue);
  Stream<Transaction> transactionids = sortedGroceries.map(Transaction::getId).collect(toList());
vs. the first code block under "Figure 1" in the linked article.

For me at least, it helps keep code from getting too unruly: you only have one 'thing' you can do in a filter, map or sorted call, unlike in a foreach loop where anything can go. So my thesis is this: using streams I can quickly scan over the function/stream names to get the gist of what it's doing, but using foreach loops I need to closely examine each line to have any idea of what's happening :3 (e.g. people love abusing labeled breaks [1] in our codebase, as well as excessively modifying input parameters w/o documentation, so I might be a bit biased against for loops)

[1]: https://stackoverflow.com/questions/14960419/is-using-a-labe...

I find the code in the linked article much more readable than that. You've introduced a huge amount of noise that makes it hard to see what the actual operations being performed are.

I agree to an extent ... this example is pretty contrived, but when you start getting around ten filter/map/groupby operations in, it gets a little difficult to follow what's supposed to be happening. So typically, my first step towards breaking it out into a method is separating out the individual streams like above. As is mentioned in a cousin comment, it also looks a lot nicer with type inferencing, but alas we are stuck with the verbosity of standard Java 8 for now.

This is a place where local variable type inference really comes in handy for cutting down the noise of the type declarations.

  var descendingTransactionsByValue = comparing(Transaction::getValue).reversed();
  var groceries = transactions.filter(t -> t.getType() == Transaction.GROCERY);
  var sortedGroceries = groceries.sorted(descendingTransactionsByValue);
  var transactionids = sortedGroceries.map(Transaction::getId).collect(toList());
is much easier to understand

Oh definitely, although I'll have to wait for it to be added to the JDK [1] in order to use that in anger :/ (outside of lombok)

[1]: http://openjdk.java.net/jeps/286

Look, you're basically just saying you don't want to learn anything new.

"I just flatly responded that it was far less readable, even if I could condense a half-dozen lines of code down to one."

Express the exact same thing, in 1/6 the code, and somehow the shorter code takes longer to read and understand?

You don't even attempt to express why you find the shorter code harder to understand. Or show code written both ways, to argue why one is better than the other. Or anything at all to support your claim.

Based solely on what you're telling us here, I'm siding with your code reviewer on this one.

> Express the exact same thing, in 1/6 the code, and somehow the shorter code takes longer to read and understand?

I think people need to differentiate between code review and code scanning. I look at code completely different when someone is pointing to it than when I'm scanning through a file. A ternary operator isn't tough to read when you're already looking at it. However, scanning through a file I'm going to have a lot more difficulty spotting a `?` than I am the indentation of an if/else block. I'm more than content to take 5 lines instead of 1 to spot it 2 months later when I don't know what I'm looking at.

As long as the one line is less than six times harder to read than any one of the previous lines, changing to the new approach seems like a win?

I suppose to some extent this is a matter of taste, but I've usually found the streams api to be clearer and easier to understand, although perhaps there's a bit of a learning curve. I think the win is especially large in cases where you replace an anonymous inner class with a lambda, imho. At my previous job, we had all gotten quite familiar with Apache's CollectionUtils, but java8 almost obviates the need for that entirely.

hopefully they flatly responded that you're wrong.

more generally, if you can condense 6 lines to 1, that change will almost always improve readability. ofc, you can create golfed examples where this is not the case, but those are outliers. i'd say that if you are arguing in favor of code 6x as long as an equivalent implementation, the burden of proof is on you. and being used to a more verbose syntax because you've used it for years is not a valid defense.

I cannot even fathom why you think that's less readable than someone's mangled foreach loop.

Yay, "fluent APIs" (the Builder design pattern w/ method chaining). Been there, done that.

Actually, I'm a repeat offender. Created a jooq like wrapper for SQL. Ditto for HL7. Ditto for UIs. In fact, I was Builder crazy for a while.

I got over it.

Unreadable to you could mean readable to others.

Sometimes you have to learn different paradigms and ways to express computer program.

Also, there's an unstated reality here that the person who sets the standards is a senior person (in the company, in his career, in the industry) and may just be letting personal judgment and past success - good in its time, but not necessarily timeless - override his judgment about a legitimately new and better way of doing something. This is tech after all: progress is valuable, and objectivity is hard.

Maybe I've been shaped by my history... but I've been writing core code for other devs to extend for 5 - 8 years by now. And hell, the smart code I've written is necessary, sadly, but impossible to maintain. I've written non-blocking, lock-free holy-balls pieces of code, but there are very few people I know who could maintain that kinda code.

The best code I've written is when a junior can hit me up and go "Yo, I've tried to implement this feature here. I've picked up on those patterns over there to get started. I'm not sure about those two parts, can we talk about those a little? I think the rest of the code I've done is good, let's look at that though. That part over there just looks weirds, I'm not certain it'll work right."

Overall, I think smart code not a good thing to do. 90% of your code, or more, should just obviously work because it follows "the pattern".

This isn't new, it has always been thus.

Indeed, and every piece of advice about coding style is sometimes wrong.

"How can a new developer just memorize all that stuff? “Code Complete“, the greatest exponent in this matter, is 960 pages long!"

First... do not memorize, but internalize, understand why they work, and when to apply which one. Use them to solve the problem of your code been read in 6 month by a serial killer that know your address.

Second... 960 pages. If you really want to advance the craft, if you really want to become a better developer, then you don't measure by the number of pages (<sarcasm>what a sacrifice, I have to read</sarcasm>), you measure by the amount of gold advice on the book. 960 pages is a lot of gold.

Third... If you read the whole blog and understood the value in following the Cliff notes to the Cliff notes that this post is, then you should be looking forward to read the 960 pages.

960 pages is a light reading IMO :).

One can hardly find a better time investment than reading a good book on a useful subject. Especially today, when quite often you can easily get access to the best knowledge mankind has on a topic. Books are awesome!

Was about to come here to say this. This stuff was all written down decades ago.

As books go, Code Complete isn't intended to be memorized but to be understood. I would imagine a coder already having an internal model of how to write code and using a book like Code Complete tweak and improve that model.

Also, I read something like Code Complete for the same reason I've read the parent blog. Even though I feel like I know the basic points here, writing good code inevitably is a trade-off and so one more idea of how to make the trade-off is useful.

> Second... 960 pages. If you really want to advance the craft, if you really want to become a better developer, then you don't measure by the number of pages (<sarcasm>what a sacrifice, I have to read</sarcasm>), you measure by the amount of gold advice on the book. 960 pages is a lot of gold.

I think that comment on the page length was just on the volume of stuff that one would have to memorize if they were to memorize (rather than internalize) the knowledge.

His second example to "modularize" a branch condition is not functionally equivalent in _most_ in-use programming languages:

    valid_user = loggedIn() && hasRole(ROLE_ADMIN)
    valid_data = data != null && validate(data)

    if (valid_user && valid_data) …
Is not equivalent to:

    if (loggedIn() && hasRole(ROLE_ADMIN) &&
        data != null && validate(data)) …
His version will always execute `validate(…)` if `data` is not null regardless of whether the user is logged in or has the appropriate role. Not knowing the cost of `validate(…)`, this could be an expensive operation that could be avoided with short-circuiting. It also seems somewhat silly (and I know it's just a contrived example), that a validation function would not also perform the `null` check and leave that up to the caller.

If validate() doesn't have a side effect then the short-circuiting doesn't matter. The micro-optimization of skipping the validation for performance reasons is premature optimization. If the performance optimization is necessary it should be stated more explicitly in the code then just being hidden being a && short circuit

I've always thought this kind of short-circuiting as an implementation detail of the runtime that should not be relied on, a hack from old school C that refuses to go away. I cringe when I see code that relies on it. It is not semantically obvious when a developer intends to use the short-circuiting trick vs when it's just inadvertently there. Of course part of the reason it lives on is because of our awful popular languages that require null checks everywhere but don't provide any syntactic help for it.

Allowing the extra constraint "doesn't have a side effect" is more dangerous than clarity in this case because it increases what a dev needs to know to modify the code, and allows for a bug to easily be introduced if the validate code is modified to have side-effects.

And though a test might be added to check for this, "Was this function run" is easier to determine than "Does this function have side-effects".

That's why all your functions should not have a side effect if at all possible. And if they do it should be stated in the function's name. Maybe something like validateAndLogIt().

And then when stuff is rewritten s.t. once you get there there's always a user logged in and non-null data? Presumably you do away with the temporary variables and just write:

  has_role(ROLE_ADMIN) && validate(data)
If validate(data) needed to be called regardless of has_role, you now have a problem.

I guess the alternative would be:

    if (isValidUser() && isValidData(data)) …
Which would avoid the potentially expensive `validate()` without putting it all on the one line.

Not really, because the code comment above mentioned that one of the primary purposes of this technique is to reduce function overhead.

Perhaps an alternative would be

  valid_user = loggedIn() && hasRole(ROLE_ADMIN)
  if (valid_user) {
    valid_data = data != null && validate(data)
    if (valid_data) {

For whatever reason, I'd prefer comments to this version. Note: I actually agree with the OP about pulling the logic into named conditionals whenever possible, but in the case you do want the short-circuiting behavior I would not bother with the variables at that point.

  if (loggedIn() && hasRole(ROLE_ADMIN)) {
    // User has permission to do this
    if (data != null && validate(data)) {
      // Submitted data is valid

I prefer code over comments

  userHasPermission = (loggedIn() && hasRole(ROLE_ADMIN))
  if (userHasPermission) {
    dataIsValid = (data != null && validate(data))
    if (dataIsValid) {
That's shorter, introduces terms in reading order (readers do not have to wonder what if (loggedIn() && hasRole(ROLE_ADMIN)) means before encountering userHasPermission. Yes, you can write the comment before the if statement, but then, it tends to become longer: "check whether the user has permission to do this") and less likely to become inconsistent when code evolves.

I see such comments as symptoms of time pressure or a somewhat sloppy, but caring, programmer.

That's an interesting trick, but it does create a whole bunch of throwaway booleans. Still, I can see that if the situation gets really complex breaking it down like this can be a benefit (but not so much in the current example where it actually increases cognitive load because you will have to read more lines to figure out what is really going on rather than what the boolean name indicates, for instance, userHasPermission does not fully cover the load and if validate would return false on being sent 'null' for that data element then you could get rid of the flag altogether).

I think part of the reason for doing it the way they do in the article is to reduce nesting. I saw the problem with missing the short circuit as well though which is why I like the suggestion of using

  if (isValidUser() && isValidData(data))

And checking for data being null in the isValidData method. There is a little overhead in the call and return if data is null, but the clarity provided seems like a win in these kinds of cases.

And why not just make it clear that valid_data depends on valid_user?

    valid_user = loggedIn() && hasRole(ROLE_ADMIN)
    valid_data = valid_user && data != null && validate(data)

    if (valid_user && valid_data) …
I think this makes it clear that valid_data has a dependency on valid_user that was being hidden in the previous version, which we're now making clear. It will look a bit weird, but I think that's a positive because it draws attention to the fact that the short-circuiting is required and that double-look coupled with a short comment will make everything much more readable. The previous version does not convey as much meaning, in my opinion.

    if (loggedIn() && hasRole(ROLE_ADMIN) &&
        data != null && validate(data)) …

In C, valid_user and valid_data might just be macros and then the short circuiting would work just fine.

If you want to be really concise you could also remove the precondition that loggedIn returns true before calling hasRole and have it simply return false if loggedIn isn't true.

But it seemed to me that the important bit was two dependent conditionals, not this exact example.

I really like the advice from "Perl Best Practices" to code in paragraphs. Sometimes, a large function cannot be broken up usefully, because a lot of state needs to be shared between the different parts, or because the parts don't have a meaning outside of the very specific algorithm.

In that case, code in paragraphs: Split the function body into multiple steps, put a blank line between these and, most importantly, add a comment at the start of the paragraph that summarizes its purpose.

Now when someone else finds your function, they can just gloss over the paragraph headings to get an idea of the function's overall structure, then drill down into the parts that are relevant for them.

Situations like this are exactly where nested functions can be helpful.

I've always thought that it was a shame that C didn't have them.

Sometimes I almost wish that Algol flavored languages like Pascal anf Modula 2 would have won out for systems programming, instead of C and the languages it inspired.

Actually, GNU C supports nested functions, and a new round if standardization is just starting up, so maybe there's a chance?

In languages where braces define a scope even if there's no if, for, etc. keyword around, you can get a lightweight version of that just by sticking braces around your "paragraphs", as needed. They aren't the same as nested functions, in particular because you can't invoke a naked block multiple times, but if you've got a long function that hasn't got any useful break points in it, but you just want to chunk things, curly braces may help you isolate things nicely.

I tend to consider this a "last resort" over using actual functions, but there are certain classes of functions where this is helpful. The two biggest I know of are functions that represent a state machine, and functions that are the big top-level "plumb all the libraries together" where breaking that up into separate functions significantly complicates code due to all the intricate routing of values you have to do.

> They aren't the same as nested functions, in particular because you can't invoke a naked block multiple times,

Sure you can, just use `goto`! It's fantastic for code reuse. ;p

On a more serious note, gcc and clang both support block functions now. Pretty sure they aren't full closures but they can be handy in these situations.

The thing is, naked scope blocks are missing the main reason I'd use a nested function: the ability to be named.

I do occasionally use scope blocks when I want to constrain the scope of one or more local variables, and there isn't an otherwise appropriate scope already created by a flow control construct.

I also use naked scope blocks sometimes. To me, their main purpose is to make temporary variables go out of scope, to clarify which variables are still intentionally live beyond the block.

> Situations like this are exactly where nested functions can be helpful.

Indeed. When I'm writing Haskell, I'm using a lot of nested functions in `where` blocks, sometimes cascaded down.

Shared state between different parts of the code sounds an awful lot like something that could be a class.

Yes, a Method Object pattern http://c2.com/cgi/wiki?MethodObject

Neat! Now I have to refer to an entirely different file (or a different section of this one, which is almost as bad) in order to figure out what this one function is doing.

maybe both are related.

I never understood people that create a function only to set a flag or a set of flags in another and pass everything else.

Also sometime doing code that looks like this

> > if a: > getting_started(d,e) > if a and c: > maybe_prepare(f,g) > common(d,e,f,g) >

Instead of

> > getting_started(a,b,c,d,e) > maybe_prepare(a,b,c,d,e) > common(a,b,c,d,e) >

Usually I leave branching for the leaf code. It's maybe just me.

I'll second Sindisil's comment about nested functions.

I still often do the paragraph approach, but the thing about that is you rely on the comments to stay up to date to describe that paragraph.

I prefer to instead move that paragraph into a small nested function and name it to reflect what would be in that comment. I feel function names are easier to keep up to date than comments (ie, programmers don't just read past the function names like they do comments).

I completely disagree, every method can be split in private methods. In that way you don't need awful and unhelpful comments in the middle because you can understand what it does simply from the method name.

I've gone back and forth on this one over the years. My current advice would be that if you can find something that is naturally a sub-function, factor it out as one. Keep it private initially, but do not do this if that private function makes absolutely no sense on its own and your public code isn't calling it from more than one site.

If you factor things out into sub-functions that have no semantic meaning on their own then all you are doing is making the code harder to understand. You're also making it harder to maintain because of all the extra state that will need to be passed around between the sub-functions, which may change later.

For large complex functions that cannot be broken up sensibly, the paragraph splitting method is exactly what I use. Such large functions are a code smell and you should think carefully about whether it really does need to be so large, but there are indeed cases where it is the best option. Nobody should get too attached to dogmatic rules like "no function may exceed 25 lines".

The style that I prefer is slightly different to the GP though. I usually put about a paragraph of comments at the top explaining why the function is so long, giving an overview of the algorithm and other information that is inappropriate for a JavaDoc-style comment (since those are for the function consumer rather than a maintainer). I then give each "paragraph" of code a section heading and sometimes number these (especially if I have written out the algorithm in line-numbered pseudocode in the explanatory comment).

Totally agree. While I respect and try to adhere to the organization and naming philosophies of the "Clean Code" approach, I've also been bitten later when going over something that I hadn't worked on in months. I concluded that my brain has a state as well as the program, and comments are for reloading my brain state. Only when my brain is in the correct state do my names make complete sense. And still, unless I had ample time to make sure everything is perfectly coherent (how many times does that happen?) I'm still sometimes left wondering.

A comment that clearly explains "why," and sometimes even "how," is extremely helpful no matter how Hemmingway I think I am in the moment. And I also find chasing little factored-out bits of functions all over the place is tedious- like reading a paragraph of prose where a lot of the true meaning is delegated to footnotes. You're either skipping it, or constantly stopping and looking for the damn footnote. In code, sometimes that means grepping the project to find the damn thing. If it's small and used once, consider leaving it in.

Every method can be split sensibly. Using the functional paradigm it becomes natural to understand it because you usually work in the opposite way using composition of functions rather than dictating what happens in an imperative way. And large, complex functions do increase the cognitive load. I personally abhor regions or sections because most of the time they can go in a separate method and they just break the code-flow with something completely unrelated.

Methods can be split into private methods. But what that sometimes means is, when you're trying to understand how a piece of information flows through a method to fix a bug, instead of it being all self-contained in the method, instead you have to jump around to 10 places in a file to see where the information goes. That severely increases the cognitive overhead.

There is also a cognitive overhead when trying to find local variables defined in huge functions, at least when they are split into smaller private (static) methods (with variables passed as arguments) it's easier to track.

Both in the parent method and in the encapsulated child.

> you can understand what it does simply from the method name

That can be tough sometimes. How do you handle the case where you've created a function just to package some block of code that would otherwise be repeated 40 times? You end up with function names like

or even worse

Or you have the situation where every time you do action A, it usually needs to be followed with action B. Because you want functions to do one thing only you have two functions action_A() and action_B(). But since you are always going to do them in pairs, you end up with a group action_A_and_B() that just calls the two functions sequentially.

I think I've settled on helper functions that are static or in anonymous namespaces (I work in C++) whenever possible.

The first case is a self-explicative one liner, it doesn't need an external function.

    var evenNumbers = Enumerable.Range(0, 40).Where(i => i%2 == 0).Select(i => i);
Private methods must group non-elementary steps. The second case is a perfect example that explains why it is so much better to use a private method because now you have a method that groups both sub-methods and you don't need to call always two different methods in a bunch of places. In this way you decreased code duplication increasing clarity.

    private void ProcessLogs()
        var events = _logProvider.Read();
        events.Where(e => e.IsError).ForEach(SendErrorAlert);
        events.Where(e => e.IsWarning).ForEach(SendWarningAlert);
And the method name seems quite self-explanatory to me.

If you need to separate one method in paragraphs, and add comments that explain what each paragraph does then that code is SCREAMING for a refactoring, deleting all the useless comments and extracting that mess in properly structured Classes/Methods.

I see this all the time in our PRs:

   UtilHelper.processItems(List<object> items) : List<object>

Here's John Carmack's take on the issue:


TL;DR: He's in favor of inlining functions.

Actually, if you are inlining, he says: "you should be made constantly aware of the full horror of what you are doing."

I wouldn't say he's in favor of it, he actually appears to be advocating for a pure FP approach. But if you have to, inlining is OK, with the quoted caveat.

That's literally the first paragraph in the article, keep reading. Also, I think you misunderstood it.

> if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing.

The horror he refers to is not inlining, it's dealing with stateful logic. If you are doing state it's better to be aware of the horror making it explicit by inlining rather than hiding the state changes via passing and receiving in functions.

I'm pretty sure he's advocating for inlining one-off functions.

I agree and disagree with you :). Yes, I agree, he's advocating in-lining one-off functions, and I can see both points of view from that.

Having said that, how often do you have a large method where it isn't modifying a lot of state? I guess it depends on the context like a lot of things, but it seems like that's what is happening in the code I deal with (large methods changing state as well as having a lot of dependencies).

But the "article" isn't an article exactly, it's a commentary of an email that's 7 years old. The original email seems to completely advocate in-lining; the "article" part backtracks somewhat, and that's what my comment was around. I'm not always the clearest in my writing though.

sorry, but given this pre-condition (from gp):

> ... a large function cannot be broken up usefully, because a lot of state needs to be shared between the different parts ...

there is no way to break that up with multiple smaller functions without stowing away the state somewhere.

sometimes reading a large function is not half as bad as reading 10 different ones with each altering the shared state.

state can be passed through as arguments...

When coding in C, I often package the state that's shared between a high-level function and its local subroutines in a local "struct context" that's instantiated in the high-level function and then passed by address as the first argument to the subroutines. Makes it easy to see what the shared state is, and adding/changing the shared state doesn't require changing all the formal and actual argument lists.

Me too. OOP is actually having several such states. My go to pattern is to root states in an application tree.

Sure, but then there are more questions :) e.g. how many parameters ? 3, 4 ... ? what if they are of the same type ? would you change your numbers then ? users can get the order wrong etc. another thing : if you pass too many parameters, isn't that a hint to the fact that something is amiss ?

edit-1 : fixed typo

Remember this is in the case of spiting out smaller functions to improve readability.

Variable naming will only add to that, as you can rename variables in the extracted methods for their local purpose.

If you have too many arguments you can package them up in a object/map/tuple of your choosing (depending on language).

(Reverting to local state would in my opinion increase confusion and decrease readability.)

State can be stored in instance variables as well, and should be if many small functions share them, it's what objects are for.

> State can be stored in instance variables as well...

but then we are back to the beginning of the thread...

No, we aren't, since objects are the proper way in this case to reduce cognitive load. This notion that all the code has to be in one method to be understood is symptomatic of programmers who don't know how to factor correctly thus leading to the necessity of having to read a methods implementation to understand what it does, i.e. they're bad programmers blaming their shortcomings on well factored code rather than learning how to read and write well factored code correctly. Breaking down large methods into numerous small ones makes the code easier to read and understand, not harder, unless you do it entirely wrong.

A easy way of making code into "paragraphs" with comments is to just move it into a function. So in the end, this large function you're talking about, is just calling the other ones, creating a paragraph while the functions are just "words". Makes it easy to test and no need for comments :)

This creates the risk of making what you are doing explicit at the cost of obscuring why you are doing something.

I frequently catch tests in code reviews that don't make it clear why something is being tested and what the overall expected result is beyond the effects being tested.

If the methods you are creating are private and for the purpose of segmenting a block of code, you really shouldn't be testing it directly, as it almost certainly won't be part of your public API.

This is one of my favorite things.

Get a decent high level architecture, good, consistent database design and you don't write anywhere near as much application code. Start hacking about using one field for two purposes or having "special cases" and everything starts to get messy. These special one off cases will involve adding in more code at the application level increasing overall complexity. Repeat enough times and you will code a big ball of mud.

Instead people argue about number of characters per line or prefixing variable names with something and other such trivialities. (These things do help readability, but overall I think they are quite minor in comparison to the database design / overall architecture - assuming you are writing a database backed application).

>Start hacking about using one field for two purposes or having "special cases" and everything starts to get messy.

That advice is not as easy to put into practice as you make it sound. For instance, using one field for two purposes is often done to avoid special cases.

I think the eternal problem of software development is that both being more abstract and being more specific comes with a cost, and the middle ground is always shifting as requirements keep changing.

There's relevance in talking about both micro- and macroscopic guidelines. Both are important.

Very rarely does someone "read" an entire code base "with one look" and be able to deduce issues. You do, at some point, have to get into the weeds. Managing that experience is what articles like these are about.

Yes, there is value in both, but I only ever see people talking about the former.

There are more people talking about code that don't know actual code than people that know what they talk about when they say code source. A software is not only a source code, it's many things: business rules, GUI, database, network, programming language etc. It's seems logical that there's more talking that is macroscopic from the point of view of coder's own microscopic. Hopefully, hackernews is here to help ;)

I always read both side of the story the thing that is as the top (database design, overall architecture) basically the abstract principle. And the bottom part the actually precise drawing that executes as a program. Actually reviewing only unknown code from in an unknown file in an unknown function is very rare. You always come from the design/architecture point of view diving into more details. The translation of the big picture into code has also its pratice just like high level language of your code.

"Actually reviewing only unknown code from in an unknown file in an unknown function is very rare."

I am not sure if I am understanding you correctly, but every programming job where you weren't the original author involves looking at unknown code and trying to work out what the hell it is doing.

Stopped reading at "Place models, views and controllers in their own folders". No worse way to organize your code than classify by behavior type. "Here are all the daos", "here is all business logic", "here are all the controllers". You add a feature as small as resource CRUD and scatter it's pieces across the whole code base.


Honest question from someone with little real world experience outside .Net: This is MVC's convention, to "Place models, views and controllers in their own folders", and it's what I'm used to working with. Can you point me to resources outlining other methods (responding with google "XYZ" would be fine too).

I believe it's called vertical packaging.

Package "employee" contains: EmployeeDto, EmployeeService, EmployeeController Package "order" contains: OrderDto, OrderService, OrderController

And so on. This way logic that is closely related is kept closeby.

Thank you, thank you. THIS, exactly this.

Django's recommended project structure is one that immediately comes to mind. A project is broken down into "applications" each which have their own models, views, and controllers (among other things like forms, tests, etc.). Each "application" is an area of responsibility within the project, like payment or user management.

As it's python, the 'views' module could technically be a folder with multiple files inside it, but they would all be grouped under their respective app.

There several reasons to create a folder. The question is simplistic to me if the folder describe a `model' I call it model. If it describe a feature like "cart" I call it... `cart'...

Agree. A few years ago I read this http://olivergierke.de/2013/01/whoops-where-did-my-architect... post which made a lot of sense to me. Also this: http://www.slideshare.net/olivergierke/whoops-where-did-my-a...

The guidelines for a framework I use is to put views and models in separate folders. Which means code for any model is spread out between at least two folders. Finding code is annoying.

To avoid that Django does split by "app" or feature first.

Also, be sure to repeat adjectives and other name parts as much as possible. Ideally, the same names could be used in the directory/package, file/class, function/method and variable names. Never let the reader forget that this is smurfView or batController or whatever.


He doesn't advocate MVC. He says you should use whatever paradigm your team decides on, and if it happens to be MVC, then you should follow that paradigm and not scatter the M's, the V's and the C's all around.

That's not what the OP is getting at. He's referring to the common file organization of projects where, for example, in MVC, the models are located under a models directory, the controllers under a controllers directory and the views under a views directory.

If I want to understand what the code does, looking over the code and seeing just a bunch of models or controllers is next to useless. Instead, the code should be organized semantically by the domains of the application. I should be able to look over the various directories and files and have a high level understanding of what the application does, not that it's just another MVC application.

This semantic organization also promotes encapsulation since only things related to each other are near each other, instead of scattered throughout many directories organized by arbitrary architectural concepts.

I've noticed recently that especially in online discussions, the term "cognitive load" is used as a catch-all excuse to rag on code that someone doesn't like. It appears to be a thought-terminating cliché.

There's definitely room to talk about objective metrics for code simplicity, which are ultimately what many of these "cognitive load" arguments are about. But cognitive load seems to misrepresent the problem; I think it's hard to prove/justify/qualify without some scientific evidence over a large population sample.

With that said, the article presented fine tips, but they seem to be stock software engineering tips for readable code.

In cognitive psychology, cognitive load refers to the total amount of mental effort being used in the working memory

seems like a correct usage to me.

Anytime an ATM displays weird confirmation buttons like "Sure" instead of "Yes" or bloated confirmation text instead of "transaction completed" this increases cognitive load. I agree that it is debatable what kind of code exactly causes the least strain, but at least the term doesn't seem to be especially scientific.

It's not incorrect necessarily, just unqualified. Cognitive load is hard to meaningfully substantiate; it is person and experience dependent.

Actually, not at all. It's directly measurable http://www.ncbi.nlm.nih.gov/pubmed/17833905

"The pupil response not only indicates mental activity in itself but shows that mental activity is closely correlated with problem difficulty, and that the size of the pupil increases with the difficulty of the problem"

My problem is precisely that these scientific methods are not used when "cognitive load" is being used as rationale. Wouldn't you agree that it would be a mistake for me to claim that cognitive load is an issue with something if e.g. I have not shown that pupils dilate (or some other reasonable experiment indicating correlation)? Unfortunately, doing these experiments is difficult, which justifies "hard to substantiate."

Yes I do think claims should be justified by experiments. Perhaps I'm reading the wrong fora but it seems to me that serious HCI-studies on language design should be happening a lot more than it does.

I agree with you. People use "Cognitive load" as meaning to say "people should be thinking the way I think, because to me that is very easy and straightforward".

The load gets bigger not because the code is more complex, but because we reason about problems differently, we build different mental models of the solution, and if these models are too different, then the cognitive load gets heavy.

The other side of the coin is that books like Code Complete go a long way into making those mental models more clearly stated.

Solving the underlying problem (big variety of mental models) means standardizing how we thing about solving programming problems. The GOF book, head in that direction... then again, they become used for the sake of implementing a pattern instead of solving a specific problem. I think this is one of the things that keeps software development in the craft area, as opposed to engineering. but that would be a different rant


    How to reduce the cognitive load of your code (chrismm.com)
    304 points by ingve 90 days ago | 232 comments

People sometimes ask why it's necessary to point out reposts. I think it's helpful because you get an extended and often alternate commentary and can make for interesting reading and comparisons.

I'd like to add one: let your tools do the work for you. It may seem like a pain to learn the tooling behind what you do, but once you internalize it, it becomes a superpower.

An example is that I use Clojure Refactor Mode (with CIDER) for emacs. A trick (and treat) that a lot of Clojure code uses is the arrow macros: -> and ->>. Clojure Refactor Mode has thread-first, thread-first-all, thread-last, thread-last-all and unwind. Since I've committed those to my long term memory, I can just call thread-first-all on something like:

    (reduce * (repeat 4 (count (str (* 100 2)))))
and get:

    (->> 2
         (* 100)
         (repeat 4)
         (reduce *))
This is so huge, because many times changing the levels of threading makes reasoning about the code so much easier.

I agree with the tooling part.

I tend to prefer the functional version to threading because (1) honestly, like fluent interfaces, it seems overused (2) function application is damn easy to read and (3) as soon as you have as many nested operations, it can and should be refactored into meaningful auxiliary functions. The first line reads as:

Multiply all ... 4 copies of ... the length of ... the string "200". So, basically, the length of that string multiplied by itself 4 times? (exponent).

The other form is more like step-by-step instructions, which is nice. However results are implicitly being passed at the first or last argument (I don't always remember which), and most everyday functions don't fit in the first/last category.

One of the nice things about the Clojure standard library is that most everyday functions actually do fit the first/last category (by design). Functions operating on sequences (map, filter, reduce, etc) take the sequence as the last argument and are suited for use with the ->> macro, while functions operating on data structures in a non-sequence context typically take the data structure first (assoc, conj, update) and are good for ->. So you get either:

    (->> (range 10)
         (map inc)
         (filter even?)
         (take 2)) ;=> '(2 4)

    (-> {:body {:some {:json :data}}}
        (assoc-in [:body :some :more-json] :more-data)
        (assoc :status 200)
        (update :body json/generate-string))
    ;;=> {:body "{\"some\":{\"json\":\"data\",\"more-json\":\"more-data\"}}", :status 200}
It doesn't work all the time, obviously, and it can be easy to get carried away with 15 threaded map/filter/reduce calls that should be factored into separate functions, but most of the time I find it to be a nice idiom that substantially improves readability.

That's like the function composition operator [1] in Haskell, right? Very neat :D I wonder if there's an equivalent macro in Scala ...

[1]: http://lambda.jstolarek.com/2012/03/function-composition-and...

It's more like the pipe operator in ocaml (http://blog.shaynefletcher.org/2013/12/pipelining-with-opera...). The lisp version has the extra advantage that you don't have to repeat it between all the intermediate functions. ((->> 2 (* 100) str count) vs 2 |> (* 100) |> str |> count).

I understand how a lisp implementation would work here to require only the single operator (I'm assuming a fairly simple macro).

Would it not be possible to do something similar in another functional language to take a <pipe function> and apply it sequentially to a list of function calls?

There are no semantic problems with this, but typing will get in the way: you can express it fairly easily if all the functions have the same type (such as Int -> Int): actually it's just 'foldr ($)'. But it is difficult to type a list of functions such as each member's return value has the same type as the next one's parameter (symbolically, [an-1 -> an, ..., a1 -> a2, a0 -> a1]). It's easier to refer to the composition of such functions, which is why you would see it as 'h . g . f'.

|> from scalaz. API is not that usable though. F# have List.map, List.filter functions for example, which are not present in scala.

It doesn't seem to have worked out for F# as they have recently adopted what Scala has been doing for years.

So many times this. I've seen some university projects where every coder used eclipse, but no one used its auto format function.

As a junior dev I can confirm the advice about junior devs is very accurate. An anecdote: I recently started working with a team on their half completed web app. They had so many dependencies, and tools for managing dependencies, it took me far longer than it should have to become productive. It's obviously not my place to question which technologies they use, but it can be frustrating.

It is your place to question everything. Your contribution to the team, even if it comes in the form of perspective alone, is valuable anyway.

I'd go even further and say that newcomers to teams are often the most able to pinpoint essential flaws in the development process.

Except often the things pinpointed are not flaws just different from that they are used to.

Then those differences can or ought to be documented. Having fresh eyes on something is good for discovering unwritten processes, rules, etc. If you're doing X because Y, but you never wrote down Y. A new person enters and sees X but doesn't see it's utility (is it better, faster, gives you more robust systems), they may attempt to remove that from the process or tool chain. They may be right, they may be wrong. Because they don't know Y and perhaps no one else in the office does either, at this point.

Document the tools you use, why you use them (even if it's just: we were familiar with T1 so we chose it over T2, that's not a bad reason unless T1 has some major flaws or limitations). Document the processes and provide rationales as best you can.

This is my viewpoint, at least, as someone who's done a lot of work on the maintenance end of software development. I don't know why in 1984 something was done. I find a particularly gnarly bit of code or process and I want to fix it. Turns out they had a good reason (most of the time), but it's outdated because X. Or it's still relevant, but non-obvious until you get to a certain level of familiarity that only happens for the original developer or a maintainer on the project for 20 years.

And that applies irrespective of a newcomer is junior or senior. There is hardly a substitute for a fresh pair of eyes t to reveal what one thinks is cool design, code et al is not so cool after all.

> They had so many dependencies, and tools for managing dependencies

This is not just junior developer's thing. Everyone would be hindered in this case.

This article is a mix of good advice, terrible advice, and conflicting advice. Statements like "Avoid using language extensions and libraries that do not play well with your IDE." are foolish. Pick your language primarily on the best fit of the language for the problem space, second, pick a language that you're comfortable with and knowledgeable about. Picking the wrong tool simply because it works well with another tool, is horrible advice.

This is a nice post on the subject of readability, though I mostly like that the title does not use the often misused word "readability" at all. I now prefer to talk about understandability instead, which usually boils down to cognitive load.

This is one of the things that Go has got very right in its design, though it is often badly misunderstood. Advocates of languages like Ruby often refer to the "beauty" of the code while ignoring the fact that many of the techniques employed to achieve that obscure the meaning of the code.

The main problem I have with the term "readability" is that it encourages writing of code that reads like English, even if it obscures the details of what the code does. In the worst cases, the same set of statements can do different things in different contexts but that context may not be at all obvious to the reader.

One of the first books I read when I was learning C years ago talked about avoiding "cutesy code". That was particularly in reference to macro abuse, but it's always stuck with me as a good general principle. It applies equally to excessive overloading via inheritance and many other things that make it hard to tell what a given statement actually does, without digging around in sources outside of the fragment of code you are reading.

In many ways the art of good programming is, aside from choosing good names for things, maintaining the proper balance between KISS and DRY.

I've been dwelling on this idea of readability vs comprehensibility for quite a while now, working on front-end projects in Javascript.

Everyone seems to be using the airbnb style guide for their projects now, and whenever I look through these projects I can't help but feel that people are committing some cardinal sins that should be blatantly obvious to most developers.

There's such a push to make everything "simpler" and "neater", that people are willing to trade any amount of comprehensibility to make their code look nicer.

A key example: Since object destructuring became a thing, I often see logical objects being destructured for the sake of saving a few keystrokes. Ditto for framework constructs such as React's component props. Yes it's "ugly" to see "this.props" scattered around the place, but it makes it crystal clear what data is coming from where. If you destructure everything into it's own variable then how do you distinguish between function arguments, closured variables, object properties, React props etc. And the worst thing about this practice is that it almost invariably happens in functions that are large and complex enough to "need" it, which is where it does the most damage.

I also think there's a case to be made for avoiding function declaration syntax inside objects, and ES6 class syntax in general. They seem to exist only to try and flatten a learning curve that's not that bad in the first place. Javascript doesn't have classes, it has objects and prototypes, and you're not declaring a function on a class, you're declaring a property on an object, which happens to be a function.

Why are we so quick to introduce ambiguity just to abstract away from minor complexities? Sure this code is "easier to read", but it's a lot harder to comprehend the specifics of what it's doing. And in any non-trivial project there is going to be a time when it's the specifics that matter.

On the other hand, maybe increasing the cognitive load is beneficial to everyone in the long term: http://www.linusakesson.net/programming/kernighans-lever/

...and another:

Use of whitespace (vertical and horizontal) to group and associate code with related parts.

Its a trick borrowed from graphic design, but negative-space works really nicely.

I've come to appreciate this as well and it's what has driven me to prefer spaces over tabs.

Sounds like you need to split your methods/files into smaller single-purpose chunks.

no... nothing to do with that...

This is about visually grouping related things together, not decomposition of functions.

This article is a good start, but I found it much too light on detail. Each section ended just when I was ready for it to dive into details! For example, in the final section "Make it easy to digest":

> Using prefixes in names is a great way to add meaning to them. It’s a practice that used to be popular, and I think misuse is the reason it hasn’t kept up. Prefix systems like hungarian notation were initially meant to add meaning, but with time they ended up being used in less contextual ways, such as just to add type information.

OK, great, I agree -- but what are some suggestions/examples of good prefixes? What are some examples of bad prefixes that we should avoid?

To illustrate the sort of detail I'd like to read, here is an example of my own of good/bad method names that would be greatly improved by judicious use of prefixes.

My standard go-to example for ambiguous naming is the std::vector in the C++ STL. There is a member function `vec.empty()`: Does this function empty the vector [Y/N]? Answer: No, it doesn't. To do that, you instead use the member function `vec.clear()`. There is no logic a priori to know the difference between `empty` & `clear`, nor what operation either performs if you see it in isolation. You must simply memorize the meanings, or consult the docs every time.

In the C++ style guides I've written, I've always encouraged the prefixing of member function names with a verb. Boolean accessors should be prefixed with `is-`. The only exception should be non-boolean accessors such as `size` (which has its own problems as a name). Forcing non-boolean accessors to be preceded by a verb invariably results in names like `getSize()`, where `get-` adds no useful information, clashes with the standard C++ naming style for accessors, and really just clutters the code with visual noise.

Using these prefixes: (depending upon your project's preference for underscores or CamelCase)

  .empty -> .isEmpty() or .is_empty()
  .clear -> .makeEmpty() or .make_empty()
As an additional benefit, the use of disambiguating prefixes also enables the interface designer to standardize upon a single term "empty" to describe the state of containing no elements in the vector, rather than playing the synonym game ("empty", "clear", etc.). The programmer should not need to wonder whether "clear" empties a vector in a different way.

is_emtpy and make_empty are just going to irritate every C++ programmer in the business, since all the STL containers use empty and clear.

Some C++ programmers, perhaps. But I've just explained why `empty` & `clear` are ambiguous. Even if `empty` & `clear` can never be removed from the STL containers, there's no requirement that these ambiguous names must be propagated to new code.

But focusing exclusively on these two names is missing the forest for the trees. These two names are just a particularly striking example that illustrates the benefit of prefixes. A codebase that applies useful prefix naming will be an easier codebase to understand.

And applying prefix naming consistently will also make it easier for a new developer to contribute to a codebase, since there will be no ambiguity about what to name new functions, nor what to expect them to be named. `is_empty` & `make_empty` would simply be part of that consistency.

I agree with you, but honkhonkpants raises a good point - sometimes you must bow to existing convention, even if it doesn't meet current best practice.

"Sometimes you must bow to existing convention" is indeed a reasonable point, so I suppose I should clarify/refine my position.

If you're implementing an STL-like container in C++, then absolutely -- you should stick with the convention: `empty`, `clear`, `size`, etc. To deviate from that convention would be an exercise in confusing the users of your code. You should make a note in the class comment that it deviates from any other project-wide naming scheme because it conforms to the STL container interface, and move on.

But if you're creating a C++ class that is NOT intended to be an STL-like container (or if you're not working in C++!), then I'd argue that it would be better to go with `is_empty` & `make_empty` (if you're applying this prefix naming scheme across the rest of your codebase) for the benefits I've described above.

I think make_clear is just not good style. It conflicts in meaning with std::make_unique, which allocates a std::unique_pointer. When I see make_clear I think it allocates a new, empty object.

My current pet-peeve:

- If your code deals with values where the units of measure are especially important and where they may change for the same type of value in different contexts, PUT THE UNITS USED IN THE VARIABLE NAME!

I work primarily with systems that talk money values to other systems, some of which need values in decimal dollars (10.00 is $10.00) and some that need values in integer cents (1000 is $10.00).

Throughout our codebase this is often referred to helpfully as 'Amount', unfortunately :( So much easier when you can just look at the variable.... 'AmountCents' -- this naming convention alone would prevent some bugs I've had to fix.

Which points to something deeper that I've come to realize. Your code speaks to you, in the sense that when you come back to your own code 6 months later, there's a certain amount of "I don't know what this is doing" that you can chalk up to just not having looked at it for 6 months, but there is also an amount where you have to say "no, actually I didn't write this code clearly at the time". When evaluating my own progress that's a big metric I use - on average, how am I understanding my own code later?

What I try and watch out for in myself is when I find myself not making something explicit in the code because of domain knowledge that I have. The 'Amount' example is a good one of this. The domain knowledge is that I know this particular system wants values in decimal dollars -- I mean it's totally OBVIOUS isn't it? Why would I bother writing 'Cents' at the end for something so obvious?

Yet, even referencing domain knowledge is a higher cognitive load than just reading 'Cents' in the variable name. Not to mention the next engineer that comes along -- it's likely they won't have that bit of 'obvious' domain knowledge.

I would vote both 'Code Complete' and 'Clean Code' as two must-read books for any programmer.

I own Code Complete, but I felt I got better value out of the Clean Code book combined with the Pragmatic Programmer.

I did find some value in Code Complete, but it is a little too long for my tastes. The naming and abstract data structure sections were probably my favorite parts of that book.

If you're just starting out in your career, reading Code Complete is like gaining experience by osmosis. Then once you know what you're doing, Pragmatic Programmer is like a light refresher that you read once every few years.

Yes, I read the first edition many years ago - it was a huge benefit to my naive "bash the code out however I can" practices.

Now, when reading it, I'm kind of like "yes, that's good, except when..." So you learn to temper the rules with experience. But in the beginning of your software development journey, you need something to keep you in line.

I happen to find the principle of single level of abstraction does more to reduce cognitive load than all these tips put together.

My biggest pet peeve is when people use pattern names in class names. You don't need to call things strategies if you're composing in behavior. Just call it the behavior.

val weapon = Sword() weapon.attack(up) weapon = Bow() weapon.attack(left)

Often the pattern's implementation drifts a bit from the by-the-book implementation and it ends up being something ALMOST like the pattern but it's not quite anymore. Or it's more. Then the pattern name is still stuck there and it causes more confusion than it helps to clarify.

I'm surprised no one has cited this:


Fluent interfaces make code a joy to write and a huge burden to later review and reason about, particularly when the object you are interacting with changes mid way through on certain calls. They are something I loved when I was younger, but now doing code reviews they are the bane of my existence.

All code does not need to be easily understandable by a novice developer. Minimizing cognitive load is certainly a good thing, but using overly simple grammar for a complex task leads to unneeded verbosity.

When writing software, as with any form of writing, you should keep your audience in mind as you write.

This article does a good job of encapsulating the prevailing Java "ignorance is strength" (worse/longer is better; abstraction is bad) paradigm.

When are the right-tailers (in the bell curve) ever going to let you use new features to make your code shorter? Why learn complex concepts like "multiplication", when "tally marks" will do?

I'm afraid I'm with Steve Yegge on this one, in regards to dislike of the "tools to move mountains of dirt" aspect.


>Junior devs can’t handle overuse of new tech.

heh, I'm a senior dev and I have trouble with the overuse of new tech. It's hard for me to learn when there are too many variables in play; early on, it's hard to know which bit is doing what.

Good advice and worth reading especially for younger devs. With respect to...

>> Prefix systems like hungarian notation were initially meant to add meaning, but with time they ended up being used in less contextual ways, such as just to add type information.

Hungarian notation was pretty cumbersome to read, actually, and I think the main reason it fell out of use is that editors and IDEs began to make type and declaration information available for symbols in a consistent way, so it was no longer much of an advantage (and perhaps a disadvantage) to use a manual convention that was usually applied inconsistently.

One of the major functions of Hungarian notion was to communicate information which was not contained in the types of the actual variables, for example an int could be a count of bytes 'cb', or perhaps a handle 'h', etc. But it ended up being mostly misused to communicate redundant type information, such as a char* being 'sz' (zero-terminated string), which tells us nothing we didn't already know. As you say, better IDEs made the latter kind of naming no longer advantageous (if it ever was) but that was true for some time before Hungarian notation fell out of favour - the real reason being a rejection of its redundancy within MS during the transition to .NET. Joel Spolsky details the good and bad of Hungarian notation here:


'sz' is pretty useful if you have pascal strings floating around.

From the perspective of a younger dev, this seems like excellent reading for older devs, who tend to enforce their personal style without deferring to the de facto language standards (or PEP 8 for those working in Python). In fact, perhaps everyone could do with downing a half dozen humble pills and optimize their code for harmonious human interoperability than anything to do with machines :)

I started doing this a year ago and it really helped me to maintain code. My new goal is to be able to read other's code, make it more readable, and fix the problem just as fast as it would have been without slight, constant refactoring.

I want to run a team so I can teach the whole team to work this way. Then I'll handle all the complex refactorings, which I really enjoy doing, while they greenfield new features. If they can write code this way, then I'll be able to refactor it without having to study it to figure out what it's doing.

Maximize order. Order is the lubricant for information. And if you back your reasons with guiding principles (aka philosophy) the specifics will remain obvious as well as sort themselves.

These are the only ways to reduce cognitive load and they apply to any situation where one needs to understand something. After all, code is about understanding.

Anecdotally, the specific methods mentioned in the article that seem most valid stem from the guiding principle of maximizing order, which drastically reduces the cognitive load of the contents of the article.

For the most part, I agree with this. The biggest problems I've had at work have been due to constructs that were a neat idea but just add to the complexity of figuring out the application. Throw in a bunch of business-specific engineering terminology that is not defined anywhere for the development team, and it becomes a wicked PITA to learn.

However, the one-liner example and the chained English-sounding methods, I think might be taken the wrong way. Both can be done well.

Along the same lines, use the right language for the abstraction you are dealing with. In a server environment, I prefer to have modular services linked together with some message queue.

In web projects, this is what has kept me coming back to CoffeeScript: we found it less distracting visually, given the kind of code we were writing (heavy call-back oriented, lots of chained methods).

""" Storing the result of long conditionals into a variable or two is a great way to modularize without the overhead of a function call."""

Ridiculous ! Not only caring about overhead is misleading, but introducing local variables is against refactoring principles.

Or rather more simply - do code reviews and decide on which of these things you want to include and teach everyone about:

a) the agreed way

b) other code they haven't worked on

in the process. Finally if you know someone else will be reviewing your code you'll produce better code in the first place.

"Using MVC? Place models, views and controllers in their own folders" This works on smaller projects on large projects it's often easier to group things by component

"clever code isn't." Is what I try to teach all who will listen.

Good code should never be illegible to newbies. And if they can read your code, they can learn way faster.

If you don't have to name things, you have zero chance of getting bug caused by naming things

That's why i love anonymous function. it frees me from names overload

Have to disagree on not placing null first in comparisons, it's a good way to avoid bugs.

I believe maintainable style is less important than knowing how the thing behaves.

These seem like good rules to follow, but there's nothing to suggest that they reduce cognitive load. To make that claim, you need experiments testing brain function or at least people's behavior...

To follow up with a point of comparison, this is the kind of work that can make a claim about how cognition and coding are related:


    Keep your personal quirks out of it

    Don’t personalize your work in ways that would require explanations.

    I like taking advantage of variables to compartmentalize logic.

Not the same thing.

Randomly I'm working on some fairly awful code today that has the one redeeming feature that it divided and conquered as described in the article.

It made it fairly trivial to find the exact spot the problem was happening.

At no point reading the code did I think that assigning key values into variables along the way that were named to describe exactly what they represented and then all added up at the end was a personal quirk of the code.

That's a world of difference compared to using the fairly odd:

    if(null != thing)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact