
Coding style and undoing indentation hell - ashishgandhi
http://www.virtualdub.org/blog/pivot/entry.php?id=238
======
bwh2
I've seen a lot of heavy nesting caused by the belief that functions should
only have one return statement. Ex:

    
    
        // Original
        function render() {
          $return = null;
          if($condition1) {
            if($condition2) {
              $return = 2;
            } else {
              $return = 1;
            }
          } elseif($condition3) {
            $return = 3;
          } else {
            $return = 4;
          }
          return $return;
        }
    

But I'd rather read:

    
    
        // Refactored
        function render() {
          if($condition1 && $condition2) return 2;
          if($condition1) return 1;
          if($condition3) return 3;
          return = 4;
        }

~~~
hxa7241
At least have an honest comparison:

    
    
       function render() {
          $return = null;
          if($condition1 && $condition2) $return = 2;
          elseif($condition1) $return = 1;
          elseif($condition3) $return = 3;
          else $return = 4;
          return $return
       }

~~~
VMG
Another example

    
    
        r = 0;
        outer: {                                                                     
        foreach (x in a) {                                                           
          foreach (y in b) {                                                         
            if (condition(a, b)) {                                                   
              r = 1;                                                                 
              break outer;                                                           
            }                                                                        
          }                                                                          
        }                                                                            
                                                                                     
        return r;                                                                    
                                                                                     

instead of

    
    
        foreach (x in a) {                                                           
          foreach (y in b) {                                                         
            if (condition(a, b)) {                                                   
              return 1;                                                              
            }                                                                        
          }                                                                          
        }                                                                            
                                                                                     
        return 0;

------
robomartin
It's probably a matter of style at some level.

I hate this style (from the article):

    
    
      if (...) {
            if (...) {
                ...
        } else {
            if (...) {
                ...
            } else if (...) {
                ...
            } else {
                ...
            }
        }
    

and prefer this:

    
    
      if (...) 
      {
          if (...) 
          {
            ...
          } 
          else 
          {
              if (...) 
              {
                ...
              } 
              else if (...) 
              {
                ..
              } 
              else 
              {
                ...
              }
          }
      }
    

The first example is confusing and hard to read. The second is clean and easy
to read.

As to the question of optimization, that's a different matter. Lots of tricks
can be used to compress code into fewer statements on a case by case basis:

    
    
      if(pushbutton == 0)
      {
        led = 12;
      }
      else
      {
        led = 24;
      }
    

can become

    
    
      led = (pushbutton == 0) ? 12 : 24;
    

or

    
    
      led = (pushbutton * 12) + 12;
    

or

    
    
      led = 12 * (pushbutton + 1);
    

etc.

~~~
epistasis
I find the first version easy to read, and the second confusing and hard to
read. I believe this is a matter of what a person is more accustomed to, and I
learned on the first form and lately have been using the first form more. For
me the extra brace below the keywords makes it hard to find the phrase of
interest, and the expanded vertical space also makes it more difficult to line
up indentation levels. But these visual cues are flexible, and I could learn
similar scanning tricks for the second form.

I have worked with people that were dogmatic that one form was objectively
better than another form, rather than it being a matter of local optimization
in the brain; I have difficult working with these people because they tend to
show similar rigidity and confusion of the subjective with the objective in
other areas, and such small mindedness limits what type of engineering
solutions one is able to come up with.

~~~
nightski
Agreed. If I was forced to pick it would probably be the first one, but other
than that I just don't care.

~~~
robomartin
Did either of you pick-up the fact that the first version is missing a curly
brace? :)

Hard to see. That's my point. And that's why I tend to favor the second style.

I'll use any convention if required, but if given a choice I go for styles
that protect me from my own dumb mistakes as much as possible.

------
rcfox
Don't forget that in many cases, you can replace a conditional with
polymorphism. <http://c2.com/cgi/wiki?ReplaceConditionalWithPolymorphism>

~~~
huhtenberg
You certainly can, but it doesn't mean that you should.

Case in point - Linux networking code, and more specifically sockets handling
part. Mr. Kuznetsov did an excellent job abstracting a lot of things behind
interfaces (= blocks of function pointers), but at the same time this made it
much _much_ harder to follow the code. It's basically impossible to trace the
code by hand without a pen and paper.

(edit) To elaborate on the context - reading and tracing the code straight off
the source files is a necessity of kernel development, routinely required for
both debugging and writing new code.

~~~
rcfox
Spend this afternoon figuring out GLOBAL and setting up to work with your
editor.

<http://www.gnu.org/software/global/>

~~~
huhtenberg
How would this help to understand what sk->ops[SK_READ]->foo[index](bar)
exactly calls?

~~~
rcfox
It lets you quickly jump to the definitions of the variables, their types, and
their uses. (And then back to where you were.)

It's not going to do all of the work for you, but it'll reduce some of the
cognitive load.

~~~
huhtenberg
This is the same as ctags then, and it's completely irrelevant to the issue
with over-abstracting things in the code.

------
huhtenberg
This -

    
    
      if (!test()) {
        handle_failure();
      } else {
        // do lots of work
      }
    

is just another incarnation of

    
    
      if (NULL == v)
    

i.e. it sacrifices a natural flow of code and makes it harder to follow for
something of a superficial value.

Also this -

    
    
      if (!test1()) {
          handle_failure_1();
          return false;
      }
    

is fine _unless_ the function needs to do a cleanup, which is pretty much
always the case, and which in turn means that the { } block with grow with
every _if_ and it will include a lot of redundant code.

A far cleaner and easier to follow approach is to use if-goto, whereby all
cleanup code sits at the bottom of the function (where it naturally belongs)
and the flow jumps to the appropriate part of the cleanup on failure. The only
exception is when there's no cleanup needed, in which case it can do if-
return.

~~~
dwc
I use both if-return and if-goto. In practice you need both, depending on
circumstance. In much of my recent code I've managed to use a _lot_ of if-
return checks at the top of functions. Another nice thing about a bunch of if-
returns at the beginning is it makes a nice checklist of things that will
cause the function to fail, all right up front.

~~~
huhtenberg
> _a nice checklist of things that will cause the function to fail_

Should these be assert()s? ;)

~~~
DeepDuh
Not sure if serious, but assert only makes sense in cases where you want to
switch off the tests at some point right?

~~~
huhtenberg
Asserts are for enforcing invariants in the code. If you replace an assert
with a conditional that allows the execution to continue, you are concealing
the problem and making it that much harder to track.

Practically speaking, _your_ function should not be failing because _your_
other code passed it invalid parameters. That's a bug in the former. That's
why parameter validity should be asserted.

~~~
DeepDuh
I see. I usually never let execution continue and log + exit instead, but
you're right, an assert is easier in that situation.

------
hxa7241
The broad suggestion that indentation is somehow wrong is ill-advised.

Sure, various of the refactorings given are sensible. But the basic problem is
not indentation. If something can be simplified, simplify it. If something is
too long, break it into pieces. But if something _is_ tree-structured, _that
is what it is_ , and it probably ought not be bent out of shape.

(Note: Fowler's refactoring 'Replace Nested Conditional with Guard Clauses' is
very specific -- it does not declare that all nesting is bad.)

A tree-like-indentation structure is a perfectly good element of programming
-- why try/want to change it into something else?

In fact, a good case could be made that software in its most fundamental
essential conception is tree-structured -- even that engineering generally is
tree-structured.

------
finnh
At this point in my coding life (15 years in), I'm pretty strongly against if
/ else clauses. The readability of such code just plummets in my view.

I try to limit my conditionals to inline statements:

    
    
      $x = $y if $b;
    

Often joined with early returns:

    
    
      return $b if $a;
    

If I find myself coding an if/else branch, that's often a clue that I should
refactor this chunk into a function that can use early returns to avoid nested
conditionals:

so

    
    
      if ($a == FOO) {
        $c = $d;
        }
      else if ($a == BAR) {
        $c = $e; 
        }
      else {
        $c = $f;
      }
    

becomes

    
    
      $c = get_val($a);
    
    	...
    
      sub get_val {
       my ($a) = @_;
       return $d if $a == FOO;
       return $e if $a == BAR;
       return $f;
       }
    
    

Thankfully much of my coding lately is in perl, which lets me use postfix
"if/unless" as well as "and/or" to avoid having to use braced conditionals.

I also love myself the ternary operator. Basically my take comes down to "use
conditionals for one-line assignments only, NOT for block execution".

------
jack-r-abbit
I'm glad I am not the only that is particular about these things. Computers
may not care which order your IF statement is in... but some people completely
disregard that their code is going to be read by humans too.

------
RyanMcGreal
Generally good advice, but w/r/t the first recommendation, I prefer to
structure if/else clauses so that common case is handled first and the
exceptional case is handled last.

~~~
dwc
As shown in the article, that first recommendation leads directly to the next
step(s). In the ideal case (which actually happens sometimes in the real
world!), you end up with something like this:

    
    
        if (bad1) fail();
        if (bad2) fail();
        if (bad3) fail();
        
        step1();
        step2();
        step3();
        step4();
        return SUCCESS;
    

To me, this is a very nice way to structure things when you can. You get a
checklist of things that might be wrong, and then comes the wonderfully naive
straightforward implementation.

~~~
davidhollander
Stating failures first can also be considered analogous to input validation,
preconditions, or declaring the domain for which the function is undefined.
Some related discussion on here: <http://c2.com/cgi/wiki?GuardClause>

------
dustingetz
meh. when i'm looking for a bug, i tend to start in the places with lots of
branching. in general, high quality code doesn't have much branching, and if
you find yourself neding complicated and nested branching, there is probably a
better way to factor your code.

one easy trick is to structure your code such that 'null' is impossible, by
initializing to an empty value, or re-ordering the way things happen, or using
exceptions or error-monad.

------
merlincorey
I have to run so this is really a substandard comment for hacker news, but...
Really? He advocates adding multiple exit points to a function as a way of
REDUCING COMPLEXITY? The solution when you have too deep of nesting is to pull
out inner parts into useful simpler stateless functions, or so I was led to
believe!

------
Roboprog
He's preaching to the choir in my case. I'm a big fan of busting out of
functions/methods early when you already know "it's over". (sometimes this
means putting the cleanup in something like a try/finally wrapper routine,
though)

But, kudos: I can send this to my team instead of having to write it myself.
Win!

------
Qerub
The _goto_ cleanup strategy can be improved upon by using multiple _goto_
labels, like here: <http://vilimpoc.org/research/raii-in-c/>

~~~
kstenerud
Or just test the resource handles you opened, and only close them if they're
valid. That way, you don't have to worry about doing a goto to the wrong
label, or messing up the order of cleanup.

    
    
        BOOL do_stuff_with_resources()
        {
            int success = FALSE;
            
            char *memBuf = (char *) malloc(256 * 1024);
            if (NULL == memBuf) goto cleanup;
            
            if (randBomb()) goto cleanup;
        
            FILE *fp = fopen("dummy-file.txt", "w");
            if (NULL == fp) goto cleanup;
            
            if (randBomb()) goto cleanup;
        
            int sock = socket(AF_INET, SOCK_DGRAM, 0);
            if (-1 == sock) goto cleanup;
            
            if (randBomb()) goto cleanup;
        
            int shmId = shmget(0x1234ABCD, 256*1024, IPC_CREAT);
            if (-1 == shmId) goto cleanup;
            
            shmRegion = (char *) shmat(shmId, NULL, 0);
            if ((char *) -1 == shmRegion) goto cleanup;
        
            if (randBomb()) goto cleanup;
        
            sem_t st = -1;
            if (-1 == sem_init(&st, 0, 1)) goto cleanup;
            
            if (randBomb()) goto cleanup;
        
            
            // TODO: Do stuff
        
            success = TRUE;
        
        cleanup:
            if(-1 != st)
            {
                sem_destroy(&st);
            }
            if(-1 != shmId)
            {
                shmctl(shmId, IPC_RMID, NULL);
            }
            if(-1 != sock)
            {
                close(sock);
            }
            if(NULL != fp)
            {
                fclose(fp);
            }
            if(NULL != memBuf)
            {
                free(memBuf);
            }
        
            return success;    
        }

------
T_S_
If you do your indentation "wrong" and nothing breaks, then you will do your
indentation "wrong". That's why I find whitespace-sensitive languages easier
to read, YMMV.

~~~
bingaling
The article talks about unwinding deeply nested control structures, which is
equally possible in whitespace sensitive languages.

------
mendicant
He's basically a step away from having several objects (let's call them
handlers) which can do your handling for you. Further, they can define the set
of circumstances in which they are valid.

I'm going to work with a couple of assumptions:

1) The test casts which he refactors to test*() methods are non-trivial 2) The
handlers _may_ be large sets of code

Here's a way to handle one of this refactored methods:

if (!test1()) { handle_failure_1(); return false; }

if (!test2()) { handle_failure_2(); return false; }

if (!test3()) { handle_failure_3(); return false; }

handle_success(); return true;

That's a lot of potential code we're dealing with in this class.

What if:

class HandlesFailure1

    
    
      def handles_this_case(params)
         #code from test1()
      end
    
      def handle(params)
        #code from handle_failure_1()
        return false;
      end
    

end

class HandlesFailure2

    
    
      def handles_this_case(params)
         #code from test2()
      end
    
      def handle(params)
        #code from handle_failure_2()
        return false;
      end
    

end

class HandlesFailure3

    
    
      def handles_this_case(params)
         #code from test3()
      end
    
      def handle(params)
        #code from handle_failure_3()
        return false;
      end
    

end

class HandlesSuccess

    
    
      def handles_this_case(params)
        return true
      end
    
      def handles(params)
        //code from handle_success
        return true
      end
    

end

#And now the original class

class DoesSomething

    
    
      #There are ways to set this up fairly easily. One simple way it to just new it up here, but you could auto-wire it too
      handlers = [HandlesFailure1, HandlesFailure2, HandlesFailure3, HandlesSuccess]
    
      def do_something(params)
    
        handlers.each do |handler|
          return handler.handle(params) if handler.handles_this_case(params)
        end
    
      end
    

end

There. Now there's a loop and an if. Check each case and either handle or toss
it out. This can also work for the case where you might want multiple handlers
to handle something, just don't return on the first one.

I'm not saying it's the right way, but it's another way.

Pros: It really helps to group together the handling methods with the test
methods. Very simple dispatch method

Cons: Separate classes, and harder to view all the handlers together.

------
alexchamberlain
I agree with most of what was said, especially the use of goto! I don't like
success boolean, but that's probably just me.

~~~
ajross
I used to be a big goto-error-cleanup proponent, but ultimately gave it up.
The resulting code is cleaner, but the bug it's trying to treat (returning
without correctly cleaning up) is still too easy. Complicated functions end up
with 4-5 cleanup labels and it's hard to look at them and understand the
resource acquisition order.

Generally these days I prefer functional decomposition for that (actually
honestly I'm preferring C++ RAII style where you get the cleanup for free in
almost all cases). That is, do something which requires cleanup (e.g. allocate
memory, take a mutex, etc...), call a function to "initialize" the thing you
just did, check results and free it on failure. So the allocation and cleanup
happen next to each other in the code. Nest as needed for complicated resource
strategies, and it scales well. No one is ever confused about what the state
is supposed to be.

~~~
alexchamberlain
The problem isn't with 1 resource. It is with 2 or more, right? You
successfully allocate the first, but the second fails and so you have to go
back and deallocate the first. The goto strategy avoids code repetition.

RAII is the best part about C++!

------
lfaraone
I stopped reading when he recommended if-goto.

~~~
dmaz
Sure, goto was abused, but it's no worse than (or even better than) using
continuations or using break and continue.

~~~
crusso
Any of those mysterious jump-to-somewhere-else code features resemble goto.
Not that I don't think they clean up some nesting issues, but they're pretty
much the same thing as goto.

------
drivebyacct2
Someone would sure enjoy golang!

~~~
wonderzombie
More specifically, the `defer` keyword lets you handle cleanup in the case of
failure a bit more neatly.

The gotos can be replaced with stuff like this:

    
    
        foo, err := bar()
        if err != nil {
          return err
        }
        defer foo.cleanup() // or what have you
    

Now, right before the function returns, Go will call `foo.cleanup()` and your
`cleanup()` call sits right next to the initialization code. It's a lot easier
to tell whether or not you are (or someone else is) doing cleanup properly.

