
Refactoring Large Functions - muth02446
http://robert.muth.org/refactoring-large-functions.html
======
dasmoth
Do people really prefer:

    
    
          function doFirstBit() {}
          function doSecondBit() {}
    
          function big() {
              let x,y,z;
              [x,y] = doFirstBit();
              [x,z] = doSecondBit(x, y);
          }
    

to:

    
    
          function big() {
              let x,y,z;
              // First bit...
              ...
              // Second bit...
              ...
          }
    

To me, it's no harder to find the portion of interest in the latter when
making a local change, and substantially _easier_ to understand the whole
thing (because you can sit down, maybe with a printout if that's your thing,
and read through it linearly without having to jump back and forth to the
subfunctions). And passing data between the portions can quickly end up
nastier than my contrived example above, at which point people start doing
things like creating a class just to hold what used to be the function's
stack-frame.

Not saying very long functions are something to celebrate. But if they're
organised sensibly they're not a huge problem, and to be honest probably
bother me less than functions which aren't abstracting anything terribly
interesting and only get called once.

~~~
joe_the_user
The readability/understandability points other make are good.

But I'd add that large function like this tempt the programmer to do something
like using one of the intermediate variables in the first "bit" to make a
decision in the second "bit" out of convenience. Add ten more "bits" and ten
more such "conveniences" and you get an incredibly fragile, bug-prone mess.

Which is to say, a purely lexical-order division of operations is bad because
you easily wind-up with undefined defined interface between the various
operations. Figuring out both how a simple for-loop really works as well as
how the original programmer _intended_ it to work, can be quite hard. Add
several for-loops and you get a true mess.

Some people get really good at cranking out these long functions and as you
say, it look superficially easier. I wonder if a language could support
"functional abstraction brackets" that would allow these long sections but
guarantee their logic isn't folded together

Something like

    
    
          function big() {
              let x,y,z;
              {{@First_bit //begin forced separation...
              ...
              {{@Second_bit //begin forced
              ...
          }
    

Here "First_bit" would be guaranteed to be equivalent to functions of just x,
y, z.

~~~
dasmoth
If I'm understanding you right, "C-like-syntax" languages generally allow
this:

    
    
           function big() {
               let x,y,z;
               {
                  // First bit
                  let localToThisSection = 42;
                  ...
               }
               {
                  // Second bit
                  ...
               }
           }

~~~
joe_the_user
Yes, If your x,y, and z are both inputs and outputs, that works.

If you wanted a syntax that exactly mirrored what happens in the gp's second
example, you'd need more work. Plus if you force a begin-bracket to have a
label, you avoid the editor having to use "fun3756" when automatically
abstracting out the subroutine.

------
joemanaco
John Carmack on this topic:

[http://number-
none.com/blow/john_carmack_on_inlined_code.htm...](http://number-
none.com/blow/john_carmack_on_inlined_code.html)

And I easily agree 100% with him having tried both approaches in bigger
projects.

~~~
shoo
The anecdote Carmack shares from aviation is interesting:

    
    
      >Indeed, if memory serves (it's been a while since I read about this)...
      >
      >The fly-by-wire flight software for the Saab Gripen (a lightweight
      >fighter) went a step further.  It disallowed both subroutine calls and
      >backward branches, except for the one at the bottom of the main loop.
      >Control flow went forward only.  Sometimes one piece of code had to leave
      >a note for a later piece telling it what to do, but this worked out well
      >for testing:  all data was allocated statically, and monitoring those
      >variables gave a clear picture of most everything the software was doing.
      >The software did only the bare essentials, and of course, they were
      >serious about thorough ground testing.
      >
      >No bug has ever been found in the "released for flight" versions of that
      >code.
      >
      >                                                          Henry Spencer

~~~
FlyMoreRockets
Henry Spencer is one of the internet's greatest treasures.

------
sifoobar
Some things are better said in one chunk, there are no rules. I often write
longer functions while figuring things out, when I need more context in front
of me to stay in flow.

Then I wait, and wait; until I can't stand it any more, at which point I
usually know enough to substantially improve the code.

Assuming it survived that long. Refactoring code that you're going to throw
away is not very constructive. And the more effort put into making code
pretty, the harder it will be to let go when that's the right thing to do.

None of this works in a corporate setting, where there's never enough time or
money to do anything properly. There are no awesome short term profits to be
made from prototyping and keeping code in good health. Corporations also
usually prefer rigid rules over competence and intuition, which means whatever
cure they come up with will be worse than the disease. And as a result,
corporate code is usually crap code.

------
piinbinary
It is important to remember that functions exist for the sake of humans, not
computers. The computer wouldn't care if all the code was in a giant main()
function with GOTOs and a manually-managed stack, the programmers are the ones
who have a problem with this.

------
vincnetas
Item number 0 MUST be :

Make a copy of current function as old_function and add unit test to check if
input on refactored function still produces same results as old_function.

~~~
thanatropism
That's why you have version control and batches and fuzzing etc.

~~~
vincnetas
Not sure how version control can help here, as unit tests are not run on old
versions.

About batches, could you be more specific, i might be missing something here.

And for fuzzing, it will test your new function, but will not provide any
guarantees that your new functions is working exactly like the old one. You
might be surprised, bet there could be bugs in old function which can be fixed
by refactoring. Sounds nice? But might be some legacy codes is depending on
the bug that you fixed, and for backwards compatibility reasons that bug must
stay be renamed from "bug" to "functionality".

~~~
thanatropism
I meant "branches". Mobile devices speak for us, it's an unbearable situation.

------
cpeterso
The article mentions Steve McConnel's advice on parameter lists (from his
great book "Code Complete: A Practical Handbook of Software Construction"),
but left out some interesting commentary on function length from the same
book:

Some studies of code quality from the 1980s reported that defects per KLOC was
_inversely_ correlated with function length, leveling off around 200–400 LOC
per function. Software composed of many tiny functions is more difficult to
understand because there is more context "off screen" to keep in your head.

A Cisco study of code reviews found that 200–400 LOC is the limit of how many
LOC can be effectively reviewed per hour. Applying the findings of these
studies suggests that neither functions nor patches/pull-request diffs should
not exceed 200 LOC. FWIW, I have worked on commercial software that had
functions many thousands of lines long! :)

------
kyberias
I've seen my share of code bases and many times I've seen a long function,
I've made a number of other observations:

The long function has a lot of comments that try to describe the different
steps. The comments are often confusing because whoever wrote the comments
couldn't quite understand themselves what is happening.

There is a significant frequency of bugs found in the functionality
implemented or coordinated by that function.

The long function has poor code coverage.

For me, a long function is usually a code smell.

~~~
kraftman
Yeah often the comments are the names of the inner functions that should be
extracted out too.

