

Are short methods actually worse? - pdubroy
http://dubroy.com/blog/2009/03/09/method-length-are-short-methods-actually-worse/

======
calambrac
Something that's important to remember about Code Complete (which is great,
I'm absolutely not knocking it at all) is that it's written with languages
like C, C++, and Java in mind. In C and C++ particularly, 'shorter function'
might be a euphemism for 'clever function', hence the higher defect rate and
increased difficulty to comprehend.

More expressive languages (by definition) shouldn't have the same length; if
you're writing a 200 line Python method, you are most definitely Doing It
Wrong.

Here are some of my rules of thumb for breaking things up (absolutely not
independently arrived at, by any means):

\- Loop bodies are good candidates for standalone functions.

\- Everything in a function body stays at one tier of abstraction. If I find
myself doing something at a lower level than the rest of the function, it's
time to break things out.

\- If I find myself using the word 'and' to describe what a function does, and
it isn't pretty high up the abstraction ladder and primarily just driving
calls out to other functions, it probably needs to get broken up.

\- I'm not formal about it, but my mental metric for my code is to minimize
'total loc + total indentation * some constant', meaning I'm willing to trade
loc to reduce nesting, to a point.

\- I like functions to be no longer than a screenful. That's a max; most of
mine are shorter than that.

------
ams6110
I've seen many routines (functions, methods, procedures .. whatever) that are
needlessly long due to overzealous use of the copy/paste technique. But once
you eliminate that kind of stuff, I do agree that you can get to a point where
you've abstracted too much -- and the actual thing you are trying to
accomplish with your code becomes obscured.

I have not read these studies, but to me it seems that the defect rate could
rise because it's no longer apparent what's interacting with what, and in what
order. If I have to hop around to numerous different 5 line methods (often in
numerous different source code files) it becomes very hard to intuit whether
the code is "correct" or even have any idea what it's doing.

~~~
russell
I just had to fix the same bug in 4 places because of someone's use of cut and
paste. (Actually, refactored, but you get the picture.)

~~~
silentbicycle
Generally, _cut_ -and-paste isn't the issue, _copy_ -and-paste is. Granted,
you can cut and paste paste paste ..., but cut-and-paste usually means moving
things to a more sensible place.

The biggest problem is when a code blob is duplicated all over the place and
(worse still!) the copies slowly drift out of sync, making them hard to
reunite. I've been working on a program off and on that tries to detect this
in a language-agnostic manner, but I've hit a couple dead ends. (I half expect
somebody with a more orthodox CS education to take one look at it and go, "Oh,
that's NP-complete," but I haven't let it stop me - it's been a great way to
learn about graph algorithms.)

~~~
russell
I meant copy and paste.

As you have discovered the problem is complex because of the explosion of
compares if you try to do it in any general way. It would be useful to query
"is there any other code like the section that I selected?" The copied code
usually has changes in things like constants and argument. If you abstracted
away constants it might be easier to compare. Just a suggestion.

~~~
silentbicycle
Right. Allowing preprocessing with a syntax-specific extension or (ideally)
scanning on the AST would be more interesting - code can have be surprisingly
similar except for variable names, types, and other semantic annotations.
Rather than calling them "patterns" and prizing them, sometimes abstracting
them away would be a better approach.

My main focus is on curing the heartbreak of copy-and-paste-programming, but
I'm sure it would have more general applications (if I ever get it out of
quadratic performance, etc.). Unsurprisingly, what is "similar enough" can be
a very slippery concept.

~~~
scott_s
Doing it on the AST should help performance. Even if you're still using an N^4
algorithm, the AST should be significantly smaller than the string that
represents the code.

And, as it sounds like you realize, this will catch cases where variable names
have been changed, but the structure is the same.

~~~
silentbicycle
Right, and there would also be more semantic boundaries, (possibly greatly)
reducing the size of the overall problem space. If (by sheer coincidence)
there was a token overlap between the last two lines of one function and the
first two of another, that's probably not relevant in the same way a copied-
and-pasted version of the same function in two places would be.

~~~
scott_s
The AST also makes it easier to use heuristics based on semantic information
to shorten the number of subtrees you're concerned with (previously
substrings).

For example, if in a C-like language, you could decide to only look at
subtrees that are at least complete statements, or go to a higher granularity
and only look at compound statements. You could also eliminate common idioms,
such as

    
    
      for (i = 0; i < N; i++)
    

Which will pop up all over the place and give false positives.

------
huherto
It is not so much about the length of the method. But every method should do
just one thing and it should be clear what it is. That makes code more
understandable and reusable. Probably as a consequence methods are shorter.

~~~
mdemare
I read the same thing in "Clean Code" (amazon: <http://is.gd/mJ8g> )

I have two objections: First, what is "one thing"? Render a Doom 3 frame? Or
increment the "i" variable? The "Clean Code" author argued that a log
statement or a try/catch statement deserved its own method.

Second: says who? Where's the evidence that doing "one thing" is better than
doing more things?

In general I notice that there are many rules and best practices in
programming, without a lot of evidence, or even discussion of the drawbacks.

~~~
azanar
The first of your two objections is perhaps a confusion of the word best
practice. It doesn't mean ultimatum as much as it means something seriously
worth considering.

What constitutes "one thing" is a judgment call, and it varies tremendously as
should be evident from all the other replies here trying to define it
concretely. I imagine the definition changes a lot depending on how one's
catalog of discrete things change. The point is not to define this "one thing"
for you, it is to get you thinking about it.

A lot of the best practices that have been put forth are the empirical outcome
of repeated judgment calls by others. They notice a trend in their own
practice, and put it forth as being more of an emergent principle than a
coincidence. They might evangelize some, but it is a sign they are excited
about the idea. It doesn't mean they don't know there are drawbacks; they
might want to float the idea first and see if their concerns are legitimate.
If I floated all of the possible drawbacks of ideas I've had, I'd probably
scare people away, even though I overplayed a lot of them.

Oh, and for the second objection, there is data. I don't have it in front of
me, but I _believe_ that Steve McConnell cites data within Code Complete about
the length and scope of methods relative to error rate. Granted, one citation
does not constitute proof, but it's a sign Uncle Bob isn't alone on his
soapbox. :-)

But then, the point of all of this evangelizing, really, is to get you off
auto-pilot, and thinking about your work, not to give you the definitive
answers to everything. That you asked this question shows you're thinking
about it. Work hard to keep doing this, it's more important than any rule or
best practice.

~~~
tonystubblebine
You are right that the place to go for data is Steve McConnell's Code
Complete. He does a great job of summarizing whatever research is available.

------
neilk
I wonder if the same stats apply to stateful versus stateless languages.

I'm working a lot with PHP these days (not my favorite language) and for
historical reasons we even avoid the use of objects. I've noticed that even
experienced practitioners in the PHP world write super-long functions. Forget
200 lines, more like 1000 or even 2000. When I try to refactor these, I
usually find that I start causing worse problems, passing values around all
over the place, using globals and so on.

It seems to me that if your language is relatively underpowered, a long method
can help contain the messiness.

~~~
randallsquared
Sometimes there's a local minimum, where you really need to rethink the whole
chunk you're working with to come up with a better design. If you're in this
situation, which you can get to gradually, there's might be no easy way to
refactor a piece at a time. But this isn't (ever in my experience, at least)
because a super-length function is really the best solution, but because the
design wasn't great in the first place.

~~~
eru
When I last tinkered with PHP they did not have lexical scoping. That makes
refactoring stuff out into separate functions a pain in the head.

~~~
neilk
They still don't.

And, as I pointed out above, we don't even use classes. This might seem crazy,
but this project was started before PHP had actually decent objects, and now
all the special workarounds have become part of the house style. So I'm living
in a world of pure procedural programming right now. It's the 70s again.

------
nickh
In my experience, there's a "sweet spot" for how much functionality a method
"should" have.

If a method is "too short", it: * may result in methodA calling methodB
calling etc., increasing complexity; * may not be readily apparent how it fits
in to the problem that's being solved.

If a method is "too long", its: * complexity increases; * ease of testing
decreases; * frequency of reuse decreases.

Obviously, some methods have little functionality, and thus require very few
lines. For the majority of the other methods, though, I tend to keep them
below 25 lines. If a method grows past 15 lines, I examine it to see if
there's any functionality that feels like it should be broken out into a
separate method.

------
curtis
I think focusing on method length is focusing on the wrong thing. Some of the
worst long methods I've seen weren't bad because they were long, they were bad
because they had about eight or nine levels of nesting. When a method ends
with

    
    
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    

you know you've got a problem. Breaking such a method up into smaller methods
would make each shorter, but the big win would come from eliminating the deep
nesting.

~~~
michael_dorfman
Fortunately, we've got McCabe's Cyclometric Complexity to guide us here.

------
russell
I have two perspectives. Back in the 80's, I and two others developed a word
processor. We aggressively refactored the code do that the average function
length was less than 10 lines. The code was smaller, faster, and supported
more users than any of out competitors. The key was there were only 3 of us,
who knew the code thoroughly.

Fast forward 20 years and I an flying solo on a vast
JBoss/Spring?Struts/Hibernate project. I just added a feature with 6 lines of
functional code, along with another similar piece in a DAO to get the data out
of a database. Essentially, I was mapping a key to an inventory item to a key
to a document. The feature was about 80 lines because of Java baggage and
Spring injections. In addition I had to modify another 10 files of interfaces,
implementations, struts actions, DAO's, a jsp, struts xml, and spring xml.

Another place where there is a functionless proliferation of small routines is
caused by the java method overloading model. A common case is to have a method
with several parameters, most of which have common defaults. You often end up
with 4 signatures, but can easily end up with more if you support, say, the
name of an object as well as the object itself.

------
geebee
I haven't read the studies referenced here, but I'd be concerned about
mistaking correlation for causation here. Long methods may or may not be bad,
but the temptation to write them usually arises when the programming task is
already complex. The error rate may reflect the complexity of the task, rather
than the approach (long methods vs short ones).

~~~
rgoddard
If I am making a change to long method I can understand what this method is
doing. I am more likely to take care in making changes because I can see the
potential problems and I have a visual indication of the complexity of the
code. Also I might be less likely to make changes unless necessary because of
the complexity.

When working on a shorter method, changes are easier to make and the context
and complexity are hidden. The side effects of the changes are harder to see,
so I am more likely to make a change which can break things and not know that
it will cause problems.

~~~
geebee
Unit testing _should_ provide a measure of protection against this kind of
problem (side effects of changes). The actual experience of writing, updating,
and running the tests adds its own complexity, though, so I'd view them as a
big help, but not an easy to apply solution.

------
10ren
I was also struck by seeing actual studies cited, in the usually fashionable,
religious and cliquey schoolyard of software development.

I wonder why there aren't any studies more recent than 1991. That's 18 years
ago! Before Java, before C#, before Ruby, before JavaScript, before PHP (C++,
Perl and Python had already started).

Perhaps because everyone involved has a barrow to push (including academia),
no one has an interest in undertaking impartial and objective research - after
all, it might give the wrong answers.

------
chollida1
One idea I have about why 200 was chosen is that back then the languages that
were commonly used weren't as expressive.

You can often say the same thing in Python or Ruby in fewer lines than you can
in Cobol, Fortran or C.

Or put another way, of those 200 lines of Cobol maybe 1/2 to 3/4 are setup
code that the language makes you write inorder to be able to do what you want
vs say a Python where a smaller percentage of the code is overhead vs doing
what you set out to do.

~~~
pdubroy
I'm not positive but I think that there have been studies that show that a
programmer's output in LOC per day is fairly constant no matter what the
language is. I could imagine this being similar -- maybe about 150-200
"operations" is the right number, no matter what level of abstraction those
operations are at.

That's an interesting point about the setup code though. When you have to do
explicit memory management, there may be a greater cost to dividing a long
method into several smaller ones.

In general, splitting up a function into smaller ones might just have a
greater cost in those languages. It's also possible the development tools are
a factor -- it's a lot easier to jump between various methods in Eclipse than
it is in vi (for me, anyways).

~~~
chollida1
> I'm not positive but I think that there have been studies that show that a
> programmer's output in LOC per day is fairly constant no matter what the
> language is.

That may be true, I have no idea. But the point would be that you can do so
much more with some languages than you can with others.

To make an extreme example compare writing a factorial function with Haskell
and assembler.

------
tel
Attributed to Einstein:

"Everything should be made as simple as possible, but not simpler."

(Also, is it just me or does a result like "long methods had to be changed
least" seem both rather spun and probably two-edged?)

~~~
sketerpot
That's the best phrasing of Occam's Razor I've seen in a while.

------
njharman
I try hard to stick to these "rules" for determining method/function size.

1) Keep methods/functions shorter than one "screenfull".

2) If I code the same or substantially the same logic twice, it gets extracted
to it's own method/function. [I also do the opposite, if method/function is
used once I remove it and inline logic.]

3) Logic extraneous to the problem at hand gets extracted into
method/function/language feature such as macro's or decorators if it takes
more than a couple lines. (logging, safely opening a file, wrapping in a
try/catch, etc)

Less often:

1) Logic that needs it's own unit test gets extracted.

2) If a comment precedes a block of code saying what it does. extract the
code, name method/function after the comment, delete the comment.

    
    
        # load the exception records from exception file
        bla bla bla for 10 lines...
        
        exceptions = load_exception_records_from_file(fh)

------
sofal
If the methods have a clear purpose with well-defined inputs and outputs and
no side effects, then I think the trade-off is generally in favor of smaller
methods. Once you introduce side effects, longer methods then have the
advantage of showing you all the side effects in one place.

~~~
stewiecat
Great point. I've seen that developers that really understand OO tend towards
smaller, more encapsulated classes/methods w/o side-effects whereas developers
with a procedural background tend towards longer 'void' methods that lump
side-effects into one place.

------
jderick
I think these results may be misleading -- most the methods I know of that are
>200 lines are implementing some kind of large case statement. Often, these
are auto-generated files, not hand-coded ones. So it may not be a fair
comparision.

~~~
waldrews
Agreed; selection bias - presumably the code base analyzed was made by
experienced programmers with instincts tuned to creating smaller functions.

When we do observe larger functions, there must have been a specific reason to
do it, like a big case statement or text generation. The programmer is making
an exception, and there's probably a structural reason for it, the logic is
probably simpler, and extra attention will be paid.

------
silentbicycle
This depends a great deal on the language used.

Breaking a large function apart into several smaller pieces makes testing them
easier, and (even more importantly) lets one _name_ them. A well-named
function is generally better than a block of code in a function with a one-
line comment before it, IMHO. (Of course, there's nothing stopping you from
giving all of functions names like doThatStuff() besides the eternal scorn of
your peers.)

It can be annoying in languages that require a lot of type annotations just to
declare a function and have poor support for nested functions, though.

------
djahng
I wonder why shorter methods would be inversely correlate with errors. Maybe
it's, to some extent, similar to why so many people get confused by pointers:
they lose track of where they are in the program.

------
stewiecat
One aspect this piece doesn't mention are unit tests. I've found that writing
short to-the-point methods helps with testability(?). I just wrote some unit
tests the other day for a 100+ LOC java method. 300+ lines of test code to
setup the massive amount of dependencies and go through all of the if- and
for- statements. Would've been a lot easier if it was broken apart into
smaller discreet units.

------
code_devil
Methods should do ONLY ONE thing.

Example: A program to withdraw money from a Bank Account.

getBalance() // Does One Thing

if(getBalance > withdrawAmount) {

    
    
        subMoney() // Does One Thing
    

} else

{

Print ("Insufficient Funds");

}

Other thing, I would like to point is that if you are making the code of your
function visible, then it should be properly documented and readable. The
compilers do a really great job of Optimizing the code, so there is no need to
make the code look smart and have no one understand it. The only place I would
put real smart code is when I am coding in assembly. In that situation, there
should be really good documentation explaining the reason of your smartness.

Example:

A = 4

B = A X 2

[Smart Code: Right Shift A, but it's not always obvious that you are trying to
multiply by 2 here, unless you have an explicit comment]

------
jimrandomh
The correct size for functions is as short as possible without adding unneeded
complexity, and no shorter. A piece of code deserves to be a function if you
can coherently explain what it does using only terminology from the problem
domain, or if having it significantly simplifies other code. Otherwise, leave
it inline.

------
maurycy
It depends on a programming language.

In more dynamic programming languages it is easy to create a method that deals
with abstract objects, eg iterative ones, and does not have to hold too much
knowledge.

To say nothing about easiness to test a short method that does only one thing.

------
awt
I find long methods hard to read, especially if they have a lot of conditional
logic. Recently, I encountered some long methods that had very generic names,
such as "processData." One of the advantages of smaller functions/methods is
that you can give them better, more accurate names since their scope is more
limited. If one is unfamiliar with the code, this is a tremendous help. There
are few things as daunting as 100 lines of code with 10-20 different variables
all in the same scope. In such cases, I have to load it all into my head and
break it into smaller pieces to understand it anyways.

------
keefe
Short answer? No.... each method should have a clearly defined concern
separated from the concerns of other methods with an obvious, documented and
tested contract

------
gord
The classic counterexample is not using 'i' in for loops, vis -

for(int nInnerLoopCounter; nInnerLoopCounter<nMaxInterations;
nInnerLoopCounter++) ...

Seriously, had this in a formal coding standard, you never want to work in
those places - so make sure you go straight from university to a
{lisp|ruby|python|linux|YC}startup.

------
psranga
Forcing yourself to write short methods forces you to design your solution
cleanly. I wasn't a fan of short methods, until I observed a master
programmer's (who was also a co-worker) work for a while. After I adopted this
style, I've found my own code improving. Ymmv.

------
time_management
I prefer small-function programming, but I think there are needed caveats.

Code reuse and easy testability are obvious upsides of small-function
programming but, most importantly, interactive development is easier and more
fun. You can test and play with the code while you write.

One downside on the human-readable end of things is that small-function code
can easily devolve into spaghetti code if it's not done right. You actually
have to think about _where_ your functions are in the physical file, and have
them logically organized, in order to have legible code.

Appropriate line length of a function varies wildly by programming language.
In Lisp or Haskell, a 25-line function is usually considered a "monster" and
should, if possible, be broken up, since a lot is happening in those 25 lines.
Whereas a 25-line Java function is nothing unusual.

