
An overview of the science on function length - tziki
https://softwarebyscience.com/very-short-functions-are-a-code-smell-an-overview-of-the-science-on-function-length/
======
freetime2
I find that a lot of junior developers tend to go overboard with creating
small functions, and the motivation is largely aesthetic. They have a mental
image of what “clean” code is supposed to look like, and it doesn’t involve
lots of indentation, curly braces, mathematical notation, etc. So they often
try to bury these messy details inside a function.

They don’t have a lot of experience reading and debugging other peoples’ code,
so they might not consider how much of a pain it can be to have to go jumping
through a whole bunch of deeply nested one or two-liner functions just to see
what the code actually does.

~~~
CuriouslyC
Small functions are a form of semantic annotation when used properly.

If I'm working with code that modifies an image, I'd much rather see "rotate"
than a matrix multiply. Heck, if I'm dividing a number by some special value,
say to normalize it, I'd prefer to see "normalize" than a mystery division. I
do agree that I'd rather see x + 1 than "addOne" though, since that doesn't
tell me anything the code itself didn't.

~~~
freetime2
If a block of code is short (a few lines or less), and doesn't get reused in
multiple places, I often prefer to add a clarifying comment rather than
extracting the code out into a separate function. That way they reader doesn't
need to go jumping around to see what the code does. And you can fit more
information on a comment line than you can in
aSuperDuperDescriptiveFunctionName.

------
sgk284
There are so many compounding factors here that the conclusion seems
unreliable. It also reminds me a lot of the streetlight effect[1], in that
defects may simply be easier to detect in smaller functions so that's where we
find them.

Many of the conclusions center around density and lines of code. This is weird
because if smaller functions lead to less overall code, then if you have 2
bugs in a 1,000 sloc codebase it'll measure worse than 6 bugs in a 10,000 sloc
codebase that does the exact same thing.

It might be more valuable to compute the defect rate per cyclomatic or
kolmogorov complexity. Or some other measure that's independent of line length
and then figure out how function size impacts those complexity measures.

[1]
[https://en.wikipedia.org/wiki/Streetlight_effect](https://en.wikipedia.org/wiki/Streetlight_effect)

~~~
leto_ii
I think you make a valuable point. If somebody writes concise, clear code with
little duplication, but still has a bug, their stats might look worse simply
because their code is shorter.

It's also not clear that bug rates should be the only/main thing to look at.
Complexity measures can also help to indicate if the code is easy to
read/extend. In my experience low cyclomatic complexity does correlate with
more readable code.

~~~
tsimionescu
> If somebody writes concise, clear code with little duplication, but still
> has a bug, their stats might look worse simply because their code is
> shorter.

Fortunately, the science here me to indicate the opposite: the less code you
have, the fewer defects per LoC you'll have. This is also indicated in the
article, which mentions that both function size and function length were found
to be correlated to higher defect density, meaning that both may in fact
simply be estimators of overall code length - that overall code length
correlates with defect density per LoC.

> It's also not clear that bug rates should be the only/main thing to look at.
> Complexity measures can also help to indicate if the code is easy to
> read/extend. In my experience low cyclomatic complexity does correlate with
> more readable code.

The article does actually cite some (necessarily small-scale) studies on this
as well, which have also tended to find that larger functions are easier to
read for debugging purposes, and that it was easier to add new functionality
to the system with larger functions; but that it was easier to modify the
existing functionality of the system with smaller functions. This is all in
the section entitled 'Practical Effects'.

~~~
leto_ii
> Fortunately, the science here me to indicate the opposite: the less code you
> have, the fewer defects per LoC you'll have

I judged things conditional on the fact that a bug exists. If you apply the
technique of counting bugs/loc blindly you may end up favouring longer, more
tangled code. Think about it this way: if your goal is low bugs/loc then when
a bug shows up you will be incentivized to fix it in a way that increases loc,
instead maybe of simplifying thing. This will lead to fewer bugs and lower bug
density.

------
tsimionescu
I think that this result is rather intuitive, even in the common paradigm of
'a function should do one thing and have a good name that describes that
thing'.

Essentially, the idea that code is easier to understand with small, well-
named, cohesive functions relies on the functions being bug-free themselves,
and on the relevant behavioral details being perfectly caught in the name and
types.

However, if small functions have bugs, it is pretty intuitive that it takes
more time to explore a call graph than a linear code listing to find that bug.

Furthermore, when chaining many small functions to achieve a more complex
functionality, bugs can more easily slip in the chain itself. For example, a
function may modify a list you pass in and also return it, when your chain
assumed it would return a copy. A sort function may be unstable when the
calling code assumed it was stable. A function may move to another thread when
the calling code assumes that locking is unnecessary.

All of these would be easier to catch if the code were inline, instead of
abstracted behind a function signature. Of course, this has to balance somehow
with not rewriting sorting procedures in every function of your code base.

I think that the answer probably lies with some idea of separating library
code and application code. Library code should be composed of many small
functions, well documented and well unit-tested. Application programmers
should all be familiar with the library and its semantics. In contrast,
application code should probably favor larger functions and avoid ad-hoc
abstractions. If there is a small piece of functionality that seems re-usable,
it should usually not be moved to a separate function, but to a separate
library.

~~~
leto_ii
I'm not sure the first part of the article is based on a sound methodology.
For example, we don't have the number of classes for each bucket in the plots
[1] and [2]; perhaps there are only a few classes of one liners - probably not
a statistically representative sample.

It's also not clear what type of code is counted. I'm not convinced
boilerplate (setters, getters, simple constructors) should be taken into
account. It will artificially decrease the average method length in a class.
Imagine a class with 10 fields (hence 10 setters, 10 getters, maybe a few
constructors) and one method of 100 lines. The average method length will be
120 (100 + 20*1 from the boilerplate) / 21 ~= 6 lines of code. If there are
bugs in this class, they basically have to be in the large method; still, in
statistics this would count as a class with pretty short methods overall.

I'm also surprised by what counts as a long method - in my experience code
starts getting messy with lengths > 20 - 30. These methods don't even seem to
be represented in the data.

> application code should probably favor larger functions and avoid ad-hoc
> abstractions

Why favor large functions? Maybe allow them, but why encourage? Also, we
haven't established what large is (20 lines - perfectly fine; 200 lines - I
don't think this can ever be justified).

> If there is a small piece of functionality that seems re-usable, it should
> usually not be moved to a separate function, but to a separate library.

This does seem excessive. Don't you think there might be code that can be
locally reused? Does every little thing have to go to a library?

[1] [https://i2.wp.com/softwarebyscience.com/wp-
content/uploads/2...](https://i2.wp.com/softwarebyscience.com/wp-
content/uploads/2020/08/Screen-
Shot-2020-08-24-at-9.13.53-PM.png?resize=768%2C564&ssl=1)

[2] [https://i1.wp.com/softwarebyscience.com/wp-
content/uploads/2...](https://i1.wp.com/softwarebyscience.com/wp-
content/uploads/2020/08/Screen-
Shot-2020-08-24-at-9.15.45-PM.png?resize=768%2C563&ssl=1)

~~~
tsimionescu
Your points about getters and setters are valid, and may be skewing the
results for Java especially. However, I believe they should be expected to be
skewing them in the direction of improving the numbers for very small methods,
not harming them.

> I'm also surprised by what counts as a long method - in my experience code
> starts getting messy with lengths > 20 - 30. These methods don't even seem
> to be represented in the data.

This is a good point, and I believe it is the main reason the article talks
about Very short functions being sub-optimal. I think almost everyone wod
agree that a function in the range 1-5 lines of code is very short, while
deciding if a 20-Loc method is short or long is going to be more disputed.

> Why favor large functions? Maybe allow them, but why encourage? Also, we
> haven't established what large is (20 lines - perfectly fine; 200 lines - I
> don't think this can ever be justified).

My idea is to favor expressing serial logic serially. That is, if you have to
do A then B then C then D, prefering to write it that way, rather than A, then
foo() [which does B and C] then bar() [which does D]. This implicitly favors
large functions over short ones, assuming that the code is required at all, of
course. Short code is still preferable to long code, but at the module level,
not function level.

> This does seem excessive. Don't you think there might be code that can be
> locally reused? Does every little thing have to go to a library?

There might be, of course, this is not about strict commandments. But I think
that looking at many code bases, there are numerous functions called only
once, which essentially act only as comments. A lot of the time, it may be
worth it to refractor the code more deeply, and instead of simply extracting a
piece of a larger function into a non-reused smaller function, to try to
extract some common pattern into a library, and to keep the business logic in-
line in the original function.

Just as an example, if you have something like

    
    
        for(int i=0; i <n1; i++) {
            for(int j=0; j <n2; j++) {
              arr3[count++] = arr1[i] + arr2[j] ;
            }
        }
    

As part of a large function, instead of extracting this into an
'addArr1AndArr2' function, to extract the general pattern and leave so thing
like this in the original context:

    
    
        arr3 = zip(arr1, arr2).map(a1,a2=>a1+a2);
    

Of course, this will not always be possible, and sometimes a function can just
become too long even in serial logic. But a lot of the time, the effort of
reducing line count by extracting or applying library functions will be more
worth it than simply reducing function line count by more simplistic
refactoring.

------
asgard1024
I am glad that somebody is trying to challenge the conventional wisdom.

This might also be relevant: [http://number-
none.com/blow/john_carmack_on_inlined_code.htm...](http://number-
none.com/blow/john_carmack_on_inlined_code.html)

------
ajmurmann
I saw this discussed elsewhere where it was pointed out that Java has a
tendency to have objects with lots of small getters and setters. This might
effectively be a proxy metric for objects that do too much and expose too much
of their state.

------
cjfd
When there is a correlation between A and B it could be that A causes B or
that B causes A or that A and B have a common cause. It could also be that the
correlation is just a coincidence.

I tend to write longer functions when the thing being done is simple and
shorter ones when the thing being done is complex.....

------
outsomnia
It's true one huge function repeating things and excessive fragmentation into
subfunctions are both less than ideal to maintain and there's a desirable
middle ground. But

>> Very short functions are a code smell

that's definitely not unconditionally correct. There are many cases where
dereference helpers of the form x_to_y(x), wrapping type conversions that
might go through three or four levels of struct members are going to eliminate
mistakes in bulk code. They may be done as preprocessor defines or as inline
functions but either way they are usually a green flag, not a red flag.

~~~
eesmith
You quoted the title, but the title isn't the complete conclusion, which is:

> As such, software developers should be wary of breaking their code into too
> small pieces, and actively avoid introducing very short (1-3 lines)
> functions when given the choice. At the very least unnecessary single-line
> functions (ie. excluding getters, setters etc.) should be all but banned.

I believe a x_to_y(x) fits into the "etc." of acceptable single-line
functions.

------
neilwilson
Does anybody do coupling and cohesion any more?

That was always the dividing line I was taught. A cohesive function you could
name to increase the abstraction level and stop your brain overloading.

~~~
brundolf
Something I didn't appreciate until several years into my career is that even
decoupling has a cost. By adding a layer of abstraction, you're making a bet
that the mental overhead of tracking that new idea is smaller than the mental
overhead of tracking its implementation directly. You're introducing a _new_
concept that the programmer didn't have to think about before, in hopes that
it allows them to _ignore_ a larger set of information some of the time. For
tiny functions, especially if their concept isn't already familiar to the
programmer, this may often not be a good tradeoff.

------
roughly
I'm unconvinced.

Two notes - first, the experimental evidence seems to involve taking people
unfamiliar with the codebase and asking them to debug or investigate code with
either long or short functions. I can certainly conceive of it being easier to
learn and debug a single long function than multiple smaller ones if one is
unfamiliar with the codebase. I'd expect that effect to go away once one is
asked to debug the same code again.

Second, the article asserts more bugs are found in shorter functions. However,
this is only taking lines of code into account, not functional complexity. As
code lives, bugs are found and corrected; often those bugs are to do with
unforeseen runtime circumstances. Joel Spolsky covered this memorably a couple
decades ago [1]. Again, taken as a static snapshot, I am not surprised that
short code correlates with bugs, but I don't think that tells us any
meaningful information about how to code. A codebase is not a static entity.

As pointed out by several people already, the point of short functions is
abstraction and comprehensibility - we want one and only one place to talk
about reading a file, one and only one place to talk about handling a
particular message type. Similarly, we want to know when we look at a function
that it does one and only one thing, and we want to know when we look at our
code that what it says it's doing is what it's actually doing.

I think this is one of those studies similar to the "judges sentence harsher
before lunch"[2] study - the type of effect discussed fails under any
reasonable comparison to commonly-experienced reality. Consider your own
personal experience working in codebases and how this article aligns with that
- phenomenological reporting can indeed be suspect, but when it contrasts
starkly with experimental results it often indicates poor study design.

[1] [https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

[2] [http://nautil.us/blog/impossibly-hungry-
judges](http://nautil.us/blog/impossibly-hungry-judges)

(I hope I can use this level of motivated reasoning to critique the next study
I agree with, too.)

~~~
legulere
> the point of short functions is abstraction and comprehensibility

If you have clean code style short functions you get nothing of that. There
you split functions into more functions just for the sake of it. You end up
with function names that describe the code worse than how the code describes
itself, similar to comments beginner programmers make. Why is a comment
"Increments i by one" for i++ not ok, while a function "incrementIByOne" is
supposed to be good style?

Even if you don't make functions clean code-style ridiculous short,
abstractions still come with a cost. You have to do weigh costs and benefits
each time.

~~~
Someone
_“Why is a comment "Increments i by one" for i++ not ok, while a function
"incrementIByOne" is supposed to be good style?”_

It isn’t, but _nextOrderNumber_ could be, especially if it is used in multiple
places. That function could easily evolve into _doing_ “increment i by one,
atomically”, “generate a random UUID”, “get a new unique ID from the
database”, etc.

------
jake_morrison
This post describes a process of decomposing code into very simple functions:
[http://www.gar1t.com/blog/solving-embarrassingly-obvious-
pro...](http://www.gar1t.com/blog/solving-embarrassingly-obvious-problems-in-
erlang.html)

This results in functions which are, I feel, a bit too small for comfort. On
the other hand, Erlang's pattern matching on function parameters has an effect
similar to Eiffel design by contract. It has the potential to reduce the
number of tests that need to be written (many of which are often very short
functions).

Ideomatic Erlang/Elixir code tends toward smaller functions and pattern
matching on multiple function heads instead of if/then/else logic.

------
aaron-santos
Is it difficult to believe that functions can be "too small"? I've found
myself in both code bases with enormous 5kloc functions as well as code bases
where I had to juggle 10 1loc functions in my head in order to figure out what
the hell was going on. Both are their own forms of torture.

A 5loc/function sweetspot sounds about right.

~~~
christophilus
Extremes are generally suboptimal, but the prevailing wisdom seems to be to
have many small functions, so I’m all for a study that pushes us back to the
middle.

------
awinter-py
Hmmm I wonder if test coverage is a factor here

One long function gets run every time you change any piece of it, whereas a
more complex call graph can be edited without testing every permutation

(yes there can be branches in the function too)

------
gumby
I lop Ike this author:

> Time to scare the wikipedia editors among you and do some original research.
> All the code and data can be found at...

------
tcldr
By this measure, Haskell is an abomination and Combinators are a 'code smell'.
Not sure I agree.

------
zelphirkalt
The heading is sort of misleading.

> To put it plainly, if we have a long function and split it into smaller
> ones, we’re not removing a source of defects, but we would simply be
> switching from one to another.

Of course ... But when you use short functions, you try to abstract things and
make these functions usable in general cases. Who thinks, that simply
splitting up a long function with zero changes fixes any bugs? Hopefully very
few people. The only advantage in that case would be, that one can think in
the context of that smaller function about what it is doing and that might
help to find a bug _and then fix it_ by changing the logic. By then we have
done already more than simply splitting up the long function though.

Long functions usually do more than one thing to achieve a more complex thing,
which requires doing multiple steps. That makes them less reusable. They
become too specific in what they achieve. If ones naming skills (those are
very important) are not up to the task of naming long functions precisely,
then perhaps one will forget, that this long function does something very
specific inside to reach its goal. It will silently have lost composability.
Other developers for sure will not know, unless they read all the code. I have
seen atrocious 300+ loc functions, which serve exactly one purpose and
interact with so many parts of the system, that I can read them 10 times and
still do not get what they do in their entirety.

By not using small functions, one gives away readability as well. Every
function is a chance to give a name to some short program. Short functions can
be looked at separately, if written well. Of course, if you modify a lot of
global state in your short function then no one can help you any longer.

> All of the studies measuring defect density found increased defect density
> for smaller functions. One possible explanation was suggested by [2], who
> proposed that the increase in errors was due to what they called “interface
> errors” – that is, “those that were associated with structures existing
> outside the module’s local environment but which the module used.” This
> would include errors such as calling the wrong function.

Well, that happens usually, when the naming is off or difficult to remember or
follow. That is why naming things is an incredibly important skill. The name
of a function should, if possible, give a good picture (by convention if there
is any) of what the function is doing. Do not tell me, that the 100 loc
function can be described in a single verb. It is most likely doing much more
than one thing.

So in general I am not convinced by how the title and intro of the article put
it. If we look at the article, the title is also actually different than here.
It says "Very short [...]" not "short".

They would have to limit the scope of their study a lot, to make a good point.
For example:

(1) procedural code / mainstream "every noun is a class" code / functional
code

(2) the kind of expertise of coding people allowed to take part (Was the code
written by capable people?)

(3) the programming languages they look at

It seems that this research might be biased by what code they looked at. Some
programming communities are not as well known for keeping code clean as
others. They are often associated with "only learned one programming language
ever, never widened their horizon to a different methodology or paradigm".
Those are mainly found in languages, which are very widely used and up at the
top of the programming language usage lists like TIOBE. The reason is, that
being at the top of the list is used by many as a justification to not needing
to learn a different tool. Many people stop learning. Here I quote the
article:

> However, nowadays the vast majority of functions are under 50 lines. A quick
> analysis of Eclipse, a popular open source IDE, reveals it averages about
> 8.6 lines per method in its source code.

Relating to an IDE, which is mainly used for Java.

> This shift in function sizes is perhaps partially due to changes in
> programming languages. In the 80s a Fortran “module” was commonly considered
> a function and some variables (see eg.
> [https://www.tutorialspoint.com/fortran/fortran_modules.htm](https://www.tutorialspoint.com/fortran/fortran_modules.htm))
> and function was the basic building block of software, whereas nowadays most
> Java or C++ programmers would define “module” as a class consisting of
> multiple functions.

They do that, because for a long time their languages were so limited, that
they did not have any other means of expressing a module. This also
discouraged a style which does not use objects, but only functions. Add to
that, that Java (and I think neither C++) does not have TCO and you have yet
another reason, why Java or C++ code is not a good argument, when talking
about short functions. Java got modules now, as far as I know. Not sure
developers make proper use of them.

I would like to see the same research being done on a language like Haskell or
Scheme, one language at a time.

~~~
tziki
>So in general I am not convinced by how the title and intro of the article
put it. If we look at the article, the title is also actually different than
here. It says "Very short [...]" not "short".

The "very" word was automatically stripped by hackernews. I'm guessing some
system to tune down clickbait posts?

>By not using small functions, one gives away readability as well. Every
function is a chance to give a name to some short program. Short functions can
be looked at separately, if written well. Of course, if you modify a lot of
global state in your short function then no one can help you any longer.

I disagree about readability - I think the empirical experiments show enough
evidence about short functions not being more readable than long functions,
that we can't take that as granted.

For reusability, you might be right, and maybe that's why short functions did
seem to perform better when modifying existing functionality.

>Well, that happens usually, when the naming is off or difficult to remember
or follow. That is why naming things is an incredibly important skill. The
name of a function should, if possible, give a good picture (by convention if
there is any) of what the function is doing. Do not tell me, that the 100 loc
function can be described in a single verb. It is most likely doing much more
than one thing.

Sure, but naming things well is also a difficult skill, which means not
everyone can do it well. If we look at programming as a worldwide phenomenon,
the vast majority of programming is done in English. However, the majority of
programmers aren't native speakers, meaning they face an even steeper climb to
become good at naming things. There will always be cases where programmers
misname functions, or don't update the name when changing functionality, or
typo, and I think the data here shows that things like that do happen.

>I would like to see the same research being done on a language like Haskell
or Scheme, one language at a time.

I do agree that all the research focuses on mainstream languages, and the
results might be different for functional languages. However, the point of
this post is to look at science and there's simply no science there. There
aren't any bug datasets focusing on functional languages either.

The point of this post is to gather what we know about function length and
what we don't. I feel you've pointed at many of the areas we don't know much
about - and maybe we'll get there one day. However, for now, we need to look
at what we know and what kind of generalizations we can make from what we
currently know.

~~~
jghn
> I think the empirical experiments show enough evidence about short functions
> not being more readable than long functions, that we can't take that as
> granted.

Pure anecdata on my part, my observation over the years is that developers
tend to cluster on this cluster with high correlation to if they're a top down
or bottom up thinker when it comes to coding in that top down devs tend
towards smaller functions and bottom up devs tend towards larger functions.

If there is in any validity to my anecdata, my hypothesis has been that the
top down/smaller function crowd is more likely to trust underlying
implementation of sub-functions for one reason or another, while the bottom
up/larger function crowd wants to be able to visually verify how things work.

