
Ask HN: How do you deal with 2000 line functions? - pinkunicorn
I&#x27;m working with an archaic code base and I&#x27;m stuck at trying to add new business logic into a 2000 line function that is absolutely crappily written with nothing but if() conditions in it. Given such a scenario, how do I go about adding a new functionality to this mess? Should I tell my manager that the function needs to be refactored and take time to refactor it?<p>Edit: Removed offensive word
======
ndesaulniers
1\. Unit tests. Otherwise modifying old code is full of potentially unforeseen
landmines.

2\. Read the function a few times to get a sense of what it's doing.

3\. Break the function into blocks of statements that are performing a related
task.

4\. Factor those out into smaller functions.

5\. Recursively repeat 3-5 until you're satisfied with the new functions' line
count.

> Should I tell my manager that the function needs to be refactored and take
> time to refactor it?

It's absolutely worth mentioning. Time estimations are one of the harder tasks
of Software Engineering. Being up front with your manager can help better set
expectations.

~~~
rrauenza
I would add assertions in as well as you iterate. It will validate whether you
understand the flow of information through the function.

~~~
phpnode
Absolutely this. Not only do assertions make it easier to validate your
understanding, they capture and preserve that understanding for future
developers (including yourself, 6 weeks from now).

Assertions are comments which can always be trusted and which never get out of
date. It baffles me that there's so much reluctance to use them (especially in
dynamic languages).

------
tinco
Don't tell you manager anything, just do it. If someone tells you to add
functionality to a 2000 line function, it takes extra time. If you want the
company to benefit a little bit more from the time you spend on grokking it
(by saving the next poor soul some time) by splitting it up.

Go to the part that you suspect you need to modify, identify a block of code
that together forms something you can give a name, i.e.
"initialize_doodad_struct", "validate_widget_params", pull it out put it in a
function. Now look at all identifiers that are referenced or declared in your
block of code, do a ack or ctrl+f through the 2000-N line function to 100%
verify that they're not referenced anywhere. If they aren't you can safely
encapsulate them in your new function (if you're working in a static language
you probably could just right-mouse-click-extract to do all this).

Then _immediately_ verify that the code still works. Run it, ideally in unit
tests but most certainly also in real world(-like) scenarios. Do this for
every small step. If your codebase takes 2 hours to compile, instead make a
commit. Make a commit for every single 3-10 line extraction you do. (do it
anyway, it'll look good and everyone will love you)

And then repeat until either the part you wanted to modify is clear to you, or
there's nothing more to extract ;)

By the way, I'm a Ruby programmer mainly, and in Ruby any function over 10
lines is frowned upon. I know that in some programming language communities
(looking at you C and C++!) a 200 line function doesn't raise any eyebrows,
and they'll sometimes even scold you for extracting a function if it doesn't
'add' anything. I think this is because in C every function pollutes the
global namespace, so there's a little more cost associated with it.

~~~
radicalbyte
If you write C/C++ then you probably need fast code. Function calls - unless
inlined - have a small cost. The cost is really rather small (think memory
writes) and can be negligible depending on where the values are stored.

That is of course rarely the reason. From what I've experienced it's usually
that the developer is uncomfortable with abstraction (at least that was my
reason many moons ago).

~~~
joshka
If speed of algorithm / function is a concern on particular functions, write a
test that measures this. Refactor as much as you like until you hit the limit
of readability vs adequate speed.

------
mod
You test the shit out of it, then re-write it ensuring tests pass, then add
your new code.

It's probably easier to do that than to muck around in the 2000-line version
hoping you don't screw something else up.

~~~
NDizzle
Yep, this is the only way to handle these beasts. Don't touch anything. Make a
ton of tests. Spend some time validating those tests. Get someone to write off
that those tests are sufficient. Start refactoring!

~~~
falcolas
For a function with 2,000 lines of code, we have to be honest with ourselves
and accept that testing will never be sufficient; if there's 2,000 lines of
code, you can bet good money that there's global state manipulation as well.

Making the assumption that anybody is capable of sufficiently testing such
functions and subsequently re-writing them will only introduce new bugs and
old regressions.

Seems harsh, but I've had to work with a few such monstrosities. Global state
galore.

~~~
johnmaguire2013
Global state is one of the trickiest problems we are trying to solve at my
workplace. Our entire, 10 year old, PHP codebase relies on a single static
class (and now some supporting static classes) aptly called "Meta". What does
it do? Meta things.

It's basically an abstraction over the entire database layer, that has
probably over 30 toggles that denote where and how it should fetch data, given
a table name and a function call.

And every database call in the entire project is dependent on it.

------
falcolas
Slowly and carefully, like you're defusing a malfunctioning clockwork bomb.

1) Understand at a very fundamental level what it does (there are probably
multiple things).

2) Understand at a very fundamental level what it does (this is really the
hard part)

3) Document your hard-earned understanding (tests, ironically, do not work
well for this, since there are no fundamental units to test at, and there
likely is a metric ton of shared state)

4) Refactor it into something easier to understand and test.

I've had to work with a similar 5,000 line C function at the heart of a DSL
parser; literally the core of the entire business. Changes were made very
slowly, and with a lot of consideration. Refactoring is still underway 6 years
later.

------
kpil
Pft, 2000 lines :-)

What you do is that you:

A) Realize that there might be years of hidden bugs, but also fixes to complex
problems, speedups, kludges, weird changes to requirements, etc, hiding in the
code, so even really creative unit-test might not cover real-world edge-cases.

B) Take as small bit as you can, and just refactor out the smelly code into
smaller parts. Often a good name around smelly code helps a long way.

D) Iterate. Until good enough.But not more... Maybe you have more important
things to do than rewriting ugly code that works.

There is a big risk that since you probably won't understand everything that
goes on, and why - you will break something important.

------
mh8h
Michael Feathers' book, Working Effectively with Legacy Code, is highly
recommended. In that book he explains that you can choose between two
approaches: Change and pray, or Cover and modify. Then he provides dozens of
different strategies to "cover" the existing legacy codes before you make the
changes.

~~~
DanielBMarkham
+1 on the Feathers book recommendation.

The good news is that many, many, many people have been here before you. While
you may have a lot of work ahead, you can do it safely and in an orderly and
safe fashion. (And with something as butt-ugly as that sounds, that's the only
way I'd do it)

------
azeirah
Something I found to be really useful is to remove all statements inside the
code, so you'll be left with large skeleton structures. Look at it from a
distance, (small font), see some patterns. Can you split it up?

getting a feel for the structure of the function helps me understand it.

Ex:

function someFunction () {

    
    
        if () {
            for () {
                if () {
    
                } else {
    
                }
            }
    
            if () {
                for () {
                
                }
            }
        }
    
    }

~~~
pinkunicorn
Ah I do this with if() block folding in Vim.

In this case, the developer actually has return statements inside some if
blocks. A general point that has stood out from all comments is to first
understand how the function achieves what the function is trying to achieve.

~~~
odonnellryan
A return point inside an if statement is a great place to do a refactor.
Refactor anything inside that if into its own function, write a test, and see
what happens :)

------
yk
Depends, but very likely. There may be cases were there is practically no way
around a 2000 line function, one example would be a something like a very long
switch-case statement, were the logic is very simple, but there are a lot of
cases to cover. For example something equivalent to (python pseudo code)

    
    
         dict[ keyword](args)
    

were dict is a large dictionary containing function objects. In that case the
dictionary approach has the advantage of separating the logic from the
clutter, but it is possible to work with 2000 lines of

    
    
         case something:
             foo();
             break;
    

However if it is not such a case, then you are in trouble. Thing is,
management does not like to be told that the code is a steaming pile of shit,
and management is especially unhappy if you tell them that you are going to
spend a few month on accomplishing absolutely nothing. From their perspective
refactoring is accomplishing nothing: the functionality is there, it works and
they do not care about the gory details. So you need to convince your manager
that refactoring is a good idea, which just goes against every basic
assumption of his job. So think about the actual problems you have with the
code, and what the company gains by refactoring it. (Easier extensibility,
less risk of bugs and faster turn around if a bug occurs.) Then make sure that
you convince your manager that refactoring is the right thing to do. ( Ideally
you should enlist your coworkers for that, if they know the kinds of
problems.)

Speaking a bit more generally, this is part of why technical dept happens.
Selling refactoring is hard, so everybody tries to tip-toe around the 2000
line gorilla until some unhappy soul (read someone else) can no longer avoid
to wrestle with it.

------
lutorm
I like this book, it has a lot of tips for situations like these:

[http://www.amazon.com/Working-Effectively-Legacy-Michael-
Fea...](http://www.amazon.com/Working-Effectively-Legacy-Michael-
Feathers/dp/0131177052)

------
noonespecial
I've had worse. I was instructed to add to the > 2000 line function of nested
if/than/else but the function was written _by_ the manager who said "Don't
change any of my code, its been working perfectly".

This function was used as a container to hold _all_ the business logic and
flow control. It had dozens of parameters that all had to have some value or
another passed to keep from throwing undefined errors. There was an integer
called "wat_to_do" that would choose different blocks of if/than/else inside
the function. Not like case, mind you, scattered everywhere.

I've never really been the same since then. Can you get code PTSD?

------
joshka
A few people have mentioned Michael Feathers' book Working Effectively with
Legacy code already, but I'll add my piece on this. The book describes legacy
code as any code without tests.

It then goes on to describe the "legacy software change algorithm" (google
that for more articles that will help you. Get targeted legacy code into test
harness and cover with tests: 1\. Identify change points for the target change
or new code addition. a. Find test points where the behavior of the code can
be sensed. b. Break dependencies (very carefully and often without sufficient
tests) and get the targeted legacy code into a test harness. c. Cover targeted
legacy code with (characterization) unit tests 2\. Add new functionality with
Test Driven Development (TDD) 3\. Refactor to remove duplication, clean up,
etc.

It really is worth reading this book as it covers almost exactly your
question: "22\. I Need To Change a Monster Method and I Can’t Write Tests for
It." and various other relevant chapters.

Answering your manager question with a standard consultant answer "It
depends". Refactoring code is about setting yourself up for future
development. You should always be refactoring your code after you confirm it
works. Asking your manager whether you do it should take on the same amount of
importance as asking whether you should wash your hands after visiting the
restroom. That said, developers can often feel a sense of ownership of code
and not wish change for various reason (usually familiarity with the current
crap situation).

~~~
joshka
Additionally to this. In your refactoring process, stick a commit on each step
of the refactoring. It will help you go back to a point where you were happy.
Squash the commits when you're done if needs be for cleaner history, or leave
them if there aren't a bunch of people on the project.

------
davimack
I would actually approach this differently. I'd go back to whomever the
authority is with regards to what the function is supposed to do, or to the
spec, and then rewrite it from scratch. This won't ensure that it behaves as
it does now but with a little bit added - it will ensure that it behaves as
per the specification / user requirement. If the function is that huge, with
just if's, it's very likely that it's buggy and that those bugs haven't been
addressed. They won't be addressed if you duplicate the functionality in a
refactor unless you can get a handle on the purpose of the thing to begin
with. Scrapping it and rewriting it is not a bad thing - frequently, you'll
make something much tighter if you do that.

(note: would be happy for you to have left the "offensive" word in - 2,000
lines of code for a single function screams for a refactor, and says that the
original dev had no clue how to write good code. if the code makes you want to
cuss, well....)

~~~
joshka
-1 on this. 2000 lines shows that there's a good history of edge cases that are potentially relied on by various other parts of the application or system. Much of the time, the code _is_ the spec. Writing tests for this function will allow you to rewrite an equivalent better function safely, as well as discover functionality that is hidden / ambiguous from the spec, and bugs that may trip you up in the second iteration (whether generated via refactoring or rewriting).

------
alkonaut
I think making small opportunistic fixes is best. Do what you can do without
massively delaying your work _every_ time you need to work in that area of the
code. That is, leave the code better than you found it, each time.

If you add the new logic this time and also split it into four 500-line
methods and maybe a few extra tests, you have done quite a lot and it is
likely doable within the time frame of the work you are doing.

But

If there are zero tests and you aren't confident that you can create those
tests then I'd bring it up with management. Maybe they'll say that changes in
this functionality will be very rare after this small change and that it's
very well tested manually and by customers - then that's it. Not all code has
to be made concise and elegant, it's better to focus your effort on the code
that regularly needs _changing_. However, in this case I'd ask management to
consider not changing it at all, or make sure it will be very well tested
manually after your changes).

------
wnevets
I would add landmarks and do small refactoring within the function until I'm
confident in my ability to start splitting it up into smaller functions. This
should happen naturally as you make changes and add new features. However if
you're not in the function enough to make this happen then it's probably not
worth refactoring to the business.

------
ufmace
Start out by studying and understanding the function, the way that it's used,
and everything that it does. Does it mess with a lot of global state? Is it
called on one place for one thing, or a whole bunch of places, or even for
multiple different reasons? How hard would it be to set up some unit tests?

I would recommend against any kind of rewrites or radical refactoring right
away. You probably don't know what it does well enough yet or what kind of
workarounds and bug fixes are in there, so you're likely to bring back bugs.
And not many businesses have the time to spare to actually redo something like
that correctly all at once.

Aside: You could just give up and quit, but I consider it part of the job of a
good developer to be able to take a mess of spaghetti code and gradually turn
it into something easily understandable without causing huge delays in
business processes or show-stopper bugs. I don't think a developer who is only
willing to touch pristine code is very valuable in the real world.

I'd make the first goal to add the required changes with as few changes to the
function as possible. Focus your refactoring work at first on trying to get
some tests around this thing. Don't try to test every use case right away, but
do try to comprehensively test a few tasks, including capturing whatever it
does to any state in any other systems. Most likely, a lot of the initial
refactoring work will hardly touch the function at all, but instead focus on
putting the things that it touches into modular, testable parts. Getting good
tests around it will help you understand what it does better, what parts of it
are important, and how state is handled around the system.

Then as you go, start adding tests, and gradually refactor things that are
tested. Write tests for any bugs that are reported back to you. Anything that
looks weird, ask around and see if you can figure out what it's for and if you
really need to test for it.

------
rbrogan
You could start by writing comments. There is little chance of any bugs coming
from that. If there are any tests available then you could work on
understanding the tests and then step through the code in the debugger. Hope
that helps.

------
tienthanh8490
I know other people already suggested writing test, carefully documenting etc
so I won't repeat that. Just want to add one more thing: try to prune the code
first as such a large codebase is likely to have redundant code built up over
time. By doing this you will have some time to understand the code (to know
what you can kill and what you can't), and also save yourself some time as the
codebase you have to refactor now is now much shorter and probably cleaner.

------
agentultra
I've had to do this once. They don't teach you managing code like this! A
friend gave me a copy of _Working Effectively With Legacy Code_ [0] which
helped me.

The gist of it: a strong suite of integration and unit tests. Isolate small
code paths into logical units and test for equivalency.

[0] [http://www.amazon.com/Working-Effectively-Legacy-Michael-
Fea...](http://www.amazon.com/Working-Effectively-Legacy-Michael-
Feathers/dp/0131177052)

------
timonoko
I remember linear lists of 1000+ if-statements. These were "state machines",
popular form of real-time multitasking 50 years ago. Nothing wrong with that.
You might make an analyzer which recognizes longer series of events and
impossible states, but why bother. This is a programming paradigm, best you
can do is to go with the flow.

------
fallingfrog
We have a saying where I'm from - "That's the smell of money!" Relax, I'm sure
your boss knows that this is going to take a while. Make prudent changes,
don't go overboard, take your time and test thoroughly.

------
frozenport
Where a function ends and where a function starts is somewhat artificial. For
example, if this were BASIC it would be just be a label.

I would read the thing a few times and figure out exactly where to add the
functionality. Refactor second, if time permits. If the thing is 2k long
clearly, refactoring it isn't a priority.

------
there4
This is a great presentation about refactoring that may be useful: "RailsConf
2014 - All the Little Things by Sandi Metz"
[https://www.youtube.com/watch?v=8bZh5LMaSmE](https://www.youtube.com/watch?v=8bZh5LMaSmE)

------
lectrick
1) Cover code with tests 2) Refactor. Martin Fowler wrote a famous book on
refactoring patterns that are guaranteed to merely transform the code and not
break functionality. 3) Rerun tests, repeat till everything passes

------
tahssa
Do a re-write. If your manager says no to the re-write then add another if
condition to handle the current business logic and, at the same time, look for
a new job (which is probably what the last guy did).

------
scarface74
It depends on the tooling you have available. If I encountered a 2000 line C#
method, I would use Resharper and start with the extract method command and
then extract class.

------
kaizensoze
[https://www.youtube.com/watch?v=dWIHvuABaFU&t=01m05s](https://www.youtube.com/watch?v=dWIHvuABaFU&t=01m05s)

------
odonnellryan
You should read Clean Code, even if it is just the bits on refactoring, so you
can refactor properly! :)

------
quintes
unit test first so you know if you've broken it. Then lots of test input
variations best taken from production sources.

Refactor thereafter and test after each iteration of changes. It's not a
rewrite, so don't lose the plot just Refactor into smaller methods :)

------
panamafrank
quit, get a job working in a newer more modern language or just a company
working on 'green field' projects. I worked on legacy code bases for ~3 years
as a junior grunt and it contributed to some kind of PTSD-lite disorder.

or read the 'ol refactoring book...

------
zippy786
Send it to me with requirements, I'll do it for 500$

------
jarsin
Get a new job :)

~~~
thatswrong0
That's the /r/relationships solution

~~~
odonnellryan
I enjoyed this! ;)

------
sharemywin
you probably need a ton of unit tests for this thing.

------
zerr
Do not touch it.

------
davidw
Indeed.com, Craigslist and maybe Linkedin.

------
PaulHoule
Yes, just don't use the word "shit".

The obvious refactoring is to split the function up into smaller pieces, even
if you can split it into two 1500 line functions that is a win.

~~~
josephmx
Is swearing on HN banned now?

~~~
hamiltonkibbe
Apparently not, I personally find the phrase "2000 line function" offensive
but that part isn't censored.

