
How to Improve a Legacy Codebase - darwhy
http://jacquesmattheij.com/improving-a-legacy-codebase
======
apeace
> Do not fall into the trap of improving both the maintainability of the code
> or the platform it runs on at the same time as adding new features or fixing
> bugs.

I don't disagree at all, but I think the more valuable advice would be to
explain how this can be done at a typical company.

In my experience, "feature freeze" is unacceptable to the business
stakeholders, even if it only has to last for a few weeks. And for larger-
sized codebases, it will usually be months. So the problem becomes explaining
why you have to do the freeze, and you usually end up "compromising" and
allowing only really important, high-priority changes to be made (i.e. all of
them).

I have found that focusing on bugs and performance is a good way to sell a
"freeze". So you want feature X added to system Y? Well, system Y has had 20
bugs in the past 6 months, and logging in to that system takes 10+ seconds. So
if we implement feature X we can predict it will be slow and full of bugs.
What we should do is spend one month refactoring the parts of the system which
will surround feature X, and then we can build the feature.

In this way you avoid ever "freezing" anything. Instead you are explicitly
elongating project estimates in order to account for refactoring. Refactor the
parts around X, implement X. Refactor the parts around Z, implement Z. The
only thing the stakeholders notice is that development pace slows down, which
you told them would happen and explained the reason for.

And frankly, if you can't point to bugs or performance issues, it's likely you
don't need to be refactoring in the first place!

~~~
ethbro
From personal experience, a good way of approaching the sell to business
stakeholders is getting them involved in the bug triage and tracking process.

You need to make the invisible (refactoring and code quality) visible
(tracking) so they can see what the current state is and map the future.

The biggest reason business stakeholders push back against this is that
developers tend to communicate this in terms of "You don't need to know
anything about this. But we've decided it needs to be done." Which annoys
someone when they're paying the hours.

I've had decent success with bringing up underlying issues on roadmaps, even
to the generalness of "this feature / component has issues." It's a lot easier
conversation if it's adding "That thing that we've had on our to-do list for a
couple months" vs "This new thing that I never told you about."

And as far as pitching, if the code is at all modular, you can usually get
away with "new feature in code section A" \+ "fixes and performance
improvements in unrelated section B" in the same release.

PS: I love the simple counter-based bookkeeping perspective from the linked
post. (And think someone else suggested something similar in a previous
performance / debugging front page article)

~~~
crdoconnor
I've tried this "getting them involved" approach and it failed miserably for
me. I've tried explaining why module A had to be decoupled from module B to
stakeholders. I've tried explaining why we need to set up a CI server. I've
tried explaining why technology B needs to isolated and eliminated.

In almost all cases they nod and feign interest and understanding and their
eyes glaze over. And why should they be interested? The stories are almost
always abstract and the ROI is even more abstract. It's all implementation
details to them. These stories usually languish near the bottom of the task
list and you often need to sneak it in somehow to get it done at all.

I think the only real way of dealing with this problem is to allocate time for
developers to retrospect on what needs dealing with the code (what problems
caused everybody the most pain in the last week?), then time to plan
refactoring and tooling stories and time to do those stories alongside
features and bugs.

Stakeholders do need to assess what _level_ of quality they are happy with
(and if it's low, developers should accept that or leave), but that should be
limited to telling you how much time to devote to these kinds of stories,
_not_ what stories to work on and _not_ what order to do them in.

I don't see why they shouldn't have visibility into this process but there's
no way they should be allowed to micromanage it any more than they should be
dictating your code style guidelines.

This is, IMO, the single worst feature of SCRUM - one backlog, 100% determined
by the product owner whom you have to plead or lobby if you want to set up a
CI server.

~~~
DougWebb
"Currently, every time we want to build a release of the software in order to
test it before deployment, __ developers need to stop working on features and
maintenance while we go through the build process, which takes __ hours/days.
There are a lot of manual steps involved, and we found that we make an average
of __ errors in the process each time, which takes an additional __ hours/days
to resolve. We go through all of this __ times a year.

We've determined that we can automate the entire process by setting up a
Continuous Integration (CI) server. There's some work involved in setting it
up; we estimate it will take __ days/weeks to get it running. But once it's
running, (we'll always have a build running __ minutes after each code
change)|(we can click on a button in the CI's GUI and we'll have a build
running __ minutes later), and we'll be saving __ hours/days of effort per
build/year."

Plug in your numbers. If the time to deploy the CI server exceeds the savings,
the business would be justified in telling you not to do it. (You'd have to
make a case based on quality and reproducibility, which is tougher.) If the
cost is less than the savings, the business should see this as a no-brainer,
and the only restraint would be scheduling a time to get it done. (Not having
it might cost more, but it might not cost as much as failing to get other
necessary work done.)

~~~
crdoconnor
Who the fuck writes a fully costed business case on whether or not to spend a
day setting up a CI server?

I'm trying to get some fucking work done, not convince investors I need a
series A.

~~~
DougWebb
Ah, I see the problem. You have no interest in understanding why your business
makes the decisions it makes; you just expect them to give you permission to
do whatever you say you want to do.

You said: _I 've tried explaining why we need to set up a CI server. ... In
almost all cases they nod and feign interest and understanding and their eyes
glaze over._

The reason you've failed to make a convincing case, I believe, is because
you're talking in your language instead of theirs. Perhaps they've tried to
explain to you, in their language, why they won't prioritize your CI server,
and _you_ nodded and feigned interest while _your_ eyes glazed over.

The quote I gave you expresses your request and justification for a CI server
into terms the business needs: what problem does it solve, what does it cost,
how does it affect on-going costs, what are the risks of doing it and not
doing it, and what impact does it have on other activities if it is done and
if it is not done. This is _not_ a "fully costed business case" or "convincing
investors you need a series A". If you've given any thought at all to why you
want a CI server beyond "I want it" you should have no problem filling in the
blanks in my quote. And if you haven't bothered to think that much about it,
your business is doing the right thing by giving your requests a low priority,
because they shouldn't give your ideas any more attention than you're giving
them yourself.

~~~
sanderjd
You're making good points, but there is a _lot_ of truth to your parent's
sense that making a business case for every little thing is deeply
inefficient. The hard part is striking a good balance between one extreme of
arrogant engineers who never think about the business case for the things they
are working on and the other extreme of having technical decisions
micromanaged by non-technical managers.

~~~
DougWebb
Yes, it can be deeply inefficient, but so is not getting approval to do
necessary work. You have to start making progress somewhere, even if it's not
as fast as you'd like it to be. If you're sucessful with this, you gain
credibility and over time your recommendation will be sufficient to get
approval for smaller tasks, and the business case will only need to be made
for bigger tasks.

If you're not sucessful with this approach, and can't get approval despite
showing that it's in the business' best interests using the business' own
criteria, then your business is too dysfunctional and toxic to fix. Time to
move on.

~~~
crdoconnor
>Yes, it can be deeply inefficient, but so is not getting approval to do
necessary work.

No, actually not needing approval _to do necessary work_ is very efficient.

>You have to start making progress somewhere, even if it's not as fast as
you'd like it to be. If you're sucessful with this, you gain credibility and
over time your recommendation will be sufficient to get approval for smaller
tasks, and the business case will only need to be made for bigger tasks.

There's no point in working to gain enough credibility to be able to do your
own job effectively when you can simply leave and go and work somewhere else
that doesn't expect you to prove to it that you can do your job after they've
hired you.

Even if you manage to prevent the company from shooting itself in the foot as
far as you're concerned by "proving your worth", it'll probably only go and
shoot itself in the foot somewhere else and that will also ultimately become
your problem.

In any case, this process tends to feed upon itself. Failures in delivery lead
to a lack of distrust which leads to micromanagement which leads to failures
in delivery. It's not that you _can 't_ escape that vicious cycle, it's that
it typically has a terrible payoff matrix.

~~~
DougWebb
I meant not getting approval for necessary work, and therefore not being able
to do the necessary work, is inefficient. Not _needing_ approval for necessary
work is great; we agree on that.

You're right about having to make a choice between fixing the place you're at
or finding a new place to be. There are many factors to consider, and
sometimes trying to fix the place you're at can be worth the effort.

------
specialist
Sound advice.

re: Write Your Tests

I've never been successful with this. Sure, write (backfill) as many tests as
you can.

But the legacy stuff I've adopted / resurrected have been complete unknowns.

My go-to strategy has been blackbox (comparison) testing. Capture as much
input & output as I can. Then use automation to diff output.

I wouldn't bother to write unit tests etc for code that is likely to be
culled, replaced.

re: Proxy

I've recently started doing shadow testing, where the proxy is a T-split
router, sending mirror traffic to both old and new. This can take the place of
blackbox (comparison) testing.

re: Build numbers

First step to any project is to add build numbers. Semver is marketing, not
engineering. Just enumerate every build attempt, successful or not. Then
automate the builds, testing, deploys, etc.

Build numbers can really help defect tracking, differential debugging. Every
ticket gets fields for "found" "fixed" and "verified". Caveat: I don't know if
my old school QA/test methods still apply in this new "agile" DevOps (aka
"winging it") world.

~~~
erikpukinskis
> re: Write Your Tests, I've never been successful with this ... I wouldn't
> bother to write unit tests etc for code that is likely to be culled,
> replaced.

I think you misread the author. He says "Before you make any changes at all
write as many _end-to-end_ and _integration tests_ as you can." (emphasis
mine)

> My go-to strategy has been blackbox (comparison) testing. Capture as much
> input & output as I can. Then use automation to diff output.

That's an interesting strategy! Similar to the event logs OP proposes?

~~~
rzzzt
Sounds like approval testing:
[http://approvaltests.com/](http://approvaltests.com/)

You capture the initial output from the original code, then treat this
canonical version as the expected result until something changes.

------
cessor
I'd add a prerequisite to the top of this list:

\- Get a local build running first.

Often, a complete local build is not possible. There are tons of dependencies,
such as databases, websites, services, etc. and every developer has a part of
it on their machine. Releases are hard to do.

I once worked for a telco company in the UK where the deployment of the system
looked like this: (Context: Java Portal Development) One dev would open a zip
file and pack all the .class files he had generated into it, and email it to
his colleague, who would then do the same. The last person in the chain would
rename the file to .jar and then upload it to the server. Obviously, this
process was error prone and deployments happened rarely.

I would argue that getting everything to build on a central system (some sort
of CI) is usefull as well, but before changing, testing, db freezing, or
anything else is possible, you should try to have everything you need on each
developer's machine.

This might be obvious to some, but I have seen this ignored every once in a
while. When you can't even build the system locally, freezing anything,
testing anything, or changing anything will be a tedious and error prone
process...

~~~
flukus
> I would argue that getting everything to build on a central system (some
> sort of CI) is usefull as well, but before changing, testing, db freezing,
> or anything else is possible, you should try to have everything you need on
> each developer's machine.

I'd extend this and say that the CI server should be very naive as well. It's
only job is to pull in source code and execute the same script (makefile,
whatever) that the developers do. Maybe with different configuration options
or permissions, but the developers should be able to do everything the CI
server does in theory.

A big anti pattern I see is build steps that can only be done by the CI server
and/or relying on features of the CI server software.

------
taude
This is a good high-level overview of the process. I highly recommend that
engineers working in the weeds, read "Working Effectively with Legacy Code"
[1], as it has a ton of patterns in it that you can implement, and more
detailed strategies on how to do some of the code changes hinted at in this
article.

[1] [https://www.safaribooksonline.com/library/view/working-
effec...](https://www.safaribooksonline.com/library/view/working-effectively-
with/0131177052/)

~~~
zimablue
Second this, this is one of the best coding books I've read.

edit: it also gives a lot of similar advice to the article, big-bang rewrites
often impossible, drawing a line somewhere in the application to do input-
output diffing tests when you make a change

------
bmh_ca
I mostly agree with this - bite-sized chunks is really the main ingredient to
success with complex code base reformations.

FWIW, if you want to have a look at a reasonably complex code base being
broken up into maintainable modules of modernized code, I rewrote Knockout.js
with a view to creating version 4.0 with modern tooling. It is now in alpha,
maintained as a monorepo of ES6 packages at
[https://github.com/knockout/tko](https://github.com/knockout/tko)

You can see the rough transition strategy here:
[https://github.com/knockout/tko/issues/1](https://github.com/knockout/tko/issues/1)

In retrospect it would've been much faster to just rewrite Knockout from
scratch. That said, we've kept almost all the unit tests, so there's a
reasonable expectation of backwards compatibility with KO 3.x.

~~~
humanrebar
> In retrospect it would've been much faster to just rewrite Knockout from
> scratch.

You're getting a bit of pushback on this sentiment, so I'll play devil's
advocate a bit here.

I've tried gradual refactors in the past, with poor results, because unfocused
technical teams and employee turnover can really kill velocity on long-term
goals that take gradual but detailed work.

That is, replacing all those v1 API calls with the v2 API calls over five
months seems fine, but there's risk that it actually takes several years after
unexpected bugs and/or "urgent" feature releases come into play. And by that
time, you might have employee turnover costs, retraining costs, etc.

I'm just saying the risk equation isn't as cut and dry as it seems. There's is
survivor bias in play in both the "rewrite it" and the "gradually migrate it"
camps.

~~~
jacquesm
The rewrite only works - in my experience, YMMV - if the team is already 100%
familiar with the codebase as it is _and_ the task is a relatively simple one
_and_ there is a nice set of tests and docs to go with the whole package.

Outside that boundary you're set up for failure.

~~~
ebiester
The one caveat is that there are times when the business realizes that their
old workflows and features aren't what they now need. The rewrite becomes a
new project competing with the old rather than a functional rewrite.

This is also fraught with peril. However, it is a different set of problems.
In an ideal world, you have engineers who can make reasoned decisions.

However, if the company culture allowed one application to devolve into chaos,
what will make the second application better?

~~~
typednothing
At some point they'll junk the in-house program and buy something off the
shelf.

~~~
jacquesm
Assuming something off the shelf is available, yes. In fact, if something off
the shelf is available we'll be happy to make that recommendation, too many
companies that aren't software houses suddenly feel that they need to write
everything from the ground up. And even companies that are software houses
suffer from NIH more often than not. (Though, I have to say that in my
experience in the last couple of years or so this is improving, it used to be
that every company had their own in-house developed framework but now we see
more and more standardization.)

------
_virtu
How does one get better if they only ever work in code bases that are steaming
piles of manure? So far I've worked at two places and the code bases have been
in this state to an extreme. I feel like I've been in this mode since the very
beginning of my career and am worried that my skill growth has been negatively
impacted by this.

I work on my own side projects, read lots of other people's code on github and
am always looking to improve myself in my craft outside of work, but I worry
it's not enough.

~~~
rb808
I think its pretty common- and I think you're lucky.

I was surprised to see the article say "It happens at least once in the
lifetime of every programmer,". I think if you work on greenfield projects
your whole career you're likely the one who's creating these 'steaming piles
of manure'.

By working on bad legacy projects you learn an awful lot of things about what
works and what is a problem to maintain - it will make you a better developer.

The only issue is if you always work on legacy stuff and never get to write
greenfield you might get typecast as such. Whether that is a problem of not is
up to you. Sounds like you care enough you can change when/if you want to.

~~~
couchand
I think you're setting up a false dichotomy. There are codebases other than
just legacy and greenfield projects: high-quality, well-structured and well-
maintained code.

I would agree that if all you work on is greenfield you're probably making the
messes others are cleaning up, but I don't think that means developers are
bound to either make messes or clean them up. There are plenty of good, long-
lived projects out there.

Not every old project is legacy.

~~~
_virtu
This is what I've been wondering about. I don't care if the stack isn't the
newest, or the tech is the shiniest. I'm just more interested in working on
code that was _engineered_. That is code that was designed and then built.
That's the problem I have with most of the code I'm working in.

At my current place of work, we're not even using xmlhttprequest. We're using
an antiquated xml library that's been hand rolled (xajax + major changes) to
emulate our ajax requests. It's insanity to me that we're still in this mode.

------
kentt
> Do not ever even attempt a big-bang rewrite

I'd love to hear a more balanced view on this. I think this idea is preached
as the gospel when dealing with legacy systems. I absolutely understand that
the big rewrite has many disadvantages. Surely there is a code base that has
features such that a rewrite is better. I'm going to go against the common
wisdom and wisdom I've practiced until now, and rewrite a program I maintain
that is

1\. Reasonably small (10k loc with a large parts duplicated or with minor
variables changed).

2\. Barely working. Most users cannot get the program working because of the
numerous bugs. I often can't reproduce their bugs, because I get bugs even
earlier in the process.

3\. No test suite.

4\. Plenty of very large security holes.

5\. I can deprecate the old version.

I've spent time refactoring this (maybe 50 hours) but that seems crazy because
it's still a pile of crap and at 200 hours I don't think it look that
different. I doubt it would take 150 hours for a full rewrite.

Kindly welcoming dissenting opinions.

~~~
Yhippa
Not a dissenting opinion but I'd love to see some case studies on rewrites. As
a consultant this is a frequent request and will probably be big business in
the future as people migrate off of expensive legacy mainframe or other
applications from the 80's, 90's, and possibly 2000's.

~~~
ef4
It's not "rewrite" that's bad, it's thinking you can cut over to a new system
in a "big bang".

Rewrites are definitely common and beneficial, but the successful ones always
run the new code and the old code side-by-side for an extended period of time.
Which means you're still tending and caring about the old code, even as you
strive to direct most of your effort into the new code.

------
maxxxxx
How do people handle this in dynamic languages like JavaScript? I have done a
lot of incremental refactoring in C++ and C# and there the compiler usually
helped to find problems.

I am now working on a node.js app and I find it really hard to make any
changes. Even typos when renaming a variable often go undetected unless you
have perfect test coverage.

This is not even a large code base and I find it already hard to manage. Maybe
i have been using typed languages for a long time so my instincts don't apply
to dynamic languages but I seriously wonder how one could maintain a large
JavaScript codebase.

~~~
stickfigure
I think you just captured the essence of why _micro_ services are so popular.
Dynamic languages just don't scale to large codebases, so there's enormous
pressure to decompose software into chunks that can be digested more easily.

Some amount of this is good, but it often forces the chunk boundaries to be
smaller than the "natural" clumping of data and behavior in a distributed
system. IMHO this is a much worse problem than a messy monolith; you can
refactor a monolithic codebase to be more modular, but refactoring hundreds of
microservices is a herculean endeavor.

My problem with microservices is the word _micro_.

~~~
gnaritas
> Dynamic languages just don't scale to large codebases

You mean "popular" dynamic languages due to their lack of tooling. Dynamic
languages like Smalltalk scale up just fine, but Smalltalk has automated
refactoring tools. In other words it's a tool support problem, not a dynamic
language problem.

~~~
Roboprog
> Dynamic languages just don't scale to large codebases

Static languages scale to large codebases. There's no app that a static
language (and those who insist on static types) can't turn into a much larger
codebase :-)

I love the imagery of "mountains of dirt": [http://steve-
yegge.blogspot.com/2007/12/codes-worst-enemy.ht...](http://steve-
yegge.blogspot.com/2007/12/codes-worst-enemy.html)

------
lbill
I used to work on a messy legacy codebase. I managed to clean it, little by
little, even though most of my colleagues and the management were a bit afraid
of refactoring. It wasn't perfect but things kinda worked, and I had hope for
this codebase.

Then the upper management appointed a random guy to do a "Big Bang" refactor:
it has been failing miserably (it is still going on, doing way more harm than
good). Then it all started to go really bad... and I quit and found a better
job!

------
OutsmartDan
Big bang rewrites are needed in order to move forward faster.

A huge issue with sticking to an old codebase for such a long time is that it
gets older and older. You get new talent that doesn't want to manage it and
leave, so you're stuck with the same old people that implemented the codebase
in the first place. Sure they're smart, knowledgable people in the year 2000,
but think of how fast technology changes. Change, adapt, or die.

~~~
jacquesm
A big bang rewrite will nine out of ten times slow you down, it will not
accelerate things, and the most likely outcome is that not only will it be
slower, it might fail entirely.

It's a complete fallacy to think that you're going to do much better than the
previous crew if you are not prepared to absorb the lessons they left behind
in that old crusty code.

So you'll have to learn them all over again.

> Change, adapt, or die.

Die it is then.

~~~
alkonaut
It's not a given that legacy code means "no people still around, no docs and
no tests". I'm on a rewrite project and I'm 10 years in, and the whole crew
from the last project (also around 15 years) is till in this project too. That
helps.

The causes of the big bang rewrite are usually not just "this code smells
let's rewrite it" but rather that the old product reached some technical dead
end. Perhaps it can't scale. Perhaps it's a desktop product written in an UI
framework that doesn't support high DPI screens and suddenly all the customers
have high DPI screens. Obviously in that situation you'd aim to just replace a
layer of the application (a persistence layer, an UI layer) but as we all know
that's not how it works. The cost of a rewrite shouldn't be underestimated -
as you said there is no reason to believe that if it took 50 man years for the
last team then the new team will take 50 too. But that is in itself not a
reason to not do it.

~~~
jacquesm
Fair enough. So the real lesson then is 'it depends', as with everything else.
But the kind of jobs where the cleanup crews get called in are on the verge of
hopeless and it is not rare that we do these on a 'no-cure, no pay' basis.

Great to see you be part of such a long lived team, that's a rarity these
days. That's got to be a fantastic company to work for. Usually even
relatively modest turnover (say 15% per year) is enough to effectively replace
all the original players within a couple of years, most software projects long
outlive their creators presence at the companies they were founded in. Add in
some acquisitions or spin-outs and it gets to the point where nobody even
knows who wrote the software to begin with.

------
busterarm
All of this seems to focus on the code, after glossing over the career
management implications in the first paragraph.

I've done this sort of work quite a number of times and I've made mistakes and
learned what works there.

It's actually the most difficult part to navigate successfully. If you already
have management's trust (i.e., you have the political power in your
organization to push a deadline or halt work), you're golden and all of the
things mentioned in the OP are achievable. If not, you're going to have to
make huge compromises. Front-load high-visibility deliverables and make sure
they get done. Prove that it's possible.

Scenario 1) I came in as a sub-contractor to help spread the workload (from 2
to 3) building out a very early-stage application for dealing with medical
records. I came in and saw the codebase was an absolute wretched mess. DB
schema full of junk, wide tables, broken and leaking API routes. I spent the
first two weeks just bulletproofing the whole application backend and whipping
it into shape before adding new features for a little while and being fired
shortly afterwards.

Lesson: Someone else was paying the bills and there wasn't enough
visibility/show-off factor for the work I was doing so they couldn't justify
continuing to pay me. It doesn't really matter that they couldn't add new
features until I fixed things. It only matters that the client couldn't
visibly see the work I did.

Scenario 2) I was hired on as a web developer to a company and it immediately
came to my attention that a huge, business-critical ETL project was very
behind schedule. The development component had a due date three weeks
preceding my start date and they didn't have anyone working on it. I asked to
take that on, worked like a dog on it and knocked it out of the park. The
first three months of my work there immediately saved the company about a
half-million dollars. Overall we launched on time and I became point person in
the organization for anything related to its data.

Lesson: Come in and kick ass right away and you'll earn a ton of trust in your
organization to do the right things the right way.

------
sz4kerto
The OP has so many reasonable, smart-sounding advice that doesn't work in the
real world.

1) "Do not fall into the trap of improving both the maintainability of the
code or the platform it runs on at the same time as adding new features or
fixing bugs."

Thanks. However, in many situations this is simply not possible because the
business is not there yet so you need to keep adding new features and fix
bugs. And still, the code base has to be improved. Impossible? Almost, but
we're paid for solving hard problems.

2) "Before you make any changes at all write as many end-to-end and
integration tests as you can."

Sounds cool, except in many cases you have no idea how the code is supposed to
work. Writing tests for new features and bugfixes is a good advice (but that
goes against other points the OP makes).

3) "A big-bang rewrite is the kind of project that is pretty much guaranteed
to fail.

No, it's not. Especially if you're rewriting parts of it at a time as separate
modules

My problem with the OP is really that it tells you how to improve a legacy
codebase given no business and time pressure.

~~~
jacquesm
On the contrary, we do this work under extreme business and time pressure,
sometimes existential pressure (as in: fail and the company fails).

That's exactly why this list is set up the way it is: you will get results
_fast_ and they will be good results.

If you want to play the 'I'm doing a sloppy job because I'm under pressure'
card then consider this: the more pressure _the less room there is for
mistakes_.

Here is a much more play-by-play account of one of these jobs where management
gave me permission to do a write-up as part of the deal:

[https://jacquesmattheij.com/saving-a-project-and-a-
company](https://jacquesmattheij.com/saving-a-project-and-a-company)

(For obvious reasons management usually does not give such permission, nobody
wants to admit they let it get that far on their watch, I did my best to
obscure which company this is about.)

~~~
sz4kerto
> That's exactly why this list is set up the way it is: you will get results
> fast and they will be good results.

What do you mean by 'fast'? If you can get meaningful improvements in a few
months' time, then you're just working with smaller code base than what I
thought of. If you're talking about stopping for a year, then .. well, that's
the problem I'm talking about.

> If you want to play the 'I'm doing a sloppy job because I'm under pressure'
> card

No, I just wanted to share my opinion that I disagree with the overly
generalized suggestions you're making.

~~~
jacquesm
> What do you mean by 'fast'?

Much faster than by going the rewrite route (assuming that is even possible,
which I am convinced for anything but the most trivial problems it isn't).
Preferably to first deploy within a few days and incremental changeover to the
new situation starting within two weeks or so of the starting gun being fired.

> If you can get meaningful improvements in a few months' time, then you're
> just working with smaller code base than what I thought of.

No.

> If you're talking about stopping for a year, then .. well, that's the
> problem I'm talking about.

Who said so?

All I said is that you should only do one thing at a time. Do not attempt to
achieve _two_ results with _one_ release.

> No, I just wanted to share my opinion that I disagree with the overly
> generalized suggestions you're making.

You are very welcome to your own opinion about my 'overly generalized
suggestions' it's just that they are a lot more than suggestions, they are
things that I (and others, see this thread for evidence) have used countless
times and that simply work.

All you do is a bunch of naysaying without offering up anything concrete as an
alternative that would work better or evidence that anything posted would not
work in practice. It does and it pays my bills.

~~~
alexwebb2
> > What do you mean by 'fast'?

> deploy within a few days and incremental changeover to the new situation
> starting within two weeks or so

I'm going to take this as confirmation that you're working on very, very small
projects. This would be an extraordinarily unrealistic timeframe for large
projects, which take vastly larger quantities of time to apply the steps
you've outlined - which, in turn, renders those steps useless in a competitive
business context as far as large applications are concerned.

~~~
jacquesm
No, it just means that I have crew for jobs like these that knows their stuff.

500K lines is 'small' by our standards and if we are not moving within two
weeks that translates into one very unhappy customer. That's something a
typical team of 5 to 10 people has produced in a few years.

Note that I wrote 'incremental' and 'starting'. That doesn't mean the job is
finished at that point in time. But we should have a very solid grasp of the
situation, which parts are bleeding the hardest and what needs to be done to
begin to plug those holes. That the whole thing in the end can become a multi-
year project is obvious, we're not miracle workers, merely hard workers.

In a way the size of the codebase is not even relevant. What is most important
is that you get the whole team and the management aligned behind a single
purpose and then to follow through on that. Those first couple of weeks are
crucial, they are tremendously hard work even for a seasoned team that has
worked together on jobs like these several multiple times.

The one case I wrote about here was roughly that size (so small by my
standards), within 30 days the situation was under control. We're now two
years later and they are still working on the project but what was done in
that short period is the foundation they are still using today.

If a project is much larger than that then obviously it will take more time.
Just the discovery process can take a few weeks to months, but in that case I
would recommend to split the project up into several smaller ones that can be
operated on independently with 'frozen interfaces' where-ever they can be
found.

That way you can parallelize a good part of the effort without stepping on
each others toes all the time.

The problem is not that you can't tackle big IT projects well. The problem is
that big IT projects translate into big budgets and that in turn attracts all
kinds of overhead that does not contribute to the end result.

If you strip away that overhead you can do a lot with a (relatively) small
crew.

If you're going to tackle a code base in excess of something 10 M loc in this
way you will again run into all kinds of roadblocks. For those situations it
would likely pay off to spend a few months on the plan of attack alone.

If a project that large came my way I would refuse, it would tie us down for
way too long.

But that's out of scope for the article afaic, we're talking about medium to
large project, say 50 manyears worth of original work that has become
unmaintainable for some reason or other (mass walk-out, technical debt out of
control or something to that effect).

If those are 'very very small projects' by your standards than so be it.

~~~
alexwebb2
> 50 manyears worth of original work that has become unmaintainable for some
> reason or other (mass walk-out, technical debt out of control or something
> to that effect)

That's the scale I'm talking about, so at least we're on the same page there.

It sounds to me like your specialty routinely puts you in situations where the
client has reached the end of the line and is in Hail Mary Mode, where they're
amenable to having a consultant do Whatever It Takes to turn things around. To
me, that sounds like just about the best case scenario for addressing the
issues with legacy software, and pretty far removed from the Usual Case.

In my mind, the Usual Case is legacy software that's in obvious decline but
still has significant utility, and for which there is still a significant
portion of the market that can be attracted with added features. That's the
long tail for a huge swath of the industry. In those cases, it's unthinkable
to halt development for _any_ significant stretch of time. It's dog eat dog
out here, and when your competitors aren't pausing for breath, you can't
either - it's just a totally different world, and I think you're
inappropriately pushing the wisdom from your own corner of it out into spaces
where it's just not applicable.

In a similar vein, I think your opinions on rewrites are a bit skewed by the
fact that the _only_ ones you encounter in your specialty are ones that have
failed miserably (or at the very least, they're seriously overrepresented).

You clearly have a very solid and proven game plan for the constraints you're
used to, but I think many of the extrapolations aren't valid.

~~~
jacquesm
I'd be more than happy to believe you if the comments in this thread weren't
for the most part confirming my experience. On the other hand I'm more than
willing to believe that there are plenty of places where none of this applies
(though, I haven't seen them) and where with some slight variation you could
get a lot of mileage out of these methods.

Because if the only extra constraint would be 'you can't halt development'
then that's easy enough: simply iterate on smaller pieces and slip in the
occasional roadmap item to grease the wheels. But that _does_ assume that
development had not yet ground to a halt in the first place.

The biggest difference between your experience and my experience I think is
that our little band of friends is external, so we get to negotiate up front
about what the constraints are and if we put two scenarios on the table, one
of which is ~70% cheaper because we temporarily halt development completely
then that is the most likely option for the customer to take.

------
hinkley
It's my turn to disagree with something in the article.

> Before you make any changes at all write as many end-to-end and integration
> tests as you can.

I'm beginning to see this as a failure mode in and of itself. Once you give
people E2E tests it's the only kind of tests they want to write. It takes
about 18 months for the wheels to fall off so it can look like a successful
strategy. What they need to do is learn to write unit tests, but You have to
break the code up into little chunks. It doesn't match their aesthetic sense
and so it feels juvenile and contrived. The ego kicks in and you think you're
smart enough you don't have to eat your proverbial vegetables.

The other problem is e2e tests are slow, they're flaky, and nobody wants to
think about how much they cost in the long run because it's too painful to
look at. How often have you see two people huddled over a broken E2E test?
Multiply the cost of rework by 2.

------
stephenwilcock
It is great to see more people sharing their strategies for managing legacy
codebases. However, I thought it might be worth commenting on the suggestion
about incrementing database counters:

> "add a single function to increment these counters based on the name of the
> event"

While the sentiment is a good one, I would warn against introducing counters
in the database like this and incrementing them on every execution of a
function. If transactions volumes are high, then depending on the locking
strategy in your database, this could lead to blocking and locking. Operations
that could previously execute in parallel independently now have to compete
for a write lock on this shared counter, which could slow down throughput. In
the worst case, if there are scenarios where two counters can be incremented
inside different transactions, but in different sequences (not inconceivable
in a legacy code), then you could introduce deadlocks.

Adding database writes to a legacy codebase is not without risk.

If volumes are low you might get away with it for a long time, but a better
strategy would probably just to log the events to a file and aggregate them
when you need them.

------
artursapek
Are there businesses building automation and tooling for working with legacy
codebases? It seems like a really good "niche" for a startup. The target
market grows faster every year :)

~~~
tjalfi
Semantic Designs[0] is one of several companies that sells software for
working with legacy codebases and programming language translation. [1] is a
SO post by one of their founders that describes some of the difficulties in
programming language translation.

[0] [http://www.semdesigns.com/](http://www.semdesigns.com/)

[1]
[https://stackoverflow.com/a/3460977/3465526](https://stackoverflow.com/a/3460977/3465526)

~~~
artursapek
Interesting, thanks! Sounds like it's a really hard problem.

------
mfrisbie
Sometimes your inner desires to rewrite it from scratch can be overwhelming.

[https://alwaystrending.io/articles/software-engineer-
enterta...](https://alwaystrending.io/articles/software-engineer-entertains-
erotic-fantasy-about-rewriting-entire-codebase-from-scratch)

------
user5994461
Agreed about the pre-requirements: Adding some tests, reproducible builds,
logs, basic instrumentations.

Highly disagree about the order of coding. That guy wants to change the
platform, redo the architecture, refactor everything, before he starts to fix
bugs. That's a recipe for disaster.

It's not possible to refactor anything while you have no clue about the
system. You will change things you don't understand, only to break the
features and add new bugs.

You should start by fixing bugs. With a preference toward long standing simple
issues, like "adding a validation on that form, so the app doesn't crash when
the user gives a name instead of a number". See with users for a history of
simple issues.

That delivers immediate value. This will give you credit quickly toward the
stakeholders and the users. You learn the internals doing, before you can
attempt any refactoring.

------
SideburnsOfDoom
> add instrumentation. Do this in a completely new database table, add a
> simple counter for every event that you can think of and add a single
> function to increment these counters based on the name of the event.

The idea is a good one but the specific suggested implementation .. hasn't he
heard of statsd or kibana?

~~~
jacquesm
Not available on all platforms. Think: mainframes, platforms no longer with
the times, non-unix and so on.

If you have access to a tool like that by all means use it, the specific
implementation is not relevant, the article merely tries to show a simplest
way to implement this very useful functionality that will work without
limitation on just about anything that I can think of.

~~~
SideburnsOfDoom
> Not available on all platforms. Think: mainframes, platforms no longer with
> the times, non-unix and so on.

YMMV, though I would steer people towards an off-the-shelf solution over
rolling your own.

Does "non-unix" mean Windows? My experience there has been that you can find a
statsd client for your language of choice, and a way to plug whatever logging
tool you have into kibana.

~~~
jacquesm
Quite often there is an embedded component in the mix somewhere, or even a
machine that is not networked in any present day sense of the word.

The whole reason these jobs exist is because modern tooling and the luxury
that comes with them is unavailable. But I've yet to find a platform where
that counter trick did not work, even on embedded platforms you can usually
get away with a couple of incs and a way to read out the counters.

If the timing isn't too close to failure.

One interesting case involved a complex real life multi-player game with
wearable computers. In the end we got it to work but only by making all the
software run twice as fast as it did before so we could use the odd cycles for
the stats collection without the rest of the system noticing. That was a big
of a hack. And the best bit: after making it work we then used all the freed
up time to send extra packets to give the system some redundancy and this
greatly improved reliability.

That system was running 8051 micro controllers and the guy that wrote the
original said that 'this couldn't be done'. Fun times :)

The server side portion of that particular project got completely re-written
as well roughly along the lines presented in the article, that wasn't a huge
project (500K lines or so) but I was very happy that it wasn't my first large
technical debt mitigation project or I would have likely stranded.

------
yeukhon
Healthcare.gov is a good example although not legacy codebase. Anyway, I think
fixing small bugs and writing tests are best way to learn how to work with
legacy system. This allows me to see what components are easier to
rewrite/refactor/add more logging and instrumentation. Business cannot wait
months before a bug is fixed just for the sake of making a better codebase.
But I agree on database changes should be minimal to none as much as possible.
Also, overcommunicate with your downstream customers of your legacy system.
They may be using your interface in an unexpected manner.

I have done a number of serious refactoring myself and god tests will save me
a huge favor despite I have to bite teeth for a few days to a few weeks.

------
moonbug
This should be one of the first tasks that any aspiring career programmer has.
It's an essential experience in making a professional.

------
weef
Great advice. Writing integration tests or unit tests around existing
functionality is extremely important but unfortunately might not always be
feasible given the time, budget, or complexity of the code base. I just
completed a new feature for an existing and complex code base but was given
the time to write an extensive set of end-to-end integration tests covering
most scenarios before starting my coding. This proved invaluable once I
started adding my features to give me confidence I wasn't breaking anything
and helped find a few existing bugs no one had caught before!

~~~
humanrebar
> Writing integration tests or unit tests around existing functionality is
> extremely important but unfortunately might not always be feasible given the
> time, budget, or complexity of the code base.

Bottom line: If the project cannot afford to properly maintain the code, it's
a failure of the business model. Projects _can_ be maintained indefinitely,
but it costs money. And that means the project has to bring in enough money to
pay for those maintenance costs.

The options, as I see them:

1\. Accept that this particular project, and those that intimately depend on
it, has a lifecycle and will eventually die, either slowly or quickly. Prepare
for that fact, staying ahead of the reaper by quitting, transferring to
another project, etc.

2\. Build a case to leadership that the project is underfunded long-term. This
takes communication skills, persuasion skills, technical skills, and political
skills. You'll need to go to all the stakeholders in their frame of reference
and explain the risk involved in fundamentally depending on legacy code.

Anyway, engineers tend to see the "legacy code" problem as a technical one. It
is in the sense it takes technical work to fix it. But the root cause is a
misallocation of resources. If the needed resources aren't there in the first
place, the problem is a bad business model.

~~~
corpMaverick
Alternatively. Teams should be organized around products not around projects.
The idea that you can move developers around new projects is wrong. A large
organization with this mindset will end up with a lot of unmantainable and
unmantained code.

------
deedubaya
Yeah, I've done this. It's frustrating and easy to burn out doing it because
progress seems so arbitrary. Legacy upgrades are usually driven by large
problems or the desire to add new features. Getting a grip on the code base
while deflecting those desires can be hard.

This type of situation is usually a red flag that the company's management
doesn't understand the value of maintaining software until the absolutely have
to. That, in itself, is an indicator of what they think of their employees.

~~~
jacquesm
> This type of situation is usually a red flag that the company's management
> doesn't understand the value of maintaining software until the absolutely
> have to.

Recent conversation with the manager of a company: "I've yet to see anybody
give me a good reason why we need to maintain the software we already built if
it work."

No kidding.

~~~
deedubaya
That's just a poor job of surfacing the consequences of not maintaining
software by whoever built it.... unless their software is bug free... and we
all know there is so much bug free software out there.

------
mannykannot
WRT architecture: In my experience, you would be lucky if you are free to
change the higher level structure of the code without having to dive deeply
into the low-level code. Usually, the low-level code is a tangle of
pathological dependencies, and you can't do any architectural refactoring
without diving in and rooting them out one at a time (I was pulling up ivy
this weekend, so I was primed to make this comment!)

~~~
humanrebar
> ...you would be lucky if you are free to change the higher level structure
> of the code without having to dive deeply into the low-level code.

The problem, in my mind, is that code can't be accurately modeled on one axis
from "low level" to "high level". You can slice a system in many ways:

\- network traffic

\- database interactions

\- build time dependencies

\- run time dependencies

\- hardware dependencies

\- application level abstractions

...and certainly more. On top of that, the dimensions are not orthogonal. You
might need to bump the major version of a library to support a new wire
format, for example. Anyway, since there are many ways to slice a project,
what is "high level" in on perspective can be "low level" from another. And
vice versa.

------
alexeiz
I was in this situation more than once.

My actions are usually these:

* Fix the build system, automate build process and produce regular builds that get deployed to production. It's incredible that some people still don't understand the value of the repeatable, reliable build. In one project, in order to build the system you had to know which makefiles to patch and disable the parts of the project which were broken at that particular time. And then they deployed it and didn't touch it for months. Next time you needed to build/deploy it was impossible to know what's changed or if you even built the same thing.

* Fix all warnings. Usually there are thousands of them, and they get ignored because "hey, the code builds, what else do you want." The warning fixing step allows to see how fucked up some of the code is.

* Start writing unit tests for things you change, fix or document. Fix existing tests (as they are usually unmaintained and broken).

* Fix the VCS and enforce sensible review process and history maintenance. Otherwise nobody has a way of knowing what changed, when and why. Actually, not even all parts of the project may be in the VCS. The code, configs, scripts can be lying around on individual dev machines, which is impossible to find without the repeatable build process. Also, there are usually a bunch of branches with various degrees of staleness which were used to deploy code to production. The codebase may have diverged significantly. It needs to be merged back into the mainline and the development process needs to be enforced that prevents this from happening in the future.

Worst of all is that in the end very few people would appreciate this work.
But at least I get to keep my sanity.

~~~
mattmanser
I've always found it remarkably quick to fix warnings too, tends to be the
same mistakes over and over.

------
logicallee
This says, near the end, "Do not ever even attempt a big-bang rewrite", but
aren't a LOT of legacy in-house projects completely blown out of the water by
well-maintained libraries of popular, modern languages, that already exist?
(In some cases these might be commercial solutions, but for which a business
case could be made.)

I'm loath to give examples so as not to constrain your thinking, but, for
example, imagine a bunch of hairy Perl had been built to crawl web sites as
part of whatever they're doing, and it just so happens that these days curl or
wget do more, and better, and less buggy, than everything they had built.
(think of your own examples here, anything from machine vision to algabreic
computation, whatever you want.)

In fact isn't this the case for lots and lots of domains?

For this reason I'm kind of surprised why the "big bang rewrite" is, written
off so easily.

------
iamNumber4
Sometimes you get an entire septic tank full of...

Code base that is non-existent, as the previous attempts were done with MS BI
(SSIS) tools (for all things SSIS is not for) and/or SQL Stored procedures,
with no consistency on coding style, documentation, over 200 hundred databases
(sometimes 3 per process that only exist to house a handful of stored
procedures), and a complete developer turn over rate of about every 2 years.
with Senior leadership in the organization clueless to any technology.

As you look at a ~6000 lines in a single stored procedure. You fight the urge
to light the match, and give it some TLC ( Torch it, Level it, Cart it away)
to start over with something new.

Moral of the story, As you build, replace things stress to everyone to
"Concentrate of getting it Right, instead of Getting it Done!" so you don't
add to the steaming pile.

~~~
HelloNurse
Can you convince management that development in this situation is horrible and
expensive and that there are better architectures?

------
matt_s
Regarding instrumentation and logging - this can also be used to identify
areas of the codebase that can possibly be retired. If it is a legacy
application, there are likely areas that aren't used any longer. Don't focus
on tests or anything in these areas and possibly deprecate them.

------
quadcore
From what I've seen the most common mistake when starting working on a new
codebase is to not read it all before doing any change.

I really mean it, a whole lot of programmers simply dont read the codebase
before starting a task. Guess the result, specially in terms of frustration.

~~~
rocky1138
Sometimes the code is so horribly written we have nothing else to try but to
poke at it with a stick in different ways until it breaks.

------
ransom1538
> Before you make any changes at all write as many end-to-end and integration
> tests as you can.

^ Yes and no. That might take forever and the company might be struggling with
cash. I would _instead_ consider adding a metrics dashboard. Basically - find
the key points: payments sent, payments cleared, new user, returning user,
store opened, etc. THIS isn't as good as a nice integration suite - but if a
client is hard on cash and needs help - this can be setup in hours. With this
setup - after adding/editing code you can calm investors/ceos'. Alternatively,
if it's a larger corp it will be time strapped - then push for the same thing
:)

~~~
dfabulich
I think instead of "as many as you can" it's "as many as you can afford."

------
lol768
Any advice on what steps to take when the legacy codebase is incredibly
difficult to test?

I completely agree with the sentiment that scoping the existing functionality
and writing a comprehensive test suite is important - but how should you
proceed when the codebase is structured in such a way that it's almost
impossible to test specific units in isolation, or when the system is
hardcoded throughout to e.g. connect to a remote database? As far as I can see
it'll take a lot of work to get the codebase into a state where you _can_
start doing these tests, and surely there's a risk of breaking stuff in the
process?

~~~
ef4
An after-the-fact test suite is a different beast than one written
concurrently with the app. It's not worth trying to force one to be the other.

Work from the outside in, keeping most of the system as a black box. Start
with testing the highest-level behaviors that the business/users care about.

------
pc86
I've been a part of several successful big-bang rewrites, and several
unsuccessful ones, and saying that if you're smart they're not on the table is
just flat out wrong.

The key is an engaged business unit, clear requirements, and time on the
schedule. Obviously if one or more of these things sounds ridiculous then the
odds of success are greatly diminished. It is much easier if you can launch on
the new platform a copy of the current system, not a copy + enhancements, but
I've been on successful projects where we launched with new functionality.

~~~
jacquesm
I've yet to see a large system with lots of subsystems rewritten in one go,
but I'm more than open to being convinced that it can be done so if you could
please do a write-up of how such a project was managed.

The ones I have seen - and this is actually one of the major reasons the
clean-up crew gets called in the first place - is big bang rewrite projects
gone astray.

One huge problem with rewrites of old code is that the requirements are no
longer known or even misunderstood.

~~~
alkonaut
The biggest problem with "the new system" is that it's rarely a rewrite of the
second system. Obviously someone liked the old system otherwise it wouldn't be
rewritten. But the business case for the new system isn't just lower
maintenance cost, higher performance, a modern look etc. It's _always_ going
to be all those new features. That's what sinks the new project.

------
Bahamut
Can't say I agree with the big bang rewrite part necessarily - at my last job,
I found myself having to do significant refactors. The reason was that each
view had its own concept of a model for interacting with various objects,
which resulted in a lot of different bugs from one off implementations. My
refactor had some near term pain of having to fix various regressions I
created, but ultimately it led to much better long term maintenance.

------
d--b
I agree with most of this, though I think it doesn't dive into the main
problem:

Freezing a whole system is practically impossible. What you usually get is a
"piecewise" freeze. As in: you get to have a small portion of the system to
not change for a given period.

The real challenge is: how can you split your project in pieces of
functionalities that are reasonably sized and replaceable independently from
each other.

There is definitely no silver bullet for how to do this.

~~~
jacquesm
I could probably do a better job of making that clear in the article. The
whole point is to iterate and to lock and release parts selectively so you are
never working on more than one thing at the time.

------
alexwebb2
> How to Improve a Legacy Codebase When You Have Full Control Over the
> Project, Infinite Time and Money, and Top-Tier Developers

edit: I'm being a little snarky here, but the assumptions here are just too
much. This is all best-case scenario stuff that doesn't translate very well to
the vast majority of situations it's ostensibly aimed at.

------
kevan
>Use proxies to your advantage

At my last gig we used this exact strategy to replace a large ecommerce site
piece by piece. Being able to slowly replace small pieces and AB test every
change was great. We were able to sort out all of the "started as a bug, is
now a feature" issues with low risk to overall sales.

------
safek
> Do not ever even attempt a big-bang rewrite

Really? Are there no circumstances under which this would be appropriate? It
seems to me this makes assumptions about the baseline quality of the existing
codebase. Surely sometimes buying a new car makes more sense than trying to
fix up an old one?

~~~
kyberias
Your car buying analogy is flawed. When you buy a new car, someone has built
it for you. It's cost effective because the manufacturer builds a great number
of them. You can be fairly certain that it works and if it doesn't you'll have
a guarantee.

When you rewrite a software system, you do it yourself. You don't know whether
you'll succeed. You might end up with worse end-results. The assumption here
is that no off-the-shelf software can be used to replace it. Hence rewrite.

~~~
tim333
Also see Splosky's well known essay
[https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

------
macca321
Another thing you can do is start recording all requests that cause changes to
the system in an event store (a la event sourcing). Once you have this in
place, you can use the event stream to project a new read model (e.g.a new,
coherent, database structure).

------
jhgjklj
The biggest problem in improving legacy codebase is that the people who have
involved with have been too long and are completely using old techinques and
as a new developer you can not change them, they will change you which means
its hard to improve.

------
rattray
> Yes, but all this will take too much time!

I'm actually quite curious; how long _does_ this process typically take you?

What are the most relevant factors on which it scales? Messiness of existing
code? Number of modules/LOC? Existing test coverage?

~~~
jacquesm
Good questions.

How long it takes depends on the mandate given by management. Sometimes it's
30 days to get from zero to something stable and incrementally improvable at
which point we hand back to the company with maybe a transition period where
we still manage the project. Sometimes it is just a feasibility study in which
case it can be even shorter. But if it is boots-in-the-mud (which is where the
real money is) then it can be up to a year.

It scales just fine provided you have the people and this is more often than
not a huge problem. It's happened that we had to leave people in place for
months or even years after the project was in essence done simply because as
soon as our backs were turned it was back to the usual methods. That's
actually really frustrating when it happens.

Existing test coverage can speed things up but if the tests are brittle or
otherwise not helpful can actually make things much worse.

As for number of modules or LOC: if you're doing a platform switch that can
really eat up time, if it is just to bring things under control then it does
not really matter much.

One you did not mention, but which can greatly impact the speed with which you
can move is the quality of existing documentation. If there is anything at
all, especially up to date requirements documentation that can serve as a tie
breaker between a suspected bug or a feature it can make a huge difference.

~~~
rattray
Very interesting, thanks!

------
jscn
Genuinely would like to know how anyone has managed to do both of:

> write as many end-to-end and integration tests as you can

and

> make sure your tests run fast enough to run the full set of tests after
> every commit

------
btbuildem
Thanks for posting, some excellent high-level advice.

------
jefurii
Stick around that startup long enough and this a good set of things to do with
your own code.

------
jofer
I agree with everything said, but I think they assumed a well-maintained and
highly functionality legacy codebase. In my experience, there are a few steps
before any of those.

\---

1\. Find out which functionality is still used and which functionality is
critical

Management will always say "all of it". The problem is that what they're aware
of is usually the tip of the iceberg in terms of what functionality is
supported. In most large legacy codebases, you'll have major sections of the
application that have sat unused or disabled for a couple of decades. Find out
what users and management actually think the application does and why they're
looking to resurrect it. The key is to make sure you know what is business
critical functionality vs "nice to have". That may happen to be the portions
of the application that are currently deliberately disabled.

Next, figure out who the users are. Are there any? Do you have any way to
tell? If not, if it's an internal application, find someone who used it in the
past. It's often illuminating to find out what people are actually using the
application for. It may not be the application's original/primary purpose.

\---

2\. Is the project under version control? If not, get something in place
before you change anything.

This one is obvious, but you'd be surprised how often it comes up.
Particularly at large, non-tech companies, it's common for developers to not
use version control. I've inherited multi-million line code bases that did not
use version control at all. I know of several others in the wild at big
corporations. Hopefully you'll never run into these, but if we're talking
about legacy systems, it's important to take a step back.

One other note: If it's under any version control at all, resist the urge to
change what it's under. CVS is rudimentary, but it's functional. SVN is a lot
nicer than people think it is. Hold off on moving things to git/whatever just
because you're more comfortable with it. Whatever history is there is
valuable, and you invariably lose more than you think you will when migrating
to a new version control system. (This isn't to say don't move, it's just to
say put that off until you know the history of the codebase in more detail.)

\---

3\. Is there a clear build and deployment process? If not, set one up.

Once again, hopefully this isn't an issue.

I've seen large projects that did not have a unified build system, just a
scattered mix of shell scripts and isolated makefiles. If there's no way to
build the entire project, it's an immediate pain point. If that's the case,
focus on the build system first, before touching the rest of the codebase.
Even for a project which excellent processes in place, reviewing the build
system in detail is not a bad way to start learning the overall architecture
of the system.

More commonly, deployment is a cumbersome process. Sometimes cumbersome
deployment may be an organizational issue, and not something that has a
technical solution. In that case, make sure you have a painless way to deploy
to an isolated development environment of some sort. Make sure you can run
things in a sandboxed environment. If there are organizational issues around
deploying to a development setup, those are battles you need to fight
immediately.

~~~
scott00
I don't completely understand your warning to stick with the existing version
control environment. Just because you switch development to git doesn't mean
you delete the old CVS archive. Isn't consulting the old archive sufficient
whenever you're doing a significant historical investigation?

~~~
jofer
There are a couple of reasons I'd argue it's best to avoid switching version
control environments early on.

1\. Integration with whatever build/issue tracking systems are present is
worth preserving until you have the time to recreate it properly.

Duplicating what's already there under the new environment is always more
problematic than it looks like at first glance. This is especially true when
you're dealing with any in-house components (which usually manage to show up
somewhere).

2\. A clean break where you leave the old VCS behind and archived is tempting,
but it's rarely ideal in the long-term.

The old archive is likely to wind up being deleted/lost/bitrotted/etc after a
year or two. Invariably, you wind up in a spot a few years down the line where
it would be useful to have the full commit history, and the old VCS winds up
being inaccessible. Ideally, you'd want to preserve as much history as
possible when migrating. However, trying to correctly preserve commit history
(and associated issue tracker info, etc) is always a time-sink, in my
experience. It's easy for simple projects, and a real pain for complex
projects with a weird, long history. Choose the time that you attempt it
wisely.

\---

Again, I'm not saying don't move, I'm just saying that it almost always winds
up taking a lot of time and effort. I'd argue you're better off spending that
time and effort on other portions of the project early on.

Also, things like git-svn can be real lifesavers in some of these cases,
though they do add an extra layer of complexity. If you do want to use a
different VCS, I'd take the git-svn/etc approach until you're sure there are
are no extra integration problems.

All that said, yeah, if there's no history and no integration with other
systems/tools, go straight for something modern!

------
crankyadmin
Delete it...

(Speaking from experience from work)

~~~
UXCODE
Is there someone who left the legacy code and became beneficial?

As a result of my work experience, it was more beneficial to delete the legacy
code and only provide the necessary functions when renewing the system.

------
korzun
> Before you make any changes at all write as many end-to-end and integration
> tests as you can.

I don't agree with this. People can't write proper coverage for a code base
that they 'fully understand'. You will most likely end up writing tests for
very obvious things or low hanging fruits; the unknowns will still seep
through at one point or another.

Forget about refactoring code just to comply with your tests and breaking the
rest of the architecture in the process. It will pass your 'test' but will
fail in production.

What you should be doing is:

1\. Perform architecture discovery and documentation (helps you with
remembering things).

2\. Look over last N commits/deliverables to understand how things are
integrating with each other. It's very helpful to know how code evolved over
time.

3\. Identify your roadmap and what sort of impact it will have on the legacy
code.

4\. Commit to the roadmap. Understand the scope of the impact for your
anything you add/remove. Account for code, integrations, caching, database,
and documentation.

5\. Don't forget about things like jobs and anything that might be pulling
data from your systems.

Identifying what will be changing and adjusting your discovery to accommodate
those changes as you go is a better approach from my point of view.

By the time you reach the development phase that touches 5% of architecture,
your knowledge of 95% of design will be useless, and in six months you will
forget it anyways.

You don't cut a tree with a knife to break a branch.

~~~
markatkinson
I gasped when I saw this article at the top of HN due to the relevance of it
right now in my life. I am currently working on a real monolithic jambalaya
that suffers from a lack of documentation, architecture, extreme abstraction,
rampant tight coupling and no previous source control.

Your point on performing architecture discovery and documentation is spot on.
It has really helped me to strip away the mess and understand the flow of the
logic and maybe even shine some light on the parts of code that are valuable.

~~~
xivusr
This article is painfully relevant for me. I just reviewed a code base with
zero tests, documentation, no inheritance- rampant duplication.

It's a simple event tracking system and yet there are 75 models, and over 80
controllers. This was outsourced to a team which coincidentally appears to
have close to that many devs working there. The good news is that according to
the client "it pretty much works". I know better than to suggest a Big Bang -
though it seems so appealing.

Documentation and code freeze are my next steps and implementing end to end
testing.

------
pinaceae
First and foremost, do not assume that everyone who ever worked on the code
before is a bumbling idiot. assume the opposite.

If it's code that has been running successfully in production for years, be
humble.

Bugifxes, shortcuts, restraints - all are real life and prevent perfect code
and documentation under pressure.

The team at Salesforce.com is doing a massive re-platforming right now with
their switch to Lightning. Should provide a few good stories, switching over
millions of paying users, not fucking up billions in revenue.

------
jlebrech
do refactoring you should have known at the time and not the brand new fangled
way to do things, that way each new way fades into the other.

