
Avoid rewriting a legacy system from scratch by strangling it - ahuth
https://understandlegacycode.com/blog/avoid-rewriting-a-legacy-system-from-scratch-by-strangling-it/
======
d_watt
I had the experience of inheriting a codebase that was halfway through the
process of being “strangled”, and it was a nightmare. The biggest reason being
that it's not a "fail safe" way to plan a project. In this particular case, a
full replacement was probably a 12 month affair, but due to poor execution and
business needs, priorities shifted 6 months in. It was full of compromises. In
some places, instead replacing an API completely, it would call into the old
system, and the decorate the response with some extra info. Auth had to be
duplicated in both layers. Debugging was awful.

While some of the issues could be chalked up to "not doing it right," at the
core of it, the process of strangulation described in the article leaves the
overall architecture in a much more confusing state for the lifetime of the
project, and if you have to shift, you've created vastly more tech debt then
you had with the original service, as you now have a distributed systems
problem. Unless you can execute on it quickly, I think it's a very dangerous
way to fix tech debt, avoiding fixing the core issues, and instead planning
for a happy path where you can just replace everything.

If you absolutely think you need to quarantine the existing code, I'd
recommend putting a dedicated proxy in place that routes either to the old
service or the new service, and not mixing the proxy and the new code. That
separation of concerns makes it much easier to debug, and vastly reduces the
likelihood of creating a system of distributed spaghetti. What I’d really
recommend, though, is understanding the core codebase that powers the
business, and make iterative improvements there, rather than throwing it all
out.

~~~
james_s_tayler
So if strangling is out and big-bang rewrites are out...

What do we do?

~~~
d_watt
In my experience, most of the time the right decision for the business is
investing the time into the core product. If you have 6 months (as stated in
the article), what could you do with that time if you could dedicate it to
improve the core codebase, rather than a rewrite? Worse case scenario is after
6 months you don't have a perfect codebase, but you've made it better, and you
don't have yet another layer to deal with.

Certainly, there are times where that's not reasonable. Maybe the core
codebase is on a proprietary technology. Maybe it's built around an EoL
framework. Maybe it's built on on a FOTM language that you can't hire for at
all anymore.

In those cases, which are more rare than developers like to admit, I think a
piecemeal migration makes sense. In my experience, it's better to do it by
altering the consumers. Whether it's a frontend that can point to another
endpoint, or an API gateway that can switch out the service it's pointed to,
or making a shim layer (proxy), that can serve that purpose.

Having a service that is both the proxy and the new application is a poor
separation of concerns, very difficult to reason about, and makes it too easy
to intermingle the logic between the old and the new. In my experience.

~~~
bdamm
Your guidance is true. However our industry is unfortunately also littered
with well intended refactorings that resulted in any more bugs, feature
changes, and customer impacts than the original developer intended, or even
conceptualized as possible. So it matters a lot.

For example, refactoring "to provide abstraction so future work is easier" is
90% of the time an error.

~~~
cjfd
At some point it starts to sound a bit like 'whatever you do, don't touch a
keyboard'. Rewriting everything from scratch? Very risky, please don't.
Strangling you old application? Some comments here tell us that we should not.
Perhaps they are right. Refactor your current application? 90% of the time it
is an error.

~~~
SirSavary
Had the pleasure of working at a company like this. Don't rewrite, don't you
do refactor, and a proxy? How do you even spell that?

The end result was me losing my "mojo" after a year of producing effectively
nothing. I'm back together now but my god it's a terrifying feeling when
you've been programming for over ten years and one day you can't make the code
flow out of your hands anymore.

------
ineedasername
My recent experience with an ERP, specifically some major bolt-on modules, was
that the vendor simply made the switch to a new platform that had maybe 60% of
the capabities. A roadmap (which has actually been fairly accurate) showed
about 3 years to get to 90%.

New customers were pushed to the new product. Existing one were encourage to
do so and temporarily live without prior features (usually with temp workers
doing things manually) for a deep discount. Those who had to stay with the
legacy system were told to expect nothing but bug fixes and compliance-related
updates (for federal programs and reporting requirements) and that if they
needed something more than that, they'd either need to built their own bolt on
(there was a robust, if clunky sdk) or pay contractors to do so.

It sucked, yeah, but it seemed like a reasonable way to go about such a
transition that was always going to make people unhappy.

~~~
iamaelephant
This is more or less the model that Basecamp uses with their rewrites. New
product with new features and a strong encouragement to come along, but
guaranteed support if you can't.

------
wiradikusuma
I'm in the middle of a rewrite. It's very challenging, but the alternative is
worse (a sinking ship). My lessons learned:

    
    
      1. Do it sooner
      2. Get full commitment from stakeholders
      3. Agree on feature freeze
      4. Get it done quickly
      5. Don't over promise, esp about the timeline
      6. Focus on delivering big/important items first (MVP)
      7. Appoint a benevolent dictator, don't assemble a committee to avoid second-system  syndrome
      8. Have test scenarios ready (black box)
    

Unfortunately they all depend on another, e.g. the longer you wait for
rewrite, the harder it will be to finish it (feature creep).

I will write a blog post when it's done successfully, otherwise I will hide
under rock.

------
layer8
Of course, that approach is difficult to apply if the interface is a
significant part of, or deeply entangled with, the pain points that the
rewrite is intended to solve.

~~~
marczellm
It is also difficult if there's an ill-defined interface that exposes
implementation details, or no interface at all.

It is also difficult to apply if we are not talking a server/client app but a
desktop app, being rewritten in a different language or incompatible GUI
toolkit.

~~~
fatnoah
>It is also difficult if there's an ill-defined interface that exposes
implementation details, or no interface at all.

I've successfully strangled a large codebase that had these issues, though we
did have the benefit of a client/server application so there was a place to
actually define interfaces.

We started in the middle by creating a logical service layer to group all the
bits of like functionality. We left the implementations alone, just moved them
to align with the new "service" layer. We slowly worked our way up the stack,
including defining a new client API, and then changed the existing API methods
to be a shim on top of the new methods.

We were then able to update client code to use the new interface, but the old
ones stuck around for about 24 months while we sunsetted older clients. The
actual strangulation took about 2-3x as long a stop-and-rewrite effort, but
there were VERY few regressions because we were still in a constant test and
release cycle and managed the scope of strangulation changes in each release
AND all of our testing was still valid since we weren't changing input/outputs
or any expected behaviors.

------
aargh_aargh
The comma, present in the article and missing in HN, completely changes the
meaning of the title.

~~~
kspacewalk2
What is the alternative meaning? To be, the comma changes nothing. It's
unambiguous either way.

~~~
ummonk
It can also be read as "avoid (rewriting a legacy system from scratch by
strangling it)"

------
rusticpenn
We did it very differently in our group 1\. The developers of the old tool
continued to work on it. 2\. A new team took requirements from the old team
and filtered to make them more meaningful 3\. Designed a system architechture
that would work with the targeted workflow 4\. Designed a minimal version and
ran it with a new branding next to the old one. 5\. Reached feature parity
with the old one and dumped it

The important thing to note is that the new tool does not do everything the
old tool does. The workflow is also different from the old one. However the
customers loved the new one as it was simpler, faster and more robust to use.

------
noobermin
Am I the only one in the IT adjacent world who thinks the inverse of this is
the larger problem (churn, NIH, reinventing the wheel) in software today?

~~~
dano
No, you are not the only one. The quest for the new shiny thing is stronger
than ever today. New frameworks, new languages, silver bullets everywhere.
Good decision making frameworks are in tremendous need in the technology world
to help everyone understand the ramifications of the choices they're trying to
make.

~~~
charlieflowers
> Good decision making frameworks are in tremendous need in the technology

Can you/anyone recommend any?

------
mannykannot
One thing that complicates matters somewhat (as if they were not already
complicated) is at the decision point marked _isRoundtrip?_ in the fourth
(penultimate) diagram, where the affirmative case is handled within the new
system.

Given, however, what is being posited -- a legacy system that is not modular
and which contains unrefactorable pathological dependencies -- the old system
must also handle this case in parallel, in order to be in the correct state to
handle future requests of a type that still need to be delegated to the old
system.

This parallel implementation may have to persist well into the replacement
process, and the requirement for it to do so may mean that you still have to
do double implementation of features and fixes for most of the transition.

~~~
SirSavary
Requiring the legacy system to handle the request in parallel is exactly what
this method is trying to avoid.

If your old system has dependencies that you don't understand, I don't see the
strangulation method working at all.

------
kazinator
Fantasy:

> Here’s the plan:

> Have the new code acts as a proxy for the old code. Users use the new
> system, but it just redirects to the old one.

> Re-implement each behavior to the new codebase, with no change from the end-
> user perspective. Progressively fade away the old code by making users
> consume the new behavior. Delete the old, unused code.

Here is the reality:

1\. People do the above incompletely; their deletion of the old system slows
down and then they move on to another project or organization, leaving a
situation in which 7% of the old system still remains.

2\. People iterate on the entire above process, ending up with multiple
generations of systems, which still have bits of all their predecessors in
them.

------
khendron
I think an overlooked aspect of a legacy system that makes "strangling"
difficult is that nobody fully understands the behaviours of the system
anymore.

It is really hard to replace the functionality of a piece of code when you
don't know 100% what that functionality is.

~~~
ahuth
This is a good point.

I'm working on moving some functionality out of a system - not replacing the
system. And it's still extremely challenging to actually figure out everything
that's going on with just the thing I'm moving out.

------
myth_drannon
I see it working for a backend code, legacy UI systems has way more coupling
so it would be better to do a complete rewrite. If you have a legacy framework
A and you start replacing it with framework B, component by component, it will
have to follow the practices of framework A and basically you are going to be
writing legacy style code in the new framework B which is much worse than
having legacy framework A. Because framework B is now written in a completely
alien way and not how it was intended to be used.

------
pflanze
I have written a set of libraries and dev tools (like a better repl) for Perl
(the FunctionalPerl project) with the idea to help write better code in that
language, and to give me and whoever joins in such efforts a way to hopefully
save a legacy code base. Maybe it is the case that when a company reaches the
point where they feel their code base has become unmaintainable, it can still
be saved by using the tools and programming approaches that I can provide.
That (other than, and more than just, "because I can") is the major motivation
why I invested into that project. But I wonder how much it will help. I
haven't had the chance to try it out so far. I got to know companies that have
begun to split up Perl applications into micro services and then move the
individual services to other languages, and they don't necessarily have an
interest in my approach. But I'm also very diffident reaching out to more
companies, due to worrying about how much pain it would be to deal with (and
how likely it would fail)--investing my time into newer tech (Haskell, Rust
etc.) instead looks tempting in comparison. Should I continue to reach out to
companies to find the right case (presumably working as a contractor, with
some big bonus if successful)? Any insights?

------
Cthulhu_
I'm dealing with a rewrite at the moment (that is, I was hired to start
rewriting an existing web application). I want to apply this pattern but the
existing codebase was already dated by the time it was written. It's a huge
load of mixed responsibilities, globals (it's a PHP backend), RPC-like http
API (every request is a post containing an entity name, action, parameter, and
additional parameters handled in a big switch), etc. Files of 13K lines of
code.

So far I'm stuck in the overthinking phase of the new application. And as the
article states, I'm asked to keep adding new features to the existing
application - nothing big (because individual things aren't big), but at the
same time, I've been adding a REST API on top of the existing codebase for the
past few weeks. It's satisfying in a way but it hurts every time I have to
interact with the existing codebase and figure out what it's doing.

Plus we're not going to get rid of the existing application at this rate. I
should probably set myself limits - that is, I'll postpone and refuse work on
the existing application if it's not super critical. And quit if they're not
committed to the rewrite before the summer.

------
jillesvangurp
Strangling is a good way to slowly replace a system by simply starting to work
around it until whatever value it adds is so diminished you can safely pull
the plug.

Big software rewrites are extremely risky because they take inevitably more
time than people are able to estimate and also the outcome is not always
guaranteed.

An evolutionary approach is better because it allows you to focus on more
realistic short term goals and it allows you to adapt based on priorities.
Strangling is essentially evolutionary and much less risky. It boils down to
basically deciding to work around rather than patch up software and minimize
further investment in the old software.

Also, there are some good software patterns out there for doing it responsibly
(e.g. introducing proxies and then gradually replacing the proxy with an
alternate solution).

------
ncmncm
I did a rewrite.

The old code worked, but was slow. Adding features would make it slower. Lock-
free queues and threads everywhere, packet buffers bouncing from input queues
to delivery queues to free queues to free lists, threads manfully shuttling
them around, with a bit of actual work done at one stage.

Replaced it all with one big-ass ring buffer and one writer process per NIC
interface. Readers in separate processes map and watch the ring buffer, and
can be killed and started anytime. Packets are all processed in place, not
copied, not freed, just overwritten in due time.

It took a few months. Now a single 2U server and a disk array captures all New
York and Chicago market activity (commodity futures excepted).

I kept the part that did the little work, scrapped the rest.

C++, mmap, hugepages FTW.

------
corneliusphi2
Having successfully replaced a legacy system one time we got it to work by
turning the legacy system's business logic into a library that the new system
could use. This key is just replace the underlying architecture without
reimplementing years of work.

------
d--b
What the article describes is a rewrite! In the end there will be no more
legacy code left...

What the article is saying is: don’t rewrite your code in one go, but rather
cut the system in pieces that are independent and rewrite each in successive
phases.

It’s kind of obvious, though. And the difficult part of the rewrite is
actually to slice the original code in indépendant chunks. More often than not
legacy systems are riddle with leaky abstractions and dependencies (the
infamous spaghetti code), that’s a hell to disentangle.

------
sivanmz
Often, the clients of legacy code are old too, and are hard coded to access
it.

I've done this, but on a private branch, with a single merge to trunk in the
end. Starting with complex integration tests, new interfaces were gradually
defined and made the code testable, giving me the needed confidence.

------
shujito
So, how can this be applied to mobile app development? I can think of adding
dependencies and new code to get along with the old code in the app, but it
will cause a considerable bloat (size) of the app, which it can be noticeable
by management, unlike web services/sites/apps

~~~
inanutshellus
Not to mention legacy thick apps! In my case, legacy thick-apps we don't have
the source code for! Arg!

------
rbosinger
All of this gets harder if it's your data model that is the problem. So, get
those data models right early if you can!

~~~
glacials
Easier said than done -- data models frequently evolve in hard-to-predict
ways. Instead, build your data model in a way that is easily pliable, and
won't need a complete refactor because of something simple like a hot key or a
new index.

------
smadge
Does the strangler have to be a separate server? Couldn't you wrap the
existing code within the same binary?

------
p0nce
Does this happen in practice or the old product is just replaced by a newer
_competitor_?

~~~
uncletaco
Ignoring consumer products for a moment: we've done this at my company for an
internal app.

Our legacy system was built as a desktop app for internal uses that became
difficult to both scale and comply with our regulatory obligations, so we
began buildin out an api around its core business functionality and built
various front ends to speak with it throughout our company.

It has been a middling success, mostly because change requires political
capital that might not be there six months to after initiating it. However I
think overall we've improved the product and I don't think a massive rewrite
would've gone nearly as far due to political winds shifting and the rewrite
getting deemed a waste of time by the new new powers that be.

~~~
p0nce
It's also my experience that political "will" for change lasts about six
month...

------
utxaa
but how come the linux and BSD kernels, emacs (since RMS), the java language,
even python (python3 was not a rewrite), git, hg, django, etc ... have never
been rewritten from scratch?

what is the lesson here?

------
gregknicholson
> After 7 months, you start testing the new version.

Translation: after 7 months you stop mucking about and start trying to produce
something useful.

------
utxaa
i thought this was an article about not rewriting a legacy system.

------
benignslime
[https://martinfowler.com/bliki/StranglerFigApplication.html](https://martinfowler.com/bliki/StranglerFigApplication.html)

If you wanted to read the referenced article. This was the first thing I
thought of. I appreciate Fowler's writing style and his sourcing. He always
links some interesting stuff.

