Hacker News new | past | comments | ask | show | jobs | submit login
Avoid rewriting a legacy system from scratch by strangling it (understandlegacycode.com)
397 points by ahuth 40 days ago | hide | past | web | favorite | 121 comments

I had the experience of inheriting a codebase that was halfway through the process of being “strangled”, and it was a nightmare. The biggest reason being that it's not a "fail safe" way to plan a project. In this particular case, a full replacement was probably a 12 month affair, but due to poor execution and business needs, priorities shifted 6 months in. It was full of compromises. In some places, instead replacing an API completely, it would call into the old system, and the decorate the response with some extra info. Auth had to be duplicated in both layers. Debugging was awful.

While some of the issues could be chalked up to "not doing it right," at the core of it, the process of strangulation described in the article leaves the overall architecture in a much more confusing state for the lifetime of the project, and if you have to shift, you've created vastly more tech debt then you had with the original service, as you now have a distributed systems problem. Unless you can execute on it quickly, I think it's a very dangerous way to fix tech debt, avoiding fixing the core issues, and instead planning for a happy path where you can just replace everything.

If you absolutely think you need to quarantine the existing code, I'd recommend putting a dedicated proxy in place that routes either to the old service or the new service, and not mixing the proxy and the new code. That separation of concerns makes it much easier to debug, and vastly reduces the likelihood of creating a system of distributed spaghetti. What I’d really recommend, though, is understanding the core codebase that powers the business, and make iterative improvements there, rather than throwing it all out.

This is the case when you get involved in any mid-refactor system or codebase. It has to evolve to it's destination and it always happens over time bit-by-bit. Quitting in the middle of a murder isn't likely to give you the trunk full of insurance money, nor the pinacoladas on the beach living happily ever after with the victim. These projects, once started are best seen through, else you end up in a worse position than you started.

I agree, but to me that's why it's dangerous to think about refactoring the whole codebase at once, whether strangler or rewrite. Project design is just like software design, you need to be design it for failure scenarios, eg business priorities shifting, poor execution, etc, without leaving the overall situation in a worse state than when you found it.

If you design a long lived code cleanup project where the only option is "we can't stop this project for another 6 months or we'll have an even bigger mess on our hands," that's poor project design. In this way, rewrites are almost safer. They're usually "atomic" and you can "roll back" by throwing it out if you can't focus on them.

All this is based on dynamic business and product needs that are shifting every couple of quarters, and a technical team that's getting new people, and shifting off more experienced folks every couple years. That seems to be a pretty standard expectation to me, though I'll admit I'm mostly looking at this through the lens of early, mid, and late stage startups. It may be big companies that move slowly and can guarantee resources for a larger project for a long time are a different story.

I am in a similar position. We started the strangling process 2 years ago but due to management not wanting to disrupt old clients by shifting them to the new code (same stuff updated UI) the strangling strategy has basically shifted into us maintaining two copies of many feature. Success!

That's the hardest part of any strangulation. The process can not involve any new features. If it does you will never transition everything.

If you're maintaining two copies of the same features, then, by definition, you're not exactly following the process described? The process was developed to avoid exactly that

So if strangling is out and big-bang rewrites are out...

What do we do?

In my experience, most of the time the right decision for the business is investing the time into the core product. If you have 6 months (as stated in the article), what could you do with that time if you could dedicate it to improve the core codebase, rather than a rewrite? Worse case scenario is after 6 months you don't have a perfect codebase, but you've made it better, and you don't have yet another layer to deal with.

Certainly, there are times where that's not reasonable. Maybe the core codebase is on a proprietary technology. Maybe it's built around an EoL framework. Maybe it's built on on a FOTM language that you can't hire for at all anymore.

In those cases, which are more rare than developers like to admit, I think a piecemeal migration makes sense. In my experience, it's better to do it by altering the consumers. Whether it's a frontend that can point to another endpoint, or an API gateway that can switch out the service it's pointed to, or making a shim layer (proxy), that can serve that purpose.

Having a service that is both the proxy and the new application is a poor separation of concerns, very difficult to reason about, and makes it too easy to intermingle the logic between the old and the new. In my experience.

Your guidance is true. However our industry is unfortunately also littered with well intended refactorings that resulted in any more bugs, feature changes, and customer impacts than the original developer intended, or even conceptualized as possible. So it matters a lot.

For example, refactoring "to provide abstraction so future work is easier" is 90% of the time an error.

At some point it starts to sound a bit like 'whatever you do, don't touch a keyboard'. Rewriting everything from scratch? Very risky, please don't. Strangling you old application? Some comments here tell us that we should not. Perhaps they are right. Refactor your current application? 90% of the time it is an error.

Had the pleasure of working at a company like this. Don't rewrite, don't you do refactor, and a proxy? How do you even spell that?

The end result was me losing my "mojo" after a year of producing effectively nothing. I'm back together now but my god it's a terrifying feeling when you've been programming for over ten years and one day you can't make the code flow out of your hands anymore.

> Having a service that is both the proxy and the new application is a poor separation of concerns

I must have missed that in the first read. Yes, do not do this.

If anything survives the hype cycle for microservices, I hope it’s this: if your company isn’t surviving more than a couple of technology cycles, especially when they seem to be getting faster, then what are you doing this for?

Stop trying to replace one system with another one. Yours will be old and busted someday too. And certainly don’t let one system be the gateway to the other. Put them both behind a thin layer that handles only a couple of concerns (say, auth, making sure all requests have a correlation ID, perf statistics, maybe pick 2).

What we are both suggesting, I have still always called the strangler pattern. It just has three actors instead of two.

> What we are both suggesting, I have still always called the strangler pattern. It just has three actors instead of two.

That makes sense, in practice I've seen multiple situations where it was one "monolith" directly in front of another, and rather than a strangler, it was a tumor that ended up with things intertwined. Treating it like a service oriented architecture is a good way to think about it.

You still end up in situations where logic needs to be shared. Do you duplicate Auth? What happens when a new user signs up, do you sync users over? Do they read from the same database(s)? I'll also agree with the over hyped ness of microservices, but I think breaking out services and "rewriting" them to me is the way to achieve the overall goal. Even in the strangler pattern, you'll end up needing to share functionality, and either you have to monolith services calling into each other, or a service oriented architecture.

First big system I worked on, we did authentication at the border, setting a header (which we stripped off of any incoming requests, of course).

Authorization gets duplicated, but that’s going to happen at some point anyway (and how many times does it get duplicated in a microservice architecture?)

I’ve seen the tumor thing too. A few of the most egregious services (memory hungry, least coherent to the whole, etc) get split off to be independent, but lots of things do not. And sometimes the old team sheds people due to it not being the cool thing anymore and then you’re really fucked.

[edit to add] I have developed some very serious resistance to the 'low hanging fruit' model of development. I think it is the direct cause of the Lava Flow Antipattern.

If the point is to completely remove a problem from the system, starting with the most popular one and working toward the more boring ones means that at some point you will lose the argument in a prioritization meeting, the lava will cool and you'll be stuck with both. I would expect a higher likelihood of success if you start at one 'end' of the system and guarantee that high priority examples are evenly distributed throughout the effort.

That’s not the worst case scenario. The worst case scenario is that the rebuilt system is worse and full of a bunch of new weird bugs. There is no guarantee that a rewrite ends up better.

Probably the same thing a doctor does when a patient ignores a pain and they have to tell them it's stage 4 cancer: Provide palliative care and your condolences.

At some point you have to ask how the company let things get this bad, and what they've done concretely to avoid it happening again. And whether you really want to participate in the heroic levels of effort it's going to take to keep the patient alive in 4 years.

Louis C.K has a joke about how his ankle was bothering him and the doctor told him, "Well, it does that now". That's it? "It does that now?!" Monologue monologue. The first time I laughed along with his joke. On subsequent viewings I said, "Hold on..."

Louis is a seriously sedentary guy. He's been letting things go even more than I have, which is saying something. But I do exercise. If I get to pick the feat of strength, I could make the leaderboard. When I bunged up my hip I got PT, and I went, and I could do the exercises. I didn't do all of them though, so it still bugs me from time to time (also on the last day the PT did something to my good knee, which is still bothering me a year later). If I were in proper athlete shape, they might offer more, because my prognosis for recovery from something more aggressive would be high.

Nobody is going to look at Louis, or listen to Louis, and offer that. Louis is not going to do any of that shit. He's just going to crack jokes about it, or about you... as is amply evidenced by that set. So yeah, your ankle does that now, sport. (BTW, my ankle is fucked up too, and I do the exercises he pantomimed. They do actually help quite a bit, which he would know if he had tried them even twice)

And yet we do it at work all the time. And I am seriously beginning to wonder what would happen if we just triaged these situations and walked away from some of them. If we just let the saner competitor win.

This is a nice fantasy, but in practice, you walk away from one dumpster fire, and either 1. walk into a new one or 2. write another one. I've done both multiple time. It's inevitable really, when "legacy" means "code I didn't write" and especially with the evergreen groups of developers we have now (5 year "seniors"). It's amazing to me how otherwise competent, well intentioned teams can use best practices and still create balls of spaghetti.

I’ve seen enough people who are in love with their excuses to know that some places are just going to have legacy code. If I stay there, there is some other legacy code lover who is going to take a job bothering another team that would have been happy to have me.

Nothing is perfect anywhere, sure, but there are degrees.

You get the spaghetti when you take your eye off the ball. When you start making excuses for why we are going to do dumb things just “for now”. And by the time you notice the three year old comments that say “this is temporary”, well, your processes are set. This is the culture, and if you make a stink, people are thinking more about “why now” than “why not”.

The only way I know to get out of that is starting the conversation early about how we can do the things we always do faster and more accurately, and how we can do things we’ve never tried before. But that tree takes a while to bear fruit. And it requires coworkers who are willing to try something new, because for sure as shit nobody has shown them how to do this before.

I'm beginning to think there are no best practices, only best executions.

"I'm beginning to think there are no best practices, only best excuses."

I read it like that at first. Think about it, it's also very true.

'Best practices' is to developers as belly rubs are to alligators. There are a few of these phrases that seem to trigger some sort of docile reaction out of us.

We should probably get that checked out.

Agreed. I've noticed the exact same thing with the phrases "increases productivity" and "increases maintainability"

Because what I've also noticed is true with respect to both of those phrases is...

It. Never. Actually. Does.

It's. Not. Inevitable. I'm sorry that that has been your experience, but it doesn't have to be this way.

As I implied above, I do think there is a point of no return, though. I didn't always feel that way, but you can only be accused of "tilting at windmills" so often before you re-evaluate. I don't think I have gotten properly intimidated by a piece of code since year 2 of my career. I had never encountered a knot I could not untangle (and harbored a feeling that Alexander the Great was a coward for using his sword). But now I have encountered knots so tangled that the most pessimistic estimates for a rewrite (but only with all new people) would be faster, and I can now sympathize with Alexander.

There are only so many people you get to teach in your lifetime, and only so many projects. If you only take the easy ones you never grow and you will reach fewer people. But if you take every hard luck case you also won't get far either.

A working relationship can be broken the way a married couple can be wrong for each other. Sometimes divorce is the correct course of action.

It's not enough to be right, you have to be productive too. And sometimes you only get both if you move toward people who are closer to agreeing with you.

Management: Guys... Productions on fire!???

Devs: Oh yeah... It does that now.


Literally my life right now.

I'm in the interesting position of having just walked into a situation where the paitent was already dead, and organs had already been harvested from the old company. Rightfully angry customers hadn't had access to the legacy system (it's costs killed the company) in several months.

In the words of Monty Pyhon it was a dead parrot, and I came in to say "It's just resting... It's pining for the Fjords!"

I'm almost a month in on a complete re-architect from scratch. Most customers are overjoyed that the business will be back soon, but I'm afraid their good will is going to be short lived if things aren't running soon.

Is there no way to resurrect the old system using the pattern described in the article?

What's wrong with constant ongoing evolution? Software is the ultimate metaphysical playdough. It can always be changed. Piece by piece. Step by step. Until it becomes something different. It's just... not exciting to do it that way.

Evolution will not let you escape local minima, intrinsic/sprawling design decisions that cannot be evolved away from.

I am sitting on a codebase whose oldest line of code is about 20+ years old and has evolved successfully in that time such that product it was 20 years ago and todays product are unrecognizable from one another. Its database schema is even older, encoding decisions made 30 years ago. Reasonable decisions at the time, but no longer reasonable.

I am working on a major rewrite which will take a year. The nature of the changes required mean I cannot break it up into pieces and do it bit by bit as I have been for last couple of decades with major enhancements. A functioning product is all or nothing. As someone who is anti-rewrite, pro-evolve and accustomed to working on old codebases, and being my own business so its my own money on the line, the decision to embark on a 12 month rewrite is not taken lightly.

Can you please do a blog post at the end of the project to let us know how it pans out as a case study?

I think my original comment mischaracterized my project.

It is a refactor, but a refactor that is going to take about a year to complete, where there will be nothing to show for it until it is finished. From old codebase, probably about 20% to 30% of it will be ripped out and replaced.

Its not my first major project in terms of keeping this codebase capable. Previously written compilers/runtimes and IDE modules to keep it going : to gain control of codebase which was written in a commercial/proprietary dev environment into something I have full control over. That effort took a year.

The difference in this case is I am deliberately, intentionally throwing code away, alot of it, for the first time and contrary to my instincts. Still a post mortem might be interesting. My successful efforts to code my way out of a proprietary dev environment was an interesting and risky project, discussed at length with relevant dev communities at the time, but probably worth writing up one day.

Can you tell what area is this? Is it a backend service or something like software that runs on consumer's PCs?

Rich Client / Consumer PC. SQL backend (originally was proprietary DB), fairly tight coupling between UX and model, old school rich client event loop. An unhealthy amount of global state ( what the original programming language encouraged )

I think evolution can escape a local minima - however - I would imagine that most organizations aren't able to support the individuals who can lead an evolution toward a more long range goal that would require transiting the natural boundaries of a minima that an undirected evolution would naturally transit.

With regard to your own project, I find it somewhat hard to believe that you can rewrite a 20 year old code base in just one year's time. Assuming an output of 50 LOC a day - and 250 work days a year - which means the system you are replacing is under 15k LOC.

50 LOC a day seems pretty pessimistic when you have an existing system to work from, and 20 years experience with the domain!

As you’re anti-rewrite, do you have concerns that a years development time won’t be able to cover 20+ years of development work? I think it was Coding Horror that said this is the greatest fallacy with rewrites - decades of invisible features and bug fixes are suddenly forgotten. I’ve not been involved with a rewrite so I can’t really speak from experience but that seemed to make sense.

There are basically two options:

1. Strangler pattern - which allows you to continually advance functionality, but will take longer and requires delicate care.

2. Complete rewrite - the only successful way to do this is to code freeze your legacy app. It's risky because it means keeping the product features stale for awhile, but less cumbersome to advance once it's done.

Either way is risky. Choose your poison.

You can do incremental rewrites without the Strangler Pattern; Strangler Pattern, is based on the idea that you cannot modify the old code (though, in practice, it often does involve modifications, as the old code often doesn't have the right architecture to implement even Strangler without some modifications.)

When you accept that the old code is expensive but not actually impossible to modify, you can do an incremental rewrite without Strangler-style proxying, and not even needing to prioritize replacement of the user-facing components, and only or preferentially rewriting as necessary to make user-visible feature improvements or bug fixes, unlike Strangler Pattern’s preference for no-visible-effect initial replacements. (I call this “standard Ship of Theseus replacement”.)

This typically is even better for continuing to deploy new features than Strangler, but does leave you with a system without a single clean boundary layer between new-style components (the replacement/proxy system in Strangler)) and legacy, instead you have a system with mixed islands of legacy and new components (possibly multiple styles of legacy, if a third set of standards is adopted before replacement completes.) So, again, it's has its own risks/costs, but it's a third option.

Incremental rewrites assume that your framework of choice is still supported. In my experience, I would say 75% of the time someone is considering a rewrite, it's because their framework of choice is out of date, which makes it impossible to modify old code.

> Incremental rewrites assume that your framework of choice is still supported

You mean the one for the legacy? (Which I wouldn't call the framework of choice, since it's inherited, the new one is chosen...)

Sure it requires that the legacy code is, at least internally, supported or at least supportable, and sometimes that's not the case. Though on larger production systems, letting them get to a completely unsupported state is rare, and those are the systems where the choice between big-banf rewrite, strangler, and a more free incremental replacement are most consequential, IMO. If nothing other than a big-banf rewrite is possible, there's not a choice, much less a consequential one.

When I hear "legacy application", I basically assume the framework used is outdated/unsupported. That's my point, you can't incrementally rewrite that because you'd be rebuilding on a burning platform.

Example - you wrote your app in a custom PHP using PHP 5.6 and you're doing a rewrite in PHP v7 in Laravel.

> When I hear "legacy application", I basically assume the framework used is outdated/unsupported.

That's often the case, but the app itself is usually actively (if lightly, sometimes, because the cost of change is high and there is a lack of skilled maintainance personnel) even if the underlying platform is outdated and possibly even out of support (and in some cases long abandoned by the vendor.)

> That's my point, you can't incrementally rewrite that because you'd be rebuilding on a burning platform.

You absolutely can maintain software on an outdated and, often, unsupported platform (the latter can sometimes be a licensing problem, as you may not continue to have the legal right to use it.)

> You absolutely can maintain software on an outdated and, often, unsupported platform

Huh? I mean yes you can technically continue to use an outdated framework, but what do you do when there is CVE identified and someone exploits it? Do you just sit on your hands and wait until you can upgrade the whole framework months later?

There are explicit risks associated with continued use of an outdated and unsupported framework, so why anyone would continue to build on a burning platform is beyond me...

> Huh? I mean yes you can technically continue to use an outdated framework, but what do you do when there is CVE identified and someone exploits

Technically, you can do a big bang rewrite, but they fail at a very high rate on significant systems—and you're still using the unsupported system while you do the big bang rewrite, and quite possibly after it fails, so you even in the best case you haven't eliminated the problem you point to with doing an incremental rewrite. So, the basic problem exists either way. Incremental rewrites (strangler or otherwise) prioritizing the highest-risk components for earliest replacement is one plausible risk-mitigation strategy, but the right choice is going to depend on project details. When you eliminate real options because of fake “you can't do that” considerations, you increase the risk of choosing a suboptimal approach because you preemptively discarded the least-bad solution.

> Incremental rewrites

You keep using this term and I don't think you know what it means. The Strangler pattern IS an incremental rewrite. You've suggested that it's possible to introduce incremental rewrites to old code and all I was clarifying is that by refactoring code written in an old framework (e.g. PHP 5.6) does nothing to eliminate technical debt (adds to it, in fact). The Strangler pattern is most often used when you want to switch languages (e.g. Java -> Rails) or when an older paradigm doesn't have a straight migration path (e.g. WebForms -> .NET MVC). The new code using the new framework essentially strangles the old code.

For the record, I've advised over 100+ software companies, which I would say about 60%+/- of them are experiencing some type of major rewrite and of those, 9/10 are because they simply can't upgrade an outdated/unsupported/poorly architecture software framework. Trying to refactor an unsupported framework is simply not an option. You (1) either migrate it to the latest (if possible) and refactor over time, (2) strangle it with the new framework, or (3) rewrite it. That's it. Every other topic discussed is simply just one of those but semantically wrapped in some engineering jargon or nuance.

This is all consistent with the examples here: https://paulhammant.com/2013/07/14/legacy-application-strang...

- C++ -> Java Spring

- Powerbuilder/Sybase -> Swing

- VB6 -> .NET

- Java -> Rails

- Java/Swing -> Rails

The real world is FULL of these. I've seen many of them first hand. The rewrites you hear about in the "SV world" are either superfluous CTOs who are misguided into thinking they need to, for example, rewrite their Rails 4.2 app in Node because they think it'll get them more users, or represent real engineering feats that truly "blitzscale" startups entertain to maintain business continuity (Twitter's migration from Rails to Scala comes to mind).

> You keep using this term and I don't think you know what it means.

I'm pretty sure I do. But I'm also pretty sure you don't know what the term “Strangler Pattern” means (specifically, that you think it is equivalent to “incremental rewrite” rather than one specific approach to incremental rewrite.)

> The Strangler pattern IS an incremental rewrite.

Yes, but not all incremental rewrites are the Strangler Pattern; that's why my first post in this subthread points out that the choice isn't exclusively between Strangler and big bang rewrite, because incremental rewrites are possible without the Strangler Pattern. In fact, i alos discussed the specific differences that can arise between Strangler and non-strangler incremental rewrites.

> and all I was clarifying is that by refactoring code written in an old framework (e.g. PHP 5.6) does nothing to eliminate technical debt

That's not at all what you said, though it's possibly what you meant, if you were writing very imprecisely. If it is, though, it's odd to the point of non-sequitur as a response to anything I've written because I never suggested refactoring code while retaining an old framework, I suggested an incremental rewrite similar to what is done on Strangler but without (1) implementing a new-system proxy as a first (or, potentially any) step, or (2) adopting an “old code is deleted but never modified in the course of the transition" rule.

> The Strangler pattern is most often used when you want to switch languages (e.g. Java -> Rails) or when an older paradigm doesn't have a straight migration path (e.g. WebForms -> .NET MVC).

Yes, though there is no particular reason that either of those cases require Strangler for incremental replacement.

> For the record, I've advised over 100+ software companies

Good for you, but that's not at all relevant to the discussion.

> You (1) either migrate it to the latest (if possible) and refactor over time, (2) strangle it with the new framework, or (3) rewrite it. That's it.

No, it's not, unless you are using “strangle” much more broadly than the Strangler Pattern, which isn't just an incremental replacement by a particular strategy for incremental replacement characterized most notably by placing a request-intercepting facade in front of the old system.

> that you think it is equivalent to “incremental rewrite” rather than one specific approach to incremental rewrite.

So why would I use the word "an" as in, "The Strangler Pattern is an incremental rewrite", as opposed to "the"?

> That's not at all what you said, though it's possibly what you meant,

Weird, the following was my first response to you. Shrug...

>Incremental rewrites assume that your framework of choice is still supported. In my experience, I would say 75% of the time someone is considering a rewrite, it's because their framework of choice is out of date, which makes it impossible to modify old code.

FYI - My choice of the word "impossible" was poorly chosen. It's not impossible, it's just stupid.

The original parent was basically asking "if we can't do strangler or big bang for a legacy app what else is there"? You suggested that incremental rewriting is a 3rd option and clarified that a Strangler Pattern is a subset of an incremental rewrite, which I agree. You implied that this meant continual use of an unsupported framework, which I sought to clarify and advise against.

> No, it's not, unless you are using “strangle” much more broadly than the Strangler Pattern, which isn't just an incremental replacement by a particular strategy for incremental replacement

I am indeed.

> characterized most notably by placing a request-intercepting facade in front of the old system.

This is incorrect. The Strangler pattern doesn't necessary mean you strictly write a facade. Furthermore, like all design patterns, they're up for interpretation. What determines the difference between a router, adapter, proxy and a facade? All technically can be used to intercept incoming requests.

Fowler, who popularized the Strangler concept, says[0]:

An alternative route is to gradually create a new system around the edges of the old, letting it grow slowly over several years until the old system is strangled.

AND then says

In particular I've noticed a couple of basic strategies that work well.

Which implies that there are various strategies (not just one) that are enacted under the term Strangler Pattern. Hence my previous comment "alternatives discussed here are simply subject to semantics and nuance". You could certainly use Adapters, Routers, Decorators, Proxies, Bridges, etc. for design patterns used in an overall Strangler strategy.

I think we're actually saying the same things, you just seem to prefer to be overly and unncessarily pedantic about your use of the terms Stangler and Facade.

[0] - https://martinfowler.com/bliki/StranglerFigApplication.html

Things need to be more than "out of date" to be impossible to modify. What technologies would you label that way?

Anything that needs an "obstacle course" or a "downhill slalom" to modify? Examples:

- Compilers/IDEs that only exist on single machines (expensive mainframes, single user VMs, SSH hosts)

- Compilers/IDES that must be air-gapped for any security reason (including CVEs and expired support, not just PII/PCI/confidential-quarantine reasons)

Things start to feel impossible when you have to use a Windows XP VM because it was the last OS Microsoft officially supported the VB6 IDE, who knows how to install and license the ActiveX controls anymore for development (some of the vendors don't even exist anymore to ask), the VM is intentionally air-gapped/quarantined from internet traffic because it is an XP VM so there's now some fun extra steps with virtual folder redirects and worktree-less clones as remotes to get git changes in/out. (If your VM stopped supporting basic copy/paste from the host machine you feel you might go mad. Trying to grep VB6 code in VSCode to make navigation feel somewhat more sane due to scroll delays in the VM feels like its own growing madness as VSCode believes you to be insane using such an arcane language that looks but does not smell like VB.NET.)

Uh, not that that resembles my current situation or anything.

There are similar horror stories from say COBOL devs with airgapped mainframes.

What do you mean?

Technically speaking, you can build software on an outdated and unsupported framework, sure. But why on earth would you, for example, continue building your software on PHP 5.6 (EOL Jan 2019) when anytime a CVE is identified, you're basically SOL in being able to patch it.

Sure, you shouldn't just stick with the old and add features. You should focus an incremental rewrite to bring you closer to a switch to something supported early (e.g. dumping dependencies that won't migrate with you, preparing the code base for a quick conversion of incompatible language features, ...). Only few technologies are such dead-ends that isn't possible.

But once it's too late no migration strategy rids you of the old codebase and it's environment now. With the full rewrite you need to maintain and run it until that's done. With the strangler the old code still runs until it is replaced.

I'm so confused. What are you suggesting that's any different from what I said?

Apologies, then I must have misunderstood your original comment.

All good!

https://medium.com/@herbcaudill/lessons-from-6-software-rewr... suggests to build a new product without throwing away the legacy one.

Improve the code gradually, module by module.

My recent experience with an ERP, specifically some major bolt-on modules, was that the vendor simply made the switch to a new platform that had maybe 60% of the capabities. A roadmap (which has actually been fairly accurate) showed about 3 years to get to 90%.

New customers were pushed to the new product. Existing one were encourage to do so and temporarily live without prior features (usually with temp workers doing things manually) for a deep discount. Those who had to stay with the legacy system were told to expect nothing but bug fixes and compliance-related updates (for federal programs and reporting requirements) and that if they needed something more than that, they'd either need to built their own bolt on (there was a robust, if clunky sdk) or pay contractors to do so.

It sucked, yeah, but it seemed like a reasonable way to go about such a transition that was always going to make people unhappy.

This is more or less the model that Basecamp uses with their rewrites. New product with new features and a strong encouragement to come along, but guaranteed support if you can't.

This is the way you do it. Nobody is happy but the product and business live on.

I'm in the middle of a rewrite. It's very challenging, but the alternative is worse (a sinking ship). My lessons learned:

  1. Do it sooner
  2. Get full commitment from stakeholders
  3. Agree on feature freeze
  4. Get it done quickly
  5. Don't over promise, esp about the timeline
  6. Focus on delivering big/important items first (MVP)
  7. Appoint a benevolent dictator, don't assemble a committee to avoid second-system  syndrome
  8. Have test scenarios ready (black box)
Unfortunately they all depend on another, e.g. the longer you wait for rewrite, the harder it will be to finish it (feature creep).

I will write a blog post when it's done successfully, otherwise I will hide under rock.

Of course, that approach is difficult to apply if the interface is a significant part of, or deeply entangled with, the pain points that the rewrite is intended to solve.

It is also difficult if there's an ill-defined interface that exposes implementation details, or no interface at all.

It is also difficult to apply if we are not talking a server/client app but a desktop app, being rewritten in a different language or incompatible GUI toolkit.

>It is also difficult if there's an ill-defined interface that exposes implementation details, or no interface at all.

I've successfully strangled a large codebase that had these issues, though we did have the benefit of a client/server application so there was a place to actually define interfaces.

We started in the middle by creating a logical service layer to group all the bits of like functionality. We left the implementations alone, just moved them to align with the new "service" layer. We slowly worked our way up the stack, including defining a new client API, and then changed the existing API methods to be a shim on top of the new methods.

We were then able to update client code to use the new interface, but the old ones stuck around for about 24 months while we sunsetted older clients. The actual strangulation took about 2-3x as long a stop-and-rewrite effort, but there were VERY few regressions because we were still in a constant test and release cycle and managed the scope of strangulation changes in each release AND all of our testing was still valid since we weren't changing input/outputs or any expected behaviors.

The "Strangler" needs to copy the legacy interface at first, but after that you can add new interface feature to it.

You can temporarily implement both interfaces where needed, create adaptors, or other patterns during the intermediate steps.

Usually the problem is that you can’t realistically change the interface, because too much other software relies on it, and having all that software rewritten would be too costly, and also risky. In addition, as a sibling mentions, the reason the old interface is a pain point is often that it exposes implementation details in a way that prevents you from rearchitecting the implementation. In that situation, adapters typically won’t help.

Even if you don't change the interface, by doing fun things like writing types in go that accept strings or ints as ints to strangle a perl server binary and proxying unimplemented handlers, you can still end up with problems. We started having issues with our db getting knocked over by callbacks from our server fleet on an endpoint we hadn't even rewritten yet. Turns out the slowness of perl handling the tls connections had shielded the db and forced retries, switching to go meant the db was hammered with the concurrent requests from perl workers, unhindered by the tls handshake.

Like giant session state stored in the server using session affinity...

The comma, present in the article and missing in HN, completely changes the meaning of the title.

What is the alternative meaning? To be, the comma changes nothing. It's unambiguous either way.

It can also be read as "avoid (rewriting a legacy system from scratch by strangling it)"

Baby Hercules, avoid getting eaten by that serpent by strangling it.


The next time you catch one, Lennie, avoid killing a mouse by strangling it.


Do you strangle it to avoid it, or do you avoid strangling it? The age-old question.

Indeed, the title here made little sense - I was reading the article and it sort of contradicted what I got from HN...

Thanks for pointing the titles actually differ.

This is a nice example where adding a comma not just changes the meaning, but inverts it.

This blog by Mr It, Strangling.

I really don't get this. How does it change?

“Avoid having to rewrite a legacy system from scratch via strangling it” vs. “Rewriting a legacy system from scratch by strangling it is something you should avoid doing”

Avoid: rewriting (by strangling) a legacy system

Avoid rewriting a legacy system (by strangling it)

We did it very differently in our group 1. The developers of the old tool continued to work on it. 2. A new team took requirements from the old team and filtered to make them more meaningful 3. Designed a system architechture that would work with the targeted workflow 4. Designed a minimal version and ran it with a new branding next to the old one. 5. Reached feature parity with the old one and dumped it

The important thing to note is that the new tool does not do everything the old tool does. The workflow is also different from the old one. However the customers loved the new one as it was simpler, faster and more robust to use.

Am I the only one in the IT adjacent world who thinks the inverse of this is the larger problem (churn, NIH, reinventing the wheel) in software today?

No, you are not the only one. The quest for the new shiny thing is stronger than ever today. New frameworks, new languages, silver bullets everywhere. Good decision making frameworks are in tremendous need in the technology world to help everyone understand the ramifications of the choices they're trying to make.

> Good decision making frameworks are in tremendous need in the technology

Can you/anyone recommend any?

oh you are not the only one at all !

I mostly work in the android world and the chase for the new and shiny is real.

I see some new libraries get a lot of traction seemingly only because they are written in kotlin/coroutines, not because they offer a better solution (for the one I have in mind, they did not even bother trying to do a benchmark to compare it with the existing solutions).

The thing is, the Android dev ecosystem got WAY better in some aspects.

Having moved to some of the new and shiny, well implemented MVI/MVVM architectures backed by Rx or Flow are very robust and give a good framework to develop on.

You still have to fight back against the zealots yelling that solution x that works just fine should be replaced by solution y even though it would take months of engineering work and does not really improve anything you care about (e.g. a 5% diminution of bytecode size is not something that's worth spending weeks on, or a network stack that hand wavingly 'improves performances' with no benchmark made to actually ascertain where our hot paths are)

PS.: for the parts that got worse : the build system and Android Studio failed to scale quickly enough to keep up with the enormous increase of build complexity. As a result they are slowly become less and less useable for large projects.

Part of this is that many software engineers are bad at advocating for what they care about. One way of being bad at advocacy is to try to point to whatever has numbers, even if those numbers don't ultimately matter.

Sadly, it takes a lot more skill and courage to say the following in a way that other stakeholders care about: "We should use this tool because when I started using it, data flows were clear in a way that finally silenced some of the anxiety that I don't know what the fuck I'm doing and I am going to ruin everything."

I've started a separate thread to talk about that communication skill here: https://news.ycombinator.com/item?id=22352477

I specialise in legacy code. Not that I'm opposed to doing a greenfield project now and again, but I genuinely enjoy working with legacy code. It's a fun challenge and the "stink of old" on it keeps the must-have-shiny people away for the most part.

However, it's always a challenge. For example, sometimes you have subsystems that are begging to be retired. For example, on one system we're maintaining about 20 KLOC of GWT code. For the last 10 years or so, it hasn't really been worth moving away from it, but there will be a day (that is rapidly approaching, I think) where the cost of supporting a mostly abandoned Java framework that compiles into JS outweighs the risk and cost of slowly replacing it.

There's a real difference between being pissed off with the choices your predecessors made, or lusting after the new, hot framework and saying: nope... this just isn't viable any more. Planning that transition isn't easy either. Again, it's one of the reasons I enjoy this kind of work.

And sometimes, you even just decide that you're going to work with what you've got. Ironically, though, this usually involves more churn, NIH and reinventing the wheel because code written 20 or 30 years ago did not have the facilities that we desire in modern development. You think, I'd love to enjoy the benefits of that new framework, but there ain't no way that we'll be able to use it. How do I get the benefits using the code I already have? Answer: you study what other people are doing and you build the same damn thing in your environment. Nobody builds the new-shiny for old stuff so if you want it, you have to build it yourself.

I enjoy bonsai trees. As trees grow, the branches become out of scale with the trunks. You can imagine that if your trunk is the size of a pencil, it doesn't take long for the branches to catch up. So if you want a tree that is in scale, you are constantly having to prune off the branches and grow new ones. There is a saying that a bonsai tree is never finished until it is done. Code is the same way. There is no such thing as avoiding churn -- unless you are truly trying to kill off your project. You always have to prune off branches and grow new ones, otherwise development will slowly grind to a halt -- the challenge of adding functionality without changing the code base getting to be more and more complex. But if you prune your branches before they grow you will end up with a stick in a pot. Or if you decide that you want to grow out every bud that pops out, then you will have an impenetrable mass of confusion. Deciding which branches to grow and which branches to prune, unfortunately requires good taste.

I’ve had to learn not only reinventing a wheel created by someone else before I came to a company, but also reinventing the wheel that I created when I didn’t know what I know now.

You dont rewrite modern software.

That depends by what you mean by “modern”. And the age of the system of question is completely irrelevant to whether it is or isn't an unmaintainable legacy mess.

wait a minute I thought legacy was used to refer to old pre-cloud pre-serverless pre-javascript software? Everything written nowadays is modern and hence you just replace it with more modern stuff. re-writing it is often difficult because of its modern layers.

I hope this is satire, but I’m not sure if it is.

r/programmingcirclejerk would have a field day with it, that's for certain.

Any code that’s not Serverless JS needs to be rewritten

One thing that complicates matters somewhat (as if they were not already complicated) is at the decision point marked isRoundtrip? in the fourth (penultimate) diagram, where the affirmative case is handled within the new system.

Given, however, what is being posited -- a legacy system that is not modular and which contains unrefactorable pathological dependencies -- the old system must also handle this case in parallel, in order to be in the correct state to handle future requests of a type that still need to be delegated to the old system.

This parallel implementation may have to persist well into the replacement process, and the requirement for it to do so may mean that you still have to do double implementation of features and fixes for most of the transition.

Requiring the legacy system to handle the request in parallel is exactly what this method is trying to avoid.

If your old system has dependencies that you don't understand, I don't see the strangulation method working at all.


> Here’s the plan:

> Have the new code acts as a proxy for the old code. Users use the new system, but it just redirects to the old one.

> Re-implement each behavior to the new codebase, with no change from the end-user perspective. Progressively fade away the old code by making users consume the new behavior. Delete the old, unused code.

Here is the reality:

1. People do the above incompletely; their deletion of the old system slows down and then they move on to another project or organization, leaving a situation in which 7% of the old system still remains.

2. People iterate on the entire above process, ending up with multiple generations of systems, which still have bits of all their predecessors in them.

I think an overlooked aspect of a legacy system that makes "strangling" difficult is that nobody fully understands the behaviours of the system anymore.

It is really hard to replace the functionality of a piece of code when you don't know 100% what that functionality is.

This is a good point.

I'm working on moving some functionality out of a system - not replacing the system. And it's still extremely challenging to actually figure out everything that's going on with just the thing I'm moving out.

I see it working for a backend code, legacy UI systems has way more coupling so it would be better to do a complete rewrite. If you have a legacy framework A and you start replacing it with framework B, component by component, it will have to follow the practices of framework A and basically you are going to be writing legacy style code in the new framework B which is much worse than having legacy framework A. Because framework B is now written in a completely alien way and not how it was intended to be used.

I have written a set of libraries and dev tools (like a better repl) for Perl (the FunctionalPerl project) with the idea to help write better code in that language, and to give me and whoever joins in such efforts a way to hopefully save a legacy code base. Maybe it is the case that when a company reaches the point where they feel their code base has become unmaintainable, it can still be saved by using the tools and programming approaches that I can provide. That (other than, and more than just, "because I can") is the major motivation why I invested into that project. But I wonder how much it will help. I haven't had the chance to try it out so far. I got to know companies that have begun to split up Perl applications into micro services and then move the individual services to other languages, and they don't necessarily have an interest in my approach. But I'm also very diffident reaching out to more companies, due to worrying about how much pain it would be to deal with (and how likely it would fail)--investing my time into newer tech (Haskell, Rust etc.) instead looks tempting in comparison. Should I continue to reach out to companies to find the right case (presumably working as a contractor, with some big bonus if successful)? Any insights?

I'm dealing with a rewrite at the moment (that is, I was hired to start rewriting an existing web application). I want to apply this pattern but the existing codebase was already dated by the time it was written. It's a huge load of mixed responsibilities, globals (it's a PHP backend), RPC-like http API (every request is a post containing an entity name, action, parameter, and additional parameters handled in a big switch), etc. Files of 13K lines of code.

So far I'm stuck in the overthinking phase of the new application. And as the article states, I'm asked to keep adding new features to the existing application - nothing big (because individual things aren't big), but at the same time, I've been adding a REST API on top of the existing codebase for the past few weeks. It's satisfying in a way but it hurts every time I have to interact with the existing codebase and figure out what it's doing.

Plus we're not going to get rid of the existing application at this rate. I should probably set myself limits - that is, I'll postpone and refuse work on the existing application if it's not super critical. And quit if they're not committed to the rewrite before the summer.

Strangling is a good way to slowly replace a system by simply starting to work around it until whatever value it adds is so diminished you can safely pull the plug.

Big software rewrites are extremely risky because they take inevitably more time than people are able to estimate and also the outcome is not always guaranteed.

An evolutionary approach is better because it allows you to focus on more realistic short term goals and it allows you to adapt based on priorities. Strangling is essentially evolutionary and much less risky. It boils down to basically deciding to work around rather than patch up software and minimize further investment in the old software.

Also, there are some good software patterns out there for doing it responsibly (e.g. introducing proxies and then gradually replacing the proxy with an alternate solution).

I did a rewrite.

The old code worked, but was slow. Adding features would make it slower. Lock-free queues and threads everywhere, packet buffers bouncing from input queues to delivery queues to free queues to free lists, threads manfully shuttling them around, with a bit of actual work done at one stage.

Replaced it all with one big-ass ring buffer and one writer process per NIC interface. Readers in separate processes map and watch the ring buffer, and can be killed and started anytime. Packets are all processed in place, not copied, not freed, just overwritten in due time.

It took a few months. Now a single 2U server and a disk array captures all New York and Chicago market activity (commodity futures excepted).

I kept the part that did the little work, scrapped the rest.

C++, mmap, hugepages FTW.

Having successfully replaced a legacy system one time we got it to work by turning the legacy system's business logic into a library that the new system could use. This key is just replace the underlying architecture without reimplementing years of work.

What the article describes is a rewrite! In the end there will be no more legacy code left...

What the article is saying is: don’t rewrite your code in one go, but rather cut the system in pieces that are independent and rewrite each in successive phases.

It’s kind of obvious, though. And the difficult part of the rewrite is actually to slice the original code in indépendant chunks. More often than not legacy systems are riddle with leaky abstractions and dependencies (the infamous spaghetti code), that’s a hell to disentangle.

Often, the clients of legacy code are old too, and are hard coded to access it.

I've done this, but on a private branch, with a single merge to trunk in the end. Starting with complex integration tests, new interfaces were gradually defined and made the code testable, giving me the needed confidence.

So, how can this be applied to mobile app development? I can think of adding dependencies and new code to get along with the old code in the app, but it will cause a considerable bloat (size) of the app, which it can be noticeable by management, unlike web services/sites/apps

Not to mention legacy thick apps! In my case, legacy thick-apps we don't have the source code for! Arg!

Does the strangler have to be a separate server? Couldn't you wrap the existing code within the same binary?

All of this gets harder if it's your data model that is the problem. So, get those data models right early if you can!

Easier said than done -- data models frequently evolve in hard-to-predict ways. Instead, build your data model in a way that is easily pliable, and won't need a complete refactor because of something simple like a hot key or a new index.

If you manage to pull that off you probably won’t have to do a rewrite in the first place ;-)

Luckily it's so easy that it only requires clairvoyance.

Does this happen in practice or the old product is just replaced by a newer _competitor_?

Ignoring consumer products for a moment: we've done this at my company for an internal app.

Our legacy system was built as a desktop app for internal uses that became difficult to both scale and comply with our regulatory obligations, so we began buildin out an api around its core business functionality and built various front ends to speak with it throughout our company.

It has been a middling success, mostly because change requires political capital that might not be there six months to after initiating it. However I think overall we've improved the product and I don't think a massive rewrite would've gone nearly as far due to political winds shifting and the rewrite getting deemed a waste of time by the new new powers that be.

It's also my experience that political "will" for change lasts about six month...

but how come the linux and BSD kernels, emacs (since RMS), the java language, even python (python3 was not a rewrite), git, hg, django, etc ... have never been rewritten from scratch?

what is the lesson here?

> After 7 months, you start testing the new version.

Translation: after 7 months you stop mucking about and start trying to produce something useful.

i thought this was an article about not rewriting a legacy system.


If you wanted to read the referenced article. This was the first thing I thought of. I appreciate Fowler's writing style and his sourcing. He always links some interesting stuff.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact