Hacker News new | more | comments | ask | show | jobs | submit login
Strangler pattern, a powerful simple concept to do refactoring (martinfowler.com)
211 points by neo2006 13 days ago | hide | past | web | favorite | 69 comments

> There's another important idea here - when designing a new application you should design it in such a way as to make it easier for it to be strangled in the future. Let's face it, all we are doing is writing tomorrow's legacy software today. By making it easy to be strangled in the future, you are enabling the graceful fading away of today's work.

This is a very important point that I've been harping on for years. If the nature of your business requires quickly adapting to new business needs--then make your software and systems easy to delete! Chances are good that they'll be obsolete/outdated in a few years, and everyone will be much happier if the system is easy to remove and replace.

How to implement that is very dependent on the nature of your systems, how you deploy, your company's tech and operational culture, etc. But it's very much worth thinking about: "If this project were to fail in the long run, how hard would it be to get rid of?" In large companies, lingering legacy systems can make or break an entire organization.

OTOH: "Well-designed components are easy to replace. Eventually, they will be replaced by ones that are not so easy to replace."

Not sure what your point is? Should we give up on well-designed components?

A cynical quip like Sustrik’s Law is meaningless to me without a deeper analysis. What mechanisms or conditions make this “law” true? And what are its actionable takeaways?

I think it drives at an important point: Your job isn't done just because you've somehow managed to design and implement the perfect component that's absolutely reusable. There are plenty of internal and external forces that will turn that component into something imperfect and flawed, if not replace it outright with something that is far less usable, reusable, appropriate, etc. It requires constant vigilance and maintenance to maintain that state.

hahahah oh man. That's such a trap. Good point.

I can't agree with this more. One company I worked at did proof-of-concept sales to enterprise customers, and as engineers we supported the sales team by adding additional features the customer may demand before making the sell. Sometimes the sale does not go through, and the implemented feature is of no use to the other enterprise customers we have. It's still there in the codebase as technical debt, and it's difficult to delete since there's no clear boundaries between the useless code and the more frequently used paths. As lines of code increase development velocity decreases. Using a strangler pattern may make this process less painful.

Would this be mitigated by setting up custom demos with quickly mocked features on separate branches that can be easily spun up and deleted? I’ve helped clients with similar needs, and it’s so much easier to make a rough working feature branch for something that helps the sales process for a large customer, temporarily run it on a subdomain for demo purposes, and then get rid of it. It still hangs around for a time in a branch, but it only gets merged in when there’s a definitive need for it to be part of the live product.

Edit: This doesn’t solve the problem of wasted team effort, obviously. This kind of effort ought to only be taken when you can’t get a potential customer to agree to signing a contract before doing some work on a new feature. But that can be difficult to do sometimes. Marrying sales and development is its own challenge.

>and the implemented feature is of no use to the other enterprise customers we have

>As lines of code increase development velocity decreases.

How do you make this clear to the CEO......................

Legacy Software: any software that exists before I joined the organization, no matter how recent or up to date.

- Alternatively -

Legacy Software: any software that has been deployed.


Even if the workflows are not outdated, the software libraries will likely be outdated.

In manufacturing, don't be surprised to see Windows 2000 boxes on the plant floor; also expect new hires to complain about AngularJS apps.

In this context, I’m not using those cynical definitions of “legacy”. The definition is right there: software that doesn’t have any business value anymore.

The main problem I've seen with this strategy is 'funding' for the new system gets pulled half-way, and there's no time remaining to refactor the old system. Fast-forward a year later, someone else may try to write a 3rd refactor using a different 3rd improved system design. Now that project also stalls out due to project pressing concerns. Now the codebase is polluted with different systems that require more training to work with, more lines of code to cause bugs (mixed frameworks), and generally a headache. I still think the Stangler pattern is a good one, just make sure you get enough leeway to reach the 'strangler' moment where most of the old code becomes deprecated and removed.

This is known as the "lava flow" anti-pattern. I kind of like the visual imagery of the two. With the strangler pattern, you have new living pathways to deliver functionality. With the lava flow anti-pattern you have petrified remnants that lead different places, but which have died.

From my own experience, when you take over a project that has attempted the strangler pattern, but got cancelled the key is either to continue it (holding your nose if you don't like it), or remove it and go back to the old infrastructure. If the new pathway is better than the old, go with it until you can remove the old one. Even if you have a new super zappy way to do it, resist introducing it without "finishing" what what started earlier (one way or another).

At my last company, I headed the Engineering org and was told by my CEO that there was was no time for refactors of our monolith (and not a good one at that...access to HTTP request all the way down, for example) or anything silly like that, so we went a very similar route. We did what we called the zombie rewrite, which was an inside out refactoring that left no changes on the surface.

Step one was to slowly rework things into logical services that still relied on the underlying legacy code. Eventually, we got to the point where cross functional domain logic was accessed via logical services rather than direct use of classes in other domains. Once we accomplished that, we slowly worked back down to understand, optimize, and refactor legacy code in digestible chunks. The final stage was to work up to the REST API level to create a more sane implementation of the web bits that preserved existing behavior.

It took the better part of a year to do all the work, but by going piece by piece (which allowed for understanding some wonky code), we had very, very, very few regressions and minimally impacted our overall ability to still deliver on the roadmap.

Agreed. We are wrapping up the rewrite of a CRM and man-oh-man has it been long. Two years in the making due to being taken on and off, on and off, and yet again on and off the project. Looking back. I wish I had either just rewritten the existing system, piece by piece at a time, or at least starting moving pieces one-by-one out via micro-services to a new backend. The front-end then could've been completed in a smaller rebuild.

Regardless, had we gone with a strangler approach, we probably would've been done months ago. As it stands, there is at least another week or two of development via QA break/fixing until its ready to go live.

With that said, I still think there are times when a full rewrite makes sense. We redid our API two years ago when it was still small and I am happy we did the full rewrite. The old architecture just wouldn't have been able to scale with where we are now and I felt the code was just too far gone to be helped.

As always, life is full of learning experiences.

There's a similar thing I try to encourage. For any new project/component you should do it three times.

First time to get it wrong.

Second time to learn from your mistakes and get it less wrong.

Third time to try to ensure that all subsequent mistakes can be gracefully recovered from.

When given a new piece of work I find stage one tends to be my planning stage.

This is similar to the advice about building houses: they first house you ever build you should build for an enemy, the second for a friend and the third for yourself.

I love this saying.

When would you do a rewrite instead of a strangler?

I personally have worked on successful rewrites when the existing system was actually broken. Maybe it worked enough to get most work done but there are manual steps to keep it from falling apart completely. Development has gone off the rails. The rewrite can be a winner that constantly delivers useful functionality.

If there is high level of coupling within a system it will be difficult to contain the rewrite to a single subsystem. In this case, it may make sense to rewrite. Alternatively, you can gradually remove unnecessary coupling before applying the strangler pattern, but this can be a longer path.

The related principle here is to ensure you have good boundaries in your application with minimal dependencies between each system so it's easy to rewrite portions of your app. Here is a talk on this: https://vimeo.com/108441214

Same creators realized immediately after building that they can build better and are given the opportunity to rewrite.

I'd guess the risk of the strangler pattern is that the new code ends up being largely the same 'shape' as the original?

There's already been tons of discussion here on HN about rewrites.

But, for the love of programming, please seriously consider this gradual rewrite/strangler pattern/whatever you want to call it over the often disastrous complete rewrite.

Definitely agree. Unfortunately full-rewrites will always have some level of praise. Typically it starts with a POC, completely new tech stack, like for a new one starting today it will probably be a React/Node or something like that with GraphQL. Perhaps it is replacing a Java/JSP/JSF something-er-rather.

The POC will have the advantage of taking a small feature, and reproducing it, but working 10 times better, faster etc.. However, what is unfortunate is that often times the POC team will say, yeah this is about 80% of the way there, we still don't have X, Y and Z implemented. But that's where the rub is - it turns out that X, Y and Z are often always under estimated. After all, there is a reason they chose to exclude it from the POC.

Beyond that, the POC is just 1 feature usually. What about the other 80 screens. The POC team will then say, well if this took 3 weeks, then 80*3 (as a worst case) will be 4 year project - but the upshot is that it can be paralleled and done a lot quicker.

This is how I always hear it. So they put in a plan to do it, and things just start unravelling as you have 1) stakeholders slowing down work with getting existing functionality. 2) you still need to tackle the last 20% (which might really be 50%) 3) You will constantly drain resources for the next year or 2 from the ACTIVE product, and that battle will be ongoing.

If you use the strangler pattern, you will basically be making incremental releases that use the new technology AND the old at the same time. You don't have to replace the last 20% until it's absolutely a good idea. You're able to get the instant benefits to the screens/features you want out the door, and basically train up the existing team. There are no two teams competing, they are all working on the next short-term release.

Indeed. I think the key point here is that even if the rewrite 'only' takes 4 years, how many new features have been added to the original codebase (maintained for 'continuity') in the meantime?

There's usually some expectation that the rewrite will make it 'quicker' to develop these new features so of course it will catch up. Eventually. After ten years of splitting the dev team to maintain two codebases. One of which has not produced any value in that time.

Alternatively the application featureset is frozen for the duration of the rewrite, and the company folds halfway through.

I imagine it depends on the situation.

Complete re-writes can sometimes help you elliminate a lot of technical debt and move faster if you do it right the second time. You can use different architectures, better tooling etc. So sometimes rewrites can be useful.

In other cases, your system might be doing just fine, and it needs parts of it re-written to be more scalable/efficient/whatever.

It always depends on the specific circumstance you're in. When deciding between the options, it makes a lot of sense to think through these things for a while and make sure you're doing it for the right reasons.

I don’t doubt it has happened somewhere, but I’ve never seen the mythical complete rewrite that was awesome and saved time and did it better the second time. Unless you’re talking about rather small projects of one or two people rewriting something in one month...

In three different large teams (~100 people) during my career, I have witnessed the other kind of rewrite. People forgot how many things were working right, and had started to focus only on the warts. They overestimated their ability to redo what they’d done before, and underestimated how long it would take. Why does anyone assume a multi-year project will go any faster the second time? You need a lot of evidence for that, and I’ve never seen any. In all cases, the rush to rewrite quickly caused people to cut corners and introduce new design mistakes, ultimately ending up with something that was only marginally better after a heavy cost of several years’ development. In all cases I’ve seen, the people in charge admitted regretting the decision to rewrite code and told me they wished they’d done it more piece-meal.

Just a data point, but I have to wonder how often a clean rewrite actually happens. I’m looking for the link now, but I remember reading on Wikipedia that it’s estimated that 30% of software globally is late and over budget. I suspect that rewrites are more affected by Planning Fallacy than the first time through, it’s easy to assume you can do better. https://en.m.wikipedia.org/wiki/Planning_fallacy

A big piece of enterprise software I've worked on did a full rewrite successfully. This was a long time before I joined, but the original product was written in Magik and apparently the only requirement was "make it work like magik" which I thought was pretty funny. It definitely was a success.

Though on the same product I came across a mildly hideous half-baked attempt at a UI framework re-write. I got the story on that and it was definitely one of the time-wasting regret stories.

Software is mostly just hard and expensive.

Yup. I can relate for a project in death march mode that was even already used by a first customer like a "beta" while development continued. A progressive refactor was the way to try to succeed. It did. But you need someone / a team willing to take the extra mile.

We've used this pattern in an warehouse management system in a company I worked for years ago.

Old version had almost all the logic in PL/SQL, some Qt forms for high-level management, and C++ console apps (warehouse processes) running on portable terminals through telnet (so in reality these were running on the server, and portable clients were telneting to it to control it).

First we introduced XML-based protocol between C++ app and portable terminals, and a .net client running on these terminals to replace telnet. This allowed for a simple graphical interface instead of text-only, so there was a good motivation for customers to upgrade. Also it separated the parts clearly and allowed to mix and match new and old parts in the system.

Then we introduced J2EE application server, exposed the database through hibernate, and new processes were written in java (jbpm to be specific). They still used the same XML protocol and .net client, so old C++ processes could call java processes and vice-versa.

New processes required the possibility to call PL/SQL logic so the features that were needed were exposed through J2EE services to the java processes.

Finally we added a way to write new management forms in Eclipse RCP.

We also planned on moving the logic from PL/SQL to J2EE completely, and becoming database independent, but we never got to that.

The rewrite was never completed, there was a merger in the meantime, and we switched tech again, at which point most of the team left :)

But what we finished was working reliably, no features were lost, and as far as I know some customers still use the old system, some use both, some use the new system.

The thing that made it easy to do, but also hard to finish - was the logic in PL/SQL. As long as we left that be it made moving everything else easy. But at the same time it was a constant temptation to just call the old PL/SQL function instead of writing a new J2EE service, and finish the task at hand faster.

I've seen huge things being built in/with PL/SQL, running stable, fast, nearly bug-free and containing all the features needed, for decades, sometimes upgraded to a new database major version.

I've also seen multiple attempts at replacing those PL/SQL systems, reimplementing the same functionality with Hibernate/Java (and I recall one instance of PHP being used) and they've all been slow (to use and develop), buggy and always lacking in functionality compared to the original. Basically all three of the previous problems or a disgusting combination of them.

In all of those replacement cases, there has been no objective reason to replace the entire system, maybe give the UI a facelift.

So here's my question, why rewrite it all, what's the reason these "rewrites" and "stanglings" are done to nicely working systems just not complying with what the newest fad is? Is PL/SQL or DBs such an arcane knowledge your devs did not understand it enough to give it a facelift? I'm genuinely just curious why these things happen, if I knew the reason maybe I could stop another service I have to use being turned to excrement.

The ultimate goal was to become database-independent as far as I understand (I was just a junior dev, this was my first job, so I'm guessing, I wasn't making decisions). To avoid Oracle tax I assume. It was very funny when mid-rewrite we learnt Oracle bought Java :)

But there were lots of other improvements - using jbpm for designing warehousing processes was a natural fit, much better than making persistent long-running processes with C++ using nested ifs in a while loop and serializing the state of the process with manual inserts and updates on each state change.

With jbpm you could see the whole state machine as a graph, move nodes around, insert new ones easily, and the persistence was automatic, including all the variables you use, which saved a lot of time and hard-to-track bugs (some combination of steps breaks the persistence the next time you enter this process - good luck fixing that and tracking what really happened on the warehouse before the process broke the persistence).

Regarding the speed we were actually slightly faster with the jbpm (mostly thanks to the hibernate 2nd level cache and optimistic locking). We measured the time on the portable devices between pressing a key and seeing the next screen, and because the bottleneck was PL/SQL procedures running selects to decide what to show and waiting on locks - the whole overhead of application server and jbpm and .net client was hidden in the savings thanks to the cache (and optimistic locking).

Also jbpm processes had versioning. Old versions were continuing to run and new instances were started with new version. Upgrading the processes with the C++ code was basically stop-the-world event.

We could have skipped the Eclipse RCP thing, though. The subteam that worked on that went a little over the top with architecture astronomy, there were like 4 sub-layers with 4 levels of configuration xmls :). And the framework built on qt we used previously for forms was quite nice already, arguably better.

And no - PL/SQL isn't arcane, and the whole team had lots of experience with it running the system for years.

Thanks for answering.

> The ultimate goal was to become database-independent as far as I understand (I was just a junior dev, this was my first job, so I'm guessing, I wasn't making decisions).

Did you actually achieve that?

Nope, we merged with another company with similar product, and we switched the tech again.

Why not move to another DBMS rather than move off from SQL altogether?

PL/SQL or stored procedures are specific to each database vendor. So, it is hard to pull off. The better way to replace PL/SQL altogether, but keep sql queries.

Sure, but still it would be more sane to set up another DB, say Postgres, write some migration code to always keep the two DBs in sync, then start migrating the SQLs one by one.

My mom works at a startup that was acquired by a big bank. She's a PL/SQL dev, and a majority of their business logic is implemented in PL/SQL on Oracle. After the acquisition, the new bank execs decided to make a new initiative to rewrite the entire thing with "modern" technology. Lots of talk about MongoDB and microservices.

After one year, and lots of revelations about how much of the business actually is implemented in PL/SQL, they have yet to even port one small part of the system to the new stack. It's becoming clear how little they actually understand about the requirements of the software, and why a system like Oracle was chosen in the first place.

I'm also very confused about why people do this.

Not the parent, but I assume people do it so they don't have to use Oracle anymore (for either cost or ethical reasons).

But there are other DBs that could be migrated to, rewriting SQL to another dialect is usually much easier than a full rewrite.

In the codebase I maintain, this boils down to a 'gradual rewrite from the outside in' combined with refactoring bits of the 'in' where practical. You create new independent code paths on the way 'in' instead of trying to make the existing ones cope with yet another use case. These tend to highlight bits of existing core code which can be refactored as isolated and testable components, typically by copy-pasting them into a separate library and updating references piecemeal. Old code hangs around as long as it's referenced, but new code doesn't necessarily have to touch that copy.

Eventually the monolith's trunk gets hollowed out as useful pieces become dependencyless libraries, and the tangled knot of rotting branches, vines, and strange green things with purple lumps starts to die back as a multitude of independent and healthy trunks grow from the surrounding earth.

At the risk of breaking the metaphor...

We opted for something similar when facing a technology pivot point in our front-end application. Instead of opting to re-write our entire application that had been built over the course of 3 years, we chose to integrate the replacement technology side-by-side with the original. This enabled us to add new features using the new technology as needed, along with allowing us to move old features over as we were able to. So far, I'm pretty happy with it. But, at some point, we will have to actively prioritize re-writing old content using the new technology, or else we'll be stuck with this frankenstein forever.

Yeah. I'm really torn on this approach. I guess there aren't really any "good" options to redoing 3 years worth of stuff, so it's mostly an exercise in trying to pick the most acceptable bad trade-off.

The downside of this as I've experienced it is the tech-stack bifurcates and it becomes harder to ramp up new people. Or only some people get to work on the new shiny stuff and others just don't.

It's probably the best approach if you can guarantee the frankenstein won't persist forever. I've seen it get several frankensteins deep. At that point you're never getting rid of it, and the problem compounds. But there are no right/wrong answers, it's all completely context dependent and perhaps that's the only configuration in which the company can survive.

I also think there are likely certain inflection points in terms of project size/complexity. If the whole thing would only take 3 months to re-write then that's probably a good option. If it would take 3 years, it's probably not feasible. You know what I mean? The software itself is an input into the function of 'can we move this to a new techstack/architecture?' and I feel like there are certain parameters which have a safe operating envelope within which it's workable to do a re-write and not have the old solution hang around, but outside that the parameters may just simply not allow for it as a possibility.

I've seen a team of 5 junior devs on an existing product outpace 10 senior devs on a team to replace that product from scratch for 3+ years.

A third team, of like 3 devs, made a proof of concept using the Strangler pattern and just a simple API to interact with the existing database within a year and took over the "rewrite project".

5 devs will nearly always outpace 10 if they have to work on the exact same piece of code. You gotta break it up into teams of at most 4, with as little overlap as possible. Communication doesn't scale.

The other problem is that 10 senior devs sounds like too many cooks and egos in one kitchen.

Junior devs have less experience, but that also can mean less rigidity and a willingness to explore new options.

Sure, junior devs will usually outpace senior devs (especially looking at LOC). But in the end, did they produce robust, reliable, fault-tolerant code that is easy to understand, extend and most importantly delete?

I'd like to hear more specifics on this project. Especially why if junior team A and senior team B were already working for 3 years, why was team C introduced? Did they save the day?

I'm really curious about the final outcome and any learnings you might have had.

That's a pretty funny analogy considering the same tree appears to be the inspiration behind the design of the aptly-named "Bed of Chaos" from Dark Souls: https://66.media.tumblr.com/02253f537ab15bd946d55cdb9b0b7b41...

Ha, I failed to notice that even the description is similar: A failed experiment to duplicate the first flame.

“The experiment failed and had terrible consequences, releasing pure chaos and creating a distorted being ... that consumed the witch and her followers. The being became the source of all demons: The Bed of Chaos”

Sounds downright prophetic.

I think this is an insightful point of view: "Let's face it, all we are doing is writing tomorrow's legacy software today."

Maybe we spend too much time trying to design flexible programs that can be easily updated to support new requirements (which, in my experience, usually ends up just being extra complexity which must be maintained and never allowed the extra flexibility it was supposed to deliver), and it might be wiser to sometimes think about software which is easy to replace.


Adopted a similar pattern recently on a project, results aren't fully in yet, but it looks like it might work.

However, while I understand the reluctance about rewrites, in my personal experience they have actually been very successful.

So beware the absolutes. Always beware the absolutes ;-)

> Always beware the absolutes ;-)

All categorical statements are bad — including this one ....

I see some people didn't get the (self-contradicting) joke. I probably should have put it in quotation marks, or appended a smiley-face.

Or they didn't think that simply repeating the self-contradicting joke already made added anything to the conversation?

"We describe how a small, successful, self-selected XP team approached a seemingly intractable problem with panache, flair and im-modesty." - Abstract of the linked paper


This pattern was taught to be by a former mentor, except he phrased it "always be ready to ship". I used it to convert my app from rendered HTML + jQuery into a full fledged VueJS app + API. Much, much better than rewriting.

One more thing to add, is that a lot of software (e.g. SAAS) has users that depend on it. You want to make sure that your customers are happy through the transition.

The link from "Further reading" was a great read too: Legacy Application Strangulation: Case Studies (2013)


Maybe a little OT but it seems a little surprising that this Man, Martin Fowler just observes nature and creates design patterns out of it. And of course once he writes an article everyone starts talking about it.

I'm not sure how I feel about it. The quality of his articles are generally pretty good. But it seems a little reckless (or perhaps foolhardy) for just one person to dictate all new design patterns.

A Pattern isnt good just because someone came up with it. Lots of patterns have been discovered/described in the past, and they turned out to be rubbish. This might be one of them, or it might be good, time will tell.

He is not dictating anything, he is offering you his advice. You too can describe a pattern you came up with, and submit it here as well

Can we be honest with ourselves and admit that an article with a link to martinfowler.com is much more likely to get to the top of HN and more eyes than a random blog post on design patterns?

I think your statement is accurate, but I don't think it requires any special honesty or insight. Fowler has written seven books on software development, spent five years as the editor of the design column for IEEE Software magazine, is the editor of an Addison-Wesley Signature Series of books, and is employed by a very famous software development company. His posts ought to be given more attention than random blog posts. That's not to say they're always correct. They're just more likely to be worth reading.

Yes. Because brand. I discovered awhile ago now that the text isn't what sends the message. Who it's coming from sends a lot of the message.

I think you’re mischaracterizing the genesis of this design pattern. Fowler didn’t come up with it based on what he saw in nature, but instead saw something in nature that allowed him to illustrate the pattern in an understandable way.

It's just naming. The patterns are there anyway.

Isn’t this common sense though? How else would you re-write a complex, legacy system? But then again, I feel the same way about all virtues Martin Fowler extols.

Most of the redesigns are done in the "big redesign in the sky" fashion, that mean we rewrite a new system from scratch and when it have equivalent features as the old system we redirect traffic to it a "big bang" introduction. This usually goes wrong because it take too much time to have an equivalent system, and redirecting the traffic in one shot add to the risk of such a project. It also create frustration between the people maintaining the old system and thoses creating the new one. This said, a lot of companies still do that!!

If it were common sense it would be more common.

god, that's a terrible name

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact