This is a very important point that I've been harping on for years. If the nature of your business requires quickly adapting to new business needs--then make your software and systems easy to delete! Chances are good that they'll be obsolete/outdated in a few years, and everyone will be much happier if the system is easy to remove and replace.
How to implement that is very dependent on the nature of your systems, how you deploy, your company's tech and operational culture, etc. But it's very much worth thinking about: "If this project were to fail in the long run, how hard would it be to get rid of?" In large companies, lingering legacy systems can make or break an entire organization.
A cynical quip like Sustrik’s Law is meaningless to me without a deeper analysis. What mechanisms or conditions make this “law” true? And what are its actionable takeaways?
Edit: This doesn’t solve the problem of wasted team effort, obviously. This kind of effort ought to only be taken when you can’t get a potential customer to agree to signing a contract before doing some work on a new feature. But that can be difficult to do sometimes. Marrying sales and development is its own challenge.
>As lines of code increase development velocity decreases.
How do you make this clear to the CEO......................
- Alternatively -
Legacy Software: any software that has been deployed.
Even if the workflows are not outdated, the software libraries will likely be outdated.
In manufacturing, don't be surprised to see Windows 2000 boxes on the plant floor; also expect new hires to complain about AngularJS apps.
From my own experience, when you take over a project that has attempted the strangler pattern, but got cancelled the key is either to continue it (holding your nose if you don't like it), or remove it and go back to the old infrastructure. If the new pathway is better than the old, go with it until you can remove the old one. Even if you have a new super zappy way to do it, resist introducing it without "finishing" what what started earlier (one way or another).
Step one was to slowly rework things into logical services that still relied on the underlying legacy code. Eventually, we got to the point where cross functional domain logic was accessed via logical services rather than direct use of classes in other domains. Once we accomplished that, we slowly worked back down to understand, optimize, and refactor legacy code in digestible chunks. The final stage was to work up to the REST API level to create a more sane implementation of the web bits that preserved existing behavior.
It took the better part of a year to do all the work, but by going piece by piece (which allowed for understanding some wonky code), we had very, very, very few regressions and minimally impacted our overall ability to still deliver on the roadmap.
Regardless, had we gone with a strangler approach, we probably would've been done months ago. As it stands, there is at least another week or two of development via QA break/fixing until its ready to go live.
With that said, I still think there are times when a full rewrite makes sense. We redid our API two years ago when it was still small and I am happy we did the full rewrite. The old architecture just wouldn't have been able to scale with where we are now and I felt the code was just too far gone to be helped.
As always, life is full of learning experiences.
First time to get it wrong.
Second time to learn from your mistakes and get it less wrong.
Third time to try to ensure that all subsequent mistakes can be gracefully recovered from.
When given a new piece of work I find stage one tends to be my planning stage.
The related principle here is to ensure you have good boundaries in your application with minimal dependencies between each system so it's easy to rewrite portions of your app. Here is a talk on this: https://vimeo.com/108441214
But, for the love of programming, please seriously consider this gradual rewrite/strangler pattern/whatever you want to call it over the often disastrous complete rewrite.
The POC will have the advantage of taking a small feature, and reproducing it, but working 10 times better, faster etc.. However, what is unfortunate is that often times the POC team will say, yeah this is about 80% of the way there, we still don't have X, Y and Z implemented. But that's where the rub is - it turns out that X, Y and Z are often always under estimated. After all, there is a reason they chose to exclude it from the POC.
Beyond that, the POC is just 1 feature usually. What about the other 80 screens. The POC team will then say, well if this took 3 weeks, then 80*3 (as a worst case) will be 4 year project - but the upshot is that it can be paralleled and done a lot quicker.
This is how I always hear it. So they put in a plan to do it, and things just start unravelling as you have 1) stakeholders slowing down work with getting existing functionality. 2) you still need to tackle the last 20% (which might really be 50%) 3) You will constantly drain resources for the next year or 2 from the ACTIVE product, and that battle will be ongoing.
If you use the strangler pattern, you will basically be making incremental releases that use the new technology AND the old at the same time. You don't have to replace the last 20% until it's absolutely a good idea. You're able to get the instant benefits to the screens/features you want out the door, and basically train up the existing team. There are no two teams competing, they are all working on the next short-term release.
There's usually some expectation that the rewrite will make it 'quicker' to develop these new features so of course it will catch up. Eventually. After ten years of splitting the dev team to maintain two codebases. One of which has not produced any value in that time.
Alternatively the application featureset is frozen for the duration of the rewrite, and the company folds halfway through.
Complete re-writes can sometimes help you elliminate a lot of technical debt and move faster if you do it right the second time. You can use different architectures, better tooling etc. So sometimes rewrites can be useful.
In other cases, your system might be doing just fine, and it needs parts of it re-written to be more scalable/efficient/whatever.
It always depends on the specific circumstance you're in. When deciding between the options, it makes a lot of sense to think through these things for a while and make sure you're doing it for the right reasons.
In three different large teams (~100 people) during my career, I have witnessed the other kind of rewrite. People forgot how many things were working right, and had started to focus only on the warts. They overestimated their ability to redo what they’d done before, and underestimated how long it would take. Why does anyone assume a multi-year project will go any faster the second time? You need a lot of evidence for that, and I’ve never seen any. In all cases, the rush to rewrite quickly caused people to cut corners and introduce new design mistakes, ultimately ending up with something that was only marginally better after a heavy cost of several years’ development. In all cases I’ve seen, the people in charge admitted regretting the decision to rewrite code and told me they wished they’d done it more piece-meal.
Just a data point, but I have to wonder how often a clean rewrite actually happens. I’m looking for the link now, but I remember reading on Wikipedia that it’s estimated that 30% of software globally is late and over budget. I suspect that rewrites are more affected by Planning Fallacy than the first time through, it’s easy to assume you can do better. https://en.m.wikipedia.org/wiki/Planning_fallacy
Though on the same product I came across a mildly hideous half-baked attempt at a UI framework re-write. I got the story on that and it was definitely one of the time-wasting regret stories.
Software is mostly just hard and expensive.
Old version had almost all the logic in PL/SQL, some Qt forms for high-level management, and C++ console apps (warehouse processes) running on portable terminals through telnet (so in reality these were running on the server, and portable clients were telneting to it to control it).
First we introduced XML-based protocol between C++ app and portable terminals, and a .net client running on these terminals to replace telnet. This allowed for a simple graphical interface instead of text-only, so there was a good motivation for customers to upgrade. Also it separated the parts clearly and allowed to mix and match new and old parts in the system.
Then we introduced J2EE application server, exposed the database through hibernate, and new processes were written in java (jbpm to be specific). They still used the same XML protocol and .net client, so old C++ processes could call java processes and vice-versa.
New processes required the possibility to call PL/SQL logic so the features that were needed were exposed through J2EE services to the java processes.
Finally we added a way to write new management forms in Eclipse RCP.
We also planned on moving the logic from PL/SQL to J2EE completely, and becoming database independent, but we never got to that.
The rewrite was never completed, there was a merger in the meantime, and we switched tech again, at which point most of the team left :)
But what we finished was working reliably, no features were lost, and as far as I know some customers still use the old system, some use both, some use the new system.
The thing that made it easy to do, but also hard to finish - was the logic in PL/SQL. As long as we left that be it made moving everything else easy. But at the same time it was a constant temptation to just call the old PL/SQL function instead of writing a new J2EE service, and finish the task at hand faster.
I've also seen multiple attempts at replacing those PL/SQL systems, reimplementing the same functionality with Hibernate/Java (and I recall one instance of PHP being used) and they've all been slow (to use and develop), buggy and always lacking in functionality compared to the original. Basically all three of the previous problems or a disgusting combination of them.
In all of those replacement cases, there has been no objective reason to replace the entire system, maybe give the UI a facelift.
So here's my question, why rewrite it all, what's the reason these "rewrites" and "stanglings" are done to nicely working systems just not complying with what the newest fad is? Is PL/SQL or DBs such an arcane knowledge your devs did not understand it enough to give it a facelift? I'm genuinely just curious why these things happen, if I knew the reason maybe I could stop another service I have to use being turned to excrement.
But there were lots of other improvements - using jbpm for designing warehousing processes was a natural fit, much better than making persistent long-running processes with C++ using nested ifs in a while loop and serializing the state of the process with manual inserts and updates on each state change.
With jbpm you could see the whole state machine as a graph, move nodes around, insert new ones easily, and the persistence was automatic, including all the variables you use, which saved a lot of time and hard-to-track bugs (some combination of steps breaks the persistence the next time you enter this process - good luck fixing that and tracking what really happened on the warehouse before the process broke the persistence).
Regarding the speed we were actually slightly faster with the jbpm (mostly thanks to the hibernate 2nd level cache and optimistic locking). We measured the time on the portable devices between pressing a key and seeing the next screen, and because the bottleneck was PL/SQL procedures running selects to decide what to show and waiting on locks - the whole overhead of application server and jbpm and .net client was hidden in the savings thanks to the cache (and optimistic locking).
Also jbpm processes had versioning. Old versions were continuing to run and new instances were started with new version. Upgrading the processes with the C++ code was basically stop-the-world event.
We could have skipped the Eclipse RCP thing, though. The subteam that worked on that went a little over the top with architecture astronomy, there were like 4 sub-layers with 4 levels of configuration xmls :). And the framework built on qt we used previously for forms was quite nice already, arguably better.
And no - PL/SQL isn't arcane, and the whole team had lots of experience with it running the system for years.
> The ultimate goal was to become database-independent as far as I understand (I was just a junior dev, this was my first job, so I'm guessing, I wasn't making decisions).
Did you actually achieve that?
After one year, and lots of revelations about how much of the business actually is implemented in PL/SQL, they have yet to even port one small part of the system to the new stack. It's becoming clear how little they actually understand about the requirements of the software, and why a system like Oracle was chosen in the first place.
I'm also very confused about why people do this.
Eventually the monolith's trunk gets hollowed out as useful pieces become dependencyless libraries, and the tangled knot of rotting branches, vines, and strange green things with purple lumps starts to die back as a multitude of independent and healthy trunks grow from the surrounding earth.
At the risk of breaking the metaphor...
The downside of this as I've experienced it is the tech-stack bifurcates and it becomes harder to ramp up new people. Or only some people get to work on the new shiny stuff and others just don't.
It's probably the best approach if you can guarantee the frankenstein won't persist forever. I've seen it get several frankensteins deep. At that point you're never getting rid of it, and the problem compounds. But there are no right/wrong answers, it's all completely context dependent and perhaps that's the only configuration in which the company can survive.
I also think there are likely certain inflection points in terms of project size/complexity. If the whole thing would only take 3 months to re-write then that's probably a good option. If it would take 3 years, it's probably not feasible. You know what I mean? The software itself is an input into the function of 'can we move this to a new techstack/architecture?' and I feel like there are certain parameters which have a safe operating envelope within which it's workable to do a re-write and not have the old solution hang around, but outside that the parameters may just simply not allow for it as a possibility.
A third team, of like 3 devs, made a proof of concept using the Strangler pattern and just a simple API to interact with the existing database within a year and took over the "rewrite project".
Junior devs have less experience, but that also can mean less rigidity and a willingness to explore new options.
I'd like to hear more specifics on this project. Especially why if junior team A and senior team B were already working for 3 years, why was team C introduced? Did they save the day?
I'm really curious about the final outcome and any learnings you might have had.
“The experiment failed and had terrible consequences, releasing pure chaos and creating a distorted being ... that consumed the witch and her followers. The being became the source of all demons: The Bed of Chaos”
Sounds downright prophetic.
Maybe we spend too much time trying to design flexible programs that can be easily updated to support new requirements (which, in my experience, usually ends up just being extra complexity which must be maintained and never allowed the extra flexibility it was supposed to deliver), and it might be wiser to sometimes think about software which is easy to replace.
Adopted a similar pattern recently on a project, results aren't fully in yet, but it looks like it might work.
However, while I understand the reluctance about rewrites, in my personal experience they have actually been very successful.
So beware the absolutes. Always beware the absolutes ;-)
All categorical statements are bad — including this one ....
One more thing to add, is that a lot of software (e.g. SAAS) has users that depend on it. You want to make sure that your customers are happy through the transition.
I'm not sure how I feel about it. The quality of his articles are generally pretty good. But it seems a little reckless (or perhaps foolhardy) for just one person to dictate all new design patterns.
He is not dictating anything, he is offering you his advice. You too can describe a pattern you came up with, and submit it here as well