
The True Cost of Rewrites - telotortium
https://8thlight.com/blog/doug-bradbury/2018/11/27/true-cost-rewrites.html
======
twic
I see a lot of fear of rewrites in the comments. Naturally, someone has posted
that daft Spolsky article about how Netscape staying with the disastrous
Navigator 4 codebase would somehow have been a good idea. Rewrites are one of
the classic bogeymen in software.

So i just want to tell you that i've done successful rewrites!

The biggest was rewriting the e-commerce site of a well-known toy company.
They had a homegrown site written in .NET, and we rewrote it on top of a
commercial e-commerce framework in Java. It took a couple of years, the
rewrite team was 2-5 times the size of the team that built the old site, and
the client ended up with all the features of the old site, plus
internationalisation, plus thorough automated tests and deployment. They
seemed to be really happy with it.

The smallest was a data-aggregation server my team uses. It was written in
Node, and kept crashing under load. I rewrote it in Java. It was just me, and
it took a week or two. It hasn't crashed since, and we've been able to add
loads more features.

Success factors in both: putting effort into thoroughly understanding the old
thing; resisting any temptation to add new features unless you are
exceptionally confident in how much more work it will take; thoroughly testing
what you build as you go.

~~~
scotch_drinker
Everyone has done "successful" rewrites. But a rewrite that takes 2 years and
requires a team much larger than the original team isn't successful in the
eyes of most stakeholders and is almost NEVER what the dev team proposes as
the timeline. Just because something got done eventually at great cost doesn't
make it successful.

~~~
coding123
Our team is doing a slow replacement of code written in Java to node. At
almost every turn we find that the replaced code is 50 to 80 percent smaller
and much easier to read. We're doing it slow because we are phasing in new
features at the same time to make sure stakeholders get value while we do it.
It's been tremendously successful and cost effective keeping the old and the
new running simultaneously.

~~~
cpeterso
Awesome results! Is the new code smaller due to the different programming
language (JS) or a better understanding of the program's requirements?

~~~
coding123
Sadly just converting poorly written code that doesn't use re-use mostly.
Other than that, we're using multiple typescript frameworks for data wrangling
and transport and validation without having to re-define objects as DTOs,
Models, etc... And taking advantage of graphql has been, massively helpful.

------
arkh
There is a trick to rewrites: start by writing a full suite of end to end
tests. Once done you'll discover that you can easily stabilize your old code
and make changes with this security harness. You may discover that you don't
really need to rewrite anything.

If you have to maintain untested code I recommend you get yourself a copy of
"working effectively with legacy code" as it is mostly a list of recipes on
how to add tests to a codebase. I also recommend it to anyone starting to
write something new so they can learn what is useful to test.

Forget about unit testing and start working on end-to-end tests. Selenium,
Sikuli, Codeception, Wiremock, siege are the kind of tools you want more than
whateverUnit. Test your applications at the UI level. Test its performances.
Your client does not care about your design pattern usage: they want something
which respect their specifications.

~~~
wvenable
Tests fossilize a design. This is generally a good thing and allow you to
focus on refactoring and bug fixes with some confidence that you won't break
anything. It's also especially good for products that must adhere to specific
interface or API.

But when you want to re-write the last thing you want is to fossilize your
design. You explicitly don't want the same system you started with otherwise
it wouldn't be a rewrite, It would be a refactoring.

~~~
afpx
This isn’t necessarily the case. In my experience, the teams that want
rewrites have built big monoliths with no design, lots of coupling, and lots
of copy-and-paste code (basically a ‘big ball of mud’). So, what the rewrite
does is introduce an architecture that decouples (often valuable)
functionality into small, targeted functions and libraries. This enables
easier maintenance and modification. It also fixes the brittleness and
fragility.

So, the goal isn’t to keep the monolith as is. Instead, it’s to break up its
functionality into small parts. Then, the parts may be joined so that the old
interfaces work as they did before. But, they may also be used to build
entirely new interfaces.

------
jacquesm
A good rewrite is done bit-by-bit releasing the rewritten work immediately.
This forces you to focus on deliverable chunks rather than code-for-
code's-sake and to test your assumptions instantly. If you do not release your
work there is a very large chance that it will all be for nothing.

A good rewrite happens so subtly that end-users and operators will never
really realize that a rewrite is underway.

There are only very rare cases where a rewrite in big-bang fashion is
indicated and even then the bulk of those will be incompetence on the part of
the tech crew because they see no way to turn the job into an incremental one
(or do not want to see a way).

~~~
majikandy
Incremental rewrite isn’t a rewrite at all. It is the better choice though. It
is worth noting that refactoring and improving an area of the code without a
need to change it or build new features on it is also wasteful.

~~~
dahart
Your comment is great except for the first sentence. Maybe you could elaborate
on what you mean, but of course an incremental rewrite is a rewrite. All
rewrites are incremental, the proposal is to choose the order of increments of
a system rewrite such that it is fully working both before and after each
increment, rather than start from scratch and wait to find out if the full
system will work until and have feature parity after all increments are
complete. I’d guess you already knew that, and just have a notion that the
word “rewrite” means it needs to be from scratch and all at once for some
reason?

~~~
majikandy
I’m just trying to be absolute on the fact that incremental change of a single
running system is more efficient and shouldn’t be confused with ‘rewrite’
projects where a second system, that is not used by the business yet, is
developed while the business continue to use the ‘old’ one.

Calling incremental improvement a form of rewrite gives it a bad name :)

I’d argue that rewrites that are a second system that (hopefully) eventually
get deployed once the features exceed current system is not incremental at
all, but is instead Big Bang.

Even if the features of second system are built incrementally, if the business
aren’t using it until it is finished then it is Big Bang.

------
Maro
I've seen 2 non-trivial rewrites of the mainline software at 2 companies. Non-
trivial = 10-100M in yearly revenue.

In both cases:

\- it was promised to take 6-9 months, and ended up taking 3-4 years

\- it never really finished, the old software had to remain in production
(along with the new "rewrite")

\- in some form or the other, while the rewrite was happening, the company
lost its customer focus and/or its ability to innovate

\- good people left

Instead of rewrites, make incremental changes. The eng. team should never be
off doing its own thing.

Also, a good lesson from Facebook: instead of rewriting their PHP codebase,
they extended the language by creating the "PHP++" language Hack (along with
the HHVM runtime), and incrementally changed their codebase to take advantage
of Hack:

[http://bytepawn.com/hack-hhvm-second-system-
effect.html](http://bytepawn.com/hack-hhvm-second-system-effect.html)

~~~
traek
> Also, a good lesson from Facebook: instead of rewriting their PHP codebase,
> they extended the language by creating the "PHP++" language Hack (along with
> the HHVM runtime), and incrementally changed their codebase to take
> advantage of Hack

I agree with most of your comment, but I don’t think this part generalizes.
This isn’t feasible for most companies unless you’re operating at Facebook
scale.

~~~
Maro
I agree it doesn't generalize, but I don't agree that it's because of "scale".
It doesn't generalize because not every company should write its own language
(and can't hire good enough engineers for this). And, fortunately, most
"legacy" codebases aren't in "bad" languages, so there's no need for this.

Afaik, the team that did Hack/HHVM at Facebook is ~5 people. I don't think you
need scale for this (the rewrite of the code itself is not a scale thing, the
codebase is usually linear-ish in the number of engineers).

My point is: instead of doing a rewrite, be inventive and avoid it, and this
is a great example.

~~~
TazeTSchnitzel
What Facebook did amounts to rewriting PHP itself though.

------
lmilcin
I specialize in fixing bad software projects typically taking a role of
principal developer/architect. Over nearly 20 years of my experience every
rewrite I took part in failed by either not delivering the improvements, by
causing a lot of unplanned headaches for the business or by wildly missing all
planned deadlines and cost estimates.

On the other hand couple of years back I started purposefully taking up
projects in bad shape and fixing them by cutting discussions about rewrites
and instead focusing on putting work where it matters -- diagnosing the
problems and finding solutions to them.

I found that most of the teams I met stopped or indeed never really had any
practice of constantly diagnosing and improving their process and application.
This is usually the cause why the application is in bad shape but more
frequently than not the team will blame the organization, predecessors or
constraints like old technology they are working with. They channel their
frustration by focusing on the idea of the rewrite which seems like a relief
from the frustration of current codebase. Unfortunately, the rewrite is
performed using the same process that failed the previous version with
predictable results.

Think of a person that has messy house. This person does not have practice to
keep things clean and in order. The person decides he/she will fix the issue
by building another house and burning the old one with all belongings. The
result is predictable.

The solution is to learn to keep things clean and in order instead of burning
the old house and re-building it at great cost and effort.

~~~
Chyzwar
I agree that in most cases it is the fault of an organization that problems
were swept under the rug for far too long. There are cases were requirements
changed so much over the years that architecture of system would no longer
support the business without massive cost or overhead.

We have plenty of ideas on how to build new software: agile, TDD, DDD. It
seems that there is no proven process for doing rewrites.

~~~
lmilcin
I would not say in most cases organization is to blame, I would say in most
cases the blame is shared.

------
dkrikun
There is a lot of advice here to do incremental rewrites, refactoring,
unittesting, functional/system testing etc.

Undoubtfuly it is all mature and correct advise. However, I feel it is all
one-sided and I ought to provide some counter points:

1\. The codebase in question may be well beyond the line where any sane person
would touch it, seriously.

2\. Individuals experienced with the codebase, its structure, implementation,
technology stack might be not available (think cobol).

3\. Refactoring or incremental rewrite is a process that has to be planned and
managed s.t. current product/codebase structure. Oftencase it is this very
structure, which demands the full rewrite -- because of it being too
convoluted to allow refactoring to take place.

4\. You might want to do a rewrite to refresh the tech. stack -- language,
design, frameworks -- it is ok to do so.

5\. You do not necessarily need to come with the same feature set. Both you
and your customers might want to cut down unnecessary cruft. Technical debt
usually starts to show its signs with losing flexibility w.r.t user request to
change features.

6\. Occasionaly you might come up with a separate, different product, with a
different name and brand -- which might turn out to be even better!

7\. You learn a lot in the process.

~~~
weliketocode
I think this is a pretty reasonable set of counter points, even if I don't
agree with them.

Being part of team incremental rewrite, myself, I'd say that you're
overestimating the cost of the incremental rewrite and underestimating the
cost of the rewrite.

For your first 2 points especially, I'd argue if you can't even begin to
incrementally rewrite a system, you're in no position to begin to plan a full
rewrite.

~~~
dkrikun
It is mainly a matter of decision making. In my case, no refactoring was ever
sanctioned by itself. Devs were having been allowed to refactor anything on a
small scale and always backed up by a feature request. So when the desire to
somehow "heal" the codebase is met, it is usually because it is already too
"dead" to work with in the first place.

~~~
weliketocode
I'm not sure what you mean by your last sentence.

I'd say that if code is being used in production it's not yet "dead", and it
should be a much higher priority for team members to work with it.

Can you not add additional tests? Refactor out even a single feature at a time
into more modern tech?

------
flunhat
Relevant, from 2000: [https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

An interesting point: code is ( _probably_ ) not as ugly as it seems -- it
just looks that way because you didn't write it. What looks like cruft is
actually the accumulation of years of bug fixes and edge case handling.

~~~
rightbyte
... or accumulation of hacky extra features not in orginal scope and
architecture? In the beginning it might have been fine code, but not enough
time have been allocated to do new features right.

Rewrites might be the wrong term. "Refactoring" is a better. If you already
have functioning code you don't need to rewrite it, but fix the mess.

If you have undocumented legacy code that's a mess, and it doesn't work with
new quite different requirements, it might be faster to just rewrite most
parts and cherry pick code that seems to do edge catching stuff and opaque
interfacing with other black box systems.

I aim at spending half my time refactoring and documenting, so I'll in reality
end up with at-least some time. Random Company average I have been at is
probably around 0.1%.

~~~
ChrisSD
There's a very big difference between refactoring and rewriting, IMHO. As you
say, with rewriting you're throwing away functional code whereas with
refactoring you're shifting it around.

Both have their place but refactoring happens regularly whereas rewriting is a
more drastic option that should be used only when absolutely necessary. Having
a robust test suite is very important with rewrites to prevent regressions.

------
harryf
To me the biggest issue with re-writes is it tends to be Engineering working
in a vacuum without business input. What that tends leads to is “blind” 1-to-1
reimplementation of features without reflection on which features are _still_
bringing business value e.g. if most of your user base transitioned to mobile,
is it really worth reimplementing all things that were built with users IE on
Windows in mind?

Or put another way, the place where you can save a ton of effort in a rewrite
and increase your chances of success is in throwing stuff away, but you
usually need more than just the engineering team to achieve that.

~~~
Pamar
Oddly enough, my experience is that when IT asks Business "can we finally drop
support for feature X? We are pretty sure that Roman Numerals are not used
anymore for budget reports..." the answer is "I am not sure, maybe it will
become useful again in a year or two. Let it alone".

~~~
slededit
You haven’t communicated the cost. To the business manager it’s essentially
free to leave it in - so why not keep the optionality.

If you could estimate “keeping feature X costs $200,000 per year in added
development time” it would be an easy decision. Through this process of cost
estimating you may find that while annoying it only costs $10,000. Any ill
will generated from removing it would cost more than that so it should be left
alone.

~~~
Pamar
This has never worked. The project has a maintenance budget that covers
everything (bug fixing, new features, migration to new versions of OS, DB
etc.). Business requires a certain number of enhancements, the development
team is already understaffed and therefore they are often missing the required
number of enhancements in a specific release cycle.

So the full discussion goes something like this: _" Can we remove support for
Roman Numerals? We are pretty sure nobody uses these anymore, and we estimate
that this would cost 20 man/days in this release but result in a saving of 800
man days over the fiscal year..."_

 _" Not sure, it could be useful again, use these 20 days to add support for
Aztec calendar to the Insurance reports - you have already postponed this
twice!"_

(I.e. they get the impression that somehow there are 20 "extra" days available
and those have to be diverted to implements something that may have become
already obsolete).

~~~
slededit
Never lead a sales pitch with the price tag, always with the benefit.

The conversation shouldn’t be about it costing 20 days to remove something. It
should be about the net savings from doing so.

“Removing rarely used feature XYZ will open up 780 man-days of schedule time
this year for new features by improving our efficiency”

~~~
Pamar
I am working on a legacy application which brings in 90% of the revenue of the
company (imagine something managing bookings for an hotel chain: of course we
get also revenue for what guests pay during the stay, and maybe for cups,
pens, and bathrobes with our logo, but the vast majority comes from the
bookings, of course).

As such, every different part of the business may request changes/enhancements
- our team provides these to lots of different "business departments" across
the whole company.

Again, imagine a hotel chain with hotels all over the world, each national
branch may require specific changes due to local laws and regulations, or
because they need to start a new incentive campaign or participate in a joint
venture with a flight company or whatever.

There is one application, and N different (competing) "customers" each one
considering only their own specific plans and priorities. (In case of
conflicts, the pecking order will be used to solve who gets more attention:
biggest hotels, or hotels in regions that bring more revenue have more
"clout").

Now, when I say "we have to postpone your request for X in order to recoup 780
more days later" the guy in front of me will immediately conclude that he will
not necessarily get a bigger share of these 780 days - he will have to fight
for his piece just as strenuously as before, and in any case this will happen
maybe in three months, and he needs his stuff _yesterday_ \- so he better
insist to have his own specific request included in the release, no matter if
it costs 21 days, 20 days, 5 days: he wants this to be done because the rest
of his business needs it for a specific date, and everything else is just a
way for IT to postpone his request once again.

In other words: everybody wants their own specific request implemented as soon
as possible, and anything else has absolutely zero interest for them.
Especially if it is some promise of future "gains" from guys who are
constantly late.

In my experience, this is not so uncommon when you work on an app that has
been developed internally (and so there is no unified marketing department
which represents a single stakeholder) - note also that precisely because we
have to work for a myriad different "internal customers" we tend to accumulate
"technical debt" at a faster rate: there is only one codebase, and has to
accommodate all these pesky requirements from all over the world...

~~~
slededit
If you have multiple teams competing for features, it shouldn’t be too hard to
sell it that way.

“If we could open up 780 days on the schedule - what new features would you
like?” At this point you want the client dreaming of all the extras they never
thought they’d get to have.

Then at the end you mention the 20 day delay. At this point they will feel the
“loss” of the new features they just imagined. It’s an extremely powerful
sales technique that works almost everywhere.

Given your earlier comments about the extra 20 days they seem particularly
susceptible to this technique.

~~~
user5994461
If you announce that you have 4 man years free for new tasks, in a team that
may not even have 4 developers, after abandoning a feature that already
existed and worked, you're delusional and so is the stakeholder if he believes
it.

I agree on the principles nonetheless. You want clients to write a formal
request for new features, then development has a backlog of requests and can
prioritize them.

Clients might not like the prioritization but that's life, limited bandwidth,
it's simple to show there is too much to do and not enough resources.

~~~
slededit
I’m using the numbers I was given as I’m not in a place to judge if they are
realistic. Certainly if you make false promises no amount of sales techniques
will save you in the long run.

~~~
Pamar
The 200000 vs 10000 (and therefore the 20 vs. 780 man days) come straight from
your example. Which is part of the problem: while I know for sure that are a
lot of things that should be refactored/changed/improved/cut off it would be
very difficult for me to give an estimate of "how much resources we save over
the next year" \- as a simple example, I could revamp significantly the GUI
(God knows how much it needs it)... and in the next six months I get 85% of
the requests for changes having to do with business logic, financial batches
running at the end of the month and stuff like that, maybe because there are
new regulatory hurdles to clear like GDPR or The EU Travel Directive. Then the
promised extra resources that would have become "available" fail to
materialize, and the whole initiative is considered a failure. Good luck
trying this again on the next year.

~~~
slededit
If you can’t quantify the benefits then how do you know if it’s worth doing?

Everyone else has to justify their programs with a cost/benefit analysis. I
don’t think software should be immune.

~~~
Pamar
The benefit depends on what part of the system I decide to optimize. If I
guess right (and enhance something impacted by many/big requirements _in the
future_ ) I will reap good results. If I enhance something which stays
basically untouched for the rest of the FY I will be considered a fraud or
misguided.

Applications developed internally don't necessarily have a roadmap (or might
have it and abandon it 2 weeks into the new year because). Problem is, nobody
will consider you a hero for sticking to the (now obsolete) plan.

------
DanielBMarkham
I hate to criticize, but people need to know that this is a bad article. If I
had to summarize the errors into three sentences, they'd be "You don't know
how to write software so your software sucks. That means you won't know how to
do a rewrite either. Here's some math I made up."

The simplest way I can show the error? If making a new piece of software is so
expensive that it's far, far too expensive to do, how come folks enter the
market everyday with new software which keeps replacing all of that stuff you
think is irreplaceable? And they not only replace your stuff with better
stuff, they do it at a fraction of the cost you take just keeping the lights
on in your shop.

Math is not going to save you now. Looking at how fast features can be
deployed is sticking your head in your ass. It's counting things for the
purpose of counting them. It doesn't work like that. You may enjoy counting,
sgraphing, and tracking points-per-sprint or feature-speed-per-team, but
nobody buys or uses software based on feature count or team speed. It's not
that these things aren't important. It's that you're confusing managing things
with value creation.

I wrote an essay earlier this week about code budgets which I think can help a
little by at least helping force conversations around the real issues
involved. [http://tiny-giant-
books.com/1.html?EntryId=rec39SaDeZCZjauRo](http://tiny-giant-
books.com/1.html?EntryId=rec39SaDeZCZjauRo)

But the larger issue here is that organizations don't know how to create and
harness value. They're really good at hiring, managing, and a few other
things. But those other things don't come directly into play here. It'd be
great if they did, but they don't.

You don't know how to create value. Step 1 is admitting that. Without that
admission, no amount of charting or graphing is going to help. And yes, you
can't rebuild your software. Probably lucky you've got it up enough to provide
value right now. I'd hang on to it.

~~~
ChicagoDave
I second this rebuttal.

Incremental changes by domain, decomposition, and careful management of the
work is not only effective, but also necessary.

~~~
DanielBMarkham
I was leading an effort a while back to look at replacing a group of 40-odd
systems, tied together by batches, to run a large worldwide retail operation.

After scoping out the work, my recommendation? Build a small app to handle
receiving. You do receiving at all of your locations, it's being done by
several different separate systems, and it's an opportunity to write a small,
cross-platform app that can be used by anybody with zero training right away.

It was shot down! Why? _Because large projects aren 't done that way around
here_

That has nothing to do with anything, yet it prevented getting started
immediately.

As a hired-gun, I moved on to bigger and better things. The org dropped 100M+
on just the kind of rewrite this author is talking about before giving up in
failure. (Actually they changed the goalposts so that they won, then had a big
party. But there was very little done compared to the money they spent)

It's the wrong mental model. It's painful to watch, like a kid with a big
hammer trying to make a large circular block fit inside a small square hole.
It's not going to be good even if somehow you make it happen. It's going to be
ugly as crap. You end up destroying the thing you're trying to help.

~~~
mannykannot
> The org dropped 100M+ on just the kind of rewrite this author is talking
> about before giving up in failure.

That is evidence that the author has a point. He also said, at the very end,
that this is a two-part article, and part 2 will deal with how to address
these problems. I would not be surprised if he advocates something like the
approach you suggested in this case.

------
ccleve
"Don't pave cowpaths" is relevant here.

If the old software is convoluted, it may be because the old business
processes are convoluted.

A software rewrite should be done in the context of a business process
redesign. Consider how information flows through the organization. Do you
really need to do all the things you're doing now? Should different people or
departments get different information? Should the product you're selling or
the service you're providing change? Should parts of the process be outsourced
or insourced?

If an IT department is considering a big rewrite, but doesn't have the
authority to look at the business as a whole, then the project is being
managed at too low a level. Rewrites make the most sense in the context of a
change in the underlying business.

------
MikeTaylor
"We'll do a rewrite, simplify everything, get rid of all the edge-cases."

Guess what? It turns out that all those edge-cases were there for a reason.
You needed them. And by the time you've added them back, or equivalents, two
things have happened. First, you've burned an order of magnitude more time
than you budgeted for the rewrite. And second, you have a codebase every bit
as gnarly as the one you started with.

First rule of rewrites: don't do it. Second rule of rewrites (for experts
only): don't do it yet.

~~~
FeepingCreature
Sometimes the reason has not been relevant for ten years, though, like when a
codebase is filled with horrible hacks to account for the fact that you only
have 64MB of RAM, while running in a vm with 4GB.

~~~
perlgeek
That is right, but you should be able to point at (preferable several) such
assumptions that went into the original system, and verify that they are no
longer necessary.

Maybe you used to have much less RAM in the past, but also much less data, or
not as many users etc.

------
hawski
A rewrite can also mean a gradual change. A great example of this is the
rewrite of 0install in OCaml from Python [0]. With such an approach you change
module by module with old code and new code working side by side. It is of
course not so easy, but it makes it possible to do on-line. A bit like with
bits of Rust code that go in Firefox with each version.

I would like to know if anyone participated in a similar gradual rewrite and
how it went.

It may be very hard for some architectures and also sometimes wrong
architecture (prototype became production, some features were dropped along
the way, a new feature is dog slow) may require big rewrite, even breaking
APIs, then it would be impossible. But if one keeps API one can at least do
comparison tests as more features are implemented.

[0] [http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
ret...](http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
retrospective/)

------
sinuhe69
Rewrite from scratch is a very risky business, especially if it is conducted
under the technical view only. The only exception is when the rewrite-team is
the same as the original development team, which is almost never the case,
except in the continuous delivery model. For me, the consequence is clear: \-
The rewrite cost is so high, it never justifies to do architectural and
software engineering sloppily, even in an agile approach. \- Rewrite only
critical pieces of software is always a better approach to prefer. In most
cases, rewrite just a few pieces of software can satisfy the requirement at
least to 80%, be it better performance or higher stability or interfacing with
other software. \- Therefore, for any big piece of software, modularisation is
always helpful.

------
guitarbill
Sounds about right, but:

> As people move off of the team, features are forgotten and misunderstood;
> but as long as those features continue to work, there will be customers
> continuing to depend on them.

You have this in the non-rewrite scenario, too, it's just part of the
technical debt, so it's less obvious to see. And it will bite you just as hard
if you need to update the feature.

So because rewrites are good at uncovering unknown unknowns, and it might be
better to say that the technical debt was underestimated.

------
vbezhenar
I'm approaching the rewrite of one old system in the next year (not really
large, I'm estimating that this rewrite will take 2 years for few developers,
probably 100-300 KLoC in the end). It's really hard to extend that system. It
uses Oracle 9. It's written with Delphi 7 with half of the code being
autogenerated PL/SQL procedures (those visual tools for users, too bad that
users don't use them anyway). It's distributed, because in 1998 there was no
network access everywhere, so there's hierarchical export-import over CD all
over the country (of course now everything is connected, so those export-
imports are just mailed). Build process is another nightmare. It consists of
several DLLs. Every DLL is a separate project which uses its own set of
components. You need separate VM to develop and build every component. Sources
for some components are lost. DB schema is insane.

Customer wants to migrate from Oracle (because it costs quite a fortune) to
free PostgreSQL and he wants to use browser. Also a lot of things changed in
those 20 years and many functions are just not used anymore. I think that
rewrite is quite justified in this case (will rewrite in Java).

------
thebuzzsaw
My team recently completed a rewrite of a major system. It was approached with
the following goals.

\- Port from Java to C#. The Java teams no longer maintained it beyond
critical fixes. Many original experts were gone. My team had more vested
interest and expertise. We _wanted_ to be in charge of this system, and we
work in C#.

\- Switch from horizontal processing to vertical processing. The original
system would query one record, query one associated record, query yet another
associated record, etc. to form the complete picture of one entity. The new
system reads and writes batches of homogeneous data. (Kind of a leap from OOP
to DOD in terms of database accesses.) This optimization was largely necessary
as the old system could not keep up with load spikes.

\- Along with the previous change, we completely separated the data processing
into two giant phases: one read-only step and one write-only step. This allows
the read-only step to target replication databases. This reduces the stress on
the primary database, which is a huge win.

\- Detach the system so that it can be run, tested, and released in isolation.
The original system was part of a much larger whole, so any fixes or
enhancements had to wait for weekly or biweekly monolithic releases.

Everything you expect from a rewrite happened but not to a devastating degree.
It went overbudget, missed key bug fixes, etc. but we worked through them and
came out the other side with all the wins we hoped for. Being able to deploy
multiple times within a single day is amazing. Our iteration time on new bugs
or features is very small. The overall architecture is simpler and easier to
navigate. I have confidence that a member of my team could enter the code base
cold and fix a bug within a day.

So I dunno. AMA?

~~~
paulddraper
> We wanted to be in charge of this system, and we work in C#.

What was the estimated cost of developers being literate in more than one
language? How significant was this in the decision?

~~~
thebuzzsaw
I don't think it had a visible cost in our case. For starters, my team
understood the system adequately to not need to reference the original source
code so closely. (Much of the work happened before even beginning to read the
original source code.) Secondly, we all had enough Java exposure to be
comfortable porting behavior over when it was needed.

------
foobiekr
Huge, ugly, broken legacy systems are a bear to work with and I've mostly
spent time working on them when not at startups. The most successful project
of my career was a facade-and-rewrite in which I played a significant role and
has billions of $ of annual revenue behind it now, so I think I understand
this issue fairly well. I also played a significant role in another similar
project that, after I left, spiraled completely out of control and became a
disaster that, 8 years later, is still not finished.

The "this is dumb, we should rewrite it" reaction people tend to have when
exposed to very large, complex legacy software systems is totally
understandable, but really comes from a fundamental misunderstanding of the
forces that drive software development at organizations which have legacy code
bases. Yes, the re-write, if it gets "done", is basically guaranteed to be
"simpler and cleaner" when it arrives; of course it is! It is being compared
to a code base that's 20 years old, not just from an era where the
technological compromises were completely different, but also from essentially
a different world of what was acceptable at the time. In addition to that, all
software that lives long enough tends to become a mess. Let's see how the
rewrite is in 20 years.

Oh, but wait, this time will be different. This time we won't make quick fixes
and little hacks, we'll be diligent about requirements, we'll refactor and
clean up when change is needed ...

Everyone who has not should read "The Big Ball of Mud" (
[http://www.laputan.org/mud/](http://www.laputan.org/mud/) ) which is about
the clearest description I have seen of the reality of legacy systems.

------
kyberias
I think it's important to distinguish full rewrites and incremental rewrites.
The latter is possible if you have some tests in place.

------
mmsimanga
Coming from data warehousing. Rewrites are normally instigated to get rid of
data silos. Put everything in one place and model it the say way. Very few of
these projects are ever finished and the irony is that each project ends up
being a data silo. It is not unusual to have three so data warehouses within
an organisation. Each just slightly different from the next but with 80% the
same information. Turns out that last 20% in the 80/20 principle is very hard.

------
mannykannot
The author's formulae leave an interaction out: as the project is delayed, the
catch-up costs increase as a consequence. Furthermore, if the delay leads to
an unanticipated shifting of resources back to the old system to keep it
running, that can add to the delay. Catch-up cost has a dependency on
undiscovered scope.

It is also possible that some of the unknown scope becomes obsolete, but if
that is happening, then you are already in a long-drawn-out process.

------
revskill
If a rewrite introduced a stable, performant, secure, extensible architecture,
than it's worth it. Else just improve it incrementally instead.

~~~
jacquesm
The two are not mutually exclusive. A rewrite can be done incrementally. In
fact, that's the best way to do it.

~~~
revskill
Ah, what i mean is, for the latter part, do not rewrite from scratch, instead
change the old architecture incrementally.

------
rafiki6
As a general rule in my experience, a re-write will take at least as long as
writing the current software. There's a reason that much of our modern banking
infrastructure still lives in COBOL. But that's not a bad thing. It makes a
ton of sense. The cost/benefit of a rewrite should be done beforehand and this
should be taken into consideration. You'd be surprised at how rarely that
happens. It's a false and incorrect assumption that re-writes are cheaper or
faster (you are basically re-learning the entire specification of an
application). But re-writes also offer a ton of opportunity and benefits and
if those are important to your business, then proceed with a re-write.

------
EugeneOZ
If code reached the point where it can't evolve anymore - it should be
rewritten. Otherwise it will be dead code and there's not so many fields where
dead code can be used without killing everything around.

------
wolco
Rewrites mostly fail because business logic is assumed but never fully
understood at a detailed level. Multiply by the size of the rewrite and things
get out of hand quickly.

The best way to go through a rewrite or upgrade is to get the business
involved at the start and throughout the process. If you fail to do this when
something is missed, overlooked or done differently and you will be at fault.
If you include them it becomes a not feature not necessary.

------
pixelmonkey
This is a fascinating thread on HN because of how divisive the issue of
software rewrites is. I wonder, what do people think of rewrites that are
driven by emergent scale and business requirements (rather than technical
bitrot or code smells)?

I've been on a project where we had a working system, but it had some severe
technical platform & product value limitations, and we knew those limitations
were costing us real $$$, both in support burden and market share vs legacy
incumbents and competitors.

Plus, we had a "ticking time bomb" because it was a large-scale data system,
but the prototype (which became prod) was not designed up-front to handle
horizontal sharding, and we were at the limits of vertical scaling, and were
projected to hit the "max limit" of that vertical scaling within 12 months,
given current growth rate.

Thus, we began a rewrite, with full knowledge of how dangerous it was -- we
even circulated Brooks's essay on "the second system effect" and had several
team discussions about it during the specification stage of the rewrite.

In the end, the project was a success, and powered 3+ years of scaled growth
(the current live data storage of the system is 100x what it was when the
rewrite began, and our "hard limit" was around 2x). The rewrite also helped us
make the system more scalable, competitive, and mature, not by throwing away
edge cases, but by choosing an architecture that didn't cut corners on areas
that, we discovered during user feedback from the v1, were non-negotiable core
areas of value for our customer use cases. We had relaxed many requirements in
the "prototype stage" of the v1, merely in the interest of getting _something_
working out the door in front of customers.

The last piece of de-risking we did is to run _both systems in parallel with
our users for several months_. This allowed us to e.g. let 10% of our users
into the new system at first, ensure we weren't breaking any of their use
cases, then let 20% in, 50% in, and so on. We could also do user interviews
throughout. Since the new system involved not just a better data backend, but
also faster response times, a modernized UI, and many new features, lots of
people wanted in. We even had a waitlist, at one point.

Then, we cut the stragglers over, and cut the old system loose -- which felt
great, BTW! Running two production systems in parallel isn't easy, but was
absolutely the right thing to do.

With hindsight being 20/20, we feel firmly that our first system was a
"prototype that went to prod", and that we followed Brooks's advice to "plan
to throw one away, because you will anyway". And that we executed a
"successful rewrite". But it certainly wasn't easy.

I'm really proud of that project, but I also feel it was a bit of a harrowing
experience, especially near the end, when we were concerned some "showstopping
bugs" were going to keep the progress bar at 99% for a couple extra months.
But we made it through.

Perhaps the reason my outcome is better is because the _need_ to rewrite
wasn't driven by a framework or architecture du jour, but by real business
requirements and real scaling requirements. Even then, I think sometimes those
requirements can be overstated and the ability for an existing architecture to
cope can be understated. I feel confident we made the right call, but I think
it takes real expertise -- and healthy dose of skepticism -- to take on the
full rewrite risk with eyes wide open.

(p.s. now, 3 years later, the same team is being _forced_ to rewrite a
significant portion of the backend, not for any business requirement or scale
reason, but because of bitrot of a stable open source database engine version
which needs to be upgraded to avoid EOL, and wherein the new version
introduces backwards-incompatible breaking changes to the API and schemas. At
least in this case, it's "only" a backend migration, and not a total rewrite.
But, I'll tell you that it sucks to realize this is _just_ required
maintenance, thus a pure development cost with little customer benefit, rather
than a project to introduce a step-level change to the product and business.
C'est la vie!)

------
hnruss
My team has been rewriting a winforms application in JS. We just reached
feature parity after ~4 years. In order to continue delivering new features
without having to catch up with the old application, we released both the new
and old applications together. This strategy worked well for us, as it allowed
customers the ability to fall-back to the old application and create requests
for missing features in the new application.

------
networkimprov
Odd that increasing the Pace of the rebuild team from 1.5 to 2.0 has no impact
on the catch-up factor.

The faster they work, the more over-engineered the results? :-)

------
al2o3cr
IME, the chief roadblock to a successful rewrite is the organization: an
organization that produced poorly-factored, unmaintainable code that NEEDS a
rewrite is not well-equipped to do something different the second time.

------
timwaagh
i think the point to start rewriting is when nobody still on the team can
maintain the product. i recently got pretty close. i had been researching
something for months (speeding up a login), but to no avail. so to avoid
looking even more incompetent i said, you know what, an SSO app should be
relatively simple to create. just slap some strings together and youre done.
it would be quicker than banging my head against the wall for another three
months. but somebody on another team still had knowledge and this ultimately
gave us the idea to fix it.

------
mac01021
> What’s a better way? I’ll propose an alternative to you in my next blog
> post!

After a clifhanger like that, it better be good. :)

~~~
olooney
It's pretty obvious what part II will be, considering Part I is basically
arguing that a full, ground-up, greenfield, throw-away-everything-and-start-
from-a-blinking-cursor rewrite is an expensive mistake.

Half the comments on this thread have already alluded to it: A careful, piece-
by-piece replacement, delivering the new system incrementally (running in
parallel with the old system if need be.) Or even a gradual refactoring of the
original code base if the base technology wasn't the problem. Read the success
stories in this thread - you'll see they tend to follow this pattern.

------
est
cost of rewrite is 1x, unless you have to maintain the old ones and replace
them in parallel.

------
BenoitEssiambre
I think the "undiscovered scope" part is where most people trip.

You can think of software and how it fits with its users purposes as having
multiple layers of features, interactions and details.

The top few features usually each have a few sub features that you have to
tailor correctly to your users work flows, each of these sub feature often
have sub details, interactions with other features, corner cases, data formats
or cross compatibility considerations.

It's a fundamentally mathematical issue. Adding a layer of detail is an
exponential proposition and the exponential function is explosive in nature.
If your software only has 3 major features and each have 3 sub features, that
is 9 things to consider. If each sub feature has 3 corner cases you need to
get right, that is 27 things. And if each of these 27 things have 3 details
that gets you to 81 considerations. The next level is 243. Each of the 243
points tend to take a similar ammount of time to plan out and build weather
it's at the top or the bottom of the pyramid.

As a piece of software evolves over the years, the intricate details at the
bottom get sculpted out. The software's fit to its purposes can become very
fine.

The thing is, people tend to think of software in terms of its top 2-3 layers
of details, its 27 most important features. The finer complexity is often not
as visible and it's just difficult for humans to keep so many items in mind.

This is true of any complex technologies. People tend to think of cars as
machines that have properties concerning speed and direction and suspensions
and breaking but rarely think of the complex chemistry of the fuel aeration
and combustion process, the complex internal forces and velocities of parts in
the transmission, the carefully tweaked metallurgic alloy work that has gone
in each of the thousands of metal parts, the carefully chosen properties of
the plastics and the rubbers, the thermal properties of everything, the
hundreds of electronic systems, the analog circuits, the thousands of
specification lines of many dozens of communication protocols for controlling
all these parts, the entertainment system, acoustic properties etc, etc, etc.
All these things took decades to evolve and refine.

It's easy when planning a rewrite, to plan out the few dozen items visible at
the top of the iceberg and underestimate the amount of important details
hiding bellow.

Sometimes, some of the details might not be important, but often times they
are the essence of your business case and at the core of the favourable
economics of your product. It's the years of knowledge accumulated in the
subtle details that provide you with a moat and makes it difficult for your
product to be copied.

If you rebuild with mostly the top few dozens of features in mind, and only
vague ideas about everything else, you are likely to be creating a commodity
solution, one that your competitors will have a much easier time to copy than
your older finely tailored solution.

------
neocodesoftware
anyone know of an articles with references or real data?

------
Illniyar
There are very few axioms of software development that are applicable
everywhere. "Never rewrite" is one of the few that is valid in all situations.

~~~
petepete
"except when the original version used MongoDB as a relational store and a
home-grown web framework that was a poor mans' Django/Rails (except slower,
undocumented and incredibly resource hungry) where moving bit by bit to
PostgreSQL wasn't an option, then totally rewrite it properly"

~~~
adrianhel
Better yet: Stick with Mongo, create regression tests, switch out or simplify
the framework and add integration testing.

It is often viable to take the meat of a system and place it in a new
framework with some alterations.

And for the performance issue, just identify what causes it by timing parts of
your code. I'm guessing home grown database wrapper.

