
Red Flags Signaling That a Rebuild Will Fail - pkcsecurity
http://www.pkc.io/blog/five-red-flags-signaling-your-rebuild-will-fail/
======
nostrademons
#5 has a converse - oftentimes, the only way to get a rebuild to _succeed_ is
to drop features, and it's a major red flag if management insists on 100%
feature parity.

The way to distinguish this from the #5 situation in the article is to ask if
you're dropping features _because they 're hard_ or _because nobody uses
them_. The former is a red flag; the latter is a green flag. Before you embark
on a rebuild, you should have solid data (ideally backed up by logs) about
which features your users are using, which ones they care about, which ones
are "nice to haves", which ones were very necessary to get to the stage you're
at now but have lost their importance in the current business environment, and
which ones were outright mistakes. And you should be able to identify at least
half a dozen features in the last 3 categories that you can commit to cutting.
Otherwise it's likely that the rewrite will contain all the complexity of the
original system, but without the institutional knowledge built up on how to
manage that complexity.

~~~
lordofmoria
> Before you embark on a rebuild, you should have solid data (ideally backed
> up by logs) about which features your users are using, which ones they care
> about, which ones are "nice to haves", which ones were very necessary to get
> to the stage you're at now but have lost their importance in the current
> business environment, and which ones were outright mistakes.

This is so important. I've been on many a project where, 3 months in, we wish
we had historical tracking data on user activity to back up our instincts to
cut a particular feature that seems worthless. The worst part? Even if you add
it immediately, you'll have to wait 2-4 weeks to get a sufficient amount of
data.

~~~
manicdee
Also important to realise that a feature that is rarely used (view history,
remove user) might be more important than one used more often (dashboard
widget that nobody pays attention to)

~~~
Cthulhu_
Yup; statistics are only part of the picture and value of a story. Compliancy
is another one for example; sure, few people will use the 'download all my
data' and 'delete my account' options, but they're mandatory for GDPR
compliance and not offering them may cause a huge fine. There's a lot of these
compliancy features.

------
maxxxxx
"Red Flag #4: You aren’t working with people who were experts in the old
system.”

I think this is most important. A lot of people want to rewrite because they
don't understand the current system and don't want to bother learning. Before
you rewrite you really should understand the current state deeply.

~~~
pbreit
This strikes me as dangerous. Didn't the experts build the first system? Don't
you want to deliver a fresher system? Won't the experts be attached to the old
way of doing things?

~~~
zbentley
> Didn't the experts build the first system? Don't you want to deliver a
> fresher system? Won't the experts be attached to the old way of doing
> things?

With all respect, that means you should not be in a position to rewrite legacy
code, or to commit others to such a rewrite.

If all the experts you have worked with have been, in your eyes, overly
attached to the old way of doing things, you have one of two issues:

\- You have not had enough experience in the field, and have not worked with
experts that _actually have perspective_ about when/how to rewrite, abandon,
or rework their code.

\- You have dogmatically condemned people who think that the latest-and-
greatest tech may not be a good solution to the problems at hand to the "old
fogey" bin.

Either issue means you're not ready to make decisions at this level. Learn
more. Research more. Watch more. Listen more.

Weirdly, gaining this perspective has less to do (in my experience) with years
on the job, and more with diversity of team/business environments worked in.

------
tomelders
I’ve carved a career out of rebuilds. I’m working on a rebuild right now.
There’s a ton of companies out there who’ve done very well with their home
grown antiquated systems from the late 90’s and early 00’s that are now facing
stiff competition from young upstarts who had feature parity from day one and
are knocking out new features at break neck pace because they’re leveraging
the latest and greatest in tools, technology, and thinking.

I’ve always been a big believer in rebuilding your product from the ground up.
I think it’s something you should always have going on in the background. Just
a couple of devs whose job it is to try and rebuild your thing from scratch.
Maybe you’ll never use the new version. But I think it’s a great way to better
understand your product and make sure there’s no dark corners that no one dare
touch because they don’t understand what it does, how it does it, or why it
does it the way it does.

And I’ve always believed that if you don’t want to rebuild your app from
scratch, then don’t worry, a competitor will do it for you.

So I agree with every point raised in this article. And I think it does a
great job of articulating the issues that often go unspoken. But I’d like to
add one more. And for me, this is the biggest issue for any company wanting to
rebuild it’s product.

If your sales team has more clout than your designers and developers, then
you’re fucked. And in the enterprise software world, this is the norm. An
uncheked sales team that get’s whatever it wants has already killed your
product and made it impossible to rebuild. Their demands are ad-hoc,
nonsensical, and always urgent. So urgent that proper testing and
documentation are not valid reasons to prevent a release. Their demands are
driven by their sales targets, and the promises they make to clients are born
out of ignorance of what what your product does, and how it does it.

This is not true of all companies. Many companies find a reasonable balance
between the insatiable demands of a sales force and the weary cautiousness of
their engineers. But if your company submits to every wish and whim of your
sales team, and you attempt to rebuild your product, then you’re screwed.

~~~
flukus
> I’ve carved a career out of rebuilds

What's your learning process? If you don't do maintenance how do you know your
rebuilds aren't creating the same problems that lead to the systems needing
replacement?

I've got a very well founded distrust of people that only work on green field
projects, they're generally responsible for the system's that need rebuilding.

~~~
omeid2
I have also come to believe that people who jump to rebuilds also tend to have
very shallow technical skills and are not keen or capable of studying and
analyzing a system at depth.

~~~
tomelders
Complete nonsense.

------
Ensorceled
Red Flag #6: Key stake holders keep moving the goal posts.

If your goal moves from feature comparable but on a modern platform, to new
features, to a complete reinventing of the product all without actually
shipping ... you might be in trouble.

I had a rebuild go 6 months over. In the heated executive meeting at t+3
months I was called to defend my team and pointed out that the VP Product had
just delivered “final” specs literally the day before. How could we be on
track with development if PM is 3 months past “end of development” with design
specifications. The fact that the specs were changing weekly because “we’re
agile” is a whole other issue.

~~~
Cthulhu_
> The fact that the specs were changing weekly because “we’re agile” is a
> whole other issue.

The article touches on that too; simplified it's stating that if you're not
live within 6 months, you're doing waterfall.

~~~
Ensorceled
That’s not waterfall. Waterfall you don’t _start_ dev before specs are final.

Waterfall isn’t just a synonym for “the wrong way to do it” :-)

------
alkonaut
The truth I think is more often that the legacy system is too old and brittle
to improve, and customers are demanding ever more complicated features from
it.

So you rebuild as a new system as a gamble, because even though it shows all
the traits described, the new system is at least one that anyone is willing to
develop, and one where features can be added, and to which people can be
recruited.

We know big rebuilds have small chances of sucess. But that doesn’t mean you
shouldn’t do big rewrites. You are in a bad place if you even consider. Maybe
the big rewrite means the company has an 80% risk of going under. Still could
be that safe bet.

------
bkovacev
This is yet another article where there's a clear managerial-only approach.
Sorry, but I dont dig this.

As a developer you're constantly fighting managers who want to rush things to
get them out and who will eventually blame you for a bug/non-defined behavior
once you hit a certain milestone.

To me it seems the author of the article doesn't understand the tech debt. If
you've ever worked in a startup you'd know that the requirements are ever-
changing, thus that if a certain payment system is put in place, it might
evolve to the point where you really need to refactor it and in order to
enable the refactor you have to refactor the whole business flow as well. If
there's more than 2-3 features affected by a new feature, a big refactor is
definitely needed.

Only one solution offered, which I dont think is adequate because why would I
leave something in that was only meant to provide value for short term and
then build on top of it till I kill the old system?

~~~
LolNoGenerics
His argument is against rewriting a whole codebase. Refactoring is surely an
alternative.

------
gaius
Missing the biggest red flag of all, engineers wanting to just play with new
toys and pad their CVs. Ask the engineers why they want to rebuild and listen
carefully to the answer and if it’s vague handwaving and buzzwords
(microservices! Containers! New JS framework!) and no hard numbers to justify
it, just say no.

For example “we spend X/year on AWS but if we spend Y to rewrite in C++ we
need fewer VMs and can cut that to Z/year” is simple calculations. If your
engineers can’t even do that, their motives are suspect.

~~~
ebiester
On the other hand, “we cannot hire anyone to work in COBOL/Perl 5.8/Tcl/other
outdated language” is a very real problem. It turns out that 2018, developers
are judged for working too long in old technologies even when we know as in
industry that a developer can learn a new language.

~~~
gaius
I wonder if that’s really true. I bet loads of people would be delighted for
the chance to go on using their old favourites.

~~~
wffurr
It's absolutely true. People with enough experience to have "old favorites"
tend to be very senior and expensive or retired.

New grads and junior engineers can end up trapped in a career dead end if
their first job is on seriously old legacy tech.

[https://medium.com/@csixty4/pick-was-post-relational-
before-...](https://medium.com/@csixty4/pick-was-post-relational-before-it-
was-cool-ed610c9d0f14)

I almost fell in the same trap, but quit a similar job to go back to grad
school and get my Master's in CS.

------
pspeter3
I think people also deeply underestimate the time it will take. We've
undergone an incremental rewrite for ~4 years at Asana.

~~~
jupake
Used your software once before. Loved it! You guys should do a blog post about
your rewrite experience. Would love to know what your tech stack was and what
your new one looks like.

~~~
toshaga
This could be relevant: [https://blog.asana.com/2017/08/performance-asana-app-
rewrite...](https://blog.asana.com/2017/08/performance-asana-app-rewrite/)

~~~
pspeter3
That is definitely the best out there. I'm hoping to write another about what
our current stack is.

------
solox3
With this good article I think I have a good question.

The reference to Martin Fowler’s strangler pattern
([https://www.martinfowler.com/bliki/StranglerApplication.html](https://www.martinfowler.com/bliki/StranglerApplication.html))
was mentioned in the article to grow the new system in the same codebase until
the old system is strangled. In my case (Ionic 1 to 2) however, both the
entire framework and the language are different. How should the strangler
pattern work in this case?

~~~
twunde
For webapps you would use a reverse proxy such as nginx or haproxy and replace
your application page by page. Then configure the reverse proxy to send all
requests to /home to go to the new stack and all other requests go to the old
stack. Then flip the switch for every page you finish converting. For backend
work, it's similar. You can have an api built in a new stack and it can just
have a different endpoint or use a reverse proxy. Backend workers can pick up
work from a different queue or you can switch the old job worker off and turn
on the new one, and then monitor that everything is working as planned. The
really important thing about the strangler pattern is that you need some easy
way to turn on bits of functionality while turning off the corresponding old
parts. It can be feature flags, it can be routing middleware. You can rip out
the guts of the angular routing mechanism and use that to flip the switch.

~~~
wink
Seconded. Took part in a moderately big rewrite with this strategy and it
worked pretty well.

Identify key components and subsystems and rewrite them one by one. From the
outside you seem to be switching over one REST endpoint after the other, but
of course internally it's a bit more difficult, but applications often enough
have enough parts that are not SO intertwined that you can do stuff like this.
It's a bit related to how you break up a monolith. Find bigger, less coupled
parts and shave them off and just touch the glue code.

------
lyqwyd
This article really captures the risks of a rebuild. I’ve been through a
number of them, all but 1 abject failures. The one success was driven by the
executive understanding that the company would fail without a rebounds, and it
was still 6 months late, resulted in one of the cofounders being fired, an
extremely painful rollout, and the company still failed, due to other
problems.

My firm belief is that when you need a rebuild, you are already well into a
fail state as a company. Not to stay there can be no recovery, but it is an
indication of some deep problems for the company, beyond anything the
engineering department alone can resolve... and if the rebuild is not coming
from the executive leadership, it is an even bigger issue as it will more
likely lead to bigger problems than it will solve.

------
ellimilial
This is gold.

I've become a member of a team the company scrambled to deal with a `legacy`
python/SQL - based ingestion/storage system in an effort to 'harden' it.
Despite my best efforts, we are going for a full rewrite into
java/spring/avro/mongo/es. We have internal users talking SQL and utilising
the system at the moment, a fair amount of data is relational.

I have run out of ideas how to convince the team and stakeholders, will have a
one-shot chance to talk to VP. Any ideas how to voice the concerns about the
full re-design (perhaps I'm just being difficult)?

~~~
sonnyblarney
1\. Given the risk, cost and limited upside, the onus is on the refactor team
to prove that it needs to be done. Where is the ROI, factor in the risk. Where
is this in the stack of things to do? Are there better ROI things?

2\. Consider 'what the point' is in the first place, because the entire world
could be run on python/SQL and it would be 'hard'. I don't think anyone would
consider 'Mongo' to be 'hard' usually people use it because it's fast and
easy, not hard. Consider maybe only replacing one part at a time, i.e. Java-
SQL.

3\. Consider a simple clean up or refactor. No need to learn no languages and
tools when maybe you just need a house clean.

4\. People seem to be going back to SQL because of it's inherent
standardization - so many reporting and analysis systems use SQL as an
interface, to the point where even NoSQLs are starting to use SQL.

~~~
pedalpete
I'm a big supporter of "replacing one part at a time", and wish I had done
that on a rebuild I'm just completing.

In fact, I thought I was. We split our app into 3 parts, rebuilt part 1, then
part 2, but part 1 couldn't be released to customers until part 2 was done,
and we kept our legacy system supporting the majority of our users until we
are done with part 3, which is nearing completion now.

I thought that was "replacing one piece at a time", but it isn't most users
aren't touching it until part 3 is done, and at that point, they are
experiencing a new system from scratch.

------
rwmj
Is "rebuild" new jargon for "rewrite", or does it mean something different? I
thought the article was going to be about builds failing.

~~~
ConceptJunkie
Yeah, I did too until I started reading the article.

Using the normal sense of "rebuild" didn't make sense.

------
wellpast
Red Flag #1 should be that you’re doing a rebuild.

~~~
CydeWeys
I'm potentially looking at a situation like this right now at work. We're on a
NoSQL DB and it's just not working too well for us anymore, so we would like
to transition to something that provides more relational semantics (PostGres,
Spanner, something like that). Migrating the backend between one kind of DB
and another is non-trivial, especially because the whole ORM needs to be
ripped out as well. It's not a full rebuild of the application but it's
definitely substantial in effort level.

Sometimes a rebuild is just necessary, because you are on a tech stack that is
no longer working for you, for whatever reason. How would you solve that kind
of problem?

~~~
grey-area
I'd definitely vote for PostgreSQL, it can handle large loads effortlessly,
it's reliable, and yet they keep adding great features.

It could also function pretty much like a nosql db initially, to ease your
transition, then you could migrate gradually to using it as a relational db.
You need strong checks on data integrity before you start - you could consider
double writing (to old orm using nosql + new orm using psql), and comparing
data stored to be sure you don't miss anything at first, before you switch?

------
jpeeler
Firefox seems to be doing pretty well with their incremental rewrite into
rust. I do wonder how long it will take to complete the transition versus
doing a complete rewrite instead.

------
nerdponx
Another red flag not mentioned here: the old system doesn't have an end-to-end
suite of functional test cases you can rely on.

------
lgleason
I recently left a project that demonstrated most of these traits. Usually
these things are the top of the ice-burg.

~~~
teddyh
Know your burgs and bergs. A “burg” (or burgh) is a fortification, or more
usually refers to a city built around (or inside) that fortification. A “berg”
is a mountain, or a large hill. Therefore, an iceberg is an “ice mountain”,
and a “burgermeister” is a “city master”; i.e. a mayor.

~~~
de_watcher
You forgot to mention that you should use "tip" instead of "top" in this
idiom.

Here is a video with more detail:
[https://www.youtube.com/watch?v=dQw4w9WgXcQ](https://www.youtube.com/watch?v=dQw4w9WgXcQ)

------
kazishariar
¯\\_(ツ)_/¯:'Dual commits' to the rescue! -pun intended

------
Chyzwar
The rewrite is usually when it is too late for the project. Need for re-write
mean that project maintenance was ignored and technical debt reached critical
levels.

I would start by firing people that led to this situation.

~~~
maxxxxx
" I would start by firing people that led to this situation."

You are one of those blessed people who can architect a system and the
architecture holds up for decades. From my experience most systems will end up
in a big mess over time if features get added. There is almost no way around
it.

~~~
flukus
> You are one of those blessed people who can architect a system and the
> architecture holds up for decades.

This is exactly why maintenance is needed. Proper maintenance that includes
things like updating the architecture and gradually migrating the whole system
to that architecture, rebuilding small unwieldy components, updating and
migrating database schemas as the product evolves, removing unused features.

If a product is just getting bugs patched and nothing else then it isn't
really being maintained, it's being deprecated. Unfortunately as an industry
we still think that there are distinct build and maintenance phases and that
the latter can be done with less resources.

