
The Birth of Legacy Software – How Change Aversion Feeds on Itself - whack
https://software.rajivprab.com/2019/11/25/the-birth-of-legacy-software-how-change-aversion-feeds-on-itself/
======
Bootwizard
I quit my last job because my team was in this room spiral. And management's
solution to the problem? Work harder.

As far as I know, everyone else still works there. I bet some of them sleep at
their desks trying to keep adding more knots to that awful legacy system.

And it didn't get that way WHILE I was there. That project had been around
since 1998. They were so afraid of change that they were still using the C++98
compiler. So I didn't even leave that job with relevant (read modern) C++
knowledge.

Funny thing: The server architect at my current job was one of the founding
members on that project. He had the same kind of view towards it. I feel like
that project has been in a doom spiral from the beginning. So 21 years at this
point.

It blows my mind that there are these legacy code sweat shops out there barely
holding software together like this. What a miserable existence.

~~~
romwell
>And it didn't get that way WHILE I was there. That project had been around
since 1998. They were so afraid of change that they were still using the C++98
compiler. So I didn't even leave that job with relevant (read modern) C++
knowledge.

This happened to me too!

I understand the reasons for that: it was highly specialized (scientific)
software used in computer chip manufacturing. Most of the people on the team
had a PhD in Engineering or Science, but few had extensive Computer Science
training before joining the company.

The hardware industry is moving slowly, because you don't just upgrade a fab.
Some of the clients were on RH5, so that was our build target.

Part of the codebase was in Fortran (and for good reasons too; this language
is common in HPC/scientific scene).

All this resulted in a codebase that was part brilliant, part byzantine,
poorly documented, and not even compiling with a modern C++ compiler (hence
reluctance to start using C++11 or newer).

Thankfully, the automated test suite was holding it all together; but that was
about it. As people left, they took systems knowledge with them that was
written down nowhere, and nobody wanted to do a deep dive and document the
still known parts.

Predictably, the priority has shifted to "quality" \- that is, fixing bugs
instead of innovating on the core functionality.

That was not the only reasons why I left for better opportunities, but it was
a big factor. It wasn't the legacy system that was scary - no more scary than
a hairy prototype that gets the job done, really - it's that nobody was going
to put in the effort and take the risk to start moving from a decades-old
"prototype" stage.

I believe that things finally started moving when the people there realized
that there is no way but forward; I hope they are using modern C++ now. But
that train was set in motion after I left.

~~~
BlueTemplar
Isn't "modern C++" pretty much an oxymoron, and pretty much follows the above
definition of "legacy software" ?

~~~
romwell
It's.. not?

The definition of "legacy software" in this discussion is: software that grew
so many "temporary" fixes and workarounds instead of necessary architectural
changes that it's in permanent maintenance mode, and entire parts of it are
untouchable because nobody understands what they do due to the exponential
increase in complexity stemming from the abundance of these hacks, exacerbated
by system knowledge evaporating with engineer turnover.

That doesn't apply to either C++ language and compiler (rapidly evolving in
the past decade), nor projects written in it, generally (that's not a property
of the language).

On that note, we've integrated some Fortran code that was written in the 80s
that I wouldn't call "legacy" under this definition: the algorithm was clear,
and the implementation documented well enough that modifying it, if necessary,
would not have been hard, and using it with our floor was very simple (it was
one of the flavors of gradient descent algorithms that converged much better
than several others).

~~~
BlueTemplar
> software that grew so many "temporary" fixes and workarounds instead of
> necessary architectural changes that it's in permanent maintenance mode, and
> entire parts of it are untouchable because nobody understands what they do
> due to the exponential increase in complexity stemming from the abundance of
> these hacks, exacerbated by system knowledge evaporating with engineer
> turnover

Pretty much this, what else am I supposed to think when I read this : ?
[https://stackoverflow.com/questions/17103925/how-well-is-
uni...](https://stackoverflow.com/questions/17103925/how-well-is-unicode-
supported-in-c11)

We aren't on the ARPANet anymore - I'm expecting even low-level programming
languages to use native Unicode. The way how C equates "char" to both
character and byte is fundamentally broken.

(This is also/more(?) an issue with Unicode - IMHO we should have increased
byte size to something like 32 bits (during the transition to 64-bit words ?)
to be able to fit one character per byte - the increased hardware cost for
text storage would have been quickly compensated by the decrease in developer
costs. But here we are.)

------
vearwhershuh
I've seen plenty of projects blown up with massive, fearless refactors to do
thing "the right way".

Careful with that axe, Eugene.

~~~
seren
In my open office next door, people have been refactoring a cpp98 monolith
into more interdependent components to be able to have a better test suite,
better CI integration, better deployment story. That sounds about right.

Well, the issue is that they have done it a bit sneakily, they removed all the
legacy code they haven't understood. So the code is much more elegant, it has
been moved to cpp11 or 14, it ticks every good practice. There is only one
slight issue : it does not work. It somewhat work, but is not reliable and
fails regularly in unexpected ways. And they've started 5 years ago, and
haven't been delivering any business value since then.

At the beginning, it was okay because they had some leeway but now they are
blocking the release of new products, and our market share is in free fall.

Heads have started to roll.

To be fair, a few years down the line, their team will likely be more
productive and efficient, but I am still not sure that the cost of the rewrite
was justified. Still the article is very on point on the risk of not paying
your technical debt.

~~~
matwood
> they removed all the legacy code they haven't understood.

This is exactly why rewrites or huge refactors typically fail. The new
programmer sees code and doesn't understand why the code is there and thinks
the last programmer was an idiot and deletes said code. Unknown to the new
programmer is that code handles some weird edge case.

A rewrite should spend 80% of the time understanding the old code and 20%
writing the new. But, that's no fun for most programmers who just want to code
in the latest shiny so the new code ends up broken.

~~~
atoav
Which is why it is sometimes crucial to add comments to your code. I know
certain code basically documents itself, but that depends on the person who
reads it. In my eyes it is precisely edge cases that might or might not be
known that profit from decent explainatory comments.

As a avid writer of comments I am convinced they also help myself, to form
thoughts and remember them later, so at times I will write the comments
_before_ writing an actual function.

If that is to bold, commenting while the thing is still in your head makes
sense anyways — saves you time later and helps everybody else who will look at
your code.

~~~
dirkf
And please write proper commit messages!

A short summary, preferably including a hint at which subsystem is impacted by
the change. Then explain in detail the context of the change: root cause of a
bug and the gist of the solution, use case(s) behind a new feature and how it
can be used, ...

Yes, often there's a bug/project tracking tool being used and the commit
message contains a reference to the relevant entry there. But from experience
I know these tools tend to change: old one gets decommissioned, data gets
migrated, what was once the primary identifier is now a mere field or comment
in the new system, access rights get messed up, ... Trying to understand the
history then turns into an archeological expedition through various eras long
gone... unless the commit messages are sufficiently self-containing.

~~~
BlueTemplar
The issue seems to be that that tracking tool does not have its data backed up
in the version control system?

~~~
JoshuaDavid
This seems like it could be mitigated with an integration that watches the
issue tracker for updates and then commits those updates to a docs folder on
the appropriate branch of the appropriate repo.

This would have saved quite a bit of headache at my last job actually.

~~~
BlueTemplar
Or just use a version control system with issue tracker support, like Fossil ?

~~~
JoshuaDavid
Switching an existing project to a new VCS loses history or at minimum causes
a new layer of "identifiers in the old system don't match the new one", no?

In general, I've found value in figuring out how to improve existing systems
where feasible rather than trying to migrate to a new system, since the
existing system probably has advantages the new one won't. At minimum, people
are already familiar with the existing system.

~~~
BlueTemplar
Of course - but I'm not sure why you assumed that I would suggest that,
instead of using a different VCS for a _new_ project.

~~~
JoshuaDavid
Ah. I've spent a lot more time working on projects I inherited rather than
ones where I made the initial technical decisions - if you're starting a new
project, you should use the most capable tools available.

------
collyw
I have noticed that database changes on a live system are the hardest / most
risky to implement. Also they are most likely to get to the heart of the
problem. I learned what the article is talking about through experience -
avoiding the "harder" changes and piling fix on top of other fixes until
things become a mess.

------
titzer
As always, there is more than one valid perspective on this problem.

Given the very first example from TFA, the problem is solved (i.e. the new
requirement is satisfied) by _only adding a few lines of code_. In fact, a
massive refactoring was _not_ required. I would suggest that this is actually
A Good Thing(tm) and may even be indicative a Good Design(tm). If every new
requirement requires the system to be overhauled, you're definitely in a worse
situation.

The lack of tests/testing is a wholly separate issue.

If the new requirement comes with new tests, and even better, tests for the
old behavior _and_ the new behavior, all the better: you can refactor the
system at a later time to make it cleaner and simpler and still meet the
requirements, which is even better, since you have a fuller picture of what
the actual requirements are.

Refactoring in the face of every new requirement smacks of poor initial
design.

~~~
hinkley
“Make the change easy, then make the easy change” is one of the tenets of
refactoring. You do, in fact, refactor the code with every requirement.

I really, really hated YAGNI about ten years ago but have come around a bit.
Our users are empirically insane. You cannot guess what an insane person will
do next, and trying will only make _you_ insane, too.

I’ve gotten a lot of good mileage and a lot less stress by following the Rule
of Three (architecting on the third example). I’ve learned to spot
bullshitters versus sincere YAGNI folks (but many more if the former). The
critical factor is identifying which decisions are reversible, not investing
much energy in them, do everything you can to delay irreversible ones, and
failing that make people pay attention to what they are choosing.

------
dgellow
> Because no one in the team has a great understanding of the system

Isn't that the main issue? I understand to not want to do invasive changes
when you're not familiar enough with a system, but the solution seems to be:

1\. get familiar with the system (doing small non-intrusive changes is a good
way to start to become familiar with it)

2\. do more invasive refactoring

For sure don't start directly with the invasive changes, but at some point you
need to get a good understanding of the system you are working on, or are
responsible to maintain.

~~~
ekvilibrist
The problem with "having a great understanding of a system" is that it's hard
to pin down, to define what that actually means. You can always use that as an
excuse, if you are gatekeeping a poorly constructed system and resisting
change.

If you require "full understanding of our system" for me to add some new
functionality in a module, then chances are pretty good that your system has a
bunch of problematic dependencies, no?

~~~
dgellow
The article seems to be about "invasive and risky changes" (that's from the
article). In this case I expect some level of gate keeping by people familiar
with the system. If people doing the gate keeping aren't familiar with
whatever they keep, then that's the first thing to work on.

> If you require "full understanding of our system"

I wrote "familiar enough", and "good understanding", not "full understanding".
"enough" and "good" will of course depends on the context.

Considering this:

> then chances are pretty good that your system has a bunch of problematic
> dependencies, no?

In practice you need to understand some level of the context in which your
module exist, hopefully not all of it, though of course it would be better to
be able to just focus on the module itself. By "system" I don't necessarily
mean the whole, complete infrastructure, it can be the module.

My point was that if people are blocking changes because of lack of
understanding of something, the solution would be to actually get some level
of understanding.

Edit: also, I assume good faith from gate keepers.

------
scarejunba
One of my biggest failures was while managing a team like this. In this
organization I was a fairly well-liked leader and had both successful and
failed projects and I was brought in to stabilize a core team that was having
rampant attrition, had a two-year project that had practically stalled, and
was feeling somewhat in a siege relationship with the rest of the company.

Interestingly, the whole situation _was_ almost exactly what he describes here
and a full year later I had failed to correct it (and probably made it worse).
Here's what I did so that others may learn from:

* Since the team had stalled on that project for two years, I pretty much sidelined it and attempted to get us in a habit of getting wins so that we'd feel comfortable with changes. Result: We just accelerated the muddying of the codebase. The team wasn't constantly upset anymore but the size of the improvements you could make was decreasing - a sign that you're in the doom spiral.

* Set aside time for us to perform the refactors, but allowed the team to identify the primary pain points and focus on them. Result: The muddied code-base was like an underwater cave, the silt was in established places. The attempted refactors were just more muddying. With two years of living with an on-call rotation, the team's primary focus was in trying to remove the things that caused them direct pain, not the underlying causes of them, because they felt that the time we had was insufficient to do anything meaningful.

* I championed automated testing, replicable builds, and atomic changes but failed to really make the argument since the net takeaway was "doing it that way would be great but it would be too slow". It didn't really take, but perhaps I should have had an enforced approach to that rather than an approach to make an argument that convinces. We got somewhere with the testing but it was a big effort. That cost me a little bit on the culture since it had to come top-down from me.

* I tried 'leading from the trenches' so to speak, but that was unsustainable. Things worked while I was there, working as an engineer along with everyone else, but then I had to sacrifice other crucial things that were necessary to retain the state.

In the end, things got somewhat better there in terms of attrition and big
system failures, but the doom spiral was still there. So it was like adding x
years to failure rather than putting us on the path to success. I'd like to
think that now, with some distance, I have a much better picture of what to
do.

To be honest, I really don't think that adding a slew of manual testers is
really the solution. The iteration speed will plummet because now there is one
more hinge to the arm controlling output.

The annoying bit about the whole thing? None of that is useful to me now. When
you're trying to build a startup you wish you had problems like this because
it means you are already successful. Ugh. Maybe some day it will be of use.

~~~
jrochkind1
> I'd like to think that now, with some distance, I have a much better picture
> of what to do.

OK, what, please share!! You told us what didn't work, in retrospect with
distance, what would you have tried differently?

~~~
scarejunba
I think I would have trouble expressing the nuance in a HN comment given how
much time I'm willing to spend on it.

------
mwilcox
I think the best thing to do is use the existing system as a really, extremely
detailed interactive requirements doc for a complete rewrite with modern
practices and good documentation, then run both in production until you can
switch the old one off

~~~
try_again
Complete rewrites seldom tend to work out well. Using a current version's
behaviour as a "requirements doc" rather than the code or an actual up-to-date
document probably misses out on a good deal of functionality that is rarely
used but critical in some conditions. The thing about legacy software is that
the software has encoded in it years or even decades of changes to functional
and technical requirements, workarounds for edge cases that weren't foreseen,
bug fixes, optimizations. Each of those can appear like a mess to the observer
but it is foolish to think a new system will not run into similar issues. The
idea of starting over to "do it right" is a fallacy because perfection does
not exist. This is why the software industry is increasingly focused on
processes that reap the benefits of quick turnaround and making change easier
to deal with.

Believe me when I say the day you switch the old system off will never come in
the vast majority of scenarios like this. The result is either wasted
development time or, probably worse, you now have two systems to maintain and
keep running.

~~~
rusticpenn
That is basically an example of sunk cost fallacy. Yes, those bug fixes are
important, but its easier to find them and fix them again in a tool with
proper architechture. The idea is to run the original and the redesigned code
parrallely and find the differences in behaviour and fix them.

We recently moved from a heap of matlab code which was started with a student
project in 2001 to a huge tool used in the industry today to a new
implementation in C++ and python. It has been a huge sucess with our
customers.

~~~
mattmanser
It's not a sunk cost fallacy at all, sunk cost is throwing further money at
something that already failed. If the code is being used, it obviously
succeeded. The product works. Otherwise it wouldn't even be worth rewriting.

The reason why people warn against rewriting is that it's a risk, a gamble,
and often a conceit by the programmers. Programmers will also often
spectacularly underestimate how hard a full rewrite will actually be.

You're taking something that works, and attempting to recreate it. You can
find lots of examples where rewrite projects went spectacularly wrong. A
commonly cited example was the Netscape rewrite (which killed a hugely
successful company).

Your gamble paid off, but it's almost always the worst decision you can make.
There's even examples in this thread of when rewriting goes wrong.

~~~
rusticpenn
Of course it depends on several factors. In our case, we were having a mess of
spaghetti code, which crashed at important moments and there was no easy way
to locate the issues. Sometimes, when your code required prayers and
sacrifices to appease the gods and devils, then maybe its time for a rewrite.
However, there are programmers who just want to rewrite because that think the
old architechture is bad , when it is mostly fine.

~~~
goto11
But the forces which caused the current code to end up in such a sorry state
will still be in force and cause the rewrite to end up in the same place.

If you have reason to think you can do better this time - e.g. the team have
learned how to avoid unexplainable crashes - then you could apply this
knowledge to fix the issues in the current code, which would a much less risk
and less time.

~~~
rusticpenn
In our scenario, The tool had no requirements. A student project did something
which a team found useful and saved money. More students were given similar
projects. Then an engineer comfortable with coding glued the code from all
these students. This happened for 17+ years (converting several person months
of work to few hours). The team lead for this team got promoted to a very high
position in the firm and he hired a software team to redo everything with
coherent code.

For us the first project acted as requirements analysis. In most cases bad
software is mainly because of lack of proper requirements. In hindsight, its
easier to make complex tool coherent.

