
Ask HN: Have you ever inherited a codebase nobody on the team could understand? - ironmagma
How did you deal with it? (Reverse engineer requirements + rewrite, convincing higher-ups to cut ties with the code, something else?)
======
kokokokoko
I would be careful with some of the responses here. Over my career I've found
that a significant subset of developers struggle with unfamiliar codebases.
Sometimes this has to do with their experience being mostly with greenfield
projects and other times it is because they have not seen a wide array of
different work created by other people.

But sometimes it is good old fashioned workplace politics. It is risky to take
on an unfamiliar codebase as any problems are now your problems in the eyes of
management. And it is, practically speaking, impossible to account for all
edge cases and surprises that may exist in a legacy codebase. Therefore,
framing the codebase as terribly written and a total disaster achieves two
things politically. It helps set up blame for any issues in the future on the
previous developers and their "terrible codebase" and opens the door for a
much more enjoyable and lest risky greenfield rewrite.

My number 1 red flag of working with a developer, unless they are very early
in their career, is hearing them describe a codebase as awful. Most really are
not that bad and are usually just using unfamiliar and less than ideal design
patterns and coding practices.

~~~
ConceptJunkie
I don't know about you, but I've rarely worked on a codebase that _wasn't_
awful in some way, and I am definitely not early in my career. I've come to
the conclusion that most programmers are simply awful at their jobs, and the
developers that can write clear, concise code are a small minority. I've known
a few, but not many.

~~~
maxxxxx
Especially in big companies it's really hard to keep your code concise. I have
often started with a very clean design that totally feel apart when new
requirements came in. If you give managers the choice to hack it into the code
in 1x time or redesign the code in 2x time most of them will vote for the
first option. Go through that cycle a few times and your code will be a big
mess

~~~
jeffwass
This 100%.

I’ve worked on several codebases over the years where a mgr said “here’s a
special case, inputs of (foo, bar) should give baz2 not baz1. Coder whips up a
hacky workaround _to that very specific ask_ in the simplest place possible,
not the right place. A few dozen requests come in like this over the months.
Overrides and exceptions are now scattered all over the code base, without any
semblance of order, functions no longer do what they advertise because data is
modified downstream, etc.

Sometimes the time to redo it right is worth the investment in terms of future
maintenance. Other times it’s too foregone and you have to bite the bullet and
stick with the mess.

~~~
jehlakj
It's hard to blame them. I've been in similar situations multiple times where
a requirement seems simple enough on the surface that you only get a few hours
to work on it. But it's actually not that easy to write it cleanly.

That hacky workaround you're talking about? Well, my task depends on it. The
workaround too depends on another hack. With the amount of time I have to work
on it, it's a no brainer.

Earlier in my career, I tried to fix them, but they started to break other
"hacks" and down the rabbit hole I go. Sometimes it's just not worth it.

------
hackernews31242
I’m a consultant and make a living saving bad projects. That’s literally why I
get phone calls for work. Keep in mind that I work in high level modern
languages, I’m sure there’s some crazy proprietary cpu running a robot in a
Detroit factory.

In any case there’s never been something that I’ve run into that I’ve not
figured out. It takes time, and the hard part usually is not figuring out what
it does but the weird edge case of the moon aligning with Venus and then the
output suddenly changes. This is why understanding the requirements is more
important than the code. I don’t care if the code is bad if I can more or less
write a test case against it and make sure it does that.

That said complete rewrites never happen. It usually is only rewriting
portions when that is cheaper than fixing. It is the if it isn’t broke don’t
fix it adage.

The only time ever I’ve been stuck was when I saw a proprietary software
implantation on top of a custom software package ontop of Solr (technically
ontop of the JVM) create a large object heap issue ontop of a proprietary OS
(Windows). It wasn’t code related I diagnosed that it was a GC issue, but it
wasn’t in any code I had source to. A Windows update ended up fixing it. And
this is why working with enterprise software is hard.

~~~
linuxlizard
Do you have any specialized tools (code navigation tools, for example) that
you use when first encountering these large piles of code? I'd love to hear
some recommendations; I have to deal with large (only sometimes bad, but
always large) piles of vendor code. I'm currently staring a pile of 900kloc of
pretty nice code but it's a /lot/ of code.

~~~
kjeetgill
Not OP, but I imagine this is fairly language specific. This is where Java
shines. Keep in mind I mostly work with services not applications.

My process looks like this:

Step one: Identify sources of reflection, this is the triskyest. Hopefully the
only dependencies are open source, so you generally know what they do and grep
can usually find the rest.

Step Two: go code spelunking. Find your entry points. Find your main() or
framework equivalent. Find callsites for rest endpoints, rpc, jmx, etc.

Step three: find other "external request processing" endpoints. Do you have
timer threads? Reading a Kafka stream and acting per record? Etc.

Once you understand those, you can interpret where most any stacktrace is
from. Good old Intellij or Eclipse can give you all the callsites for a
functions as you root around. You should slowly get a feel for which part of
the code things get called from.

Now start asking questions like: what data is shared between these entry
points? What's mutable? Is it all done safely?

Hopefully this wasn't too narrow an example. I'd imagine it'll apply to any
services.

~~~
linuxlizard
I'm working in Linux wifi drivers, all in C. Giant complicated protocol with
giant complicated code. I've been tinkering with Microsoft VSCode, CLion, and
Sourcetrail. Vim + Ctags seems to work well in the beginning but only gives
pinpoint answers (can find trees, but little view of the forest). Still
experimenting.

------
rdiddly
I did - please bear witness while I work through my trauma. Variable names
were actively obfuscatory, and their declarations were always many hundreds of
lines distant from their use. And many weren't even used at all. Or some were
assigned values that never got read or used for any purpose. In a language
that generously provides many useful data types, booleans were being converted
to strings so he could check whether it was equal to "true" (like t-r-u-e, the
literal string). No generics: instead, arrays everywhere, even in cases where
the number of items wasn't known ahead of time. (Just size it to 10,000 x 500
and hope you don't overrun. It eventually always did. So he would bump it up
to 20,000 x 600 or something.) No modularity... everything just one big
function. And if you need to do the same thing somewhere else, just copy the
code over there. Oh but make subtle changes to it so that the two copy-pastas
diverge from each other in subtle ways that could've easily been parametrized.
Copy-pasta marinara over here, copy-pasta pesto over there. I could go on (and
on), but in short, it sucked ass. I concur with others here that understanding
the code (since it sucked ass anyway) ended up being something of a lower
priority than understanding the business processes.

That was the key actually. At first I diligently tried to understand the code,
refactor, rename, move things around... but at some point I crossed a bridge
where I suddenly understood, this code is not an asset to be protected and
cared for; it's a liability that sucks my time into it and creates not just
low, but negative, productivity. And if I don't kill it now, it may suck other
victims in. By that time I understood the business processes better and had
better implementation ideas anyway, so I began "deleting with extreme
prejudice" and rewriting. Even so, I still had to read and understand the code
I was deleting or replacing, and was constantly thinking like this:
[https://www.youtube.com/watch?v=vbr9akNELdc](https://www.youtube.com/watch?v=vbr9akNELdc)
(Yep that's Airplane! and you probably won't recognize the actor who 28 years
later would play hitman/bodyguard Mike Ehrmantraut on Breaking Bad.)

Eventually I re-did the important parts and let the rest fall by the wayside.
I didn't attempt to duplicate all the functionality embodied in that mess. It
took 2 years but I remember the day I finally deleted the last piece of
shitcode. Management was behind me all the way, because they were aware of
some of the issues with my predecessor, plus over time they've become content
for some reason to kind of just take my advice about this and other things
almost implicitly. So I was lucky in that way; I was free to tackle it how I
saw fit. They didn't really have much choice anwyay though, for the kind of
money I was making at the time! (not great)

~~~
naikrovek
Code that can't be quickly replaced is a liability, in my mind.

Your example is an extreme case, and I think that even in non-extreme cases,
code that has inertia is bad. The inertia is just not AS bad as the inertia in
your story.

If you can't, (as an example,) wrap your mind around a codebase that
implements a bit of business logic, in a day, that code is too complex. It
should be replaced with something that can be understood quickly, and changed
quickly, when the person who maintains it today, dies, retires, or goes on
vacation.

(In the above, substitute "day" for whatever period is suitable for your
situation, and "person" for whatever employee-unit is suitable for your
situation; "lead dev", "team", whatever.)

Software has more inertia than hardware, and that is INSANE. There is software
at my employer that is OLDER THAN THE FUCKING X86 ARCHITECTURE, and has gone
untouched for much of that time.

"If it works, why touch it?"

A business needs to understand what is running (for legal and other reasons)
and needs to be able to fix it, change it, or replace it very quickly, because
if it is implementing a business process, _it is important for business_ , and
if it is important for business, then it needs to be able to adapt as the
business adapts.

Some things change very little, true; that doesn't mean everyone who worked on
it should be allowed to retire, then rehired as contractors 20 years later
when the mainframe hardware finally goes out of support.

------
orf
I know of a very popular London startup who's entire database is in Spanish
due to the initial dev work being outsourced to a development company in
Spain. All table, column, and procedure names are in Spanish. A refactor is
too risky and they are growing too fast, so all the engineers have to pick up
basic... programmer Spanish?

That's a more literal example of not being able to understand the codebase I
guess.

~~~
j1elo
I'm spanish and loathe spanish-written code. It looks so unprofessional to the
eye. Luckily in all spanish companies I've worked on they had an english-only
policy for the code and comments. This was because the companies didn't
discard that foregin devs could join at some point in the future and friction
should not come from a lack of Spanish skills when reading /writing code. It
made however for some funny comments from people who didn't really know their
way to prose writing in English.

~~~
hadrien01
At my company we're finally trying to write code in English (much to the
despair of way more developers than I would have expected), but one serious
difficulty we've encountered (appart from people that don't know how to write
correct sentences...) is translating domain-specific terms. Either it seems
like bad translation, or it's not understandable at all.

I don't know what the solution should be in that case? Keep domain-specific
terms in French?

~~~
Carpetsmoker
I've also ran in to the reverse problem: mixing English and Dutch in confusing
ways. For example, a common naming pattern in is `GetFooByID()`,
`GetFooByName()`, etc. I think it makes sense to stick to that, but if `Foo`
is a non-English word it just gets confusing/inconsistent, especially because
in some cases you would translate Foo and other cases you don't.

On the other hand, translating domain-specific terms can also be very
confusing. For example at my previous position we built a rental contract
system that was very specific to the Dutch rental system/laws. We translated
everything to English, but a lot of stuff is just funky because it's a
specific Dutch term without a real English translation (for example names of
specific laws/procedures).

My advice is to give up, destroy your code base, and become a sheep herder.
You're screwed no matter what.

------
elevation
Rafactor, test, repeat.

I contracted for a company that had no software team, and had outsourced the
development of their embedded product to the lowest bidder. The original firm
had delivered code that met most requirements, but were not willing/able to
resolve issues with random crashes or implement additional features. The
product manager reached out to my employer at the time for help.

When I took over the code base, my initial attempts to modify the function of
the code rendered the device completely non functional, so I focused on
restructuring the code without changing its behavior. I moved code into
functions, functions into libraries. I added parameters to existing functions
so that global variables would have to be injected, rather than accessed
directly (this helped make it clear what the inputs and outputs to the
function were.)

Eventually, I modeled state external to the system with state machines so that
it would be clear when code was trying to manipulate a resource that hadn't
been initialized yet. (This helped make some bugs stand out like a sore
thumb.)

Through incremental changes and testing at every step, this refactoring made
the structure and flow of the program much easier to understand. After only 2
weeks of refactoring, I was able to identify and fix the bugs that had been
causing the random crashes. I was also able to add new functionality to the
well structured program in a fraction of the time it had taken to do the
initial refactoring.

The best part about restructuring/refactoring code is that even after totally
reorganizing the entire codebase, I still only had a high-level understanding
of how it all worked; I didn't have to personally grok every requirement or
fine detail as I would have needed to do if I'd rewritten the code from the
ground up. Refactoring was slow going at first, but it really saved the day.

~~~
informatimago
Well, if the code base is awful, it won’t have test to begin with... what do
you test when you only have the bad code, no doc, no specs...

------
stcredzero
Yes. It was a vital subsystem in a Smalltalk program. While most of the
project had passable Object Oriented organization, this one subsystem had zero
instance methods and zero instance variables. Instead, it was copy-pasta after
slightly modified copy-pasta of these long methods that called each other
recursively. Each one of these methods used a "merging" style algorithm that
incremented 4 indexes into arrays, all the while executing deeply nested
conditional logic.

There were some smart cookies on the team, but we were all in fear of this
code. The person who wrote it spent her days sitting in the cafe downstairs,
reading novels. She'd check some logs, and occasionally come and yell at us
for doing something wrong, which she would never explain. It turns out that
the system had objects, but they were all embodied by consecutive spans of
entries in those arrays. We went on a trek through the bowels of this
corporation, asking around for documentation of the 3rd party software that
used those sequences in those arrays, but no one had it, and that company no
longer existed. If you tried to explain to her that Object Oriented code would
have instance methods, she'd always bring up her PhD in math.

A coworker of mine spent a week charting out one of those methods, and managed
to rewrite it in 1/6th the line count, with no errors. However, that didn't
really help, as it still implemented the same weird merging algorithm.

~~~
andjd
Cool. I don't think I've ever heard a war story about a SmallTalk codebase
before -- I have always gotten the impression that SmallTalk was a
research/toy language similar to Haskell that is held in high regard but
rarely used in production.

~~~
stcredzero
At one point, 80% of Fortune 500 companies were using it. No capital T. (This
is how we used to identify the pointy hairs.)

------
osrec
Yes, most of it. Most (all?) code in a pressurised business environment
eventually ends up in a bit of a bad state because technical perfection and
maintainability are rarely what the devs are going for. They're just trying to
"get it working, and now". How I've seen devs deal with it successfully:

1) Complain early and loudly about past mistakes from other devs, so that
management know delays are not your fault. Once you've made enough noise,
attempt step 2.

2) Reduce the scope of changes significantly. Management want X, but you
explain only 10% of X can be done in the available time given the current code
base. If they accept, great - you're touching less existing code, but may have
to dig around a bit. If you can convince management that the reduced scope
offers little business value, try step 3.

3) Push for a rewrite. Get budget, get resources and eventually deliver. If
you're a good dev/dev manager, you may even get to be the hero that delivered
something that works amazingly (aside: a lot of devs take on too much in this
phase and often burn out). If you do deliver, you'll be worshipped as the
authority on the system for months/years to come! Happy days!

Eventually, however, even your beautiful rewrite will decay into festering
spaghetti, as random requirements get incorporated. You may even,
deliberately, introduce complexity into your code to justify/protect your own
job. At some point, the very thought of diving into your own code may fill you
with dread, and you'll start searching job boards.

The cycle then repeats.

~~~
zem
"After me cometh a Builder. Tell him, I too have known."

[http://www.kiplingsociety.co.uk/poems_palace.htm](http://www.kiplingsociety.co.uk/poems_palace.htm)

~~~
osrec
In a nutshell: people who make stuff are critical of other people who make
similar stuff :P

~~~
rzzzt
People who repair stuff (or people) are even more critical of the repair work
done by their peers.

------
thrower123
That is essentially what I was hired out of school to do. I walked into a
massive heap of ASP.NET (with VB) and MSSQL stored procedures that didn't
really work at all, and had been through the wringer of a few cut-rate
outsourcing groups. I struggled along with it for a few months figuring out
how it was supposed to work and trying to duck-tape it together, doing a lot
of support with customers that were trying to use it, and talking to them
about what they were trying to do.

Then eventually I decided I wanted to learn some newer tech, so I started
playing with Linq-to-SQL and ASP.NET MVC and Razor and Bootstrap, and over the
course of three or four weekends and evenings I did a ground-up rewrite of the
whole thing for fun. After a a bit more time flailing away with the old mess,
I showed my side-version of it off to my boss, and it wasn't that hard a sell,
being much prettier and less buggy.

It helped that there was nobody around who was invested in the old code base.

Generally, I've found it is a lot easier to effect this kind of change if you
just do it stealthily and present it as a fait-accompli, because otherwise
people get so bogged down in debate and fear that any impetus to actually take
a risk and do something evaporates.

~~~
ams6110
If you can rewrite it in 3 or 4 weekends, it's really not that complicated.

Either that or significant chunks of it were retained and your improvements
were mostly cosmetic.

~~~
thrower123
It really wasn't very complicated... basically just some queries to search a
database and a front-end to display results.

I still don't know how the preceding mess got to the point of being such a
baroque cobbled-together monstrosity, but it was enough copy-pasta to keep an
Olive Garden supplied for a year, and a raft of incomprehensible stored
procedures that did SELECT * FROM table WHERE xyz in bizarrely complicated
ways involving multiple casts and nested temp results.

------
Jach
The book _Working Effectively With Legacy Code_ has a chapter titled "I don't
understand the code well enough to change it" and another "My application has
no structure". They both provide some techniques to get the understanding.
Personally, I think if you don't understand a codebase, and you need to, you
should start with making an attempt to understand it before you do anything
else (reverse engineering, etc. -- how could you convince higher-ups to get
rid of it, or rewrite it, if no one understands what it's doing in the first
place? What exactly are you getting rid of?). Sometimes deleting things and
seeing who complains / what tests fail can help, sure. Anyway, the book
doesn't offer anything too mindblowing, but I've found it helpful. Make
diagrams of the system (they don't need to be formal, start with just writing
down each important-looking thing you find, and noticing important-looking
relationships), print out code and mark it up, deleting any code you think is
dead code, do some scratch refactoring (extracting methods, moving things
around, generally making tiny bits of code clearer in the hopes that
eventually the larger program will become clearer too) that you don't actually
need to worry about checking in, and there are a few methods of explaining the
system you can use to verify that you're actually beginning to understand it
and where you need to focus more efforts (telling the "story" of the system,
describing things with a type of naked CRC technique)...

------
EpicEng
I was hired for this exact reason at my last company and tasked with rewriting
it while maintaining bit-for-bit identical output.

The company itself was in biotech (cancer diagnostics) and was relatively new,
spun off from a rather well known research lab. They quickly realized that
their system was incapable of scaling (or being maintained properly...) to the
needs of a business.

The code itself was written primarily by one man. He was quite bright, but not
really a software engineer by trade. Usual stuff; no source control, mish-mash
of technologies, spaghetti code, and home grown algorithms and hardware to
solve well understood problems with available solutions. New management wanted
me to rewrite everything in C# (against my protests. Not because I dislike C#,
only because the code dealt primarily with image analysis and hardware
control/robotics.)

I began by doing exactly as you proposed; I reverse engineered every bit of
code. I took extensive notes (it was hard to follow) and walked through each
step from sample prep to image acquisition to analysis to result. I started
writing each sub-system only after I understood how everything pieced
together.

The real bitch of it was that the original developer relied on automating
ImageJ for nearly all of the image analysis. As my original requirement was to
not alter the results in any way, I literally rewrote large swaths of ImageJ
in C#. Bugs and all.

Well, turns out ImageJ (Java) is compiled with /strictfp. C#/.NET does not
support this, so my floating point results were oh so close, but not
identical. This was initially a problem for management... until the CEO was
replaced, along with my boss, and the new team thought the entire project was
a dumb waste of money and had me build a new system from the ground up.

That system was released (successfully) early this year. I began work on it
nearly five years ago now, with many detours along the way. I now work
elsewhere.

~~~
koala_man
>maintaining bit-for-bit identical output

The worst projects are those when the company doesn't really want to upgrade,
so the only requirement is "make it exactly like the old system"

In my first job out of college, I upgraded an approval workflow engine from
VB3 to C#. It was written by someone who had never heard of state machines, so
it had a weird ad-hoc design that would e.g. get confused if two documents
were in the same state at the same time.

I demonstrated the bugs and suggested an alternative approach that would be
simple and robust, but management wouldn't have it. They reminded me that the
job was to make it exactly like the old system.

~~~
taurath
If those calculations are business rules, then you absolutely do need to make
it output the same as the old system in those ways because its the foundation
upon which a bunch of assumptions are made.

~~~
EpicEng
They were primarily intermediate values used in the overall process of finding
a certain type of cell. The algorithm needed improving anyway in order to be
production ready, so in this case identical output was not necessary.

------
odonnellryan
Yes... but the company did not use it for long.

Large insurance business paid a previous contractor to write up a simple web
app to consolidate public rates.

So this business could go and answer questions like "how much is my
competition charging for XYZ?"

This "simple web app" turned into the previous contractor writing his own
insane web framework from scratch in Python, because I guess Django or
something was not good enough...?

Anyway the result was something that was almost impossible to read, had who
knows how many security vulnerabilities, and was an awful experience for the
insurance company.

Lots of times implementing a new feature meant changing the web framework so
you could actually implement it.

Company already spent who knows how much on the previous contractor, so after
a small cost working with me to evaluate what else was needed to complete the
project they decided to go another direction.

I planned out how we could migrate the app slowly over to Django (it was a SPA
as well, but he did at least use React there not something crazy he wrote
himself) but they didn't have the budget.

Unfortunate. Could have been a really cool tool for not a lot of money, and I
would love for businesses to be more eager about developing such products. The
concept was the perfect example of a business-specific use of software to give
the corporation an edge.

~~~
Sileni
This sounds like a great opportunity to just go build the thing they needed
then offer it back to them as a service or for a licensing fee. Then you have
the option to offer it to other companies as well.

~~~
perlgeek
This is usually quite risky and difficult.

You might not have access to the data sources, you might not have the context
to interpret them in a meaningful, and then the company you are targeting is
probably your only potential customer (or you have to cold sell to every
competitor, so now you have a sales job, right?)

~~~
odonnellryan
Also, after a project fails the business is always close to 0% likely to work
with you on it, even if it isn't your fault (I was brought in as a clean-up
man and my billing was insignificant).

------
smilesnd
I beat it like it stole something from my mother. I write comments read it
over and over and make changes where possible. I stay glued to it like it is
my new found bible and become a guru of the code base through blood, sweat,
and tears. It is the only way to handle new code not to hate it, not to blame
others, and not think you could have done better. You befriend it accept it as
it is and move forward with the best you got.

~~~
jcul
This is the attitude man. I love huge legacy codebases, dive in and get dirty.

This is the job, just shut up and get on with it.

------
mcqueenjordan
Yes, on several occasions.

Don't do rewrites. Morph the code towards "North." Old code has a reason for
existing and being correct: It's there. From an evolutionary and survival of
the fittest model, things that are out there already have a lot going for
them. The ugly warts are battle scars of nasty edge cases and bugs.

This is one of the main reasons I strongly value the skill of reading code in
engineers. It's rare and so important.

~~~
IggleSniggle
Good code is easy to read and change, and thus it is changed until it becomes
bad code.

~~~
LeonB
The Peter Principle for Code.
[https://twitter.com/secretGeek/status/1067879315298123776](https://twitter.com/secretGeek/status/1067879315298123776)

------
VPC
I once inherited a compiled executable.

And source code that was demonstrably older than the binary.

It was written by a contractor, who was blackmailing us for the up-to-date
source code.

In an FDA-regulated industry.

How did we deal with it?

We pretending nothing was wrong, and prayed that no show-stoppers would
happen. We begged for permission to rewrite, but it was deemed to be too
expensive.

And I left that company as soon as I could.

------
Alex63
Maybe a little more clarification? When you say that nobody on the team can
understand it, are you primarily talking about code complexity, code style, or
language?

Many years ago, I became the designated maintainer for a legacy inventory
system that was used for internal audit purposes. The system ran on a
PDP-11/70, and was written in a combination of Basic-Plus (this was a _long_
time ago) and COBOL. I only had a passing familiarity with either of those
languages. I find the challenges of learning a code base usually boil down to
a few recurring problem areas:

* Problem domain knowledge: The system was used to track leased telecommunication facilities, and had a lot of obscure business logic built into it. At least half the challenge was reverse engineering business logic from the code.

* Coding style: I always find it challenging to get comfortable with another developer's style. Mismatches in assumptions/preferred approaches can make it really hard to get comfortable with someone else's code. This particular system had some truly weird programming choices, including a "screen driver" (similar to curses) written in COBOL.

* Code complexity: I was pretty lucky that the code was not very complicated. With all the other challenges, if the code had been complex it would have probably been an impossible challenge.

* Language knowledge: the original developers had used some features of Basic-Plus and COBOL that were a little obscure, which made understanding the code base that much harder.

------
striglia
Inherited a codebase that ran nearly all revenue-critical operations, and
operated at its core on some of the most metaclassy/dynamic tools Python has
at its disposal.

Luckily I work in a tech company where the fact that nobody knew this code and
it took years to effectively ramp up on was argument enough that it should go
away.

Actual removal was a much messier story. It had tangled deeply into adjacent
systems so you couldn't "just replace it". We are in the later phases of
something like the Strangler Pattern ([https://docs.microsoft.com/en-
us/azure/architecture/patterns...](https://docs.microsoft.com/en-
us/azure/architecture/patterns/strangler)) where we built higher-level
interfaces over the top and gradually re-implemented the underlying
functionality without using any of these custom frameworks.

That said, it's a long term project that is easy to lose steam on. It's been
very important to regularly revisit our goals and how we're attacking
them...AWS has released services that fundamentally changed our approach (for
the better) in the years since this effort started and we've probably cut off
at least a year from the overall effort by adopting those instead of
continuing on the original course.

I wrote up some of these ideas about accomplishing big projects that span
years at [https://medium.com/@scott_triglia/ask-a-tech-lead-i-have-
to-...](https://medium.com/@scott_triglia/ask-a-tech-lead-i-have-to-make-a-
technical-decision-but-i-cant-know-the-right-answer-4c674f9f4a74). The parts
about regularly re-evaluating the next steps in your course of action were
directly inspired by this project I just described.

------
pkaye
A coworker really wanted to be a technical lead on a firmware project so our
boss gave him one. Part way through our boss asked me to help the coworker out
but I had a tough time understanding his code. Soon enough he admits that his
is leaving the company. He created too much of a mess and wanted to bail out.
Suddenly it was all my responsibility. And this was with an major customer we
had multiple partnerships with. So somehow I had to salvage everything without
making them aware the mess we were in. So over the next 3 months we would give
them weekly engineering builds while totally rewriting the code piece by
piece. Once we were back on track it was much smoother. It helped that our
management didn't micromanage me and the customer engineers were brilliant and
a breeze to work with. It was all about the code, requirements and doing the
right things. Our progress meetings were literally 15 minutes week. Everything
else was technical discussions and development.

------
nothanksmydude
I once joined a small company in which the CEO's son, a "7th year CS PHD" (He
had to retake some classes), wrote the entirety of an iOS app using obfuscated
C++ templates/macros (!)

During development a number of requirements changed and they had already
burned through multiple other devs before hiring me. Eventually Xcode updated
and it was required to use the new version to deploy against the latest
version of iOS, this version of Xcode was not able to process his pile of
macros in the same way as the old version. The QA team had already updated all
of their test devices leaving us with no way to test the existing code.

This, combined with the CEO's son's unwillingness to sit down and walk anyone
through the code, led to me sitting in a face to face with the CEO alone in
which I explained all of this to the best of my abilities in layman's terms.
He asked me for a solution and I said have your son fix it as he is
unwilling/unable to walk anyone through it and being the 5th dev they had
hired trying to figure it out, I put in my resignation. It was a fun 3 weeks.

I later found out through a friend who did media work for the company they
were selling this product to, that they were never able to deliver and ended
up getting sued for breach of contract.

------
jimjimjim
Often enough to consider Software Archaeologist as a role.

\- Big picture:

try to identify the integration points with other systems or entry/exit points
into the code.

See if the code is logically (and hopefully actually) divided into separate
smaller parts. If it is, try to work out the main purpose of each part, its
integration points, and if there are any obvious side-effects.

\- Detail:

Is it building, clean-building, testing etc? That will make it a lot safer to
explore and experiment.

For a single source file people have different approaches to "reading" it.
Some people add notes as comments as they go through the file, they don't have
to be permanent well formatted "Comments", rather just things to reduce the
memory strain. Other people remove blank lines, comments and extra whitespace
to try to compact as much actual code into a single screen to look at the code
paths.

\- Repository:

Is the code checked into a source control system with a log history? If so
look in there for clues as to WHY things were changed, this gives a good
indicator of changes to requirements and also can explain why some parts of
code may "feel" different to others (they may have had to shoe-horn in a new
change to an existing codebase).

\- Pragmatic:

The previous people (just like you) probably never had a chance to refactor or
clean up any tech debt.

------
logfromblammo
No, but I have been the only one on the team willing and able to diagnose and
fix bugs in that legacy codebase that everyone else was afraid to touch.

It probably still mostly works for what we want to use it for, so a rewrite is
simply out of bounds. You just plant your face in the dirt, and start plowing
ahead. You learn enough about it to get done what is needed, and get out of it
as fast as you can.

Strange code isn't all that bad if you're the only one in it. And since you
didn't write it in the first place, you can always blame anything that goes
wrong on it being awful and brittle. And you can even get that module slander
done preemptively, so that when you finally get something working, you're the
conquering hero, returning home from battle with the monster. And if you break
it beyond repair, you finally get to rewrite it. You can't really lose, except
for the torture you undergo while you are actually wrestling with it.

Aside from terminal breakage, if it wasn't worth rewriting any year in the
last 20 years, this is probably not the year, either. But sometimes you do the
reverse engineering, and find that you can replace the whole crufty thing with
3 lines and a library function call somewhere in your regular code base, and
now the execution step that used to take 3 hours takes 100 ms. _That_ feels
pretty good, in the moment. Less so when management just gives you a little
pat on the head and says, "Well done. Run along, now."

------
rossdavidh
The first and most important question is, why does nobody on the team
understand it? One possibility, which should not be overlooked, is that it was
not quite important enough to spend money on maintaining a team of people who
understand it. Just as the Big Rewrite is often not as good an option as it
seems, the Big Refactor is often not a good option either, because the
software may in fact not be valuable enough to justify the many hours it would
take to do that.

So, first off, try to make a realistic estimate for your higher-ups of how
many hours it will take to refactor this, and phrase it as "at
least...[x]...and perhaps much more". It is quite possible that you will get
back the answer, "it's not worth that". Then, you are in the uncomfortable
position of being the Bearer of Bad News.

Depending on the ability of your upper management to accept bad news, you then
either: 1) gently and politely insist to them that the situation really is
this bad, or 2) start looking for an exit

But, before setting yourself up as the person who brings bad news, get a gut
check from some teammates as to whether your estimates of how much would be
required, are more or less on target. It takes a while for organizations to
accept bad news, and you may need to let people who say "it won't be that bad"
win the argument for a while, and then circle back to it in a month or so.

~~~
ConceptJunkie
I worked for a place where if the decision was between spending X hours now,
or spending an unknown number of hours that would be likely to be 5-10x hours
at some indeterminate point in the future, they would _always_ kick the can
down the road. I can't figure that mentality out, when I would regularly
predict trainwrecks and then say, "I told you so." when I turned out to be
right, and yet, no one would listen to me. "Here's how you can avoid this
being a problem in the future." was another thing no one cared to hear. The
only way to fight technical debt was to do it in your spare time, in secret,
and then announce it when it was done... and I'd talked to others who had the
same attitude (including a manager).

I can't figure out how a company with that kind of culture could stay in
business, but it did.

I think the only thing to do in that kind of situation is your number 2
option.

~~~
rossdavidh
I have sometimes seen this, although thankfully not always. I think the best
explanation of this mindset is, that everyone else is secretly thinking of the
number 2 option as well, even in upper management. It's more common than you
think.

------
notacoward
Many times. Usually it wasn't a whole system but a component or library
licensed from someone else, where "someone else" was unresponsive or out of
business. In one case, the guy who'd written it used to go on month-long
wilderness expeditions during which he was completely unreachable.

The solution almost always involved a certain amount of reverse engineering,
and damn sure I always tried to advocate getting away from software we
couldn't maintain effectively. The third leg of that tripod is _isolation_.
Reduce the number of things that depend on the code, and the number of ways
that they use it. If nothing else, that will reduce your exposure. It will
also tell you what code paths are important to understand and which are not.
Finally, it can help guide implementation of tests, or of a replacement.

If you're really stuck with such a piece of code, some of the advice in a blog
post I wrote about a similar challenge might apply.

[http://obdurodon.silvrback.com/navigating-a-large-
codebase](http://obdurodon.silvrback.com/navigating-a-large-codebase)

That's about learning a codebase that's notable mainly for being large, even
if the original developers are as available and helpful as could be, but
looking at it now I see quite a bit that applies to this case as well.

------
debt
"but understand the business that the code was being used in/by."

From my experience(currently on a 10MM line of code, 15 year old codebase) in
software the code is the business so it's important to understand it.

First trap: "Oh, I'll just document everything."

I could say just do it, but do it for yourself. I could say that it won't be
maintained over time and it'll rot and may do more harm than good when someone
goes to reference it and thinks it's still an up-to-date understanding of the
system.

But I'd rather just tell you to not waste your time doing it in the first
place and focus on not falling into the second trap.

Second trap: Rewrite.

I could say it's easier to write new code than it is to understand it. I could
tell you to be ambitious and stay up nights and weekends rewriting some view
layer logic bullshit.

But I'd rather tell you that you're not as smart as you think, your solution
may be more complex than the currently impenetrably complex behemoth before
you, and that you should instead focus on not falling for trap 3.

Third trap: Replacement.

I could tell you there's cheaper off-the-shelf solutions available that solve
the same problem; it's simply a matter of spending money and reading
documentation.

But I'd rather encourage you to really, truly embrace the final realization.

Final realization: You are dumb and will never, ever understand all the
complexities of this system as there are too many interdependent moving parts
and other similarly complex subsystems.

 _And that 's okay!_

This is a system built over many years that consists of hundreds of thousands
of tiny decisions made by hundreds of different "very smart" individuals just
like you solving complex business problems giving the climate of the time it
was built in.

So approach it like you would approach a beast in the wild, with caution and
grace.

Heroes are at the morgue, and just make sure the thing doesn't go down.

------
moltar
Yes, it was a mega legacy codebase written by a single person over the span of
a decade and was extremely “job secured”. It had Perl scripts that would
system call to php scripts that would in turn do a curl request to another
http perl script that would system call another php script that would output
HTML, which then would get parsed by the calling scripts several ways. That
was just one place. There were lots and lots of these problems.

Our team was handed the project to rewrite it. It was a secret project and we
were very careful. We didn’t want to spook the original developer.

It was a lot of hair pulling and tracking of the code. Lots of gruelling work.

~~~
Jach
Would love to know what happened once you were done, especially was the
original developer still around and did they have a funny panic mode upon
realizing the job security rug had been pulled out from under them?

~~~
moltar
Eventually the dev got a whiff of it. He was let go abc offered, I’m guessing,
a lofty consulting position for a year. This was the exact scenario they were
trying to avoid, but at least we pushed it back to as far as we could.

The project eventually went from horrible to interesting once all of the
legacy was dealt with.

------
dang
There was a huge and really quite awesome discussion of bad codebases two
weeks ago:

[https://news.ycombinator.com/item?id=18442637](https://news.ycombinator.com/item?id=18442637)

(But the current question is different enough that it has seeded a different
kind of thread.)

~~~
sah2ed
I must have missed this in my weekly HN newsletter. Thanks for the link Dan!

------
DeathRay2K
This has happened many times to me. I work at an agency that very often gets
clients who already have a codebase but don't have anyone to maintain them -
the original developers have moved on or fired them as a client. Sometimes the
codebases aren't complete, and very often have major bugs and issues. Very
often they were written by an individual many years ago, they do not follow
any modern best practices, and more often than not have no documentation.
Sometimes even variable names are in languages that no one on the team speaks
(I've taken on quite a few codebases where variables were anglicised versions
of russian or chinese words).

Most often, these clients initially just want us to maintain the codebase,
making minor changes and updates. In this case, we will simply familiarise
ourselves with the code, working within its limitations to do the required
work. Over time we might refactor parts of the code as we do this maintenance
work because it makes maintenance easier for us.

Eventually it gets to the point where the client wants major changes (And
sometimes it starts here). If we are comfortable with making these changes
within the codebase, typically if we have been maintaining it for some time,
we will refactor what makes sense to achieve the changes, and continue to work
within it. If we don't have that familiarity with the codebase and the project
is at a scale where it makes sense, we will rewrite the code at this point.

In very few cases, the project is too large and we are too unfamiliar with the
code, and we have to tell the client we're not able to do the work within
their time/budget requirements. At this point the client will either leave us,
or have us continue maintenance. About half the clients that leave shop
around, try out another agency or two, and come back in the end with a greater
understanding of the scope of the project.

------
Ocerge
This is pretty much my current job; leading a team of engineers on projects
involving legacy codebases where the original authors are long gone. The first
thing I always do is treat working software with respect. It's easy to be a
HN-commenter-pedant and assume it's all garbage, but context is everything; 9
times out of 10 the code is responsible for a good chunk of our salaries. The
second thing I try to do is lower expectations of clients. Often times, non-
technical people will be overjoyed to hear that somebody is working on this
black box they've been fighting for years, and it's up to you to keep their
expectations in check. The third is to fix broken tests, write new ones, and
learn to use grep :)

------
edw519
Oh yes, many times! I used to joke, "Why do I always inherit stuff like this?"
and my mentor would respond, "Because companies with good code bases don't
hire very often; their people are all so happy."

How I have dealt with it:

1\. Never complain. Never bad mouth any of my predecessors. Whatever they did
wrong, I probably did somewhere else just as badly. We all have.

2\. Never be bashful about what is wrong. Be objective. Be specific. Keep
asking questions until there are no more answers.

3\. Become the new expert. Don't depend on "higher-ups" for too much
judgement. Convincing them should be about as hard as convincing a mother that
her bleeding child needs a band-aid. If they need convinced, they are part of
the problem.

4\. Long conference room tables, paper, scotch tape, and multiple colored
highlighters are your friend. Others thought I was crazy, but I have had to
paper table and walls with technical debt to a) understand better and b) have
everyone visualize what we're up against.

5\. Don't be afraid to rewrite anything with a couple of caveats: a) Do it in
less that a week. b) Do it _only_ to understand it. Plan to throw away what
you rewrite. c) If you can use what you wrote instead of what's already there,
that's gravy.

6\. Priority of what to learn: a) The data base. b) The code. I have even
written utilities to scan and label data to understand what the hard disk
looks like before ever venturing into the code mess.

7\. Priority of things to refactor/rewrite: a) Rename variables EXACTLY what
they are. This can be very difficult but will probably give you the biggest
bang for the buck. This step often opens the flood gates for everything else.
b) Remove duplicate code and modularize. c) Reduce long conditionals. 800 line
case statements suck. d) Remove early exits. The idea is not to improve the
performance of the code, it's to understand what it's doing. Multiple exits
can be very confusing. e) Fix white space, maybe more, maybe less. The process
of doing these things almost always provides better learning that just reading
it. Sometimes I have had to rewrite something and then throw it away just to
understand what we have.

8\. By the time you reach this step, you probably know more about what we've
got, what our problems are, and what the speed bumps will be in the future.
You probably won't have to convince anyone to do anything except come to start
seeing you as an excellent resource.

9\. <sarcasm> Bitch about all of the above at home at night. You may ruin your
marriage, but at least you'll still have a job. </sarcasm>

~~~
krylon
> Rename variables EXACTLY what they are.

I have this coworker who has this terrible habit of giving the most
meaningless names imaginable. Local variables are often just named "tmp". Or
maybe tmpNum if it is a number. We are writing a program to implement various
tasks as background threads, and the classes have names like "Process01",
"Process02", etc. At one place, he declared five or six constants with SQL
queries, named sql1, sql2, ...

The worst thing, in a way, is that I managed to convince him to let me rename
the SQL constants, but he insists that naming a local variable "tmp" is
actually a Good Thing so you can immediately tell it is a local variable. I
kid you not. (For Great Cthulhu's sake, I _wish_ I was kidding!)

~~~
ConceptJunkie
Naming things correctly and consistently is at least 50% of good coding, and
at 90% of programmers fail at it.

~~~
krylon
Well, you know what they say: There are two hard problems in programming,
cache invalidation, naming things, and off-by-one errors[1]. ;-)

But there still is a difference between trying to come up with good names and
failing on the one hand, and naming a variable "tmp" to indicate that it is a
local variable, because I totally cannot see that when looking at the code...

[1] I do have to admit that I totally love foreach-loops, when I was still
coding in C, I ran into / caused my fair share of off-by-one errors, they were
_nightmares_ to figure out.

------
james_s_tayler
I once inherited a large enterprise system I had to build a full working
development environment for and then learn the code base and help bring a team
up to speed.

The product was essentially made up of 5 code bases in a single repo that
built and deployed 5 different executables that worked together. That itself
wasn't so bad, it was mostly modern C# and I found it a decent code base to
work on.

What was bad was that one of the modules that we were expected to support was
written entirely in VB6. The last stable release of VB6 was in 1998. I
couldn't just download something and install it and work with the VB6. It
turns out that the accepted solution according to stackoverflow, and this is
the conclusion I came to independently too, was to go on EBay and try to buy a
copy of software/compiler/IDE/whatever and then to even get it to install and
work you need to do all kinds of things like turning off certain keys in the
registry before installation and enabling them again afterwards then doing a
bunch of other modifications to make sure that it actually works when run in
compatibly mode with Windows XP.

Our official line was we went from supporting 5/5 modules to 4/5 modules.

Fun times.

~~~
iamleppert
Do they not make Virtual Machines where you come from? Or am I missing
something?

~~~
james_s_tayler
Well no, that particular office didn't. The office I was a part of did.
Suffice to say their DevOps-fu was slightly weak compared to the rest of the
organization.

I was the one who actually set out to build the VM image.

Not a trivial task at all to get the VB6 side of it running. Which is the
exact reason we are supposed to do things like making VMs for development
environments.

It would be nice to be at all places in all points of time with perfect
knowledge in order to stop bullshit from happening in the first place but I
can only be here and now.

------
deckarep
I’ve always found that if you stare at something long enough you can
eventually understand it, and break it down into its core components. If your
business depends on it, don’t be so quick to just rewrite it and abandon
because it likely will have a ton of hardened edge cases that are baked into
it that the business relies upon.

You must study the code and document what you can. Reverse engineering it is
sometimes the only option in your quest to understand it. Then you’ll need to
make decisions about how to move forward given the constraints of the business
whether it be time, resources, other priorities. There is no one size fits all
answer but don’t be shy about vocalizing the risks and making sure the
business knows the pain points. Sometimes you have to sell the problem you now
have and get buy-in to really do something to fix the mess.

One thing I’ve realized is that when people don’t take the time to document
and write clean code sometimes it boils down to their idea of job security.

Other times it turns out the problem is there is no single owner of this code
base and it’s been hacked to death.

Lastly when something is ugly, enough and messy enough...then you can just
call it proprietary technology...I kid I kid.

------
rjkennedy98
Yes, we have a few APIs written that no one on our team fully understands. All
are about 3000 lines in a single file. It takes a day to make even the
smallest changes (which we frequently have to do because of production
incidents). All the functions take or modify maps! Its crazy. In particular we
have one function that takes in 14 separate maps! And of course there are no
useful comments at all.

------
beat
I wrote about this a while back, calling it "Conway's Aftermath". My essay
includes approximately zero solutions, but it does explain the nature of the
problem.

[https://hackernoon.com/conways-
aftermath-a014749135e3](https://hackernoon.com/conways-aftermath-a014749135e3)

------
zelon88
Not so much on a team, but when working on a difficult and unfamiliar codebase
I usually start by taking a chunk and reformatting it to suit my style as
though I wrote it. Spacing, indentation, bracketing, ect.

Once I've got it to a point where I can read it with minimal cognitive load (I
like condensed code with little to no white space and no orphan brackets) I'll
make sure it still works and pick a spot or feature in the finished
application and try to find its code. Work backwards until I've figured out
what makes it tick.

In the process of doing that I usually see how a lot of other things tick and
get a sense of how and where the rube goldberg machine starts.

The hardest part is understanding the rationale of the developer. Many don't
impart such details in the comments. For that I'll try rewriting sections with
less code than the original and see what the adverse effects are.

------
YouKnowBetter
My introduction was in BASIC (sinclair zx80), I started my profesional life as
COBOL coder, in the meantime I reversed ASM to "unlock games".

I can asure you that in every step allong the way, the code (-base) I created
horrified me 5 years later.

My point being: I appriciate the advice "throw it away and do it proper" but I
can asure you, given enough time, the next person will not understand your
solution.

Being it old languages, being it old paradigms, being it olt skool tricks of
the trade: code stales. The best advice on this subject I read here (a couple
of times) is to try to understand the requirements and take it from there. If
that is not to your liking, you proberbly are in the wrong line of work and
should try to do only greenfield stuff.

~~~
speedplane
Can't agree more here. I can't tell you how many times I've heard, "this code
is spaghetti, let's start over". I love clean architectures as much as anyone,
but good developers are not afraid to jump into foreign spaghetti code, make
precision cuts with a scalpel, and sew it all back up validated by robust
automated tests. Drop the desire for simplicity; instead learn, embrace, and
manage the complexity.

------
lkschubert8
I know this isn't what most would consider a "codebase" but at a college job
doing mostly CNC programming I had to troubleshoot problems with startup and
maintenance G Code for a 2.5 axis CNC machine that the owners didn't want to
pay to have the manufacturer consult on (smart move). The kicker was it was
entirely documented in Italian. It took a lot of meticulous documentation and
patience. It was honestly a great learning experience in terms of both reading
code and thinking through all of the outcomes of a change you made
(considering a mistake could have damaged the machine).

------
vikingcaffiene
Got handed a code base that had been slapped together by an offshore dev shop.
It was a spaghetti combo of c, php, and Java. The configuration data was
located inside a compiled c binary provided to us by the dev shop. If we
wanted to make a config change we had to ask them to do it and send us a new
binary. Kind of awkward since they had been fired due to slipped deadlines and
horribly buggy code (insert shocked face here). We scrapped the whole thing
eventually. I'm still blown away by how bad it was. Never seen anything like
it before or since.

------
ElatedOwl
Our team inherited a tangled spaghetti mess of a client facing API. There were
some additional requirements that needed to be added and we quickly discovered
some serious security issues.

We considered a full rewrite, but this was too time consuming and did nothing
to solve the immediate issues. It was also risky, we had a "working"
production app.

We ended up writing an extremely thorough integration test suite. Making
changes is still painful but we know for sure we aren't breaking anything. If
we have the time/drive to rewrite the test suite could be re-utilized.

------
jerrysievert
twice, same company, two very different projects.

the first one was a mumps project, with cache. it was written by a competent
developer, but he was using it as a way to learn mumps to forward his career
in the health care industry. the issues were more specific to mumps itself,
while trying to maintain and add features (there are thousands of articles
online about issues in mumps, if you want to fully understand the struggle).
it was eventually rewritten, with tests, and supported by a small team.

the second was for the same company, but a very different developer. this
developer despised version control, and considered foxpro the "one true
language", even after Microsoft itself had abandoned it. the codebase was
riddled with bugs, fixed in various versions deployed for various customers,
so there were a ton of misc bugs and "features" strewn throughout 20-30
"codebases", but no comments, short variable names, and poor practices. from
what I could tell, the developer had been drunk for most of the development
and any changes, and thus the original was used as a template for features and
discarded as quickly as a simple web app could be written and tested.

otherwise, the codebases that I have have inherited have been at least
understandable, but sometimes best practices weren't used, or too "clever" of
solutions were chosen instead of making larger needed codebase changes which
meant much more difficult code to maintain, but nothing has come close to
those two.

------
JoeAltmaier
Lots of good stories. How to decide, rewrite or maintain?

I liken maintaining large, old, arcane code to the care and feeding of a
dinosaur. Maybe a Brotosaurus. It's sitting there cropping the treetops
happily, farting occasionally.

You want it to do something else, you have to poke it, prod it, yell at it and
it slowly gets up and takes three steps. Then it sits down again.

You'll never get very far that way, and it'll never do much more than it does.
If that's cool, ok. If not, then its time to consider another approach.

------
stillbourne
We had a perl "guru" who wrote a number of management scripts for servers.
They were located in inconsistent locations and were completely inscrutable
because obfuscated in a manner that I think he thought was clever. Seriously,
I don't think his code was obfuscated because he was malicious, incompetent,
or "making himself too indispensable to fire." I think he thought writing
tight compact perl code that was unreadable by anyone else was an imperative
to being a good perl coder. I think it was like his philosophy of coding or
something. I mean his documentation that he wrote for how to do stuff around
the network was fantastic insightful and robust. But his fucking code was just
awful. I basically had to run every perl script we could find though the
debugger to understand what it was doing and rewrite it from scratch so it
could be maintained in the future. These days I'm an actual software developer
and when I write code I think of this dudes perl code whenever I write code
and I think it has made me a better developer because I strive for
maintainability and readability over almost everything else. Mainly because I
feel like reading his code and rewriting it was akin to psychological torture
and I nearly quit my job several times in the process.

------
CodeWriter23
This is kind of a cop out IMO. There's got to be some entry point to gain
visibility into the code, put a breakpoint there, run in debugger, single
step/step over until you get the basic idea of how it flows.

I'm speaking from experience, I inherited a project based on TaxiAnytime, aka
"Uber App Clone Source Code". What a complete clusterfuck, obviously written
in as incomprehensible a style as possible to create attachment sales for
customization.

In the PHP/Laravel code three widely used patterns stand out, the "single
return statement pattern", which obviously creates the pyramid from hell. In
every method. Add to that, the 700+ line methods. Everywhere. And the icing on
the cake, I'm going to invent the term "WET" here to describe it. That means
the opposite of DRY. Did I say three? WET has the knock-on effect of anti-
encapsulation.

I set out to reverse those patterns where I needed to make changes. In about 2
months as a single man team, I had a handle on it and was extending the app. A
team of 5 should be able to make short work of it if they can get over the NIH
and YUK factors. And yes, beware the rewrite. Coding always seems like it is
easy. Until you need a resilient functioning system.

~~~
potta_coffee
Love the term "anti-encapsulation".

------
chriswoodford
I'll open by saying I've only ever had bad experiences with complete re-writes
and these experiences have impacted my aversion to them.

"[Working Effectively with Legacy Code]" by Michael Feathers really helped me
get through a situation like this.

My recommendation is not to try to understand the code per se, but understand
the business that the code was being used in/by.

From there, over time, just start writing really high level end-to-end tests
to represent what the business expects the codebase to do (i.e. starting at
the top of the [test pyramid]). This ends up acting as your safety net (your
"test harness").

Then it's less a matter of trying to understand what the code does, and
becomes a question of what the code should do. You can iterate level by level
into the test pyramid, documenting the code with tests and
refactoring/improving the code as you go.

It's a long process (I'm about 4.5 years into it and still going strong), but
it allowed us to move fast while developing new features with a by-product of
continually improving the code base as we went.

[test pyramid]:
[https://martinfowler.com/bliki/TestPyramid.html](https://martinfowler.com/bliki/TestPyramid.html)
[Working Effectively with Legacy Code]: [https://www.amazon.com/FEATHERS-WORK-
EFFECT-LEG-CODE/dp/0131...](https://www.amazon.com/FEATHERS-WORK-EFFECT-LEG-
CODE/dp/0131177052)

~~~
lostgame
>> My recommendation is not to try to understand the code per se, but
understand the business that the code was being used in/by.

I strongly agree with this. I've done at least 4 or 5 successful complete
rewrites of old code bases, and I have found, rather than even 'business' the
word for this might be 'context'.

If you can contextualize a piece of software, it's functionality and
operations, you can have a much better understanding of an existing codebase.

~~~
potta_coffee
What would you do if the codebase was actually 5 codebases absorbed from 5
different smaller companies? Assume that zero institutional knowledge about
the code / business have been passed on.

~~~
bap
You are now in the platform business.

I have to assume someone is using the software therefore there is some tribal
knowledge of what it does? Otherwise this is maybe SAAS software that users
use and some functionality is exposed that would allow you to begin
decomposing backwards toward expected input/output. You're almost black-boxing
at that point.

I will admit that I have, on very rare occasion, scream tested a piece of
software running on a server that nobody would claim ownership or knowledge of
either on the eng. team or within the org.

~~~
potta_coffee
There's a surface level understanding of what it does but nobody really
understands how many of the large features really work, or what the actual
rules are that govern them. Yes, much of this is black box. Example: yesterday
I had to try to figure out what branch of code was compiled and deployed to
our server. Everyone had assumed it was the Master branch, but no...deploying
that branch fubared everything. I finally found the "working" branch of code.

------
tmaly
I did, I wrote lots of tests and stepped through it with a debugger to get a
handle on it.

------
aryehof
Certainly. One could consider that any large team complex system that has
functional requirements determined (and best understood) by _external_ domain
experts, is likely not understandable by software developers. This is
exacerbated by our industry having no standard way to map those complex
requirements into executable code.

The result is likely some big ball of mud, partially understood by the (now
gone) _original_ developers. One that is hacked at and refactored at the edges
by those who inherit it.

Consider an actuarial risk calculation model, a payroll system, an air traffic
control system, or perhaps something simpler like a model of a double-entry
accounting general ledger. Could you wholistically understand the code base of
a double-entry accounting solution?

Such large complex systems cannot normally be rewritten economically prima
facie. Instead if possible, typically small parts are carved off, rewritten
and delegated to, until the economics and inability to add new features
absolutely force some attempt at a rewrite.

------
db48x
Absolutely. The most fun was a program that embedded Gecko to render web
pages. There was nothing particularly wrong with the code, but it was quite
complex. I handled this simply by being the specialist who did understand the
code. Of course, the problem was fractal: the build system used Automake and
Autoconf, and nobody understood it either. That was fun.

------
lowercased
There's a big difference between 'not understanding' and 'understanding that
this really is bad'. And there's a difference between something being 'bad' as
in "not my way of doing something" and 'bad' as in "this is fundamentally
insecure, flawed in these massively problematic ways, etc".

------
sonofgod
> How did you deal with it?

I quit six months in.

~~~
onemoresoop
> How did you deal with it?

I lost most of my hair six months in.

------
ericbrow
I inherited a large database that "supported" an application, when in fact the
application layer was built into the database. I took over from a begrudging
dev who was involuntarily transferred from dev/dba to just dev. In reviewing
the hundreds of ETL jobs, I found one that started off as SQL that invoked
visual basic script in an external file share, which at some point invoked a
small obfuscated machine code script located on yet a different external file
share. Googling didn't tell me what the machine code actually did. The
disgruntled dev said that if I wasn't smart enough to figure it out then I
should quit. He had put this mess together with the idea that it would be job
security. Finally, his boss forced him to admit that the machine code stripped
a text field of spaces. The job was re-written in straight SQL and ran much
faster.

------
temporallobe
To be fair, it’s really hard to write maintainable code without support from
your org. If your infrastructure does not support automated tests or builds,
then you’re in for a treat. If the codebase was written by junior engineers
just winging it and then patched up later by more senior people, again it’s
gonna be a wild ride. What if the org doesn’t or didn’t even have peer
reviews? Even better! What if the code was never even documented or there was
never a SDD? I could go on and on because I’ve seen all of it. It all depends
on how much money, care, and resources was put into the project originally.
Customers and management typically want the best bang for the buck and in the
beginning don’t want to be bothered with things like security, high
availability, failover, maintainability, or operational costs. They just want
to get to the finish line ASAP.

------
hnruss
No. I have inherited code that is very difficult to understand, though.

What typically happens is that it is mostly left alone. As the years go by,
small changes are be made, only making the situation worse. Developers
occasionally offer to management to take the time to refactor it, but
management refuses to prioritize that work.

Eventually, new requirements are drafted that the existing code simply cannot
meet without first refactoring it. At this point, the development team has the
unfortunate responsibility to inform management that making the existing code
meet the new requirements will actually take more time than rewriting the
whole thing from scratch. The technical debt is now due.

The development team then hopes that the situation provided a learning
experience for management regarding code maintenance.

~~~
ironmagma
Interesting that this is the conclusion you've come to, because I've heard
from some colleagues that this is actually the correct approach for management
to take. Their argument would be that either way, to accommodate the new
functionality, the codebase would have to be refactored. And if it is
refactored proactively, well that will probably be in a way not compatible
with the future design, because you can't anticipate something you know
nothing about.

So it's a matter of 1 vs 2 refactors, and the management chose one by delaying
as long as possible.

------
code_duck
A couple situations have arisen for me.

One, adopting necessary libraries that are lacking features, have way too many
features, or questionable code quality. Examples have been an animated
slideshow carousel, Python and PHP OAuth, and a WordPress theme. I thoroughly
read the code for each of those, deleted the parts I didn’t need, and
literally rearranged everything.

Then there are the times I go back to my code after a few years and have no
idea what is going on. If possible, I just leave it alone if it’s working.
Sometimes I need to use a new language or have developed a better coding
style. In those cases I’ve rewritten it with a close eye to whatever edge
cases I seem to be handling in the old code.

------
hasbot
Numerous times. The worst was the time it when the only employee maintaining a
little-used product was fired for sexual harassment. I transferred to the
position not knowing anything about the product or the lack of developer
knowledge. The support person gave me a walk-through of the functionality as
did the QA person. I spent about four months figuring out the build system and
the code. For the code, it was just a matter of reading, reading, reading
until I came to understand it. Making matters worse was that a massive
refactor of the had been started but not completed. It was very stressful.

------
cimmanom
We left the servers running unmolested and un-upgraded behind a firewall
(thankfully they were super stable) until we could replace the 3
unmaintainable microservices that did something simple in the most complex
manner imaginable with a 50-line module within our monolith.

(In this case, technically one person on the team - our most sophisticated
developer - was able to decipher it. But every time he needed to touch it it
would take him 2 full days just to understand the code again. Just wasn’t
worth keeping when it was so pointless to have and so easy to replace.)

For a more complex codebase, the equation might have been different.

------
howard941
I inherited a project based on more than 10,000 lines of poorly commented,
undocumented 8088 assembly code used on a shopping-cart attached radio+LCD
panel product. It wasn't that bad an experience _except_ the product took code
updates over the air and the guy decided to use his own block checking code
instead of a CRC or even a checksum on the 256 byte blocks.

The custom algorithm was a suboptimal choice as it was prone to passing badly
corrupted blocks as correct. Worse, the first stage bootloader was in masked
ROM so a true fix wasn't possible, only workarounds.

------
Const-me
Earlier in my career. I've noticed the more years I've been developing
software, the easier it is for me to understand other people's code.

Also, when I see a codebase I don't understand, if I'm paid to work with it, I
spend my time learning the stuff. Debugging tools help: traditional general-
purpose debuggers, graphics debuggers like pix/renderdoc, network tools like
wireshark/fiddler, OS-based tools like strace/procmon. For different projects
different ones are the most useful, sometimes even custom built tools are most
useful.

------
piccolbo
I inherited a code base that other people thought they understood. People have
very different tolerance levels for complexity and lack of control. People
here complain about bad coders but the incentives out there are all aligned
against quality: short stints and glorified short-term-thinking (a.k.a. Agile)
and proprietary code. Imagine if you were bound to work on a codebase for most
of your career, were given all the time you need to do your best and had all
your code in the open, forever part of your reputation. Would you write better
code?

------
tracker1
Usually it's a big single file of spaghetti, which is the bigger issue... I
tend to try to separate logic branches (if/else) into separate functions. In
addition to this, often a big if, with no else that just returns, I'll reverse
the logic.

In then end, just like eating an elephant... one bite at a time. Eventually
you'll have everything broken enough into separate functions that you then
understand the whole better and can cleanly rewrite the whole thing.

------
jghn
Yes. What’s worse is that it didn’t work as it was supposed to.after a lot of
effort later we realized that the last remaining developer who had just left
was manually fudging things in the database instead of patching the code.

It requires going back to first principles: digging into the docs on the
systems it interacted with, interviewing users to see what they did and what
they expected, in short a nightmare

------
potta_coffee
Yes. We're allegedly "re-writing" it but the organization is too dysfunctional
to make any progress. Another guy and I are keeping the company alive
basically. I've spent many, many hours reverse-engineering and documenting
things. Sometimes I'll spend weeks to find out what single line of code needs
to be changed to fix a bug.

------
ironmagma
Thanks for the responses everyone! Definitely some entertaining stories here.
Don't worry, I wasn't thinking of rewriting one, just wondering aloud how
often this happens, and it sounds like a lot :) Really cool to hear from those
of you who deal with this all the time. Thanks for the great reads.

------
rwmj
I have not fond memories of the "alite" CPU simulator. It was widely used as a
basis for scientific papers on CPU design back in the early/mid 90s. The
source code of alite was absolutely incomprehensible, which did make me wonder
about the validity of the published results.

------
liquidise
tl;dr: you have probably not thought through all the ways this system is used.
Make damn sure you have both rollback plans and a phased release in the works.
Otherwise, all your bravery and effort will be at risk. My experience follows.

I took over a suite of flash video players/recorders for a company years ago.
The variable names were exclusively 2 and 3 letter acronyms. No one on the
team even knew Actionscript. Up until my hire, the various builds were all
considered immutable since no one had any idea of where to start. Given their
integral nature to the business, this effectively stagnated entire areas of
product and business development.

What you put in our question was precisely my approach. I did my best to
reverse engineer requirements by code review and sale people interviews. Once
i had a confident list, i got approval and killed off what i can only assume
were swaths of unused features no one knew about. Rewrote + rearchitected the
entire thing from the ground up.

The release was worse than you would have expected. Turned out there were lots
of these players linked outside of our website. These versions of the app were
ones we had considered deprecated and were unknown to everyone in the company
including the cofounding CTO. This required immediate rollbacks and hastened
development to re-add features that were being used by a subset of our longest
lasting clients.

While the releases themselves were rocky, the entire effort was an
unquestionable success. The effort consolidated all the features into a single
reusable player and opened up years of feature dev that enabled millions in
new sales.

------
scarface74
Never rewrite existing code.

[https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

~~~
JustSomeNobody
It's very sad, but understandable that you're getting down voted. It's
understandable because most devs want to put their own stink on a project so
rewriting code the "right way" is a way to do it. Problem is, they get 6, 8,
10 months in and figure out the same thing their predecessor did and leave.
It's sad, because they don't realize that if a piece of code is out in the
field and it's working, then you really shouldn't do a wholesale rewrite.

~~~
scarface74
Exactly. Everyone thinks that the predecessor is an idiot and they can do it
better. Every time you throw away existing code you lose business knowledge.

~~~
rpeden
I agree with you in general, but I think a better rule would be to _almost_
never rewrite code.

I know that everyone thinks their predecessor was an idiot. And I agree that
full rewrites are usually a bad idea. But sometimes, the predecessor genuinely
_was_ an idiot, and the code really is that bad. If the use cases and inputs
and desired outputs are well documented enough, a full rewrite can be the
right choice.

It depends on the size of the system, too. A full rewrite of something as
complex as Netscape is asking for trouble. If it's something that can be
rewritten in a day, or even a few weeks, it might okay to go ahead and do it.

Of course, by the time you've got enough experience to accurately estimate how
long a rewrite will take, you've probably got enough experience to just
refactor the old code without going insane.

~~~
scarface74
If the inputs and outputs and the use cases are understood and it’s akready
working, create an anti corruption layer and treat it like a 3rd party binary
blob.

------
andirk
Yes it's called Big Fish Games: Casino.

It is a very popular fake money gambling app. Most of the code is very old and
no one knows how it works. The only work actually done is fixing bugs that the
previous bug fix caused.

------
wglb
Yes. And the author did not understand it at all either.

The code had to be destroyed and started from scratch. It was absolutely
unsalvageable.

------
segmondy
1) Get it to run in a test/local environment.

------
olooney
Yes, frequently. A couple of highlights:

PostgreSQL allows C extensions. A function which used to work was now
segfaulting occasionally so needed to be fixed. Original author was gone,
nobody really understood the PostgreSQL extension system (which involves a ton
of macros.) No version control, no useful comments, no spec. I read the code
(the core of the logic was straight-forward C, it was just the interface to
PostgreSQL that was hard to understand) and wrote a new version in Python
(another language for which PostgreSQL supports extension functions.) That
version was too slow, by two orders of magnitude. Went back to the C
implementation. Did a little light refactoring to separate the core logic from
the PostgreSQL interface. Then wrote a test harness program in C to drive call
only the inner function. Wrote unit tests until I reproduced the segfault.
Fixed the C code and tested it with my test harness. The PostgreSQL wrapper
just called the (now correct) inner function so it now worked too. Checked the
fixed code and unit tests into version control.

A PhD (no longer with the company) had fit some simple neural net models in R.
He'd written his own code for this because, according to him, the standard
packages in R didn't support a few of the bells and whistles he'd wanted like
ReLU activation. Not only was the code in-house, but the models themselves
were saved as serialized R objects. No specifications for any of this stuff.
Apparently the company had been using this code to score medium size databases
for several years. When we wanted to scale up to a much larger database
(approximately 30 billion rows) the problems with the R implementation became
apparent. Fairly slow, high memory usage, worse yet a slow memory leak on
large jobs, and worst of all occasional silent crashes where it would simply
stop and exit with a successful status of 0 and no error message. This time I
took the approach of reading the code and de-serialized R objects. I re-wrote
the implementation in Python using numpy arrays and wrote a small R program to
read the serialized models in the .RData format and emit a cleaned up JSON
object that could be easily read from Python. Luckily I didn't have to port
any of the cross-validation/training/optimization stuff; just the prediction
part. That meant 80% of the code could be safely ignored. However, the devil
was in all the one-off special cases in the serialized R objects, many of
which had behavior different from default. I would test by comparing the
predictions by the two programs on small batches of a million. These
predictions were floating point numbers between 0 and 1 so when both programs
agreed to within 1e-5 for all million I knew it was correct. It took a week to
track down all the special cases, though, and the special cases made it
impossible to just use a standard neuralnet library like Keras. (We already
had several Keras models in production and a set of tools to manage them; that
would have been easy for us.) Proprietary code begets proprietary code I
guess. At least the new implementation was much faster and didn't leak memory
so could deal with the whole database in a single long running process. I
pitched the idea of re-training all models from scratch in Keras using our
more modern tools but management wanted 100% backwards compatibility and to
preserve the value-add of the PhD.

------
clueless123
First write a tests for every function point. After that you will understand
what the system does. Then re-write the whole thing.

------
Fradow
First, you need to choose if you are going to improve it, or ultimately
abandon it. That's not an easy decision. If you decide to abandon it, you need
some very convincing arguments, and lots of them. Before making that decision,
work with the codebase, at least until you understand it well enough. It will
allow you to uncover arguments either way.

Improving a codebase is a well-known subject, so I'm not going to comment on
it further.

If you decide to ultimately abandon it, you need to understand it won't happen
tomorrow, and perhaps not before a few years (for example, I'm 2 years in with
a codebase I decided to abandon, and it's probably going to be at least 1 more
year in production). Stakeholders hate when you spend time just rewriting it
for the sake of it (from their perspective).

What you want to do instead is use a strangler pattern: your new codebase
should "strangle" the old one, and deliver value VERY quickly, which will
convince the stakeholders it was the right choice.

First, all new features are in the new codebase. If possible, start with easy
features that have as few dependencies as possible with old codebase. Any call
between the two should be in a special wrapper in your new codebase, so you
can start having a sense for what code will need to be rewritten, at some
point.

Then, start to "strangle" the old codebase: wrap ALL calls so they go through
the new codebase first. That will allow you, somewhere in the future, to cut
off the old codebase part by part, and avoid the full rewrite effect, as well
as quickly revert to old codebase if bugs are uncovered.

Once you have that, you can more easily identify which parts should be
replaced first: performance issues, too many bugs, new features needed...

When you reach the end, it's a matter of convincing stakeholders you
absolutely need to go to the last mile, with good arguments. If you can't find
good arguments, you probably don't need to go to the last mile yet.

------
jl6
I did once work on such a thing, deep in the rotten heart of finance in a life
insurance company. The design was more than 20 years old and was the result of
a port of a previous codebase to a new language. The previous codebase was
also a port from an even older language. And legend had it that the thing
started out as a Paradox database in the 1980s. Along the way, the code had
been translated without significant refactoring, so idioms from the older
systems had been brought forward without consideration as to whether they were
sane in the new one. The overall flow of the process was still based on the
original design, even though you would do it entirely differently in the
modern toolset (for example, intermediate results were written out to files,
then read back in in the next step because the original system didn’t have the
concept of functions that could pass data around).

Subtle bugs had crept in over the years, usually relating to the different
sort orders or rounding rules of the different platforms. Some minor bugs had
become features (the users didn’t want the bugs fixed because they had become
reliant on the incorrect version and didn’t want to have to restate their
results with the correct version - the idea being that having to explain and
account for the change was more trouble than accepting the minor defect).

The codebase had been altered regularly with minor changes over the years, by
a succession of contractors whose names appeared like biblical king lists in
program headers. The changes were usually minor, and the approach had always
been to graft on some new functionality or extra edge case, rather than
redesign anything. Many of the contractors had been actuaries rather than
professional programmers, so there were hair-raising sections of code that
achieved the required result in extremely obtuse ways. Real outsider art.

There were huge vestigial sections of code and redundant outputs that nobody
ever used, but because it was part of a bigger end-to-end process that was
also poorly understood and onerous to test, those sections and outputs were
always kept just in case they were significant.

In a way, this was a relief. It meant you didn’t _need_ to understand a lot of
the code, as long as it kept producing the outputs everyone expected it to.

I was part of a project to migrate all this code onto yet another new
platform. Did I take the opportunity to do a grand refactoring? Heck no. The
project was already overdue when I arrived, and I had two other projects to
work on at the same time. So I did what all those contractors had done before
me. I lifted-and-shifted with the least invasive changes possible, ran just
barely enough tests to convince the users it was good, and moved on.

I still work for that company. The codebase is now more than 30 years old.
It’s had another platform migration since, as well as the same old stream of
minor change requests. Bolt-ons on top of bolt-ons on top of tactical kludges.

The thing is ugly and horrifying. Ramshackle and arcane. Congealed, not
designed. And yet... it’s managed to carry on producing the outputs that this
business needs it to. So is it really all bad?

------
ykevinator
The older I get the more I appreciate the power of comments and readmes.

------
Jtsummers
Org-mode and literate programming.

Bring all the code into a set of org-mode files each file has one code block,
which emits back the original code. You can verify this using diff or (better)
putting the initial repo into git if it's not already and seeing if your
emitted code causes any unstaged changes to appear. Start dividing the blocks
into more logical chunks (includes, declarations, definitions, one block per
function implementation, etc.). Use a hierarchy like:

    
    
      * foo.c
      #+BEGIN_SOURCE c :tangle foo.c :noweb tangle
        <<foo-includes>>
        <<foo-file-variables>
        <<foo-functions>>
      #+END_SOURCE
      ** includes
      #+NAME: foo-includes
      #+BEGIN_SOURCE c
        #include<stdio.h>
        #include "bar.h"
        #include "foo.h"
      #+END_SOURCE
      ** file variable declarations
      #+NAME: foo-file-variables
      #+BEGIN_SOURCE c
        int x, y, z;
        some_struct a, b, c;
      #+END_SOURCE
      ** Function bodies
      #+NAME: foo-functions
      #+BEGIN_SOURCE c :noweb tangle
        <<foo-baz>>
        <<foo-quux>>
      #+END_SOURCE
      *** baz: int->int
      #+NAME: foo-baz
      #+BEGIN_SOURCE c
        int baz (int i) {
          // some logic
          return some_val;
        }
      #+END_SOURCE
    

Those variable names are useless, figure out what they actually mean and
document them in the org file. Consider renaming them. There are no test
cases, create a tests section in the org file and start writing up simple test
programs (or use a unit test framework, but you may need to do some
significant refactoring before you can do that).

A benefit of org-mode here is that sometimes you want to test some
functionality (and have no unit tests yet). So you try to test baz from above.
But once you build that source file and a second foo_test.c file you find out
that foo has a function, quux, which has dependencies outside of just the
header files and foo.c itself. To build this thing you have to build bar.c and
maybe even the whole Linux kernel. "Shit!", you say, "How do I handle this?"
Well, org-mode to the rescue:

    
    
      * Testing Foo
      ** Testing baz
      #+BEGIN_SOURCE c :tangle foo_baz_test.c :noweb tangle
        <<foo-includes>>
        <<foo-file-variables>>
        <<foo-baz>>
        // maybe some other things like the header references.
        // some test code that is able to focus in on just the baz function
      #+END_SOURCE
    

You've fully isolated that one chunk, and quux is no longer being included in
the build for this test. Whatever problems it has don't impact you (for this
test). Once you figure out how to isolate or mock quux's dependencies so that
you aren't including the full Linux kernel, you can add it to the set of
tests. Now the foo file is fully brought under testing and you can more
effectively refactor it. And the mocks and all you've made allow you to move
to a proper unit test framework if you want, versus the ad hoc initial
framework we've produced here.

Even if you don't care about testing your code, the above process (sans the
testing part) will allow you to perform a disection on your codebase and get
it documented and specced out properly.

------
Diederich
The company's entire code base was split into two: Java from the 'enterprise'
stuff, and Perl for everything else. The Java side had something close to 20
million lines of code, and the Perl side had about one million lines of code.
As always, I worked on the operational side.

NB: I rather like Perl, though I haven't used it much at work since leaving
this company a few years ago. The ...things that follow are not an indictment
against Perl. The problem wasn't the language.

Even used properly, Perl is a pretty dense language, so a million lines of
Perl in one place is quite impressive.

The company's code started in the mid 90s and continued to grow. When I
arrived, there were a couple of guys on the operations tools team with me that
had been there 7-9 years I believe. They understood the code better than
anybody, and they were both very bright guys, but things came up, at least a
couple of times a week, that would surprise and sometimes mystify them. After
some digging, debugging and sniffing around, they'd usually approximately get
somewhat near a root understanding.

That's fairly amazing to me. I've helped create some enormous code bases my
decades long career, in one case, even larger than a million lines. And I'd
certainly find myself surprised from time to time, but never for very long.

So these millions of lines of code had, over the prior 17 years, grown
organically, and had, essentially, never been refactored.

The code itself was, for the most part, quite tidy. And the underlying
concepts and structures were pretty simple and elegant. They represented
approaches I mostly agreed with.

But...almost nothing had ever been removed. No refactors, over millions of
lines of code, over most of 20 years.

There were side effects everywhere. Many of the deep, underlying methods had
had so many arguments added that perhaps one third them would be useful or
used in any given invocation.

And, of course, there were virtually no tests.

After somewhat getting up to speed on this system, I declared it DOA, and
started to push for a complete rewrite. I've been around the block enough
times to understand and comprehend the hazards of that approach, and I didn't
mentally pull that trigger lightly or quickly.

But....the organization would not have it. The two senior guys on my team were
fine with the idea, but many other long-timers were not.

That code is, to this day, with no doubt another 100k lines of Perl added to
it, (poorly) powering fundamental and important pieces of a huge company you
have all heard of.

So management basically threw bodies at it. Lots of bodies. And, even though
the operational quality of the products were objectively fairly poor, the
particular market niche didn't need or demand better, so the company made and
continues to make a ton of money.

------
mikekchar
As others have said, I specialize in legacy code. It's the one area you can be
an expert in and know 100% for sure that it's never going to change ;-)

Some quick advice: Rewrites are almost always a bad idea. The requirements are
almost always at least as difficult to discover as they were in the first
place. You will also miss things or incorrectly decide that something isn't
important now. These can often kill your project before you get a chance to
replace the old system.

But the most important reason for not rewriting is because there are almost
always business reasons for extending the existing application during the
rewrite period. This gives you a moving target for the rewrite. Additionally,
you will find that the "legacy team" who is adding code to the existing system
will be seen in a better light than the "rewrite team" because they are
actively solving business problems. The "rewrite team" will be seen to have no
value until they ship something. As more and more features are added to the
legacy application, more and more resources will be added to it until someday
someone will say, "Why are we rewriting this again?" and cancel the rewrite.
It doesn't happen every time, but in my career I think I've seen it in about
90% of the rewrites.

So you need to get comfortable with the legacy code. The first thing to do is
to make the build and deploy process as painless as possible. You probably
can't get time allocated to do it, so with every piece of work you do, steal
some time for that. If you are on a project where they have "build teams" and
it's actually impossible to build the application yourself, fix that as a
matter of priority.

Once you can reasonably work on the code, you need to start introducing tests.
The best advice I can give is to read Michael Feathers's book "Working
Effectively with Legacy Code". This is a must read. I think there may be a
newer version of it, but even though the old version is very dated technology
wise, the techniques are still rock solid.

Fight the urge to refactor/rewrite large portions of the application. Instead,
pay attention to the code that you touch the most. Ensure that this code has
good tests and once it does, fence it off from the rest of the code base and
start improving it. Code that you never touch can be the crappiest in the
world. Code that you touch once only has a one time cost, so don't fret over
it. Code that you touch every single day needs to be amazing. Concentrate your
efforts there.

The last piece of advice I have is to look at the kinds of requests you get.
If you get a lot of similar requests for functionality (for example lots of
reports), then make that part of the system easy to work with. What you want
to do is match the ability to work with the code with the expectations of the
customer. If they intuitively think, "This should be easy", then work hard to
make it easy. Say things to your stakeholders like, "You/Users asking for
feature X expect this task to be easy for me to do. It's not. I need time to
make it easier." Usually they will see the sense in that. If they expect
_everything_ to be easy, use that back on them. "I can't rewrite the whole
application without stalling our business plans. I _can_ make some parts of
this easier than others though. Which parts are the most important? Note if
you say X is important to be easy, then I have to spend time up front to make
it easy. We have to be careful about our budget". That's the kind of language
that business people can understand.

Finally, have fun with the legacy code. You aren't likely to make it (much)
worse. Use the opportunity to experiment with new ideas. However, I caution
you to avoid the temptation to transition to newer technologies (you'll never
get it finished -- just like a rewrite). Instead, think about the _techniques_
in the newer technologies and start introducing them in your old code base.
IMHO, this is _always_ more fun that simply using something off the shelf
anyway. Ironically, I find that working on legacy code is the most liberating
thing I can do on a professional team. You can always say, "Well, this is
crap. Anybody mind if I replace it?" and almost always people will welcome it.

------
wolfgang42
I inherited a suite of .NET/WinForms applications that managed warehouse
shipments to major purchasers. They had been written and modified by a
succession of programmers with wildly different opinions of how to write a
program (from copy-paste duplication to massively overarchitected inheritance
trees; fully denormalized tables to 6th normal form; and everything in
between). I was the only software developer at the company, so there was
nobody else to ask how any of this worked; I had to figure it all out from
scratch. The steps I took were:

\- Pick the program with the most egregious errors. This was part of the suite
that would upload tracking information to the purchaser, and cost us large
fines when something went wrong.

\- Find the user(s) of the software and pay them a visit. Solicit buy-in
(there was some concern that the new IT director might have started this
initiative in an effort to automate people out of their jobs) by explaining
that I'm planning on making the software easier to use (easy, since it was
awful and everyone hated it), and then have them walk me through exactly how
they used it. This turned out to be a terribly inefficient process involving
lots of paperwork shuffling, but I ignored that temporarily in favor of just
finding out how it was supposed to work now, and what sort of ways it went
wrong. Get a list of likely low-hanging-fruit bugs.

\- Track down the source code to the program, and put it under version
control. Fortunately I had a copy of the previous developer's computer, which
had an up-to-date version once I found it. However I did have to test that it
did everything it was supposed to by running it in production, which was a bit
nerve wracking.

\- Set up an automated updater (I used Microsoft's ClickOnce installer, which
checks for updates on a shared SMB drive), and replace all the copies of the
program I could find with auto-updating ones. (This required asking people to
pass around word of a replacement by word-of-mouth as they heard other people
were using it, since nobody had a list of all the users.)

\- Buy ReSharper, and start doing mechanical refactorings on the codebase to
fix the obvious and easy code smells. What the changes are doesn't really
matter much; the point of this exercise is to start to get a feel of where
everything is in the code. Since you're just using the ReSharper commands,
there's no risk of breaking anything by doing this.

\- Fix a few easy bugs, and push an update out to users. I started with making
a list view sortable (literally a one-checkbox change that saved 30 minutes a
day) and a few similar small issues. This immediately showed a previously
unprecedented level of interest in the users' problems and also got them used
to using the auto-updater before any more major changes came along.

\- Continue with more major refactorings and bug fixes, pushing out a release
every few weeks (faster if you can focus on just the one project). I usually
tried to include at least a few user-facing changes in with the internal
stuff, but occasionally the release notes were just "better performance" or
"major internal improvements, so I can do feature X next week".

The really important part of this process is understanding not only how the
software works (and how it's _supposed_ to work)--which can probably only be
done by refactoring instead of rewriting--but also getting to know how the
users use it and what their actual needs are, so you can suggest improvements
that wouldn't necessarily be obvious to someone who doesn't understand the
entire system.

~~~
thrower123
> \- Buy ReSharper, and start doing mechanical refactorings on the codebase to
> fix the obvious and easy code smells. What the changes are doesn't really
> matter much; the point of this exercise is to start to get a feel of where
> everything is in the code. Since you're just using the ReSharper commands,
> there's no risk of breaking anything by doing this.

I love ReSharper, and it has been worth every penny that I've ever paid for
updates, but you still have to be very careful. I've walked into a few very
heinous bugs where simple refactorings have broken things badly. Mostly this
was because of people doing evil things with reflection and dependency
injection, that should never have been done, or because of arcane config-file
based development, where even ReSharper's excellent "Find Usages" and code
analysis engine cannot fully understand what is going on.

~~~
wolfgang42
This is true. Fortunately none of the code I worked with did anything like
that, but it's definitely important to be aware of.

------
jiveturkey
no but i’ve written one

------
tbirrell
Pick a starting point then write a lot of comments.

~~~
thrower123
Corollary: pick a starting point, and add a shit-ton of logging to trace what
is actually going on. Hide it under #if DEBUG if necessary, and hope that you
don't get a Schrodinger's Code situation where observing the code changes how
it behaves.

I've had a few more-or-less realtime multithreaded projects that I've worked
on where you can't really put a debugger on a system and halt it, without
breaking things in interesting and misleading ways, and the only good option
is to fall back to ye olde printf debugging.

