Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Have you ever inherited a codebase nobody on the team could understand?
185 points by ironmagma on Nov 28, 2018 | hide | past | favorite | 217 comments
How did you deal with it? (Reverse engineer requirements + rewrite, convincing higher-ups to cut ties with the code, something else?)

I would be careful with some of the responses here. Over my career I've found that a significant subset of developers struggle with unfamiliar codebases. Sometimes this has to do with their experience being mostly with greenfield projects and other times it is because they have not seen a wide array of different work created by other people.

But sometimes it is good old fashioned workplace politics. It is risky to take on an unfamiliar codebase as any problems are now your problems in the eyes of management. And it is, practically speaking, impossible to account for all edge cases and surprises that may exist in a legacy codebase. Therefore, framing the codebase as terribly written and a total disaster achieves two things politically. It helps set up blame for any issues in the future on the previous developers and their "terrible codebase" and opens the door for a much more enjoyable and lest risky greenfield rewrite.

My number 1 red flag of working with a developer, unless they are very early in their career, is hearing them describe a codebase as awful. Most really are not that bad and are usually just using unfamiliar and less than ideal design patterns and coding practices.

No (or long-broken) tests and no easy (at least partially automated and otherwise documented) way to build and/or run the code locally are the norm for others' codebases I've inherited. Those two things qualify it as "awful" I'd say, all on their own. Especially in languages like Ruby or JS where you're practically crippled in an unfamiliar codebase without tests and/or being able to poke around in the running application. Both being absent is a near-perfect signal there are tons of other problems, of the screw-up sort and not the we-had-to-cut-corners-for-actual-reasons sort.

Comically bad security holes, actual or implicit (framework-created) SQL queries in a for loop for no good reason (well, because the developer had no idea how to use SQL, and hadn't developed an appropriate allergy to unnecessary network communication, are probably the reasons), hilariously misguided attempts to fix the wrong thing to improve performance ("we'll move it to jruby!" well sure but your actual problem is you chose an inappropriate database and are using it poorly, but at least you made your build pipeline worse for marginal benefits, so there's that). Ruby in your node project just to run a very basic task queue (!?). Et c., et c., c-beams off the shoulder of Orion, tears in the rain, et c.

Inheriting something even half-decent is really, really unusual. I wouldn't bad-mouth half-decent. I just rarely see it.

[EDIT] this probably varies a great deal by platform. I imagine it's a bit less common to have a total train-wreck of an iOS app, to pick another platform I've worked on, than something server-side.

> we'll move it to jruby!

Hahahah Oh man, that's great. BTW, we had people who did this. Our performance is 2x better with half the servers after we undid this "optimization" at our org. I suggest benchmarking first.

Awesome! Here are some practical problems that used to give me headache, can you give us some best practices on how deal with these situations?

- untyped languages that run through A LOT of layers of wrappers/indirections to get things done; How do you keep track? (with typed languages it is easier to figure where the next step is going to; though code in java likes to do this trick - it then obscures the details by using some container mechanism, like spring, etc) Do you manage to reduce said numbers of indirection without breaking things?

- microservices like to do that a lot (going through n levels of services just to get to the target server - with different languages and RPC protocols involved). I had once spent a week to add a frigging attribute. Any tips here?

- gui code often turns into a mess (like when it takes it short of a week to add a frigging detail); how do you deal with that?

- how do you deal with the problem of missing context? (when trying to read through the source and there is nobody there to explain any rational of what is going on?) (for me knowledge of other systems often helps to bridge the gap, are there any more empirical approaches?)

For me these things are hard, though i manage (sort of). How do you do?

In order—and mind I'm not like amazing at this so take none of this as best practice:

- Stepping through with a debugger. Static analysis call graphs in stricter languages[0]. Runtime call traces and visualization in dynamic languages. Often you can find some tool for it[0]. If it's that bad you're probably not gonna be able keep your head wrapped around it long term, so you'll be falling back on those tools often until/unless you refactor.

- Similar. Find a way to trace requests (you are really, really gonna want such a tool when things go wrong anyway if you're microservice-heavy) and record what happens, if it's so confusing you can't figure it out otherwise.

- Not much to do but isolate and replace piece by piece or screen by screen as the opportunity presents itself. Hardest part of this is that your new stuff may well look and work better than the old, which can lead to UI inconsistency, so it ends up being an exercise in inconsistency tolerance or management.

- Just gotta leverage tools to tell you more about the codebase than you might get from just looking at it (as in point 1) and do a lot of probably-boring and hard-to-measure difficult work. If it's still under development that means someone ought to be able to tell you what each largish part (feature, screen, section) should do, at least from the perspective of the end user, which may be useful. If no one can tell you that then why are you maintaining it, right?

Jumping off that fourth point: a great preventative measure for ending up in this position (or putting someone else there) is to make sure your code is written such that tools can more easily tell the reader facts about the project. Typescript over raw JS, that kind of thing. Static types are communication. They're communication that can be verified by a machine to be correct, more or less. They're great. Tests fail or pass. They tell you what the test writer expected to happen, and whether that's happening. They are communication. If all your comments get stripped and no-one updates the docs for two years and you get hit by a bus your types and tests still communicate. Even outdated test suites aren't totally useless. And static types pretty much can't go stale like docs or tests can.

[0] https://github.com/TrueFurby/go-callvis

[1] https://github.com/jamesmoriarty/call-graph (no endorsement, haven't used it, just an example.

Being able to solve those problems like "it ain't no thing" is part of being a senior developer.

Sure, and man are the easy wins satisfying. Fixing shitty data access methods/patterns is one of the easiest ways to get a "WOW!" out of a client or product owner if you're stuck in the low-visibility silo of backend dev[0]. I've just been repeatedly surprised over the years at how very, very many people are making real money, and consistently finding work, yet don't seem to know which way is up. And I don't mean greenhorns, though often they're given comical levels of responsibility (typically by the cash-strapped or cheapskates) resulting in some real messes, which has a similar effect to when an incompetent "lead architect" or an experienced team nonetheless without a clue between them is set loose, which is just as common.

I harbor no ill will toward these folks. Hell, selfishly, I'm glad there are so many. Having half an idea what you're doing is guilt-inducingly easy, pays amazingly well—especially after the confidence-boost and resulting swagger and negotiating attitude that comes with seeing this kind of thing over, and over, and over, for years on end—and it's disturbingly easy to be a or the "smart one" in the room when you're kind of a dummy, in fact.

[0] Dear backend devs: if you don't simply love backend work and/or if you aren't very well appreciated compensation-wise, and especially if you have long-term career aspirations that involve shifting more toward the biz/architect/management side for the extra social status and higher late-career pay (in most of the industry outside the huge West-coast tech companies, anyway), consider moving to more high-visibility pastures—though ideally not web frontend, as, incredibly, it's still a trash fire and comp is so-so, mostly. Unless you just like trash fires, which some people do.

"Unless you like trash fires, which some people do."

I'm dying.

What are higher visibility pastures that is not web frontend?

Mobile, probably a lot of data-analysis jobs provided you get to present results. Desktop work if you can get it. Non-traditional UI like voice. You can spend (what will seem like) a silly amount of time selling what you've done in backend work, even, it's just a harder path.

I don't know about you, but I've rarely worked on a codebase that _wasn't_ awful in some way, and I am definitely not early in my career. I've come to the conclusion that most programmers are simply awful at their jobs, and the developers that can write clear, concise code are a small minority. I've known a few, but not many.

Especially in big companies it's really hard to keep your code concise. I have often started with a very clean design that totally feel apart when new requirements came in. If you give managers the choice to hack it into the code in 1x time or redesign the code in 2x time most of them will vote for the first option. Go through that cycle a few times and your code will be a big mess

This 100%.

I’ve worked on several codebases over the years where a mgr said “here’s a special case, inputs of (foo, bar) should give baz2 not baz1. Coder whips up a hacky workaround to that very specific ask in the simplest place possible, not the right place. A few dozen requests come in like this over the months. Overrides and exceptions are now scattered all over the code base, without any semblance of order, functions no longer do what they advertise because data is modified downstream, etc.

Sometimes the time to redo it right is worth the investment in terms of future maintenance. Other times it’s too foregone and you have to bite the bullet and stick with the mess.

It's hard to blame them. I've been in similar situations multiple times where a requirement seems simple enough on the surface that you only get a few hours to work on it. But it's actually not that easy to write it cleanly.

That hacky workaround you're talking about? Well, my task depends on it. The workaround too depends on another hack. With the amount of time I have to work on it, it's a no brainer.

Earlier in my career, I tried to fix them, but they started to break other "hacks" and down the rabbit hole I go. Sometimes it's just not worth it.

If you don't act over-pessimist with that "1x vs 2x" comparison, management only sees that you offer two valid options and the second one takes double time for no reason. So obvious, take option 1.

Maybe you can propose to hack the code in an accumulative +1x time from the previous hacks (thus making it clear that next hacks will take more and more time), or design it into the system with a nice and constant 2x. That could be a strong indication for management that the hackish way does have a long term cost.

90% of developers think they're better than 90% of developers, but it's just an ego trip. Chances are, people have just had to compromise on design due to pressures from within the organisation. Or, as is often the case, they completed the exploratory coding phase of a new feature and the manager shipped it before they could refactor.

This so much. You talk with product managers etc about a new feature that could revolutionize a certain portion of your product so you start hacking away to get an alpha. The alpha is shown to some of the sales people and higher ups who start frothing at the mouth to put it into customers hands. The PM then says we need to ship this ASAP and into production goes code that was just feeling out whether or not the feature was even possible to begin with. Add a couple of months of no major refactor but tons of bug fixes plus additional features and you end up in a really fun place :]

There are so many bad developers out there, I find this difficult to be true.

Is it useful to calibrate the software scale in a way that concludes the majority of codebases are "awful" ?

What are we usefully communicating with the word, if 90% of codebases qualify?

Maybe software development is hard, sometimes unpleasant enough, to justify the crazy salaries, and the endless complaining about every codebase more than 2 years old is not quite justified..

As a collective group, software devs are really prone to self-flagellating while they go about delivering billions of dollars in value.

I've also rarely worked on a codebase that wasn't awful in some way, and I've come to the conclusion that maintaining best practices across a sprawling codebase is really hard.

Obviously. For 70 years now, the number of programmers has been doubling every 5 years. In consequence, on averge, half the programmers have less than 5 years of experience. Therefore almost all the code will be produced in a awful way, using awful new reinvented programming language-du-jour, and new reinvented configuration-and-build-system-du-jour.

Yeah nah. We don’t describe code as “awful” because we truly want to rewrite from scratch. Code can be quite awful simply because a company has a high developer turnover due to hiring practices (you can code? Good, write our payroll system) and management (the bearings will continue until morale improves).

As a result you end up with every function using different nomenclature, different approaches to encapsulation, differing levels of compliance to Demeter, bizarre side effects because they made sense in the one day that developer was allowed for that feature, and so on.

So “less than ideal patterns and practices” does indeed count as “awful” code where the maintainer has to learn a new coding style for every file in the codebase.

Agreed. It's impractical to incorporate "ideal patterns and practices", but competent developers can get pretty close.

Having said that, I much prefer working with code that is very simple and redundant with a verbose API, than with a codebase that is overly refactored and abstracted.

Finding the balance of the two extremes would be my definition of an ideal pattern.

The first step is admitting you have a problem. If you can’t acknowledge badness as a group then entropy takes over.

I complain/observe about codebases all the time, especially on projects with an SLA. It’s bit like fire code. When the shit hits the fan you [don’t] want people wandering around trying to figure stuff out.

Hallmark of a bad team (project management, really) is if the same three people are required for every emergency.

The interesting thing about complaining is that nobody wants to talk to the optimist about problems. It makes them feel small and stupid.

I find myself in the role of troubleshooter often, because I’m good at it, I acknowledge that the struggle is real, and at the end I try to summarize the experience, and diagnose where they went off the rails.

When you see 20% of the team have the same stumble, the problem isn’t with them. If the stumble was expensive then it needs to be addressed rather than blaming the victim. Nobody wants to talk to someone who shames them, either.

But I don’t believe in the classic rewrite. I believe in prevention and station keeping (don’t let it get that bad). The human brain can only deal with so much. If you want to keep adding to the project you have to keep cleaning old code and improving your practices. Slowly you’ll have a very different codebase, and a team where everyone senior can fix most problems.

Edit: a word

> But sometimes it is good old fashioned workplace politics. It is risky to take on an unfamiliar codebase as any problems are now your problems in the eyes of management. And it is, practically speaking, impossible to account for all edge cases and surprises that may exist in a legacy codebase.

When I was a more junior engineer, I thought I would be appreciated as helpful when I demonstrated that I could figure out and become productive in a new and umfamiliar codebase quickly. This was indeed the case! For a while, at least.

Over time, exactly what you predict set in. I was expected to account for all possible surprises in legacy code. In addition to being handed more tasks in legacy code, I was also expected to account for all edge cases and surprises in there. Being able to blame the previous developers, believed by management to be excellent and highly competent, would have saved me a lot of grief.

Perhaps unsurprisingly, this did not exactly thrill me and lead to wonderful performance on the job.

I agree that unfamiliarity should not awful a code base make.

However, there are degrees, and probably thresholds. If your code to manage web site users is using string theory, despite the assumedly concise, elegant and refined code, it would make it awful to all programmers, but the two in the world who master both programming, web site user management and string theory.

Otherwise, bad design patterns and bad coding practices definitely characterize awful code bases.

But the worst, is when it’s so bad that you can hardly refactorize it.

> My number 1 red flag of working with a developer, unless they are very early in their career, is hearing them describe a codebase as awful. Most really are not that bad and are usually just using unfamiliar and less than ideal design patterns and coding practices.

So, what if in reality, the codebase IS awful, and the product itself suffers from delays and setbacks even on seemingly simple tasks? Still a red flag if a developer says the sky is blue? Maybe you're the red flag in that case?

Yeah my biggest red flag is people who fail to have a proper discussion about what color the sky actually is.

Unfortunately people often aren't incentivized to discuss the actual color of the sky,but rather the color as either they would like to see it or that would suit their agenda the best.

It’s always awful. That means it’s never worth it to discuss how awful it is. You’re probably working with people who helped produce the awfulness, and it pays to be empathetic towards your coworkers.

Code isn’t produced in a vacuum and there are always external pressures for why this was hacked in rather than done correctly. It just kind of makes you come off as tone deaf and inexperienced. “My code is perfect and everyone else’s is garbage.” (It isn’t).

There are always concrete things that can be done to mitigate some of the awfulness and improve a codebase. A good developer will identify the root causes and fix them. Discuss and identify actual problems and solutions rather than your considerable (and understandable) angst.

> Code isn’t produced in a vacuum and there are always external pressures for why this was hacked in rather than done correctly.

Everybody has reasons for everything they do. Even if those reasons were made up after the fact to match the evidence (everybody wants to be sane even - no, especially - when they subconsciously do crazy stuff).

Anyone attempting self improvement has to first contend with the fact that their reasons are somewhere on a spectrum from wrong to excuses.

From a business perspective if you can tell me why it’s helping the business that’s the only reason that matters.

Also a very important life lesson to realize that the reason we did something isn’t a reason to keep it around. Sort of a “what have you don’t for me lately”. Hacking something four years ago to land a contract that saved the company isn’t a good reason to keep that code now. We did it because we did it, but we don’t have to keep it that way.

Most programmers working in a business know which codebases are the good ones and which are the bad. It's very unusual for a programmer to be given a good codebase as they are the most sought after, and therefore much politicking goes into getting those. Your number 1 red flag just shows the developer isn't competent or willing to take part in certain types of destructive office politics.

> unfamiliar and less than ideal design patterns and coding practices.

is this not what awful is?

The thing there is, all design patterns and coding practices are less than ideal.

People pick the best way they can come up with at a time and place, and everything is hunky-dory for one fleeting moment, and then the moment is gone. Any code that's good enough to survive will see the world change around it, until you've got a system that was developed for one niche and then spent years evolving into a new one as business and technical requirements changed, and will have plenty of quirks to show for it.

A good programmer can deal with that.

IMHO any non-trivial codebase will be awful in some parts. Codebases tell only half the story, and a programmer needs to also inherit the mindset that resulted in the code. Until that happens, there will be a research phase that will result in all the code being triaged multiple times, so the new coders can decide if the piece of code under triage is good, bad, or there's no way to know yet.

With that attitude anything a few years old will be called "awful", especially the latest, best shiniest code you are writing today.

I think badmouthing the developer who came before you is not good. But, a long succession of good developers, can jointly produce a codebase which is awful. Part of this is simply the fact that it has changed hands many times. Part of it is decisions made that the developers were not able to control. There are many other possible reasons. But, although most developers are trying to do a good job, and I think it is bad form to badmouth the developer before you, it is often quite simply the case that a large and old codebase is, in fact, truly awful.

Generally I agree, but there certainly are exceptions. When it's a code base "nobody" can understand and that no other team or developer wants to maintain because of its complexity or infamously poor prospect for maintenance, that's a sign of a poor codebase.

most really are that bad. try this on for size: look at your own code from 4 years ago.

I’m a consultant and make a living saving bad projects. That’s literally why I get phone calls for work. Keep in mind that I work in high level modern languages, I’m sure there’s some crazy proprietary cpu running a robot in a Detroit factory.

In any case there’s never been something that I’ve run into that I’ve not figured out. It takes time, and the hard part usually is not figuring out what it does but the weird edge case of the moon aligning with Venus and then the output suddenly changes. This is why understanding the requirements is more important than the code. I don’t care if the code is bad if I can more or less write a test case against it and make sure it does that.

That said complete rewrites never happen. It usually is only rewriting portions when that is cheaper than fixing. It is the if it isn’t broke don’t fix it adage.

The only time ever I’ve been stuck was when I saw a proprietary software implantation on top of a custom software package ontop of Solr (technically ontop of the JVM) create a large object heap issue ontop of a proprietary OS (Windows). It wasn’t code related I diagnosed that it was a GC issue, but it wasn’t in any code I had source to. A Windows update ended up fixing it. And this is why working with enterprise software is hard.

That's what I've been doing for the past 7 or so years. Turning around struggling products, gradually improving and then sometimes help a re-launch as "greenfield based on lessons learned" when the time is right.

It's not the most sexy work in the early stages but very rewarding when you help turn a failing situation around.

Sometimes it's bugs and bad algorithms or data structures. Sometimes it's misunderstood requirements and the fix has been surgical rewrites. Most often a mix.

> It's not the most sexy work in the early stages but very rewarding when you help turn a failing situation around.

And most often that struggling but working product still brings money and real value to people or business, not like most of greenfield shiny start-up..[cough!].. throw-away projects.

That's really interesting, and definitely agree knowing the requirements is hugely important. I've heard from people at BigCorps about inheriting projects where no one at the company really knows how or why a piece of code behaves the way it does. That seems like one of the more terrifying scenarios, where you can't even ask someone what a particular piece of code is supposed to do, and you have no way of knowing how it's supposed to work (and then you are tasked with fixing bugs when they crop up). But I guess those three conditions are unlikely to overlap probably (hopefully).

Have you used input/output proxy logging shims in these situations? They've been so invaluable to be in finding those weird edge cases that come up in 'orphaned software' projects as well as 'ancient legacy platform' migrations ;)

Do you have any specialized tools (code navigation tools, for example) that you use when first encountering these large piles of code? I'd love to hear some recommendations; I have to deal with large (only sometimes bad, but always large) piles of vendor code. I'm currently staring a pile of 900kloc of pretty nice code but it's a /lot/ of code.

Not OP, but I imagine this is fairly language specific. This is where Java shines. Keep in mind I mostly work with services not applications.

My process looks like this:

Step one: Identify sources of reflection, this is the triskyest. Hopefully the only dependencies are open source, so you generally know what they do and grep can usually find the rest.

Step Two: go code spelunking. Find your entry points. Find your main() or framework equivalent. Find callsites for rest endpoints, rpc, jmx, etc.

Step three: find other "external request processing" endpoints. Do you have timer threads? Reading a Kafka stream and acting per record? Etc.

Once you understand those, you can interpret where most any stacktrace is from. Good old Intellij or Eclipse can give you all the callsites for a functions as you root around. You should slowly get a feel for which part of the code things get called from.

Now start asking questions like: what data is shared between these entry points? What's mutable? Is it all done safely?

Hopefully this wasn't too narrow an example. I'd imagine it'll apply to any services.

I'm working in Linux wifi drivers, all in C. Giant complicated protocol with giant complicated code. I've been tinkering with Microsoft VSCode, CLion, and Sourcetrail. Vim + Ctags seems to work well in the beginning but only gives pinpoint answers (can find trees, but little view of the forest). Still experimenting.

It depends upon the platform, the language and the toolsets. For instance, for C, a trick I've used is to (if I can) use different C compilers and crank the warnings/errors to 11 and fix every complaint (or try to---it can be daunting to attempt this all at once).

Other tricks---run the code through linters or other stylistic nit-pickers and fix those too. I haven't yet used a code reformatter, but that's a quick way to get code into a consistent style.

I'm all in C, too: kernel drivers from wifi chipset vendors. There's one C compiler we can use (gcc) and we're even restricted to working with a specific version (because cross compiling). We have to be very careful about changing any of the code because we have to integrate any changes into the next drop of the vendors' code. Static analysis is our best bet. Is a really interesting problem.

Huh, I love working on a untangling and managing large old codebases. Mind sharing where your company or contacts?

Same - send us some info so we can do this too.

I'm not OP, but I would be turned off by this comment because it sounds demanding.

Yeah bit of an odd request for me to send you my business contacts. It really is being in an industry long enough to build a reputation to get things done. And these are enterprise projects, there’s no sexiness here. This is webforms in .NET type of stuff. Learning to ignore that the codebase is imperfect and will never be reasonable is one of the reasons I get calls.

Sorry, to clarify: I meant a means to contact you, not your prospects. I was just curious about your business and didn't mean to be intrusive.

That sounds like something for me. I love optimizing and improving existing systems.

I did - please bear witness while I work through my trauma. Variable names were actively obfuscatory, and their declarations were always many hundreds of lines distant from their use. And many weren't even used at all. Or some were assigned values that never got read or used for any purpose. In a language that generously provides many useful data types, booleans were being converted to strings so he could check whether it was equal to "true" (like t-r-u-e, the literal string). No generics: instead, arrays everywhere, even in cases where the number of items wasn't known ahead of time. (Just size it to 10,000 x 500 and hope you don't overrun. It eventually always did. So he would bump it up to 20,000 x 600 or something.) No modularity... everything just one big function. And if you need to do the same thing somewhere else, just copy the code over there. Oh but make subtle changes to it so that the two copy-pastas diverge from each other in subtle ways that could've easily been parametrized. Copy-pasta marinara over here, copy-pasta pesto over there. I could go on (and on), but in short, it sucked ass. I concur with others here that understanding the code (since it sucked ass anyway) ended up being something of a lower priority than understanding the business processes.

That was the key actually. At first I diligently tried to understand the code, refactor, rename, move things around... but at some point I crossed a bridge where I suddenly understood, this code is not an asset to be protected and cared for; it's a liability that sucks my time into it and creates not just low, but negative, productivity. And if I don't kill it now, it may suck other victims in. By that time I understood the business processes better and had better implementation ideas anyway, so I began "deleting with extreme prejudice" and rewriting. Even so, I still had to read and understand the code I was deleting or replacing, and was constantly thinking like this: https://www.youtube.com/watch?v=vbr9akNELdc (Yep that's Airplane! and you probably won't recognize the actor who 28 years later would play hitman/bodyguard Mike Ehrmantraut on Breaking Bad.)

Eventually I re-did the important parts and let the rest fall by the wayside. I didn't attempt to duplicate all the functionality embodied in that mess. It took 2 years but I remember the day I finally deleted the last piece of shitcode. Management was behind me all the way, because they were aware of some of the issues with my predecessor, plus over time they've become content for some reason to kind of just take my advice about this and other things almost implicitly. So I was lucky in that way; I was free to tackle it how I saw fit. They didn't really have much choice anwyay though, for the kind of money I was making at the time! (not great)

Code that can't be quickly replaced is a liability, in my mind.

Your example is an extreme case, and I think that even in non-extreme cases, code that has inertia is bad. The inertia is just not AS bad as the inertia in your story.

If you can't, (as an example,) wrap your mind around a codebase that implements a bit of business logic, in a day, that code is too complex. It should be replaced with something that can be understood quickly, and changed quickly, when the person who maintains it today, dies, retires, or goes on vacation.

(In the above, substitute "day" for whatever period is suitable for your situation, and "person" for whatever employee-unit is suitable for your situation; "lead dev", "team", whatever.)

Software has more inertia than hardware, and that is INSANE. There is software at my employer that is OLDER THAN THE FUCKING X86 ARCHITECTURE, and has gone untouched for much of that time.

"If it works, why touch it?"

A business needs to understand what is running (for legal and other reasons) and needs to be able to fix it, change it, or replace it very quickly, because if it is implementing a business process, it is important for business, and if it is important for business, then it needs to be able to adapt as the business adapts.

Some things change very little, true; that doesn't mean everyone who worked on it should be allowed to retire, then rehired as contractors 20 years later when the mainframe hardware finally goes out of support.

I know of a very popular London startup who's entire database is in Spanish due to the initial dev work being outsourced to a development company in Spain. All table, column, and procedure names are in Spanish. A refactor is too risky and they are growing too fast, so all the engineers have to pick up basic... programmer Spanish?

That's a more literal example of not being able to understand the codebase I guess.

I'm spanish and loathe spanish-written code. It looks so unprofessional to the eye. Luckily in all spanish companies I've worked on they had an english-only policy for the code and comments. This was because the companies didn't discard that foregin devs could join at some point in the future and friction should not come from a lack of Spanish skills when reading /writing code. It made however for some funny comments from people who didn't really know their way to prose writing in English.

At my company we're finally trying to write code in English (much to the despair of way more developers than I would have expected), but one serious difficulty we've encountered (appart from people that don't know how to write correct sentences...) is translating domain-specific terms. Either it seems like bad translation, or it's not understandable at all.

I don't know what the solution should be in that case? Keep domain-specific terms in French?

I've also ran in to the reverse problem: mixing English and Dutch in confusing ways. For example, a common naming pattern in is `GetFooByID()`, `GetFooByName()`, etc. I think it makes sense to stick to that, but if `Foo` is a non-English word it just gets confusing/inconsistent, especially because in some cases you would translate Foo and other cases you don't.

On the other hand, translating domain-specific terms can also be very confusing. For example at my previous position we built a rental contract system that was very specific to the Dutch rental system/laws. We translated everything to English, but a lot of stuff is just funky because it's a specific Dutch term without a real English translation (for example names of specific laws/procedures).

My advice is to give up, destroy your code base, and become a sheep herder. You're screwed no matter what.

I'd say if it's a country-specific thing with domain terms that one would need to learn and know anyway to do the work, even if the programmer was foreigner... then keep those terms without translation. It will be less confusing.

My bother worked in this situation, they kept the domain specific words in the local language because translating would be difficult.

Using national language in code bases is indeed bad, if you want tomhave them sold to foreign companies or contributed to by the international community. On the other hand, it may be good if you want to avoid this. I guess, not a lot of the sources developped in China is hacked on by non-chinese, or easily spied on by foreign powers...

In the mid-2000s, I inherited a code base that was supposed to recommend nearby points of interest (coffee shop! pizza place!) based on your zip code. It had been written by a contract programmer in India who naturally didn't understand how US zip codes worked. That was interesting.

(He also didn't understand how functions worked, but that's another matter...)

I can't wait to repeat this story at parties! I once tried to dig into the source of a game in Java witten by a French programmer while I was in Highschool. I was able to glean how to use Image APIs from it but the code as a whole left me mystified.

It's been 10 years, I wonder if I could crack it now.

One of my first dev jobs years ago was at a non-technical company that had outsourced some previous development work, and the resulting HTML templates had all classes and styles in Spanish. The first "encabezamiento" class name (Spanish for "header") threw me for a loop, but luckily the patterns were similar enough!

I had something similar with German. The company paid for us all to have German lessons which was nice.

Rafactor, test, repeat.

I contracted for a company that had no software team, and had outsourced the development of their embedded product to the lowest bidder. The original firm had delivered code that met most requirements, but were not willing/able to resolve issues with random crashes or implement additional features. The product manager reached out to my employer at the time for help.

When I took over the code base, my initial attempts to modify the function of the code rendered the device completely non functional, so I focused on restructuring the code without changing its behavior. I moved code into functions, functions into libraries. I added parameters to existing functions so that global variables would have to be injected, rather than accessed directly (this helped make it clear what the inputs and outputs to the function were.)

Eventually, I modeled state external to the system with state machines so that it would be clear when code was trying to manipulate a resource that hadn't been initialized yet. (This helped make some bugs stand out like a sore thumb.)

Through incremental changes and testing at every step, this refactoring made the structure and flow of the program much easier to understand. After only 2 weeks of refactoring, I was able to identify and fix the bugs that had been causing the random crashes. I was also able to add new functionality to the well structured program in a fraction of the time it had taken to do the initial refactoring.

The best part about restructuring/refactoring code is that even after totally reorganizing the entire codebase, I still only had a high-level understanding of how it all worked; I didn't have to personally grok every requirement or fine detail as I would have needed to do if I'd rewritten the code from the ground up. Refactoring was slow going at first, but it really saved the day.

Well, if the code base is awful, it won’t have test to begin with... what do you test when you only have the bad code, no doc, no specs...

Yes. It was a vital subsystem in a Smalltalk program. While most of the project had passable Object Oriented organization, this one subsystem had zero instance methods and zero instance variables. Instead, it was copy-pasta after slightly modified copy-pasta of these long methods that called each other recursively. Each one of these methods used a "merging" style algorithm that incremented 4 indexes into arrays, all the while executing deeply nested conditional logic.

There were some smart cookies on the team, but we were all in fear of this code. The person who wrote it spent her days sitting in the cafe downstairs, reading novels. She'd check some logs, and occasionally come and yell at us for doing something wrong, which she would never explain. It turns out that the system had objects, but they were all embodied by consecutive spans of entries in those arrays. We went on a trek through the bowels of this corporation, asking around for documentation of the 3rd party software that used those sequences in those arrays, but no one had it, and that company no longer existed. If you tried to explain to her that Object Oriented code would have instance methods, she'd always bring up her PhD in math.

A coworker of mine spent a week charting out one of those methods, and managed to rewrite it in 1/6th the line count, with no errors. However, that didn't really help, as it still implemented the same weird merging algorithm.

Cool. I don't think I've ever heard a war story about a SmallTalk codebase before -- I have always gotten the impression that SmallTalk was a research/toy language similar to Haskell that is held in high regard but rarely used in production.

At one point, 80% of Fortune 500 companies were using it. No capital T. (This is how we used to identify the pointy hairs.)

Yes, most of it. Most (all?) code in a pressurised business environment eventually ends up in a bit of a bad state because technical perfection and maintainability are rarely what the devs are going for. They're just trying to "get it working, and now". How I've seen devs deal with it successfully:

1) Complain early and loudly about past mistakes from other devs, so that management know delays are not your fault. Once you've made enough noise, attempt step 2.

2) Reduce the scope of changes significantly. Management want X, but you explain only 10% of X can be done in the available time given the current code base. If they accept, great - you're touching less existing code, but may have to dig around a bit. If you can convince management that the reduced scope offers little business value, try step 3.

3) Push for a rewrite. Get budget, get resources and eventually deliver. If you're a good dev/dev manager, you may even get to be the hero that delivered something that works amazingly (aside: a lot of devs take on too much in this phase and often burn out). If you do deliver, you'll be worshipped as the authority on the system for months/years to come! Happy days!

Eventually, however, even your beautiful rewrite will decay into festering spaghetti, as random requirements get incorporated. You may even, deliberately, introduce complexity into your code to justify/protect your own job. At some point, the very thought of diving into your own code may fill you with dread, and you'll start searching job boards.

The cycle then repeats.

"After me cometh a Builder. Tell him, I too have known."


In a nutshell: people who make stuff are critical of other people who make similar stuff :P

People who repair stuff (or people) are even more critical of the repair work done by their peers.

I've never particularly cared or been much knowledgeable about poetry, but thanks for introducing me to this beautiful one. :)

That was my mistake when I first started. The manager didn't think it would take too long to finish a task. Well, the naive me, without even looking closely at the code, didn't think it would either.

But I realized the more I understood, the more dependencies needed to be changed. While it's still doable, things started to break and I ended up learning way more about the project than what was required to finish the task. In the end, it took me longer to finish it.

The fact that most of my coworkers would write a hacky solution in a short time, it made me feel incompetent. Managers without a technical background don't look at code, they look at results. So how do you even do #1 when the "other devs" still work there and can get stuff done much quicker?

The sad truth is that in order to bear the cost of a long time horizon in the pursuit of elegance you need one of two things: extreme trust or total control. Even then the pursuit may lead you astray and ultimately result in failure. Until there are ways to transfer experiences from one mind to another in an efficient manner, few manager/higher ups will sympathize with a desire to take on more long range, highly technical code improvements (assuming we're not talking about a FAANG).

That is essentially what I was hired out of school to do. I walked into a massive heap of ASP.NET (with VB) and MSSQL stored procedures that didn't really work at all, and had been through the wringer of a few cut-rate outsourcing groups. I struggled along with it for a few months figuring out how it was supposed to work and trying to duck-tape it together, doing a lot of support with customers that were trying to use it, and talking to them about what they were trying to do.

Then eventually I decided I wanted to learn some newer tech, so I started playing with Linq-to-SQL and ASP.NET MVC and Razor and Bootstrap, and over the course of three or four weekends and evenings I did a ground-up rewrite of the whole thing for fun. After a a bit more time flailing away with the old mess, I showed my side-version of it off to my boss, and it wasn't that hard a sell, being much prettier and less buggy.

It helped that there was nobody around who was invested in the old code base.

Generally, I've found it is a lot easier to effect this kind of change if you just do it stealthily and present it as a fait-accompli, because otherwise people get so bogged down in debate and fear that any impetus to actually take a risk and do something evaporates.

The problem with going this route is you have to sacrifice personal time for it. Now in your case you taught yourself something new, but that's not always going to work.

I do agree with your assessment that it is much easier to drop a working thing in people's laps and get buy-in than ask for permission. This is also what causes pork barrelling in projects.

I'm doing something similar for a project I inherited from an outsourced company. Did you end up just giving it to your employer? Did you ask for compensation for all the non-work time you spent building it?

I was 23 and had a lot of time on my hands, and I was learning stuff and getting away from having to be on endless support calls with that gawdawful mess that I replaced. It was a pretty good trade, all things considered - I think I got a 15k raise that year, and built a lot of trust that I could do good work independently without a lot of oversight. On balance, I think I've been more than compensated for that time since then.

If you can rewrite it in 3 or 4 weekends, it's really not that complicated.

Either that or significant chunks of it were retained and your improvements were mostly cosmetic.

It really wasn't very complicated... basically just some queries to search a database and a front-end to display results.

I still don't know how the preceding mess got to the point of being such a baroque cobbled-together monstrosity, but it was enough copy-pasta to keep an Olive Garden supplied for a year, and a raft of incomprehensible stored procedures that did SELECT * FROM table WHERE xyz in bizarrely complicated ways involving multiple casts and nested temp results.

Or, they are actually good at their craft and rewrote it. That's not an unreasonable timeline. Taking code that does something and writing another version in another design pattern is much, much faster than starting from scratch. Even with complicated code (especially with complicated code)

I've worked on similar projects that could have been re-written relatively quickly but were only complicated because the developers made it complicated. Some devs just want to solve hard problems in "elegant" ways and will throw in elaborate inheritance chains, meta-programming, code generation, service oriented architectures, layers upon layers, frameworks, rules engines and anything else that's not boring business logic.

IME these projects are usually more complicated than all but the most wild of business logic.

The book Working Effectively With Legacy Code has a chapter titled "I don't understand the code well enough to change it" and another "My application has no structure". They both provide some techniques to get the understanding. Personally, I think if you don't understand a codebase, and you need to, you should start with making an attempt to understand it before you do anything else (reverse engineering, etc. -- how could you convince higher-ups to get rid of it, or rewrite it, if no one understands what it's doing in the first place? What exactly are you getting rid of?). Sometimes deleting things and seeing who complains / what tests fail can help, sure. Anyway, the book doesn't offer anything too mindblowing, but I've found it helpful. Make diagrams of the system (they don't need to be formal, start with just writing down each important-looking thing you find, and noticing important-looking relationships), print out code and mark it up, deleting any code you think is dead code, do some scratch refactoring (extracting methods, moving things around, generally making tiny bits of code clearer in the hopes that eventually the larger program will become clearer too) that you don't actually need to worry about checking in, and there are a few methods of explaining the system you can use to verify that you're actually beginning to understand it and where you need to focus more efforts (telling the "story" of the system, describing things with a type of naked CRC technique)...

I was hired for this exact reason at my last company and tasked with rewriting it while maintaining bit-for-bit identical output.

The company itself was in biotech (cancer diagnostics) and was relatively new, spun off from a rather well known research lab. They quickly realized that their system was incapable of scaling (or being maintained properly...) to the needs of a business.

The code itself was written primarily by one man. He was quite bright, but not really a software engineer by trade. Usual stuff; no source control, mish-mash of technologies, spaghetti code, and home grown algorithms and hardware to solve well understood problems with available solutions. New management wanted me to rewrite everything in C# (against my protests. Not because I dislike C#, only because the code dealt primarily with image analysis and hardware control/robotics.)

I began by doing exactly as you proposed; I reverse engineered every bit of code. I took extensive notes (it was hard to follow) and walked through each step from sample prep to image acquisition to analysis to result. I started writing each sub-system only after I understood how everything pieced together.

The real bitch of it was that the original developer relied on automating ImageJ for nearly all of the image analysis. As my original requirement was to not alter the results in any way, I literally rewrote large swaths of ImageJ in C#. Bugs and all.

Well, turns out ImageJ (Java) is compiled with /strictfp. C#/.NET does not support this, so my floating point results were oh so close, but not identical. This was initially a problem for management... until the CEO was replaced, along with my boss, and the new team thought the entire project was a dumb waste of money and had me build a new system from the ground up.

That system was released (successfully) early this year. I began work on it nearly five years ago now, with many detours along the way. I now work elsewhere.

>maintaining bit-for-bit identical output

The worst projects are those when the company doesn't really want to upgrade, so the only requirement is "make it exactly like the old system"

In my first job out of college, I upgraded an approval workflow engine from VB3 to C#. It was written by someone who had never heard of state machines, so it had a weird ad-hoc design that would e.g. get confused if two documents were in the same state at the same time.

I demonstrated the bugs and suggested an alternative approach that would be simple and robust, but management wouldn't have it. They reminded me that the job was to make it exactly like the old system.

Yeah, it stinks, and it's rarely the right approach. In this case they wanted to avoid a complex re-validation, but it was short sighted. The assay was not market ready yet anyway, we had no difficult to obtain clearances, and the original design was lacking in many ways. It was essentially image analysis techniques take straught out of the 70's. The new system is much better.

If those calculations are business rules, then you absolutely do need to make it output the same as the old system in those ways because its the foundation upon which a bunch of assumptions are made.

They were primarily intermediate values used in the overall process of finding a certain type of cell. The algorithm needed improving anyway in order to be production ready, so in this case identical output was not necessary.

I think research world has a lot of this kind of code. These projects are written by 1 or 2 and it is extremely hard to understand what really is under the hood.

It does, can confirm. I've rewritten more than a few MATLAB algorithms which, while novel, were of... questionable... quality.

>>> That system was released (successfully) early this year. I began work on it nearly five years ago now, with many detours along the way. I now work elsewhere.

At what part of the project did you leave? I can't imagine anyone staying in the same company doing the same rewrite for 5 years.

I stayed until I finished it. It wasn't a rwrite anymore, only the application stayed the same, and even then we added a ton of functionality (e.g. morphological analysis as well as identification.) It's something I wanted to complete as I literally built huge parts of it by myself and wanted it to succeed. I took what I learned at the previous company and had a chance to build something similar from the ground up under my own (technical) direction. It was a bit of a pride thing.

Honestly, having worked in medical devices for more than a decade, a five year dev cycle isn't crazy. All said and done it was about three years of dev due to the aforementioned detours (shiny object chasing by management.)

In the beginning there were two other devs on the team, but they were web guys and primarily concerned with CRUD stuff. They were laid off and ~1 year later we hired three more, but again, primarily concerned with the web side and third party integrations. I was all hardware, image processing, image viewing, and image analysis, only helping them when needed. These images are multiple gigapixel (~20GB uncompressed), so just managing and viewing them is a lot of work.

Can you recommend any good biotech companies in image/optics? I just got my degree in BioEng. in bio-optics and have been pretty unsuccessful in getting any traction on the job market.

I was involved in the digital pathology sector. Leica (formerly Aperio) has an office in Vista and I hear it's still a good place to work after the acquisition. You may also try Epic Sciences, indica labs, or BioImagine. There are far more, but you could also try going to local conferences. I used to attend pathology Visions each year. Lots of industry representation and new tech.

You probably know this already, bit look for assay dev labs with existing software groups. Best of both worlds. Otherwise you run the risk of developing software in a culture which inderstands nothing of software dev. Not fun.

> therwise you run the risk of developing software in a culture which inderstands nothing of software dev. Not fun.

Preaching to the choir. My thesis was a lot of that, haha.

Thanks a TON for the recommendations!

Yes... but the company did not use it for long.

Large insurance business paid a previous contractor to write up a simple web app to consolidate public rates.

So this business could go and answer questions like "how much is my competition charging for XYZ?"

This "simple web app" turned into the previous contractor writing his own insane web framework from scratch in Python, because I guess Django or something was not good enough...?

Anyway the result was something that was almost impossible to read, had who knows how many security vulnerabilities, and was an awful experience for the insurance company.

Lots of times implementing a new feature meant changing the web framework so you could actually implement it.

Company already spent who knows how much on the previous contractor, so after a small cost working with me to evaluate what else was needed to complete the project they decided to go another direction.

I planned out how we could migrate the app slowly over to Django (it was a SPA as well, but he did at least use React there not something crazy he wrote himself) but they didn't have the budget.

Unfortunate. Could have been a really cool tool for not a lot of money, and I would love for businesses to be more eager about developing such products. The concept was the perfect example of a business-specific use of software to give the corporation an edge.

This sounds like a great opportunity to just go build the thing they needed then offer it back to them as a service or for a licensing fee. Then you have the option to offer it to other companies as well.

This is usually quite risky and difficult.

You might not have access to the data sources, you might not have the context to interpret them in a meaningful, and then the company you are targeting is probably your only potential customer (or you have to cold sell to every competitor, so now you have a sales job, right?)

Also, after a project fails the business is always close to 0% likely to work with you on it, even if it isn't your fault (I was brought in as a clean-up man and my billing was insignificant).

I beat it like it stole something from my mother. I write comments read it over and over and make changes where possible. I stay glued to it like it is my new found bible and become a guru of the code base through blood, sweat, and tears. It is the only way to handle new code not to hate it, not to blame others, and not think you could have done better. You befriend it accept it as it is and move forward with the best you got.

This is the attitude man. I love huge legacy codebases, dive in and get dirty.

This is the job, just shut up and get on with it.

Yes, on several occasions.

Don't do rewrites. Morph the code towards "North." Old code has a reason for existing and being correct: It's there. From an evolutionary and survival of the fittest model, things that are out there already have a lot going for them. The ugly warts are battle scars of nasty edge cases and bugs.

This is one of the main reasons I strongly value the skill of reading code in engineers. It's rare and so important.

Good code is easy to read and change, and thus it is changed until it becomes bad code.

This is like the second law of thermodynamics applied to code.

> Old code has a reason for existing

There's also an assumption that the reason for it existing is still valid and necessary. MANY TIMES that's not the case. Without reviewing if things are still necessary, you're needlessly supporting stuff that is a net negative (and often leaves a wider security attack surface).

I recently looked at some old code I wrote in 2005, and modified slightly until about 2012. There's all sorts of mess in there, to support IE5 (not even IE6 which was the bane of anyone working in the corporate environment for far too long), flash based video, etc.

None of that is relevant any more.

> Old code has a reason for existing and being correct

This is a big issue I run in to. I've spent the last ... 10 years working as an independent consultant, and am often brought in to projects which are... a mess. Almost by definition - if they were good and you could hire cheap people to make stuff 'work' you wouldn't be calling me in.

This assumption that "it's correct" is probably the biggest sticking point I hit, in various incarnations. Was just talking to a colleague this morning about rewrites vs refactoring. Whenever I go for a rewrite, there's issues, but the refactoring is almost always way way way way way underestimated.

Recently, came in to a project 3 months ago that is 5 years old, built with 2 different web frameworks merged together (as in 2 different set of session handler clobbering each other at seemingly random times), a mix of raw JS, jquery, angular and vue, 0 unit tests, 0 docs, things breaking, and the client/owners generally insist that things used to work like XYZ". No, they didn't. In fact, I can point out that XYZ never worked. You think it did, but it didn't.

There's this assumption in "never rebuild" that "don't do it... you're no smarter than the people who came before you, you'll make the same mistakes, or worse, those 'ugly parts' are there for a reason, etc". And yet... you' ALWAYS hear people say "don't ever roll your own crypto!". OK... well... at some point you have a conflict here, and you can argue for just 'refactor' away the crypto, but you may come to realize that every major tenant of secure/modern practices has been seriously violated.

I'm not only fighting code, I'm wrestling with a client who think things were working just fine, or "hey, we might need a little tweak on ABC here". No... digging in everywhere, I've found "oh, that report about 'this number changed' last month? - the numbers are calculated wrong, and have been for 4 years, but this is the first person to report it". Having one numeric field be treated as both 'immutable' by some section of the code, and 'live/updateable' by others...

If you have a full team of people who are dedicated who live/breath project X, and you're all on the same page, working fulltime towards a singular resolution... yes. Refactor. Agree on refactoring, schedules, priorities, etc.

> I strongly value the skill of reading code in engineers

If what you're reading ends up being the equivalent of "see dick run" level of code that's in charge of PII, and you're seeing spelling and grammar mistakes in what you're reading, there comes a point when you say "fixing this is not worth the time involved".

If this was a physical building or structure, and you determined that the foundations were so fundamentally bad and were constantly failing, or found that the materials used were misrepresented, you'd probably face some legal issues if you covered that up and kept building anyway.

If you're coming in to something that is clear the original team was in way over their head, you're facing multiple security issues, logic/data issues, have no tests, sample data, repeatable builds, and the project will take weeks to get to a rebuildable state, and conflicting/contradictory input from stakeholders... a fresh rebuild will likely make more sense. Take segments from previous code if you can extract them, but only when it makes sense.

What I'm wrestling with now is currently owning 4 years of technical debt on my own. "Person A before you used to get this done in just a few days... Person B always worked much faster". Well... yes, because they didn't test, didn't ever fix anything correctly, and I'm not plugging 15 holes at the same time, and every change uncovers 4 more critical data integrity issues. I could also work fast if I was ignorant about what I was doing, or just lied about what the impact would be. And hey - let's not have any tests to run at all, so there's never a quick way to validate if I'm telling the truth when I say "this won't impact anything".

> "Person A before you used to get this done in just a few days... Person B always worked much faster". Well... yes, because they didn't test, didn't ever fix anything correctly, and I'm not plugging 15 holes at the same time, and every change uncovers 4 more critical data integrity issues. I could also work fast if I was ignorant about what I was doing, or just lied about what the impact would be. And hey - let's not have any tests to run at all, so there's never a quick way to validate if I'm telling the truth when I say "this won't impact anything".

Sounds pretty typical to me.

It's not atypical, but it's always a necessary ramp-up time to develop some level of trust and understanding between the parties.

I once inherited a compiled executable.

And source code that was demonstrably older than the binary.

It was written by a contractor, who was blackmailing us for the up-to-date source code.

In an FDA-regulated industry.

How did we deal with it?

We pretending nothing was wrong, and prayed that no show-stoppers would happen. We begged for permission to rewrite, but it was deemed to be too expensive.

And I left that company as soon as I could.

Maybe a little more clarification? When you say that nobody on the team can understand it, are you primarily talking about code complexity, code style, or language?

Many years ago, I became the designated maintainer for a legacy inventory system that was used for internal audit purposes. The system ran on a PDP-11/70, and was written in a combination of Basic-Plus (this was a long time ago) and COBOL. I only had a passing familiarity with either of those languages. I find the challenges of learning a code base usually boil down to a few recurring problem areas:

* Problem domain knowledge: The system was used to track leased telecommunication facilities, and had a lot of obscure business logic built into it. At least half the challenge was reverse engineering business logic from the code.

* Coding style: I always find it challenging to get comfortable with another developer's style. Mismatches in assumptions/preferred approaches can make it really hard to get comfortable with someone else's code. This particular system had some truly weird programming choices, including a "screen driver" (similar to curses) written in COBOL.

* Code complexity: I was pretty lucky that the code was not very complicated. With all the other challenges, if the code had been complex it would have probably been an impossible challenge.

* Language knowledge: the original developers had used some features of Basic-Plus and COBOL that were a little obscure, which made understanding the code base that much harder.

Inherited a codebase that ran nearly all revenue-critical operations, and operated at its core on some of the most metaclassy/dynamic tools Python has at its disposal.

Luckily I work in a tech company where the fact that nobody knew this code and it took years to effectively ramp up on was argument enough that it should go away.

Actual removal was a much messier story. It had tangled deeply into adjacent systems so you couldn't "just replace it". We are in the later phases of something like the Strangler Pattern (https://docs.microsoft.com/en-us/azure/architecture/patterns...) where we built higher-level interfaces over the top and gradually re-implemented the underlying functionality without using any of these custom frameworks.

That said, it's a long term project that is easy to lose steam on. It's been very important to regularly revisit our goals and how we're attacking them...AWS has released services that fundamentally changed our approach (for the better) in the years since this effort started and we've probably cut off at least a year from the overall effort by adopting those instead of continuing on the original course.

I wrote up some of these ideas about accomplishing big projects that span years at https://medium.com/@scott_triglia/ask-a-tech-lead-i-have-to-.... The parts about regularly re-evaluating the next steps in your course of action were directly inspired by this project I just described.

A coworker really wanted to be a technical lead on a firmware project so our boss gave him one. Part way through our boss asked me to help the coworker out but I had a tough time understanding his code. Soon enough he admits that his is leaving the company. He created too much of a mess and wanted to bail out. Suddenly it was all my responsibility. And this was with an major customer we had multiple partnerships with. So somehow I had to salvage everything without making them aware the mess we were in. So over the next 3 months we would give them weekly engineering builds while totally rewriting the code piece by piece. Once we were back on track it was much smoother. It helped that our management didn't micromanage me and the customer engineers were brilliant and a breeze to work with. It was all about the code, requirements and doing the right things. Our progress meetings were literally 15 minutes week. Everything else was technical discussions and development.

I once joined a small company in which the CEO's son, a "7th year CS PHD" (He had to retake some classes), wrote the entirety of an iOS app using obfuscated C++ templates/macros (!)

During development a number of requirements changed and they had already burned through multiple other devs before hiring me. Eventually Xcode updated and it was required to use the new version to deploy against the latest version of iOS, this version of Xcode was not able to process his pile of macros in the same way as the old version. The QA team had already updated all of their test devices leaving us with no way to test the existing code.

This, combined with the CEO's son's unwillingness to sit down and walk anyone through the code, led to me sitting in a face to face with the CEO alone in which I explained all of this to the best of my abilities in layman's terms. He asked me for a solution and I said have your son fix it as he is unwilling/unable to walk anyone through it and being the 5th dev they had hired trying to figure it out, I put in my resignation. It was a fun 3 weeks.

I later found out through a friend who did media work for the company they were selling this product to, that they were never able to deliver and ended up getting sued for breach of contract.

Often enough to consider Software Archaeologist as a role.

- Big picture:

try to identify the integration points with other systems or entry/exit points into the code.

See if the code is logically (and hopefully actually) divided into separate smaller parts. If it is, try to work out the main purpose of each part, its integration points, and if there are any obvious side-effects.

- Detail:

Is it building, clean-building, testing etc? That will make it a lot safer to explore and experiment.

For a single source file people have different approaches to "reading" it. Some people add notes as comments as they go through the file, they don't have to be permanent well formatted "Comments", rather just things to reduce the memory strain. Other people remove blank lines, comments and extra whitespace to try to compact as much actual code into a single screen to look at the code paths.

- Repository:

Is the code checked into a source control system with a log history? If so look in there for clues as to WHY things were changed, this gives a good indicator of changes to requirements and also can explain why some parts of code may "feel" different to others (they may have had to shoe-horn in a new change to an existing codebase).

- Pragmatic:

The previous people (just like you) probably never had a chance to refactor or clean up any tech debt.

No, but I have been the only one on the team willing and able to diagnose and fix bugs in that legacy codebase that everyone else was afraid to touch.

It probably still mostly works for what we want to use it for, so a rewrite is simply out of bounds. You just plant your face in the dirt, and start plowing ahead. You learn enough about it to get done what is needed, and get out of it as fast as you can.

Strange code isn't all that bad if you're the only one in it. And since you didn't write it in the first place, you can always blame anything that goes wrong on it being awful and brittle. And you can even get that module slander done preemptively, so that when you finally get something working, you're the conquering hero, returning home from battle with the monster. And if you break it beyond repair, you finally get to rewrite it. You can't really lose, except for the torture you undergo while you are actually wrestling with it.

Aside from terminal breakage, if it wasn't worth rewriting any year in the last 20 years, this is probably not the year, either. But sometimes you do the reverse engineering, and find that you can replace the whole crufty thing with 3 lines and a library function call somewhere in your regular code base, and now the execution step that used to take 3 hours takes 100 ms. That feels pretty good, in the moment. Less so when management just gives you a little pat on the head and says, "Well done. Run along, now."

The first and most important question is, why does nobody on the team understand it? One possibility, which should not be overlooked, is that it was not quite important enough to spend money on maintaining a team of people who understand it. Just as the Big Rewrite is often not as good an option as it seems, the Big Refactor is often not a good option either, because the software may in fact not be valuable enough to justify the many hours it would take to do that.

So, first off, try to make a realistic estimate for your higher-ups of how many hours it will take to refactor this, and phrase it as "at least...[x]...and perhaps much more". It is quite possible that you will get back the answer, "it's not worth that". Then, you are in the uncomfortable position of being the Bearer of Bad News.

Depending on the ability of your upper management to accept bad news, you then either: 1) gently and politely insist to them that the situation really is this bad, or 2) start looking for an exit

But, before setting yourself up as the person who brings bad news, get a gut check from some teammates as to whether your estimates of how much would be required, are more or less on target. It takes a while for organizations to accept bad news, and you may need to let people who say "it won't be that bad" win the argument for a while, and then circle back to it in a month or so.

I worked for a place where if the decision was between spending X hours now, or spending an unknown number of hours that would be likely to be 5-10x hours at some indeterminate point in the future, they would _always_ kick the can down the road. I can't figure that mentality out, when I would regularly predict trainwrecks and then say, "I told you so." when I turned out to be right, and yet, no one would listen to me. "Here's how you can avoid this being a problem in the future." was another thing no one cared to hear. The only way to fight technical debt was to do it in your spare time, in secret, and then announce it when it was done... and I'd talked to others who had the same attitude (including a manager).

I can't figure out how a company with that kind of culture could stay in business, but it did.

I think the only thing to do in that kind of situation is your number 2 option.

I have sometimes seen this, although thankfully not always. I think the best explanation of this mindset is, that everyone else is secretly thinking of the number 2 option as well, even in upper management. It's more common than you think.

Many times. Usually it wasn't a whole system but a component or library licensed from someone else, where "someone else" was unresponsive or out of business. In one case, the guy who'd written it used to go on month-long wilderness expeditions during which he was completely unreachable.

The solution almost always involved a certain amount of reverse engineering, and damn sure I always tried to advocate getting away from software we couldn't maintain effectively. The third leg of that tripod is isolation. Reduce the number of things that depend on the code, and the number of ways that they use it. If nothing else, that will reduce your exposure. It will also tell you what code paths are important to understand and which are not. Finally, it can help guide implementation of tests, or of a replacement.

If you're really stuck with such a piece of code, some of the advice in a blog post I wrote about a similar challenge might apply.


That's about learning a codebase that's notable mainly for being large, even if the original developers are as available and helpful as could be, but looking at it now I see quite a bit that applies to this case as well.

"but understand the business that the code was being used in/by."

From my experience(currently on a 10MM line of code, 15 year old codebase) in software the code is the business so it's important to understand it.

First trap: "Oh, I'll just document everything."

I could say just do it, but do it for yourself. I could say that it won't be maintained over time and it'll rot and may do more harm than good when someone goes to reference it and thinks it's still an up-to-date understanding of the system.

But I'd rather just tell you to not waste your time doing it in the first place and focus on not falling into the second trap.

Second trap: Rewrite.

I could say it's easier to write new code than it is to understand it. I could tell you to be ambitious and stay up nights and weekends rewriting some view layer logic bullshit.

But I'd rather tell you that you're not as smart as you think, your solution may be more complex than the currently impenetrably complex behemoth before you, and that you should instead focus on not falling for trap 3.

Third trap: Replacement.

I could tell you there's cheaper off-the-shelf solutions available that solve the same problem; it's simply a matter of spending money and reading documentation.

But I'd rather encourage you to really, truly embrace the final realization.

Final realization: You are dumb and will never, ever understand all the complexities of this system as there are too many interdependent moving parts and other similarly complex subsystems.

And that's okay!

This is a system built over many years that consists of hundreds of thousands of tiny decisions made by hundreds of different "very smart" individuals just like you solving complex business problems giving the climate of the time it was built in.

So approach it like you would approach a beast in the wild, with caution and grace.

Heroes are at the morgue, and just make sure the thing doesn't go down.

Yes, it was a mega legacy codebase written by a single person over the span of a decade and was extremely “job secured”. It had Perl scripts that would system call to php scripts that would in turn do a curl request to another http perl script that would system call another php script that would output HTML, which then would get parsed by the calling scripts several ways. That was just one place. There were lots and lots of these problems.

Our team was handed the project to rewrite it. It was a secret project and we were very careful. We didn’t want to spook the original developer.

It was a lot of hair pulling and tracking of the code. Lots of gruelling work.

Oh I inherited a similar Perl/php/shell script mess a few years ago. A fun twist was that the codebase started off as a commercial groovy/grails application, and had this Frankenstein thing surgically attached, reading and writing to the same database tables.

Still have nightmares about it now.

Would love to know what happened once you were done, especially was the original developer still around and did they have a funny panic mode upon realizing the job security rug had been pulled out from under them?

Eventually the dev got a whiff of it. He was let go abc offered, I’m guessing, a lofty consulting position for a year. This was the exact scenario they were trying to avoid, but at least we pushed it back to as far as we could.

The project eventually went from horrible to interesting once all of the legacy was dealt with.

There was a huge and really quite awesome discussion of bad codebases two weeks ago:


(But the current question is different enough that it has seeded a different kind of thread.)

I must have missed this in my weekly HN newsletter. Thanks for the link Dan!

This has happened many times to me. I work at an agency that very often gets clients who already have a codebase but don't have anyone to maintain them - the original developers have moved on or fired them as a client. Sometimes the codebases aren't complete, and very often have major bugs and issues. Very often they were written by an individual many years ago, they do not follow any modern best practices, and more often than not have no documentation. Sometimes even variable names are in languages that no one on the team speaks (I've taken on quite a few codebases where variables were anglicised versions of russian or chinese words).

Most often, these clients initially just want us to maintain the codebase, making minor changes and updates. In this case, we will simply familiarise ourselves with the code, working within its limitations to do the required work. Over time we might refactor parts of the code as we do this maintenance work because it makes maintenance easier for us.

Eventually it gets to the point where the client wants major changes (And sometimes it starts here). If we are comfortable with making these changes within the codebase, typically if we have been maintaining it for some time, we will refactor what makes sense to achieve the changes, and continue to work within it. If we don't have that familiarity with the codebase and the project is at a scale where it makes sense, we will rewrite the code at this point.

In very few cases, the project is too large and we are too unfamiliar with the code, and we have to tell the client we're not able to do the work within their time/budget requirements. At this point the client will either leave us, or have us continue maintenance. About half the clients that leave shop around, try out another agency or two, and come back in the end with a greater understanding of the scope of the project.

This is pretty much my current job; leading a team of engineers on projects involving legacy codebases where the original authors are long gone. The first thing I always do is treat working software with respect. It's easy to be a HN-commenter-pedant and assume it's all garbage, but context is everything; 9 times out of 10 the code is responsible for a good chunk of our salaries. The second thing I try to do is lower expectations of clients. Often times, non-technical people will be overjoyed to hear that somebody is working on this black box they've been fighting for years, and it's up to you to keep their expectations in check. The third is to fix broken tests, write new ones, and learn to use grep :)

Oh yes, many times! I used to joke, "Why do I always inherit stuff like this?" and my mentor would respond, "Because companies with good code bases don't hire very often; their people are all so happy."

How I have dealt with it:

1. Never complain. Never bad mouth any of my predecessors. Whatever they did wrong, I probably did somewhere else just as badly. We all have.

2. Never be bashful about what is wrong. Be objective. Be specific. Keep asking questions until there are no more answers.

3. Become the new expert. Don't depend on "higher-ups" for too much judgement. Convincing them should be about as hard as convincing a mother that her bleeding child needs a band-aid. If they need convinced, they are part of the problem.

4. Long conference room tables, paper, scotch tape, and multiple colored highlighters are your friend. Others thought I was crazy, but I have had to paper table and walls with technical debt to a) understand better and b) have everyone visualize what we're up against.

5. Don't be afraid to rewrite anything with a couple of caveats: a) Do it in less that a week. b) Do it only to understand it. Plan to throw away what you rewrite. c) If you can use what you wrote instead of what's already there, that's gravy.

6. Priority of what to learn: a) The data base. b) The code. I have even written utilities to scan and label data to understand what the hard disk looks like before ever venturing into the code mess.

7. Priority of things to refactor/rewrite: a) Rename variables EXACTLY what they are. This can be very difficult but will probably give you the biggest bang for the buck. This step often opens the flood gates for everything else. b) Remove duplicate code and modularize. c) Reduce long conditionals. 800 line case statements suck. d) Remove early exits. The idea is not to improve the performance of the code, it's to understand what it's doing. Multiple exits can be very confusing. e) Fix white space, maybe more, maybe less. The process of doing these things almost always provides better learning that just reading it. Sometimes I have had to rewrite something and then throw it away just to understand what we have.

8. By the time you reach this step, you probably know more about what we've got, what our problems are, and what the speed bumps will be in the future. You probably won't have to convince anyone to do anything except come to start seeing you as an excellent resource.

9. <sarcasm> Bitch about all of the above at home at night. You may ruin your marriage, but at least you'll still have a job. </sarcasm>

> Rename variables EXACTLY what they are.

I have this coworker who has this terrible habit of giving the most meaningless names imaginable. Local variables are often just named "tmp". Or maybe tmpNum if it is a number. We are writing a program to implement various tasks as background threads, and the classes have names like "Process01", "Process02", etc. At one place, he declared five or six constants with SQL queries, named sql1, sql2, ...

The worst thing, in a way, is that I managed to convince him to let me rename the SQL constants, but he insists that naming a local variable "tmp" is actually a Good Thing so you can immediately tell it is a local variable. I kid you not. (For Great Cthulhu's sake, I wish I was kidding!)

Naming things correctly and consistently is at least 50% of good coding, and at 90% of programmers fail at it.

Well, you know what they say: There are two hard problems in programming, cache invalidation, naming things, and off-by-one errors[1]. ;-)

But there still is a difference between trying to come up with good names and failing on the one hand, and naming a variable "tmp" to indicate that it is a local variable, because I totally cannot see that when looking at the code...

[1] I do have to admit that I totally love foreach-loops, when I was still coding in C, I ran into / caused my fair share of off-by-one errors, they were nightmares to figure out.

>9. <sarcasm> Bitch about all of the above at home at night. You may ruin your marriage, but at least you'll still have a job. </sarcasm>

It is good to keep in touch with a bunch of fellow nerds as a support group to vent out.

The problem is that "If they need [to be] convinced, they are part of the problem." is the biggest reason these kinds of projects happen in the first place.

I once inherited a large enterprise system I had to build a full working development environment for and then learn the code base and help bring a team up to speed.

The product was essentially made up of 5 code bases in a single repo that built and deployed 5 different executables that worked together. That itself wasn't so bad, it was mostly modern C# and I found it a decent code base to work on.

What was bad was that one of the modules that we were expected to support was written entirely in VB6. The last stable release of VB6 was in 1998. I couldn't just download something and install it and work with the VB6. It turns out that the accepted solution according to stackoverflow, and this is the conclusion I came to independently too, was to go on EBay and try to buy a copy of software/compiler/IDE/whatever and then to even get it to install and work you need to do all kinds of things like turning off certain keys in the registry before installation and enabling them again afterwards then doing a bunch of other modifications to make sure that it actually works when run in compatibly mode with Windows XP.

Our official line was we went from supporting 5/5 modules to 4/5 modules.

Fun times.

Ha, I just left a job that had multiple VB6 apps and services that we were actively maintaining and extending. Getting the development environment setup wasn't too bad. What drove me bat shit crazy, was the fact that the mouse scroll wheel isn't supported in the IDE.

I was part of the third team that left in as many years. Just another lesson of a single individual being the only decision maker across all departments.

It was kind of sad everyone came in trying their best to improve the software and environment, but they were quickly shot down by their boss. If you didn't copy paste his code you'd get a snarky email a few days later about not following the guidelines, that were never written down mind you.

My lesson for rewriting a legacy application, have the right management in place. If there is an old dev that does not want to change, the project is a waste of time and money, he'll have more sway over the executive team and often get his way.

On all accounts mentioned that sounds like the ultimate torture.

I legitimately can't work out if such an intolerable boss is more painful or the constant need to scroll the IDE and having to do it without the mousewheel.

I feel for you. I truly do.

Do they not make Virtual Machines where you come from? Or am I missing something?

Well no, that particular office didn't. The office I was a part of did. Suffice to say their DevOps-fu was slightly weak compared to the rest of the organization.

I was the one who actually set out to build the VM image.

Not a trivial task at all to get the VB6 side of it running. Which is the exact reason we are supposed to do things like making VMs for development environments.

It would be nice to be at all places in all points of time with perfect knowledge in order to stop bullshit from happening in the first place but I can only be here and now.

Even getting a VM running that can properly support VB6 is a non-trivial exercise, especially if you want to properly adhere to Microsoft's licenses.

I’ve always found that if you stare at something long enough you can eventually understand it, and break it down into its core components. If your business depends on it, don’t be so quick to just rewrite it and abandon because it likely will have a ton of hardened edge cases that are baked into it that the business relies upon.

You must study the code and document what you can. Reverse engineering it is sometimes the only option in your quest to understand it. Then you’ll need to make decisions about how to move forward given the constraints of the business whether it be time, resources, other priorities. There is no one size fits all answer but don’t be shy about vocalizing the risks and making sure the business knows the pain points. Sometimes you have to sell the problem you now have and get buy-in to really do something to fix the mess.

One thing I’ve realized is that when people don’t take the time to document and write clean code sometimes it boils down to their idea of job security.

Other times it turns out the problem is there is no single owner of this code base and it’s been hacked to death.

Lastly when something is ugly, enough and messy enough...then you can just call it proprietary technology...I kid I kid.

Yes, we have a few APIs written that no one on our team fully understands. All are about 3000 lines in a single file. It takes a day to make even the smallest changes (which we frequently have to do because of production incidents). All the functions take or modify maps! Its crazy. In particular we have one function that takes in 14 separate maps! And of course there are no useful comments at all.

I wrote about this a while back, calling it "Conway's Aftermath". My essay includes approximately zero solutions, but it does explain the nature of the problem.


Not so much on a team, but when working on a difficult and unfamiliar codebase I usually start by taking a chunk and reformatting it to suit my style as though I wrote it. Spacing, indentation, bracketing, ect.

Once I've got it to a point where I can read it with minimal cognitive load (I like condensed code with little to no white space and no orphan brackets) I'll make sure it still works and pick a spot or feature in the finished application and try to find its code. Work backwards until I've figured out what makes it tick.

In the process of doing that I usually see how a lot of other things tick and get a sense of how and where the rube goldberg machine starts.

The hardest part is understanding the rationale of the developer. Many don't impart such details in the comments. For that I'll try rewriting sections with less code than the original and see what the adverse effects are.

My introduction was in BASIC (sinclair zx80), I started my profesional life as COBOL coder, in the meantime I reversed ASM to "unlock games".

I can asure you that in every step allong the way, the code (-base) I created horrified me 5 years later.

My point being: I appriciate the advice "throw it away and do it proper" but I can asure you, given enough time, the next person will not understand your solution.

Being it old languages, being it old paradigms, being it olt skool tricks of the trade: code stales. The best advice on this subject I read here (a couple of times) is to try to understand the requirements and take it from there. If that is not to your liking, you proberbly are in the wrong line of work and should try to do only greenfield stuff.

Can't agree more here. I can't tell you how many times I've heard, "this code is spaghetti, let's start over". I love clean architectures as much as anyone, but good developers are not afraid to jump into foreign spaghetti code, make precision cuts with a scalpel, and sew it all back up validated by robust automated tests. Drop the desire for simplicity; instead learn, embrace, and manage the complexity.

I know this isn't what most would consider a "codebase" but at a college job doing mostly CNC programming I had to troubleshoot problems with startup and maintenance G Code for a 2.5 axis CNC machine that the owners didn't want to pay to have the manufacturer consult on (smart move). The kicker was it was entirely documented in Italian. It took a lot of meticulous documentation and patience. It was honestly a great learning experience in terms of both reading code and thinking through all of the outcomes of a change you made (considering a mistake could have damaged the machine).

Got handed a code base that had been slapped together by an offshore dev shop. It was a spaghetti combo of c, php, and Java. The configuration data was located inside a compiled c binary provided to us by the dev shop. If we wanted to make a config change we had to ask them to do it and send us a new binary. Kind of awkward since they had been fired due to slipped deadlines and horribly buggy code (insert shocked face here). We scrapped the whole thing eventually. I'm still blown away by how bad it was. Never seen anything like it before or since.

Our team inherited a tangled spaghetti mess of a client facing API. There were some additional requirements that needed to be added and we quickly discovered some serious security issues.

We considered a full rewrite, but this was too time consuming and did nothing to solve the immediate issues. It was also risky, we had a "working" production app.

We ended up writing an extremely thorough integration test suite. Making changes is still painful but we know for sure we aren't breaking anything. If we have the time/drive to rewrite the test suite could be re-utilized.

twice, same company, two very different projects.

the first one was a mumps project, with cache. it was written by a competent developer, but he was using it as a way to learn mumps to forward his career in the health care industry. the issues were more specific to mumps itself, while trying to maintain and add features (there are thousands of articles online about issues in mumps, if you want to fully understand the struggle). it was eventually rewritten, with tests, and supported by a small team.

the second was for the same company, but a very different developer. this developer despised version control, and considered foxpro the "one true language", even after Microsoft itself had abandoned it. the codebase was riddled with bugs, fixed in various versions deployed for various customers, so there were a ton of misc bugs and "features" strewn throughout 20-30 "codebases", but no comments, short variable names, and poor practices. from what I could tell, the developer had been drunk for most of the development and any changes, and thus the original was used as a template for features and discarded as quickly as a simple web app could be written and tested.

otherwise, the codebases that I have have inherited have been at least understandable, but sometimes best practices weren't used, or too "clever" of solutions were chosen instead of making larger needed codebase changes which meant much more difficult code to maintain, but nothing has come close to those two.

Lots of good stories. How to decide, rewrite or maintain?

I liken maintaining large, old, arcane code to the care and feeding of a dinosaur. Maybe a Brotosaurus. It's sitting there cropping the treetops happily, farting occasionally.

You want it to do something else, you have to poke it, prod it, yell at it and it slowly gets up and takes three steps. Then it sits down again.

You'll never get very far that way, and it'll never do much more than it does. If that's cool, ok. If not, then its time to consider another approach.

We had a perl "guru" who wrote a number of management scripts for servers. They were located in inconsistent locations and were completely inscrutable because obfuscated in a manner that I think he thought was clever. Seriously, I don't think his code was obfuscated because he was malicious, incompetent, or "making himself too indispensable to fire." I think he thought writing tight compact perl code that was unreadable by anyone else was an imperative to being a good perl coder. I think it was like his philosophy of coding or something. I mean his documentation that he wrote for how to do stuff around the network was fantastic insightful and robust. But his fucking code was just awful. I basically had to run every perl script we could find though the debugger to understand what it was doing and rewrite it from scratch so it could be maintained in the future. These days I'm an actual software developer and when I write code I think of this dudes perl code whenever I write code and I think it has made me a better developer because I strive for maintainability and readability over almost everything else. Mainly because I feel like reading his code and rewriting it was akin to psychological torture and I nearly quit my job several times in the process.

This is kind of a cop out IMO. There's got to be some entry point to gain visibility into the code, put a breakpoint there, run in debugger, single step/step over until you get the basic idea of how it flows.

I'm speaking from experience, I inherited a project based on TaxiAnytime, aka "Uber App Clone Source Code". What a complete clusterfuck, obviously written in as incomprehensible a style as possible to create attachment sales for customization.

In the PHP/Laravel code three widely used patterns stand out, the "single return statement pattern", which obviously creates the pyramid from hell. In every method. Add to that, the 700+ line methods. Everywhere. And the icing on the cake, I'm going to invent the term "WET" here to describe it. That means the opposite of DRY. Did I say three? WET has the knock-on effect of anti-encapsulation.

I set out to reverse those patterns where I needed to make changes. In about 2 months as a single man team, I had a handle on it and was extending the app. A team of 5 should be able to make short work of it if they can get over the NIH and YUK factors. And yes, beware the rewrite. Coding always seems like it is easy. Until you need a resilient functioning system.

Love the term "anti-encapsulation".

I'll open by saying I've only ever had bad experiences with complete re-writes and these experiences have impacted my aversion to them.

"[Working Effectively with Legacy Code]" by Michael Feathers really helped me get through a situation like this.

My recommendation is not to try to understand the code per se, but understand the business that the code was being used in/by.

From there, over time, just start writing really high level end-to-end tests to represent what the business expects the codebase to do (i.e. starting at the top of the [test pyramid]). This ends up acting as your safety net (your "test harness").

Then it's less a matter of trying to understand what the code does, and becomes a question of what the code should do. You can iterate level by level into the test pyramid, documenting the code with tests and refactoring/improving the code as you go.

It's a long process (I'm about 4.5 years into it and still going strong), but it allowed us to move fast while developing new features with a by-product of continually improving the code base as we went.

[test pyramid]: https://martinfowler.com/bliki/TestPyramid.html [Working Effectively with Legacy Code]: https://www.amazon.com/FEATHERS-WORK-EFFECT-LEG-CODE/dp/0131...

I love that book. I can't recommend it highly enough.

Approval Tests (http://approvaltests.com) can be a huge timesaver when you're getting that initial black box characterization put together.

Besides being an important part of getting your bearings, talking to everyone who relies on the software to get a better understanding of how they interact with it can be a great time saver, too. It's amazing how quickly you can clean up legacy code with the delete key, provided you can confirm nobody's using it anymore.

The wholesale rewrite is a will-o-the-wisp. Very, very attractive, yes. But usually when people chase after it, they end up drowning in a quagmire. That isn't to say that you shouldn't strive to get rid of all the bad code, but do it as a long-term, component-wise, in-place rewrite.

>> My recommendation is not to try to understand the code per se, but understand the business that the code was being used in/by.

I strongly agree with this. I've done at least 4 or 5 successful complete rewrites of old code bases, and I have found, rather than even 'business' the word for this might be 'context'.

If you can contextualize a piece of software, it's functionality and operations, you can have a much better understanding of an existing codebase.

What would you do if the codebase was actually 5 codebases absorbed from 5 different smaller companies? Assume that zero institutional knowledge about the code / business have been passed on.

You are now in the platform business.

I have to assume someone is using the software therefore there is some tribal knowledge of what it does? Otherwise this is maybe SAAS software that users use and some functionality is exposed that would allow you to begin decomposing backwards toward expected input/output. You're almost black-boxing at that point.

I will admit that I have, on very rare occasion, scream tested a piece of software running on a server that nobody would claim ownership or knowledge of either on the eng. team or within the org.

There's a surface level understanding of what it does but nobody really understands how many of the large features really work, or what the actual rules are that govern them. Yes, much of this is black box. Example: yesterday I had to try to figure out what branch of code was compiled and deployed to our server. Everyone had assumed it was the Master branch, but no...deploying that branch fubared everything. I finally found the "working" branch of code.

Part of the problem is that the people who owned tribal knowledge were all fired / quit without documenting anything. Every member of the existing team has been there around a year or less.

>> Assume that zero institutional knowledge about the code / business have been passed on.

Who is, in that case, using the software? They obviously understand the context by which the software is at least going to work, otherwise, why is the software being rewritten?

Who is requesting the rewrite? Do they know what it is supposed to do? Is there an executable build of it that exists somewhere?

These are ecommerce systems. It's astonishing because no-one in the company truly has a complete understanding of the business, as far as I can tell. The code is running in production and serving customers.

Rewrite is being pushed by certain parties because we're unable to meet feature requests quickly with the existing system, and it's being assumed that a rewrite will fix that problem. The team is barely functional though (from the top down). I've seen a few failed projects now and I don't think the rewrite will ever be accomplished. If we manage to rewrite, it's far from certain that we'll do a better job than the last guys did.

Late reply, and I'm sure you're smart enough to know this already, and are hopefully already planning it - but get the hell out of there, fast.

I...yeah. The picture wasn't completely clear until very recently and now the anxiety has kicked in. I'm trying to stick around a little because I've been through too many jobs in too short a time and I think I need to show some "commitment" on my resume.

> My recommendation is not to try to understand the code per se, but understand the business that the code was being used in/by.

You're absolutely right, but the problem comes when the code itself is the only authoritative documentation of what the code does, and in a lot of cases, the only authoritative documentation (or even the only documentation, period) of what the code is supposed to do!

I did, I wrote lots of tests and stepped through it with a debugger to get a handle on it.

Certainly. One could consider that any large team complex system that has functional requirements determined (and best understood) by external domain experts, is likely not understandable by software developers. This is exacerbated by our industry having no standard way to map those complex requirements into executable code.

The result is likely some big ball of mud, partially understood by the (now gone) original developers. One that is hacked at and refactored at the edges by those who inherit it.

Consider an actuarial risk calculation model, a payroll system, an air traffic control system, or perhaps something simpler like a model of a double-entry accounting general ledger. Could you wholistically understand the code base of a double-entry accounting solution?

Such large complex systems cannot normally be rewritten economically prima facie. Instead if possible, typically small parts are carved off, rewritten and delegated to, until the economics and inability to add new features absolutely force some attempt at a rewrite.

Absolutely. The most fun was a program that embedded Gecko to render web pages. There was nothing particularly wrong with the code, but it was quite complex. I handled this simply by being the specialist who did understand the code. Of course, the problem was fractal: the build system used Automake and Autoconf, and nobody understood it either. That was fun.

There's a big difference between 'not understanding' and 'understanding that this really is bad'. And there's a difference between something being 'bad' as in "not my way of doing something" and 'bad' as in "this is fundamentally insecure, flawed in these massively problematic ways, etc".

> How did you deal with it?

I quit six months in.

> How did you deal with it?

I lost most of my hair six months in.

I inherited a large database that "supported" an application, when in fact the application layer was built into the database. I took over from a begrudging dev who was involuntarily transferred from dev/dba to just dev. In reviewing the hundreds of ETL jobs, I found one that started off as SQL that invoked visual basic script in an external file share, which at some point invoked a small obfuscated machine code script located on yet a different external file share. Googling didn't tell me what the machine code actually did. The disgruntled dev said that if I wasn't smart enough to figure it out then I should quit. He had put this mess together with the idea that it would be job security. Finally, his boss forced him to admit that the machine code stripped a text field of spaces. The job was re-written in straight SQL and ran much faster.

To be fair, it’s really hard to write maintainable code without support from your org. If your infrastructure does not support automated tests or builds, then you’re in for a treat. If the codebase was written by junior engineers just winging it and then patched up later by more senior people, again it’s gonna be a wild ride. What if the org doesn’t or didn’t even have peer reviews? Even better! What if the code was never even documented or there was never a SDD? I could go on and on because I’ve seen all of it. It all depends on how much money, care, and resources was put into the project originally. Customers and management typically want the best bang for the buck and in the beginning don’t want to be bothered with things like security, high availability, failover, maintainability, or operational costs. They just want to get to the finish line ASAP.

No. I have inherited code that is very difficult to understand, though.

What typically happens is that it is mostly left alone. As the years go by, small changes are be made, only making the situation worse. Developers occasionally offer to management to take the time to refactor it, but management refuses to prioritize that work.

Eventually, new requirements are drafted that the existing code simply cannot meet without first refactoring it. At this point, the development team has the unfortunate responsibility to inform management that making the existing code meet the new requirements will actually take more time than rewriting the whole thing from scratch. The technical debt is now due.

The development team then hopes that the situation provided a learning experience for management regarding code maintenance.

Interesting that this is the conclusion you've come to, because I've heard from some colleagues that this is actually the correct approach for management to take. Their argument would be that either way, to accommodate the new functionality, the codebase would have to be refactored. And if it is refactored proactively, well that will probably be in a way not compatible with the future design, because you can't anticipate something you know nothing about.

So it's a matter of 1 vs 2 refactors, and the management chose one by delaying as long as possible.

A couple situations have arisen for me.

One, adopting necessary libraries that are lacking features, have way too many features, or questionable code quality. Examples have been an animated slideshow carousel, Python and PHP OAuth, and a WordPress theme. I thoroughly read the code for each of those, deleted the parts I didn’t need, and literally rearranged everything.

Then there are the times I go back to my code after a few years and have no idea what is going on. If possible, I just leave it alone if it’s working. Sometimes I need to use a new language or have developed a better coding style. In those cases I’ve rewritten it with a close eye to whatever edge cases I seem to be handling in the old code.

Numerous times. The worst was the time it when the only employee maintaining a little-used product was fired for sexual harassment. I transferred to the position not knowing anything about the product or the lack of developer knowledge. The support person gave me a walk-through of the functionality as did the QA person. I spent about four months figuring out the build system and the code. For the code, it was just a matter of reading, reading, reading until I came to understand it. Making matters worse was that a massive refactor of the had been started but not completed. It was very stressful.

We left the servers running unmolested and un-upgraded behind a firewall (thankfully they were super stable) until we could replace the 3 unmaintainable microservices that did something simple in the most complex manner imaginable with a 50-line module within our monolith.

(In this case, technically one person on the team - our most sophisticated developer - was able to decipher it. But every time he needed to touch it it would take him 2 full days just to understand the code again. Just wasn’t worth keeping when it was so pointless to have and so easy to replace.)

For a more complex codebase, the equation might have been different.

I inherited a project based on more than 10,000 lines of poorly commented, undocumented 8088 assembly code used on a shopping-cart attached radio+LCD panel product. It wasn't that bad an experience except the product took code updates over the air and the guy decided to use his own block checking code instead of a CRC or even a checksum on the 256 byte blocks.

The custom algorithm was a suboptimal choice as it was prone to passing badly corrupted blocks as correct. Worse, the first stage bootloader was in masked ROM so a true fix wasn't possible, only workarounds.

Earlier in my career. I've noticed the more years I've been developing software, the easier it is for me to understand other people's code.

Also, when I see a codebase I don't understand, if I'm paid to work with it, I spend my time learning the stuff. Debugging tools help: traditional general-purpose debuggers, graphics debuggers like pix/renderdoc, network tools like wireshark/fiddler, OS-based tools like strace/procmon. For different projects different ones are the most useful, sometimes even custom built tools are most useful.

I inherited a code base that other people thought they understood. People have very different tolerance levels for complexity and lack of control. People here complain about bad coders but the incentives out there are all aligned against quality: short stints and glorified short-term-thinking (a.k.a. Agile) and proprietary code. Imagine if you were bound to work on a codebase for most of your career, were given all the time you need to do your best and had all your code in the open, forever part of your reputation. Would you write better code?

Usually it's a big single file of spaghetti, which is the bigger issue... I tend to try to separate logic branches (if/else) into separate functions. In addition to this, often a big if, with no else that just returns, I'll reverse the logic.

In then end, just like eating an elephant... one bite at a time. Eventually you'll have everything broken enough into separate functions that you then understand the whole better and can cleanly rewrite the whole thing.

Yes. What’s worse is that it didn’t work as it was supposed to.after a lot of effort later we realized that the last remaining developer who had just left was manually fudging things in the database instead of patching the code.

It requires going back to first principles: digging into the docs on the systems it interacted with, interviewing users to see what they did and what they expected, in short a nightmare

Yes. We're allegedly "re-writing" it but the organization is too dysfunctional to make any progress. Another guy and I are keeping the company alive basically. I've spent many, many hours reverse-engineering and documenting things. Sometimes I'll spend weeks to find out what single line of code needs to be changed to fix a bug.

Thanks for the responses everyone! Definitely some entertaining stories here. Don't worry, I wasn't thinking of rewriting one, just wondering aloud how often this happens, and it sounds like a lot :) Really cool to hear from those of you who deal with this all the time. Thanks for the great reads.

I have not fond memories of the "alite" CPU simulator. It was widely used as a basis for scientific papers on CPU design back in the early/mid 90s. The source code of alite was absolutely incomprehensible, which did make me wonder about the validity of the published results.

tl;dr: you have probably not thought through all the ways this system is used. Make damn sure you have both rollback plans and a phased release in the works. Otherwise, all your bravery and effort will be at risk. My experience follows.

I took over a suite of flash video players/recorders for a company years ago. The variable names were exclusively 2 and 3 letter acronyms. No one on the team even knew Actionscript. Up until my hire, the various builds were all considered immutable since no one had any idea of where to start. Given their integral nature to the business, this effectively stagnated entire areas of product and business development.

What you put in our question was precisely my approach. I did my best to reverse engineer requirements by code review and sale people interviews. Once i had a confident list, i got approval and killed off what i can only assume were swaths of unused features no one knew about. Rewrote + rearchitected the entire thing from the ground up.

The release was worse than you would have expected. Turned out there were lots of these players linked outside of our website. These versions of the app were ones we had considered deprecated and were unknown to everyone in the company including the cofounding CTO. This required immediate rollbacks and hastened development to re-add features that were being used by a subset of our longest lasting clients.

While the releases themselves were rocky, the entire effort was an unquestionable success. The effort consolidated all the features into a single reusable player and opened up years of feature dev that enabled millions in new sales.

It's very sad, but understandable that you're getting down voted. It's understandable because most devs want to put their own stink on a project so rewriting code the "right way" is a way to do it. Problem is, they get 6, 8, 10 months in and figure out the same thing their predecessor did and leave. It's sad, because they don't realize that if a piece of code is out in the field and it's working, then you really shouldn't do a wholesale rewrite.

Exactly. Everyone thinks that the predecessor is an idiot and they can do it better. Every time you throw away existing code you lose business knowledge.

I agree with you in general, but I think a better rule would be to almost never rewrite code.

I know that everyone thinks their predecessor was an idiot. And I agree that full rewrites are usually a bad idea. But sometimes, the predecessor genuinely was an idiot, and the code really is that bad. If the use cases and inputs and desired outputs are well documented enough, a full rewrite can be the right choice.

It depends on the size of the system, too. A full rewrite of something as complex as Netscape is asking for trouble. If it's something that can be rewritten in a day, or even a few weeks, it might okay to go ahead and do it.

Of course, by the time you've got enough experience to accurately estimate how long a rewrite will take, you've probably got enough experience to just refactor the old code without going insane.

If the inputs and outputs and the use cases are understood and it’s akready working, create an anti corruption layer and treat it like a 3rd party binary blob.

Yes it's called Big Fish Games: Casino.

It is a very popular fake money gambling app. Most of the code is very old and no one knows how it works. The only work actually done is fixing bugs that the previous bug fix caused.

Yes. And the author did not understand it at all either.

The code had to be destroyed and started from scratch. It was absolutely unsalvageable.

1) Get it to run in a test/local environment.

Yes, frequently. A couple of highlights:

PostgreSQL allows C extensions. A function which used to work was now segfaulting occasionally so needed to be fixed. Original author was gone, nobody really understood the PostgreSQL extension system (which involves a ton of macros.) No version control, no useful comments, no spec. I read the code (the core of the logic was straight-forward C, it was just the interface to PostgreSQL that was hard to understand) and wrote a new version in Python (another language for which PostgreSQL supports extension functions.) That version was too slow, by two orders of magnitude. Went back to the C implementation. Did a little light refactoring to separate the core logic from the PostgreSQL interface. Then wrote a test harness program in C to drive call only the inner function. Wrote unit tests until I reproduced the segfault. Fixed the C code and tested it with my test harness. The PostgreSQL wrapper just called the (now correct) inner function so it now worked too. Checked the fixed code and unit tests into version control.

A PhD (no longer with the company) had fit some simple neural net models in R. He'd written his own code for this because, according to him, the standard packages in R didn't support a few of the bells and whistles he'd wanted like ReLU activation. Not only was the code in-house, but the models themselves were saved as serialized R objects. No specifications for any of this stuff. Apparently the company had been using this code to score medium size databases for several years. When we wanted to scale up to a much larger database (approximately 30 billion rows) the problems with the R implementation became apparent. Fairly slow, high memory usage, worse yet a slow memory leak on large jobs, and worst of all occasional silent crashes where it would simply stop and exit with a successful status of 0 and no error message. This time I took the approach of reading the code and de-serialized R objects. I re-wrote the implementation in Python using numpy arrays and wrote a small R program to read the serialized models in the .RData format and emit a cleaned up JSON object that could be easily read from Python. Luckily I didn't have to port any of the cross-validation/training/optimization stuff; just the prediction part. That meant 80% of the code could be safely ignored. However, the devil was in all the one-off special cases in the serialized R objects, many of which had behavior different from default. I would test by comparing the predictions by the two programs on small batches of a million. These predictions were floating point numbers between 0 and 1 so when both programs agreed to within 1e-5 for all million I knew it was correct. It took a week to track down all the special cases, though, and the special cases made it impossible to just use a standard neuralnet library like Keras. (We already had several Keras models in production and a set of tools to manage them; that would have been easy for us.) Proprietary code begets proprietary code I guess. At least the new implementation was much faster and didn't leak memory so could deal with the whole database in a single long running process. I pitched the idea of re-training all models from scratch in Keras using our more modern tools but management wanted 100% backwards compatibility and to preserve the value-add of the PhD.

First write a tests for every function point. After that you will understand what the system does. Then re-write the whole thing.

First, you need to choose if you are going to improve it, or ultimately abandon it. That's not an easy decision. If you decide to abandon it, you need some very convincing arguments, and lots of them. Before making that decision, work with the codebase, at least until you understand it well enough. It will allow you to uncover arguments either way.

Improving a codebase is a well-known subject, so I'm not going to comment on it further.

If you decide to ultimately abandon it, you need to understand it won't happen tomorrow, and perhaps not before a few years (for example, I'm 2 years in with a codebase I decided to abandon, and it's probably going to be at least 1 more year in production). Stakeholders hate when you spend time just rewriting it for the sake of it (from their perspective).

What you want to do instead is use a strangler pattern: your new codebase should "strangle" the old one, and deliver value VERY quickly, which will convince the stakeholders it was the right choice.

First, all new features are in the new codebase. If possible, start with easy features that have as few dependencies as possible with old codebase. Any call between the two should be in a special wrapper in your new codebase, so you can start having a sense for what code will need to be rewritten, at some point.

Then, start to "strangle" the old codebase: wrap ALL calls so they go through the new codebase first. That will allow you, somewhere in the future, to cut off the old codebase part by part, and avoid the full rewrite effect, as well as quickly revert to old codebase if bugs are uncovered.

Once you have that, you can more easily identify which parts should be replaced first: performance issues, too many bugs, new features needed...

When you reach the end, it's a matter of convincing stakeholders you absolutely need to go to the last mile, with good arguments. If you can't find good arguments, you probably don't need to go to the last mile yet.

I did once work on such a thing, deep in the rotten heart of finance in a life insurance company. The design was more than 20 years old and was the result of a port of a previous codebase to a new language. The previous codebase was also a port from an even older language. And legend had it that the thing started out as a Paradox database in the 1980s. Along the way, the code had been translated without significant refactoring, so idioms from the older systems had been brought forward without consideration as to whether they were sane in the new one. The overall flow of the process was still based on the original design, even though you would do it entirely differently in the modern toolset (for example, intermediate results were written out to files, then read back in in the next step because the original system didn’t have the concept of functions that could pass data around).

Subtle bugs had crept in over the years, usually relating to the different sort orders or rounding rules of the different platforms. Some minor bugs had become features (the users didn’t want the bugs fixed because they had become reliant on the incorrect version and didn’t want to have to restate their results with the correct version - the idea being that having to explain and account for the change was more trouble than accepting the minor defect).

The codebase had been altered regularly with minor changes over the years, by a succession of contractors whose names appeared like biblical king lists in program headers. The changes were usually minor, and the approach had always been to graft on some new functionality or extra edge case, rather than redesign anything. Many of the contractors had been actuaries rather than professional programmers, so there were hair-raising sections of code that achieved the required result in extremely obtuse ways. Real outsider art.

There were huge vestigial sections of code and redundant outputs that nobody ever used, but because it was part of a bigger end-to-end process that was also poorly understood and onerous to test, those sections and outputs were always kept just in case they were significant.

In a way, this was a relief. It meant you didn’t need to understand a lot of the code, as long as it kept producing the outputs everyone expected it to.

I was part of a project to migrate all this code onto yet another new platform. Did I take the opportunity to do a grand refactoring? Heck no. The project was already overdue when I arrived, and I had two other projects to work on at the same time. So I did what all those contractors had done before me. I lifted-and-shifted with the least invasive changes possible, ran just barely enough tests to convince the users it was good, and moved on.

I still work for that company. The codebase is now more than 30 years old. It’s had another platform migration since, as well as the same old stream of minor change requests. Bolt-ons on top of bolt-ons on top of tactical kludges.

The thing is ugly and horrifying. Ramshackle and arcane. Congealed, not designed. And yet... it’s managed to carry on producing the outputs that this business needs it to. So is it really all bad?

The older I get the more I appreciate the power of comments and readmes.

Org-mode and literate programming.

Bring all the code into a set of org-mode files each file has one code block, which emits back the original code. You can verify this using diff or (better) putting the initial repo into git if it's not already and seeing if your emitted code causes any unstaged changes to appear. Start dividing the blocks into more logical chunks (includes, declarations, definitions, one block per function implementation, etc.). Use a hierarchy like:

  * foo.c
  #+BEGIN_SOURCE c :tangle foo.c :noweb tangle
  ** includes
  #+NAME: foo-includes
    #include "bar.h"
    #include "foo.h"
  ** file variable declarations
  #+NAME: foo-file-variables
    int x, y, z;
    some_struct a, b, c;
  ** Function bodies
  #+NAME: foo-functions
  #+BEGIN_SOURCE c :noweb tangle
  *** baz: int->int
  #+NAME: foo-baz
    int baz (int i) {
      // some logic
      return some_val;
Those variable names are useless, figure out what they actually mean and document them in the org file. Consider renaming them. There are no test cases, create a tests section in the org file and start writing up simple test programs (or use a unit test framework, but you may need to do some significant refactoring before you can do that).

A benefit of org-mode here is that sometimes you want to test some functionality (and have no unit tests yet). So you try to test baz from above. But once you build that source file and a second foo_test.c file you find out that foo has a function, quux, which has dependencies outside of just the header files and foo.c itself. To build this thing you have to build bar.c and maybe even the whole Linux kernel. "Shit!", you say, "How do I handle this?" Well, org-mode to the rescue:

  * Testing Foo
  ** Testing baz
  #+BEGIN_SOURCE c :tangle foo_baz_test.c :noweb tangle
    // maybe some other things like the header references.
    // some test code that is able to focus in on just the baz function
You've fully isolated that one chunk, and quux is no longer being included in the build for this test. Whatever problems it has don't impact you (for this test). Once you figure out how to isolate or mock quux's dependencies so that you aren't including the full Linux kernel, you can add it to the set of tests. Now the foo file is fully brought under testing and you can more effectively refactor it. And the mocks and all you've made allow you to move to a proper unit test framework if you want, versus the ad hoc initial framework we've produced here.

Even if you don't care about testing your code, the above process (sans the testing part) will allow you to perform a disection on your codebase and get it documented and specced out properly.

The company's entire code base was split into two: Java from the 'enterprise' stuff, and Perl for everything else. The Java side had something close to 20 million lines of code, and the Perl side had about one million lines of code. As always, I worked on the operational side.

NB: I rather like Perl, though I haven't used it much at work since leaving this company a few years ago. The ...things that follow are not an indictment against Perl. The problem wasn't the language.

Even used properly, Perl is a pretty dense language, so a million lines of Perl in one place is quite impressive.

The company's code started in the mid 90s and continued to grow. When I arrived, there were a couple of guys on the operations tools team with me that had been there 7-9 years I believe. They understood the code better than anybody, and they were both very bright guys, but things came up, at least a couple of times a week, that would surprise and sometimes mystify them. After some digging, debugging and sniffing around, they'd usually approximately get somewhat near a root understanding.

That's fairly amazing to me. I've helped create some enormous code bases my decades long career, in one case, even larger than a million lines. And I'd certainly find myself surprised from time to time, but never for very long.

So these millions of lines of code had, over the prior 17 years, grown organically, and had, essentially, never been refactored.

The code itself was, for the most part, quite tidy. And the underlying concepts and structures were pretty simple and elegant. They represented approaches I mostly agreed with.

But...almost nothing had ever been removed. No refactors, over millions of lines of code, over most of 20 years.

There were side effects everywhere. Many of the deep, underlying methods had had so many arguments added that perhaps one third them would be useful or used in any given invocation.

And, of course, there were virtually no tests.

After somewhat getting up to speed on this system, I declared it DOA, and started to push for a complete rewrite. I've been around the block enough times to understand and comprehend the hazards of that approach, and I didn't mentally pull that trigger lightly or quickly.

But....the organization would not have it. The two senior guys on my team were fine with the idea, but many other long-timers were not.

That code is, to this day, with no doubt another 100k lines of Perl added to it, (poorly) powering fundamental and important pieces of a huge company you have all heard of.

So management basically threw bodies at it. Lots of bodies. And, even though the operational quality of the products were objectively fairly poor, the particular market niche didn't need or demand better, so the company made and continues to make a ton of money.

As others have said, I specialize in legacy code. It's the one area you can be an expert in and know 100% for sure that it's never going to change ;-)

Some quick advice: Rewrites are almost always a bad idea. The requirements are almost always at least as difficult to discover as they were in the first place. You will also miss things or incorrectly decide that something isn't important now. These can often kill your project before you get a chance to replace the old system.

But the most important reason for not rewriting is because there are almost always business reasons for extending the existing application during the rewrite period. This gives you a moving target for the rewrite. Additionally, you will find that the "legacy team" who is adding code to the existing system will be seen in a better light than the "rewrite team" because they are actively solving business problems. The "rewrite team" will be seen to have no value until they ship something. As more and more features are added to the legacy application, more and more resources will be added to it until someday someone will say, "Why are we rewriting this again?" and cancel the rewrite. It doesn't happen every time, but in my career I think I've seen it in about 90% of the rewrites.

So you need to get comfortable with the legacy code. The first thing to do is to make the build and deploy process as painless as possible. You probably can't get time allocated to do it, so with every piece of work you do, steal some time for that. If you are on a project where they have "build teams" and it's actually impossible to build the application yourself, fix that as a matter of priority.

Once you can reasonably work on the code, you need to start introducing tests. The best advice I can give is to read Michael Feathers's book "Working Effectively with Legacy Code". This is a must read. I think there may be a newer version of it, but even though the old version is very dated technology wise, the techniques are still rock solid.

Fight the urge to refactor/rewrite large portions of the application. Instead, pay attention to the code that you touch the most. Ensure that this code has good tests and once it does, fence it off from the rest of the code base and start improving it. Code that you never touch can be the crappiest in the world. Code that you touch once only has a one time cost, so don't fret over it. Code that you touch every single day needs to be amazing. Concentrate your efforts there.

The last piece of advice I have is to look at the kinds of requests you get. If you get a lot of similar requests for functionality (for example lots of reports), then make that part of the system easy to work with. What you want to do is match the ability to work with the code with the expectations of the customer. If they intuitively think, "This should be easy", then work hard to make it easy. Say things to your stakeholders like, "You/Users asking for feature X expect this task to be easy for me to do. It's not. I need time to make it easier." Usually they will see the sense in that. If they expect everything to be easy, use that back on them. "I can't rewrite the whole application without stalling our business plans. I can make some parts of this easier than others though. Which parts are the most important? Note if you say X is important to be easy, then I have to spend time up front to make it easy. We have to be careful about our budget". That's the kind of language that business people can understand.

Finally, have fun with the legacy code. You aren't likely to make it (much) worse. Use the opportunity to experiment with new ideas. However, I caution you to avoid the temptation to transition to newer technologies (you'll never get it finished -- just like a rewrite). Instead, think about the techniques in the newer technologies and start introducing them in your old code base. IMHO, this is always more fun that simply using something off the shelf anyway. Ironically, I find that working on legacy code is the most liberating thing I can do on a professional team. You can always say, "Well, this is crap. Anybody mind if I replace it?" and almost always people will welcome it.

I inherited a suite of .NET/WinForms applications that managed warehouse shipments to major purchasers. They had been written and modified by a succession of programmers with wildly different opinions of how to write a program (from copy-paste duplication to massively overarchitected inheritance trees; fully denormalized tables to 6th normal form; and everything in between). I was the only software developer at the company, so there was nobody else to ask how any of this worked; I had to figure it all out from scratch. The steps I took were:

- Pick the program with the most egregious errors. This was part of the suite that would upload tracking information to the purchaser, and cost us large fines when something went wrong.

- Find the user(s) of the software and pay them a visit. Solicit buy-in (there was some concern that the new IT director might have started this initiative in an effort to automate people out of their jobs) by explaining that I'm planning on making the software easier to use (easy, since it was awful and everyone hated it), and then have them walk me through exactly how they used it. This turned out to be a terribly inefficient process involving lots of paperwork shuffling, but I ignored that temporarily in favor of just finding out how it was supposed to work now, and what sort of ways it went wrong. Get a list of likely low-hanging-fruit bugs.

- Track down the source code to the program, and put it under version control. Fortunately I had a copy of the previous developer's computer, which had an up-to-date version once I found it. However I did have to test that it did everything it was supposed to by running it in production, which was a bit nerve wracking.

- Set up an automated updater (I used Microsoft's ClickOnce installer, which checks for updates on a shared SMB drive), and replace all the copies of the program I could find with auto-updating ones. (This required asking people to pass around word of a replacement by word-of-mouth as they heard other people were using it, since nobody had a list of all the users.)

- Buy ReSharper, and start doing mechanical refactorings on the codebase to fix the obvious and easy code smells. What the changes are doesn't really matter much; the point of this exercise is to start to get a feel of where everything is in the code. Since you're just using the ReSharper commands, there's no risk of breaking anything by doing this.

- Fix a few easy bugs, and push an update out to users. I started with making a list view sortable (literally a one-checkbox change that saved 30 minutes a day) and a few similar small issues. This immediately showed a previously unprecedented level of interest in the users' problems and also got them used to using the auto-updater before any more major changes came along.

- Continue with more major refactorings and bug fixes, pushing out a release every few weeks (faster if you can focus on just the one project). I usually tried to include at least a few user-facing changes in with the internal stuff, but occasionally the release notes were just "better performance" or "major internal improvements, so I can do feature X next week".

The really important part of this process is understanding not only how the software works (and how it's supposed to work)--which can probably only be done by refactoring instead of rewriting--but also getting to know how the users use it and what their actual needs are, so you can suggest improvements that wouldn't necessarily be obvious to someone who doesn't understand the entire system.

> - Buy ReSharper, and start doing mechanical refactorings on the codebase to fix the obvious and easy code smells. What the changes are doesn't really matter much; the point of this exercise is to start to get a feel of where everything is in the code. Since you're just using the ReSharper commands, there's no risk of breaking anything by doing this.

I love ReSharper, and it has been worth every penny that I've ever paid for updates, but you still have to be very careful. I've walked into a few very heinous bugs where simple refactorings have broken things badly. Mostly this was because of people doing evil things with reflection and dependency injection, that should never have been done, or because of arcane config-file based development, where even ReSharper's excellent "Find Usages" and code analysis engine cannot fully understand what is going on.

This is true. Fortunately none of the code I worked with did anything like that, but it's definitely important to be aware of.

no but i’ve written one

Pick a starting point then write a lot of comments.

Corollary: pick a starting point, and add a shit-ton of logging to trace what is actually going on. Hide it under #if DEBUG if necessary, and hope that you don't get a Schrodinger's Code situation where observing the code changes how it behaves.

I've had a few more-or-less realtime multithreaded projects that I've worked on where you can't really put a debugger on a system and halt it, without breaking things in interesting and misleading ways, and the only good option is to fall back to ye olde printf debugging.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact