Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Inherited the worst code and tech team I have ever seen. How to fix it?
557 points by whattodochange on Sept 18, 2022 | hide | past | favorite | 676 comments
I have to find a strategy to fix this development team without managing them directly. Here is an overview:

- this code generates more than 20 million dollars a year of revenue

- it runs on PHP

- it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

- it doesn't use composer or any dependency management. It's all require_once.

- it doesn't use any framework

- the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

- no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.

- the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

- JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.

- no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.

- In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...

- no caching ( but there is memcached but only used for sessions ...)

- team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

- productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

I know a full rewrite is necessary, but how to balance it?




First off, no, a full rewrite is not only not necessary, but probably the worst possible approach. Do a piece at a time. You will eventually have re-written all the code, but do not ever fall into the trap of a "full re-write". It doesn't work.

But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.

Once you are at that point, start picking off pieces to modernize and improve.

Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change... come in embracing that this beast of a codebase makes 20 million a year. So talk about how the team can improve it, and modernize their skills at the same time.

Because if you walk in, saying, "This all sucks, and so do you, lets throw it out", do you really have to wonder why you are hitting resistance?


I fully agree with this, but I think it misses a key step:

As the team’s manager, it’s your job to get buy-in from the executives to gradually fix the mess. You don’t need to tell the team exactly how to fix it, but you gotta get buy-in for space to fix it.

One approach is just to say “every Friday goes to adding tests!” (And then when there’s some reasonable test coverage, make fridays go to refactoring that are easy with the new tests, and so on).

But this often fails because when Friday comes, something is on fire and management asks to please quickly squeeze this one thing in first.

The only other approach I know of is to get buy in for shipping every change slightly slower, and making the code touched by that change better. Eg they want to add feature X, ok add a test for adjacent existing functionality Y, then maybe make Y a little better, just so adding X will be easier, then build X, also with tests. Enthusiastically celebrate that not only X got shipped but Y also got made better.

If the team is change averse, it’s because they’re risk averse. Likely with good reason, ask for anecdotes to figure out where it comes from. They need to see that risk can be reduced and that execs can be reasonable.

You need the buy-in, both from the execs and the team. Things will go slightly slower in the beginning and it’s worth it. Only you can make sell this. The metaphor of “paying off technical debt” is useful here since interest is sky high and you want to bring it under control.


Before anything else, getting buy-in for any kind of major change from the execs is key. Explain the situation and the effects. Have everything in writing, complete with date and signatures. Push back hard every time this commitment gets sabotaged because something is supposedly on fire. Get a guaranteed budget for external trainings and workshops, again in writing. Then talk to the team.

If you cannot get those commitments in writing, or later on get ignored multiple times: run. Your energy and sanity is better spent elsewhere. No need to fight an uphill battle alone – and for what? The company just revealed itself for what it is and you have no future there.

First I’d do that, then think about the engineering part.


To be fair if I was an exec at a company and the new IT lead wants me to commit, in writing, to XYZ, I’d not keep them around long. You can’t run a company on that kind of deep mistrust.

Nothing in the OP suggests abusive management. Incompetence, maybe, but I see no reason to assume that they’ll backtrack on agreements, and a new management hire who immediately starts sewing mistrusts is not someone I’d trust to get things to a higher level.


Clearly you've been in a very positive bubble. I envy you but that's not an experience shared by many.

As a programmer contractor and a guy who sometimes gets called to save small businesses due to stalled development (happened 6 times in my 20y career) I'm absolutely not even opening my laptop anymore -- before I see a written commitment from execs (email is enough; I tag/label those and make sure I can easily find them in the future).

Reasons are extremely simple and self-defensive in nature: execs can and do backtrack from agreements all the time. At the time we arrive in an oral agreement they made 20 other invisible assumptions they never told me about and when one of them turns out to be not true (example: they thought you can get onboarded in 2 days into a system with 5000+ source files and be productive as a full-blown team member on day #3) they start backtracking faster than you can say "that's not professional".

I don't dispute your positive experience. But please be aware that it's not the norm. Most execs out there treat programmers as slaves with big salaries and nothing more, and we get exactly the treatment you might expect when they have that mindset.

Sorry not sorry but I have to save my own arse first; I've been bound to extremely awful contracts when I've been much younger and stupider and I am not allowing that ever again.

I can single-handedly make a business succeed with technology, and I have done so. I am not staying anywhere where execs hand-away everything with "should be simple and quick, right? k thx bye".


Thanks, that’s what I was aiming for. It’s kind of a litmus test what kind of professionalism you can expect in a place – if any. Especially when they have shown prior incompetence, as in OP’s example.

In all honesty, given that example, if I didn’t get immediate buy-in, I’d throw the towel right then. Over 15 years of experience show that train wrecks only ever get fixed when they are recognized as such from the start.


> You can’t run a company on that kind of deep mistrust.

Trust has to be earned in some ways (but you can expect some base-level). But I want to argue on another point: as an exec, you can use this kind of writing to also get commitment from the team, to balance things out. But ofc for that there needs to be a fair discussion of priorities and once you have that, there is usually no reason to contractify the outcome.


>if I was an exec at a company and the new IT lead wants me to commit, in writing, to XYZ, I’d not keep them around long. You can’t run a company on that kind of deep mistrust.

Emails are writing, if you're imagining the IT lead walking in with a paper contract I see why you would say that.


That's essentially what the GP was implying, "Have everything in writing, complete with date and signatures."


Nowhere were contracts mentioned. A proper proposal, for example, always has a date and to sign it if agreed to is just professional conduct. I’d be wary of any exec not willing to do that. Instant red flag.


That’s what a proposal is too, it’s not necessarily a demand


That's fair. I've worked at more established places with formal design doc/RFC and sign off processes and it can work well.

After reading the description of the SOP at this shop, the idea that the OP would be able to introduce an additional layer of process requiring multiple stakeholders and management seemed like a bridge too far in my mind :).


Do leads not write proposals or RFCs? I’m not sure why you wouldn’t keep them around long if they laid out their plans in a clear way, and then pitched it to others


"Have everything in writing, complete with date and signatures"

It is possible that the executives won't take it well to all of the formality here (writing and signatures). How would you convince them that this is necessary?


"Have everything in writing" is a bad mindset and is not going to save you.

Exeutives are looking at you as the expert to deliver a good outcome. Which means making good decisions, managing expectations and keeping everyone in the loop.

Generally, if it gets to the point of having to dig up who signed off on what, you've already failed. Often you won't even get the chance to dig up those emails, because delivering a bad outcome is enough for execs to write you off without even needing to hear your excuses.


> because delivering a bad outcome is enough for execs to write you off without even needing to hear your excuses.

What makes you think they are excuses? Constantly chasing moving targets and not having even one of them agreed upon in writing is heaven for bad execs. I've seen it happen a good amount of times, my colleagues too.

I don't view the "you changed requirements 20 times the last month and I can't keep up with your impossible imagined schedule" statement as an excuse.


If the goal is to remove bad execs, then a document trail can help, although I'd suggest starting with some statistics like "over the last 3 months, we moved the goalpost 8 times, which led to an effective throughput of 4 weeks of work being done rather than the expected 12 weeks. How do you think we could improve these conditions?" Collaboration first.

Keeping email threads for reference is probably plenty data enough, btw; "signatures" sounds like the wrong approach. Maybe even just summarize the direction given in a wiki document with a change log with time stamps and requesting person, which you can review once in a while, and the sheer length of it might be enough to bring the point across.


Thank you -- good advice to put collaboration first. I sometimes have a problem that I assume the worst right away. But I've met some true villains in my life and career so maybe that's why. I'll do my best to implement your advice.

> and the sheer length of it might be enough to bring the point across.

This one sadly hasn't been true -- I tried it but I get blank stares and sometimes grumbling about making people read long stuff that I can just summarize to them. Maybe there's a way out of this conundrum as well.


Your job is to deliver what the execs consider to be a good outcome.

That includes helping the stakeholders come up with a stable set of requirements. Most of the time when teams are dealing with a lot of requirements change, it's because they never captured the true requirements which usually change at a much slower rate.

Secondly, your job is also to manage expectations, so that execs know what the impact of any changes will be when they request them.

Changes aren't an excuse to deliver late or over budget. These parameters are flexible and new targets should have been agreed when the requirements change was requested.

Execs will usually assess your performance without discussion. There is no venue to bring your cache of documents to prove your innocence after the fact.


We all know the ideal theory. I am talking execs that constantly change requirements, refuse to sign under any stable requirements, and think everything is "quick and easy", and take offense when you try to manage their expectations.

Reasonable people I easily work with. It's the rest who are the problem.


Sounds like you haven't worked in an environment where this happens. You get regarded as 'the expert to deliver a good outcome' sure. But you're ALSO expected to deliver an aggressive roadmap of a while load of other stuff that people already committed to. Someone's something's got to give


Dates and signatures are theatrical overkill.

I've yet to work at a place where meeting minutes, sent out to all attendees post-meeting, aren't sufficient for the same purpose (ass covering & continued adherence to The Plan as originally agreed).

I'm sure signature and date places do exist... but, I'd probably be looking for a new job if I worked at one.


The dates and signatures bit is nonsense, but it does help to have things in writing to ensure everyone's on the same page. That just means that when you're discussing things not in writing, you send a written follow up to everyone that's involved immediately afterwards. If it's a meeting, take detailed notes and send them around afterwards. If it's a one on one conversation, just send a follow up email that says something like, "Hi x, I just wanted to memorialize our conversation - here are the main notes that I took. Please let me know if any of this sounds off to you. Thank you."

That doesn't preclude them from not reading that email and later telling you they said something completely different, but at that point you should probably be heading for the door anyway.


Having stuff in writing is essential, for accountability on all sides. The exact format does not matter, neither does what passes as a signature in a company. My example was for broadest possible applicability. The point is the willingness to commit to something in writing and to take the time to reflect on the implications of doing so. If you cannot get that, you’ve already lost. There will be moving targets.

It’s interesting to see how all responses focus on the signature part as problematic due to its supposed formality. Is this an American work culture thing? I see signing off on an agreement as a signal of professional conduct and reliability.


> But this often fails because when Friday comes, something is on fire and management asks to please quickly squeeze this one thing in first.

There's a solution to this problem: nothing goes live on Fridays.

> and making the code touched by that change better.

Getting buy-in from management on this always appeared to me as weird. The alternative is a codebase that can only ever get worse over time. So you either gotta gold plate everything, which will take way longer than allowing for some after-the-fact improvement as needed, or your codebase turns into a pile of shit very quickly and your velocity grinds to a halt very quickly.


> Getting buy-in from management on this always appeared to me as weird. The alternative is a codebase that can only ever get worse over time.

Well that's just the thing: they have no notion of a "bad code base". To them that's an excuse and a negotiation leverage by the programmer to ask for more money. They judge others by themselves I guess.


It just feels like an amateur hour thing.

If my plumber came to me to ask if he can just dry assemble the pipes and leave them that way I'm gonna get a new plumber.


That’s assuming you know something about plumbing. If you don’t, you’ll just nod your head and say ok, that sounds good. The same thing is happening in these businesses. The business owners generally don’t know programming. Terms like “refactoring” mean nothing to them at best and sounds like “rewrite from scratch” at worst.


It's scary out there, man. A lot of people in HN judge by US companies and startups but I've only been in that bubble once for a few months and the rest of my 20 years of career has been everywhere else. And it's insanely bad in many places.


Do not waste time with a company that is going to collapse unless they are willing to do whatever it takes.


They are going to collapse making 20M a year, sure.


Revenue is not profit


Not the team’s manager. OP says so in the first line.


Yeah, there's a process. It's something that I've done a bunch of times for a bunch of clients.

There's so much low-hanging fruit there that's so easy to fix _right now_. No version control? Good news! `git init` is free! PHPCS/PHP-CS-fixer can normalise a lot, and is generally pretty safe (especially when you have git now). Yeah, it's overwhelming, but OP said that the software is already making millions - you don't wanna fuck with that.

I've done it, I've written about it, I've given conference talks about it. The real bonus for OP is that the team is small, so there's only a few people to fight over it. It's pretty easy to show how things will be better, but remember that the team are going to resist deleting code not because that they're unaware that it's bad, but because they are afraid to jeporadise whatever stability that they've found.


Personally, I would never run a linter of any kind on a full codebase that doesn't have tests. After having been bitten by all kinds of bugs over the years, I wouldn't suggest auto-linting any file that you aren't actively working on.

It's rare that linting will actually make the code work better. Granted, it could catch some security bugs. But they can - and will - introduce new bugs. You just have to ask if it's worth the risk.


This. It's so tempting when a linter warns "This code is misleading; it would be clearer to do it this other way" to think "Easy fix: change it the way the linter suggests." But, make the change, and you may discover (hopefully before delivery) that the code functionality depends on the confusing behavior.


And also starting by fixing the js/css/html front end is likely the safest, as it wont corrupt any customer data & it will be visible when something breaks. That can probably be the next best candidate to do a major overhaul. I'd also hope that a $20M/year project can afford to hire someone senior in addition to these 3 juniors?


> hope that a $20M/year project can afford to hire someone senior

Never underestimate the ability of management to look a gift horse in the mouth while shooting it in the foot.


why would someone senior even want to join this team? Especially someone senior enough to fix this. The productivity is horrible and there's no kudos for fixing something that's lived for 12 years like this.


I was added to a team because, to quote the VP, "they're good but they need some adult supervision"

Mixing skill levels in a team is healthy.


> why would someone senior even want to join this team?

Well, money. Why would someone even want to join any team?


Theoretically a company that's making $20m/year on this can afford to make it worth someone's while to come in and fix it. The problem isn't finding someone who will do it, it's that the company assumes they can continue to get by indefinitely on paying too little.


For the love of refactoring.


git init seems like job #1 because at least then you can delete every commented out line and start a little cleaner.


to loose all the comments? :D that would make it even harder to read


In a project without version control (or one that doesn't trust it enough) there are always whole sub-programs made up of dead code. It's usually some combination of commented-out blocks and functions that are only called from within those commented-out blocks. Removing commented code (not real, descriptive comments) is the first step to eliminating all this dead code, and eliminating dead code buys a ton more flexibility in what you can change safely.


Fully agreed, I was tasked with using an old library and my first order of business was to make an analysis of dead code branches. The GIT commit removed 17 out of 80 files and about 10-11% of the code in some other files (that were not deleted) and the library works 100% the same -- confirmed by tests that I painstakingly added during the last weeks.

Less code, less confusion.


> It doesn't work.

That's simply not true. I've inherited something just as bad as this. We did a full rewrite and it was quite successful and the company went on to triple the revenue.

> get some testing in place

Writing tests for something that is already not functional, will be a waste of time. How do you fix the things that the test prove are broken? It is better to spend the time figuring out what all the features are, document them and then rewrite, with tests.


The problem with people new to the company starting a rewrite from scratch is that they often are poorly informed on why things were the way they were before. If you start big, you can have bad outcomes where the new system might be objectively worse than the old one... but you are stuck trying to get the new thing out for the next 5 years because too many people sunk too much political capital into it.

As an example, I worked at an ad-tech startup that swapped it's tech team out when it had ~100 million in revenue (via acqui-hire shenanigans). The new tech team immediately committed to rewriting the code base into ruby micro-services and were struck by strange old tech decisions like "why does our tracking pixel return a purple image?". The team went so far as to stop anyone from committing to the main service for several years in a vain attempt to speed up the rewrite/architecture migration.

These refactors inevitably failed to produce a meaningful impact to revenue, as a matter of fact the company's revenue had begun to decline. The company eventually did another house cleaning on the tech team and had some minor future successes - but this whole adventure effectively cost their entire Series D round along with 3 years of product development.


You're making a silent assumption that the original team is well informed about why the things are like they are and that they know what they are doing. I think it is not always the case.

I've been to a project once where the mess in the original system was the result of the original team not knowing what they were doing and just doing permutation based programming - applying random changes until it kinda worked. The situation was very similar to that described by the OP. They even chose J2EE just because the CTO heard other companies were using it, despite not having a single engineer knowing J2EE. Overall after a year of development the original system barely even worked (it required manual intervention a few times per day to keep running!), and even an imperfect rewrite done by a student was already better after 2 weeks of coding.

So I believe the level you're starting the rewrite from is quite an important factor.

Then of course there is a whole world of a difference between "They don't know what they are doing" vs "I don't like their tech stack and want to master <insert a shiny new toy here>". There former can be recognized objectively by:

- very high amount of broken functionality

- abysmal pace at which new features are added


The original team may not have been the best at the task, but they still managed to deliver 100 MM in revenue. Sometimes the things they leave behind/ignore simply don’t matter to the business/useful tech.

Particular to ad tech, the lifespan of any particular software is lower than you’d expect (unless your google/Facebook). Technology that pays out big one year will become pretty meh within 3 years. In the case above I’d argue that the new tech team didn’t really understand this dynamic and so they focused on the wrong things such as rewriting functionality that didn’t matter for the future. Or making big bets on aspects of the product which were irrelevant.

To the OP, we don’t know that the lifespan of any of these php files is greater than an individual contract. If the business can be modeled as solve a contract by putting a php file on prod - rewriting may be entirely worthless as the code can be “write once, read never”.


Revenue is a crazy kpi for technical excellence. You should never let a high revenue rely on extremely bad code.


We (my good friend and I who both have 20+ years of experience) were brought in specifically do to the rewrite. We were new to the company. We actually had to rebuild the entire IT department while we were at it as well.

> new tech team immediately committed to rewriting the code base into ruby micro-services

well... sigh.

> These refactors inevitably failed to produce a meaningful impact to revenue

It sounds like less about the refactor itself and more about the skills of the team doing the refactor. You certainly can't expect a refactor to go well if the team makes poor decisions to begin with.


> We were brought in specifically do to the rewrite.

That's the key difference. The stakeholders should always be in on the rewrite.


> It sounds like less about the refactor itself and more about the skills of the team doing the refactor. You certainly can't expect a refactor to go well if the team makes poor decisions to begin with.

This has been my biggest struggle with rewrites where I’m currently working. We have several large, messy old codebases that everyone agrees “needs a rewrite” to (1) correct for all the early assumptions in business needs that turned out wrong, (2) deal with old PHP code that is very prone to breakage with every major new PHP version released, and (3) add much needed architectural patterns specific to our needs.

I’ve seen rewrites of portions of the project work when they involve myself and one other mid-level dev who has a grasp on solid sw engineering practices, but when the rest of the (more senior) team get involved on the bigger “full rewrite”, they end up quickly making all the same mistakes that led to the previous project being the mess that it is.

Sure, it will be using fancy new PHP 8 features, and our Laravel framework will force some level of dependency injection, but the you start seeing giant God classes being injected over here, but duplicated code copy-pasted over there, all done by “senior” devs you feel you can’t question too strongly.

To that end, an open and collaborative culture in which you start the rewrite with some agreed upon principles, group code reviews and egos kept in check, are all necessary for this to work.


You have a great experience and did a great job indeed. My only question is how does one get 20 years of such experience without horrific flashbacks of “let’s just rewrite it” decisions. Do you do rewrites/redesigns often? What’s your success rate?


I've done what I would consider as four rewrites that I can remember as large events in my life (although not fully what you'd expect). But all are good stories in my opinion.

First one was the above example. It was for the largest hardcore porn company on the planet. Myself and my good friend Jeff rebuilt an already successful business IT department from the ground up and made it even more successful. Ever heard of 'the armory in sf'?

Second was that jeff and I were hired as contractors by Rob @ Pivotal Labs (ceo) to help the CloudFoundry team rewrite itself after he had bought the team and trimmed it down to only the good people. That one was a huge mess. We spent a lot of time deep in shitty ruby code using print statements trying to figure out what 'type' of an object something was and, of course, backfilling tests. It was a fun project and both Jeff and I learned the Pivotal way, which was probably the most enlightening thing I had ever learned about how to develop software correctly from a PM perspective. If you want to improve your skills beyond just slinging code, spend some time figuring their methodology out. Much of it is documented in Pivotal Tracker help documentation and blog posts.

Third one was not really a rewrite, but the original two founders, who were not technical, had tried to hire a guy and got burned because the guy couldn't finish the job. Sadly, they had already paid the person a ton of money and got really nothing functional out of it. We (jeff and I again!) just started over. We did a MVP in 3 months (to the exact date, because we both know how to write stories using pivotal tracker and do proper estimates) and ended up doing $80m in revenue, in our first year with an initial team of 5 people.

Fourth one was three guys (who were also not technical) I kind of randomly met after I moved to Vietnam. They were deploying litecoin asic miners into a giant Vietnamese military telco (technically, they are all military). They had hired another guy to do the IT work and he was messing it all up. They invited me out to help install machines, I came out, rebuilt their networking layout and then proceeded to eventually fix their machines because the software 'firmware' that was on them was horrible. I also added monitoring with prometheus so that I could 'see' issues with all these machines. That first day on the job, they fired the other guy and made me CTO. We ended up deploying in another datacenter as well. It was a really wild experience with a ton more stories.

Life has been, um, interesting. Thanks for reading this far.


Please tell me that you've retired now due to your incredible billing rates and track record of success.


Not everything has been a success. For example, unless you're stupid rich and can afford years of losses, never start/own a night club or you might end up working for the rest of your life to pay off your debts.


The problem is that most developers are crap and self centered on working with the tech they like.

You need to work with someone who doesn't care about filling up their CV with "ruby microservices" and get stuff done.

If I went into a business to do a rewrite and decided to use $shinyNewTech because I want to build up rust experience I'd probably end up wasting years with little results.


The existing app was a large rails monolith. This wasn’t a small 10 person team but a 50 person org. Groups can get funny ideas sometimes.


> why does our tracking pixel return a purple image?

Now I'm really curious, is there some exciting non-obvious reason for a tracking pixel to be purple? Was it #FF00FF or more like #6600DD?


This definitely needs an answer.

In fact, until OP can give us the right answer, we immediately need even wrong answers!

You reading this. Yes, you. Give your best wrong answer below.


My best wrong answer is that there were different colored pixels for different front-end versions, and the app had some radically different responses depending on the version. Maybe MENA would return white, SE Asia green, people who signed up during a sale would return blue, whatever. After a while, the other pixels were removed and only one shade of purple were used for everyone, but the code for processing them was not removed. So now, if the tracking pixel is not a precise shade of purple, some unexpected shenanigans ensue.


I worked on an app once that used two different, equally ancient libraries to a) generate thumbnails and b) create a png from a pdf. While modifying part of this process I started realizing that there were conditions where you'd get a PDF thumbnail at the end, but its output had a red tint to it.

Input looked fine and invoking each step manually worked fine as well.

Come to find out that certain PDFs contained color calibration information that, combined with how we were calling it, would treat ARGB as RGB. The input would have transparency info defined and the thumbnail generator would happily repurpose the alpha channel as the red channel instead.


The tracking pixel was made my scaling the company logo down to a 1x1 image.


That's brilliant! That way nobody could accuse you of spying. "It's just our logo. What's all the fuss about?"


Obviously it's because mauve has the most RAM.


Page background where pixel displayed was purple


Accessibility. Protanopia affects cones perceiving red color.


Obviously !! The anti-doppler shift trick /s :)


I did a rewrite of a 30 year old bit of perl/php2 over the last year. Not knowing why things were the way they were was really useful for the younger team members and me to get familiar with the codebase and the business context.


Anecdotal: I asked people why they keep incorrectly using jQuery methods and produce ambiguous, difficult-to-maintain code in the year of 2022. (we still have jQuery as a dependency for legacy code.) The response was that they were not aware that native counterparts like document.querySelectorAll exist in the browser. They just copied the old jQuery code, modified them, and it worked.

I am pretty sure this kind of thing exists in any large legacy codebase.


You don't need comprehensive tests for tests to start delivering value.

Figure out the single most important flow in the application - user registration and checkout in an e-commerce app, for example.

Write an automated end-to-end test for that. You could go with full browser automation using something like Playwright, or you could use code that exercises HTTP endpoints without browser automation. Either is fine.

Get those running in GitHub Actions (after setting up the git scraping trick I described here: https://news.ycombinator.com/item?id=32884305 )

The value provided here immense. You now have an early warning system for if someone breaks the flow that makes the money!

You also now have the beginnings of a larger test suite. Adding tests to an existing test suite is massively easier then starting a new test suite from scratch.


You're assuming the existing flow is working perfectly and I agree with you that testing is a godsend. I constantly yell that testing is great. Heck, I even worked for Pivotal Labs that does TDD and pair development, and loved it.

Let's say you start to write tests and start to see issues crop up. Now what? How do you fix those things?

Github actions!? They don't even have source control to begin with. There are so many steps necessary to just get to that point, why bother?

If the existing code base already has extremely slow movement and people are unwilling to touch anything for fear of breaking it... you're never going to get past that. Let's say you do even fix that one thing... how do you know it isn't breaking something else?

It is a rats nest of compounding issues and all you are doing is putting a bandaid on a gushing open wound. Time to bring in a couple talented developers and start over. Define the MVP that does what they've learned their customers actually need from their 'v1' and go from there. Focus on adding features (with tests) instead of trying to repair a car that doesn't pass the smog test.


> Let's say you start to write tests and start to see issues crop up. Now what? How do you fix those things?

I assumed the tests wouldn't be for correctness, but for compatibility. If issues crop up, you reproduce the issues exactly in the rewrite until you can prove no one depends on them (Chesterton's fence and all).

The backwards-compatibility-at-all-costs approach makes sense if the product has downstream integrations that depend on the current interface. If your product is self-contained, then you're free to take the clean slate approach.


> I assumed the tests wouldn't be for correctness, but for compatibility.

You're assuming that the people coming in to write these tests can even make that distinction. How do you even know what the compatibility should be without really diving deep into the code itself? Given how screwed up the codebase already is, it could be multiple layers of things work against each other. OP mentioned multiple versions of jquery on the same page as an example.

Writing tests for something like that is really a waste of time. Better to just figure out what's correct and rewrite correct code. Then write tests for that correct code... that's what moves things forward.


> How do you even know what the compatibility should be without really diving deep into the code itself?

You can pretty much black-box the code and only deep dive when there are differences. Here's what I've done in the past for a rewrite of an over-the-network service:

1. Grab happy-path results from prod (wireshark pcap, HTTP Archive, etc), write end-to-end tests based on these to enable development-time tests that would catch the most blatant of regressions.

2. Add a lot of logging to the old system and in corresponding places in the new system. If you have sufficient volumes, you can compare statistical anomalies between the 2 systems

3. Get a production traffic from a port mirror, compare the response of your rewritten service against the old service one route at a time. Log any discrepancies and fix them before going live, this is how you catch hard-to-test compat issues

4. Optionally perform phased roll out, with option to roll back

5. Monitor roll out for an acceptable period, if successful, delete old code/route and move to the next one.

The above makes sense when backwards compatibility is absolutely necessary. however, the upsides is once you've set up the tooling and the processes, subsequent changes are faster.


All of that, while technically correct and possible, is vastly more complicated and time intensive than a rewrite of what the OPs description of the codebase is.


Yes, it absolutely is - but the trade off is a far lower risk of introducing breaking changes. Depending on the industry/market/clients - it may be the right tradeoff


In my eyes, a rewrite won't be introducing breaking changes. It would be to figure out what functionality makes money, then replicate that functionality as best as possible so that the company can continue to make money as well as build upon the product to make even more money.

We're talking about a webapp here, not rocket science.


The biggest problem isn't even the codebase in this situation.

When you keep finding bugs like that while refactoring and making things better, it will demoralise you. The productivity will stop when that happens.

It also require above average engineers to fix the mess and own it for which there is not much benefit.

Your refactoring broke things? Now it's your turn to fix it and also ship your deliverables which you were originally hired for. Get paged for things that weren't your problem.

If I was a manager and assigned this kind of refactoring work, I will attach a significant bonus otherwise I know my engineers will start thinking of switching to other places unless we pay big tech salaries.

People keep quoting Joel's post about why refactoring is better than rewrite but if your refactor is essentially a rewrite and your team is small or inexperienced - it's not clear which is better.

Parallel construction and slowly replacing things is a lot of unpaid work. Just the sheer complexity of doing it bit by bit for each piece is untenable for a 3 person team where most likely other two might not want to get into it.


> It also require above average engineers to fix the mess and own it for which there is not much benefit.

That's not true, it doesn't require above average engineers. It requires a tech lead that has the desire and backing to make a change, and engineers willing to listen and change. It doesn't require a 10x engineer to start using version control, or to tell their team to start using version control for example..


Source control seems like a straightforward first step, regardless of what approach is going to be taken going forward


One would think, but how do you go from source control to deployment on the production server though? If they were editing files on the server directly, there could be a whole mess of symlinks and whatever else on there. Even worse, how do you even test things to see if you break anything?

It is a can of worms.


Just start somewhere. These guys are making changes, actual functional changes and bug fixes in that environment meaning they already have all the problems you imagine are going to get in the way of fixing this mess. So stop fretting and just start small with one tiny thing. It doesn't really matter with what. You don't even need automated tests necessarily. It's a small simple flow that needs 10 minutes to run the same test steps manually for? Write them down and do it manually, I don't care. Just Do it.

Been there, done that. Slightly differently where they had a test server and prod server. So already better except one day I made a change and copied to prod. Yes it was manual. Just scp the files over to prod. And stuff broke. Turned out someone had fixed a bug directly in prod but never made the change on the test server.

First thing I did was to introduce version control and create a script to do make deployment automatic meaning it was just a version control update on Prod (also scripting languages here). Magically we never had an issue with bugs reappearing after that.

Pretty simple change and you can go from there.

The above code base was over 20 years old and made use of various different scripting languages and technologies including some of the business logic being in stored procedures. Zero test coverage anywhere. You just 'hide' small incremental changes to make things better in everything you do. Gotta touch this part because they want a change? Well it could break anyhow so make it better and if it breaks, it breaks and you fix it. It needs judgment though. Don't rewrite an entire module when the ask was adding a field somewhere. Make it proportional to the change you need to make and sometimes it's not going to be worth it to make something better. Just leave it.


Not sure the little hammer will fix much. And making folks use a method in new code pisses them off. "You say important I do this your way this time, even though there are 1000 examples of doing it the other way. I feel persecuted and your way is pointless, because it doesn't fix everything anyway. And its slowing me down and making me look bad."

Not rational but folks don't have to explain their feelings. You will be hated.


The little hammer definitely fixes. It does it in the same way as water cut the grand canyon. The beauty is that it works over time.

Now as for how to get the other devs on board, I agree with you that you can't just barge in and tell them everything they are doing is wrong etc. I never said to do that and I'm replying to a specific comment in the thread not the original Ask HN.

I.e. when I write about what I've done in the past, I got buyin from my boss and my colleagues on what I was going to do. But I didn't just sit there and kept doing what they had done over the past years. I changed lots of other little things too in the same manner.

So if we do want to talk about the original Ask HN and how to get the existing employees not to hate you, you can start by letting them tell you about what they think the problems are. What are their pain points. They might just not know what to do about them but actually see them as problems too. Maybe they've tried things already but failed or got shot down by others in the company. Maybe they did try to introduce version control but their non tech boss shot them down.

Of course it may not work out. Some people really are just stupid and won't listen even if you try to help them and make them part of the solution.


Startups have runway and can die when big-company processes forced up on them. It can sink them.


I'm not sure where you're pulling that from. There's no mention of startup here. Neither in the original (actually the opposite I'd say, 12 years and just a business unit).

None of what I said is a big-company process in any way. If in your book using source control is a big company process that will sink a startup then be my guest and I will just hope we never have to work together. Source control is a no-brainer that I even use just for myself, have used in teams of two and teams dozens to hundres. The amount of process around is what scale with the kind of company. Source control is useful by itself in every single size of company.


Source control is necessary and simple, yes.

Code review, coding standards, required tests for everything, multiple stages of deployment - are not simple and can stall development. Done wrong they can sink a company.

It's easy to read the worst possible construction on what other people write here. It's never a good idea.

Btw I worked at a startup for 8 years. It was still a startup, depending on new investment to meet the monthly. In any case the described dev group was behaving in a way that used to be typical of startups. And even business units in larger organizations have runway.


Yeah, lot of worms...and if while refactoring things break. You are on the hook for scanning through that complex monster at 3 am and finding the issue and fixing it for no additional pay in most cases.


They can literally copy the whole directory from their local machine to production as a first step for all I care.

How do they test things on production? If there’s a bug how do they revert to the previous version? There are way more issues without source control than with.


Doesn't Git support symlinks? Empty directories could be trouble though. One would have to put a .GITKEEP into every directory before checkin, and a step at deployment time to remove them again.


"Github actions!? They don't even have source control to begin with."

Right: no point in adding any tests until you've got source control in place. Hence my suggestion for a shortcut to doing that here: https://news.ycombinator.com/item?id=32884305


How do 2 junior devs manage to rewrite the entire product while also meeting the ongoing goals of the business?

You're trying to spec features on a moving target.

Even if they were able to do 50% time on the rewrite you'll never actually get to feature parity.

The only viable plan, unless the company has an appetite to triple the dev headcount, is to set an expectation that features will have an increased dev time, then as you spec new features you also spec out how far you will go into the codebase refactoring what the new features touch.


But it is functional. Grandparent post is suggesting that all the currently used functionality should have tests written for it. It makes sense, as that way they can gather the requirements of a rewrite at the same time.


We don't know that it is functional... maybe the company is only making $20m and should be making $60m. Like I said, we tripled the revenue with a rewrite.

What we did was make the case that we could increase revenue by being able to add valuable features more easily/quickly. We started with a super MVP rewrite that kept the basic valuable features, launched, then spent the rest of our time adding features (with tests). Hugely successful.

The key, of course, will be to get 1-2 top notch developers in place to set things up correctly from the beginning. You're never going to be effective with a few jr's who don't have that level of experience.


> We don't know that it is functional... maybe the company is only making $20m and should be making $60m. Like I said, we tripled the revenue with a rewrite.

It's $20m functional. It's possible it could be better but unless this is the kind of huge org where 20m is nothing (doesn't sound like it) you really need the behaviors documented before you start screwing with it. It's very likely this thing has some pretty complex business logic that is absolutely critical to maintain.


> you really need the behaviors documented before you start screwing with it. It's very likely this thing has some pretty complex business logic that is absolutely critical to maintain.

Nothing I said suggested otherwise. Absolutely critical for whomever is doing a rewrite to understand everything they can about the application and the business, before writing a single line of code.


You sound frustrated that you've joined a company with an absolute stinker of a codebase, because you're confident you could deliver much better results having refactored it first. You're managing a group of people probably enormously under-productive because of the weight of the technical debt they're under. Every change takes months. It's riddled with hard-to-fix bugs. It's insecure. There are serious bus factor problems.

Many of us have been in this exact position before, multiple times. Many of us have seen somebody say "our only choice is a full rewrite" - some of us were the one making that decision. Many of us have seen that decision go disastrously wrong.

For me, the problem was my inability to do what I'm good at: write tests, write implementations that pass that test, etc. Every time I suggested doing something, somebody would have a reason why that would fail because of some unclear piece of the code. So rather than continuously getting blocked, I tried to step into my comfort zone of writing greenfield code. I built a working application that was a much nicer codebase, but it didn't match the original "spec" from customer expectations, so I spent months trying to adjust to that. I basically gave up managing the team because I was so busy writing the code. In the end, I left and the company threw away the rewritten code. They're still in business using the shitty old codebase, with the same development team working on it.

If you really want to do the rewrite, accept how massively risky and stressful it will be. The existing team will spend the whole team trying to prove you were wrong and they were right, so you need to get them to buy into that decision. You need to upskill them in order to apply the patterns you want. And you need to tease apart those bits of the codebase which are genuinely awful from those that for you are merely unfamiliar.

Personally, I would suggest a course for you like https://www.jbrains.ca/training/course/surviving-legacy-code, which gives you a wider range of patterns to apply to this problem.


Maybe this was meant as a reply to the main post?


“I won the lottery, you can too. If you don’t buy a ticket, you’re never gonna win right…?”

There is a lot of evidence rewrites are hard to do well, and especially prone to failure.

…you might pull it off, it’s not impossible, sure. …but are you seriously saying it’s the approach everyone should take because it worked for you once?

Here my $0.02 meaningless anecdotal evidence: I’ve done a rewrite twice and it was a disaster once and went fine the second time. 50% strike rate, for me, personally, on a team of 8.

What’s your rate? How big was your team, how big was the project? What was the budget? Did you do it on time and on budget? It’s pretty easy to say, oh yeah, I rewrote some piece of crap that was a few hundred lines in my spare time.

…but the OP is dealing with a poorly documented system that’s very big, and very important and basically working fine. You’re dishing out bad advice here, because you happened to get lucky once.

Poor form.

Good advice: play it safe, use boring technology and migrate things piece by piece.

Big, high risk high reward plays are for things you do when a) things are on fire, or b) the cost of failure is very low, or c) you’re prepared to walk away and get a new job when they don’t work out.


> How do you fix the things that the test prove are broken?

Uhm. The tests don’t do any such things.

> It is better to spend the time figuring out what all the features are, document them

Yes. And the tests you should write are executable documentation showing how things are. It is like taking a plaster cast of a fossil. You don’t go “i think this is how a brachiosaurus fibula should look like” and then try to force the bones into that shape. You mould the plaster cast (your tests) to the shape of the fossil (the code running in production). Then if during excavation (the rewrite) something changes or get jostled you will know immediately that it happened, because the cast (the tests) no longer fit.


> We did a full rewrite and it was quite successful and the company went on to triple the revenue.

Which sure beats some other company coming along and "rewriting" the same or similar functionality in a competing product and killing your own revenue. But it does come down to how big the codebase is and how long it would take for an MVP to be a realistic replacement. If there are parts that are complex but unlikely to need changing soon you can usually find ways to hide them behind some extra layer. Is there any reason you couldn't just introduce proper processes (source control, PRs, CI/CD etc.) around the existing code though?


Kudos to you for successfully delivering in a similar situation. That said, I think your advice is a bit cavalier. The industry is littered with the carcasses of failed rewrites. The fact that you have done it in one context does not mean that this team can pull it off in another.

I'll also say there's a lot of semantics at play here. What is a "rewrite", what is a "test" vs a "document", what is "functional"? I read your main point being that one should avoid sunk-cost fallacy and find the right places to cut bait and write off unsalvageable pieces. The art of major tech debt cleanup is how big of pieces can you bite off without overwhelming the team or breaking the product.


> Writing tests for something that is already not functional, will be a waste of time.

This is not TDD; it's writing tests to confirm the features that work now. Then, when you make changes, you can get an early warning if something starts going south.


Of course a full rewrite can be successful. This is the problem when people base their entire critical thinking on blog posts. They then go on to preach it everywhere as well!


The blog posts are warnings about what not to do. People, naturally, when they don't fully understand something or can't grasp the complexity of something want to rebuild. Because writing also helps us understand that is what we are building. But its a trap, what you've rewritten will never be the same as before and there lies the footguns.

The blogs are plainly stating, "even though you feel you should rewrite, you probably shouldn't."


Or some of us have experienced failed rewrites. It can be a potentially expensive mistake.


> get some testing in place

What is really needed (and almost definitely doesn’t exist) is some kind of spec for the software.


Exactly. If they write tests, they will be just doing TDD where the specification becomes a problem in itself.


It is a 12 year old legacy product. What specification exists other than, "Yesterday it did X when I clicked the button, but now it does not do that anymore."


This is the point: I don’t TDD, but i am a big fan of tests. I’m this case the incorrect spec can be flagged, but all the other incorrect specs will also be there. If your Fix doesn’t break a spec, great, but if it does you can check if that spec was correct. It’s a back and forth between code and business requirements


You must have missed the part where it makes 20M revenue per year.

I gotta love hacker news, people who think the fact a backend is written in horrid PHP means it is "already not functional" while they spend their days learning something like Haskell that make them negative revenue per year.


Who knows if that 20m revenue should be 60m? They could be held back greatly by the fact that the developers are not motivated to change anything.

I also don't know Haskell and have no desire to learn it. I prefer to build products in static compiled languages where I can more easily hire developers.


Yep.

It's also a juggling job from hell so keep a cool head and seek support and resources for what needs to be done.

A big first step is to duplicate and isolate, as much as possible, a "working copy" of the production working code.

You now need to maintain the production version, as requests go on, while also carving out feasible chunks to "replace" with better modules.

Obviously you work against the copy, test, test again, and then slide in a replacement to the live production monolith .. with bated breath and a "in case of fubar" plan in the wings.

If it's any consolation, and no, no it isn't, this scenario is suprisingly common in thriving businesses.


This approach is a trap.

Management need to know that this needs a rewrite, and a more capable team, and that persuing on aggressive roadmap while things are this bad is impossible.

If they say no, and you try to muddle your way through it anyway, you are setting yourself up to fail.

If they say yes, ask for the extra resources necessary to incrementally rewrite. I would bring in new resources to do this with modern approaches and leave the existing team to support the shrinking legacy codebase.


Why would the existing team stick around knowing their jobs would be slowly rewritten into oblivion by others?


Where else are they going to go if they prefer this mess?

Why would they need to be replaced if they’re ultimately convinced to enter the 21st century?


Your suggestion sounds like the strangler fig pattern. While a valuable strategy in some cases, it does present the risk of duplicating poor architecture choices into the new code.

I would normally opt for your suggested approach too. However, based on the description given, I’d most likely recommend a complete rewrite in this case. The architecture appears to be quite poor and the risk of infecting new code with previous bad decision-making may be too great.


Yeah, I agree, full rewrite from scratch are almost never the good approach. It will start a tunnel when you cannot add anything useful to production for months, and you will have no idea when you can finally ship the whole thing and when you do, it will be very risky.

Do things progressively. Read the code, figure out the dependencies, find the leaves and starts with refactoring that. Do add tests before changing anything to make sure you known if you change some existing behaiors.

Figuring out such code base as a whole might be overwhelming, but remember that it probably looks much more complicated than it is actually.


In a team with only two people working on the monster it seems reasonable that they’d be able to manage two development streams at the same time.


This is the correct answer.

For an additional perspective see this classic: https://dhemery.com/articles/resistance_as_a_resource/


All good points, but…

This is a clear case where he needs to look for another job IMMEDIATELY.

Here’s why…

1. The problems listed are too technical and almost impossible to communicate to a non-technical audience meaning business or c-suite.

2. The fixes will not result (any time soon) in a change that’s meaningful to business like increased revenue or speed to market. Business will not reward you if you are successful or provide resources to get the job done unless the value is apparent to them (See #1).

Employment is a game. Winning that game means knowing when to get out and when to stay.

It’s time to plan your exit both for your own sanity and the good of your family.


+1. also start by adding git first and have a test env set up.

A new person who complains about existing code and proposes "Rewrite everything" on week one, will not met with __respect__


+1. came here to say this! it's in prod, making money; bring up the discussion of full rewrite with the management at your own peril. learn to tame the beast by pruning one dead/redundant function at a time, that's the best you can do, both for the project and for yourself!


My first instinct was "get some testing in place" too. That served me well in recent projects where I was in a similar situation. I was wondering if anyone has any advice on how to make sure your tests are... comprehensive? I was fortunate enough to have full flow tests in place from the beginning and a great team which knew the intricacies of the subject matter. We made lists of usecases and then tried to find orthogonal test cases. But that was my naive approach wondering if there are better methods out there. Especially if there is zero testing.


One more thing I’d add; for the love of all that is holy make sure the tests run lightning quick.

What you want to do is first reduce the cost and risk of making changes, to a close to zero as possible.

Then, come up with a broad system design that defines higher levels of abstraction. Your goal is not to redesign the system from scratch but to specify the existing hierarchies which are currently implicit in the code. Are there different modules that naturally emerge? Ok, what are they?

Once you have a sense of what the destination will look like, make tiny changes to get just one module done. Move in little bits at a time, to build up evidence that things can work.

The way to change a culture is to set such a strong positive example that people naturally went to follow. Telling other people their work sucks is not that example, but first pitching in to speed up development cycles can make everyone happy.

And lastly you have at least some responsibility to inform management of the risk they aren’t aware of. Things will go much better for you if you tell your manager that the codebase was built in a way that makes future changes expensive and risky, and this is fine for where the business was but at some point it makes sense to invest in shifting the development velocity/risk curve of the business.


> The way to change a culture is to set such a strong positive example that people naturally went to follow. Telling other people their work sucks is not that example, but first pitching in to speed up development cycles can make everyone happy.

This is the part I'm having the most trouble with. What if you are at a place which is not software minded? Any tips on making them understand?


“Never rewrite” is a popular cargo-cult that sprang from a well known blog article that made the rounds some years ago. The urge to rewrite can be a naive impulse for sure, but there are LOTS of cases where new and better technology can result in tremendous gains, or where a code base is simply too far gone to redeem. The biggest successes of my career have almost all been ground up rewrites of existing products using new technology or techniques that resulted in orders-of-magnitude improvements in performance and ROI. If you can make incremental improvements that’s great, but sometimes it’s just not possible to rewrite “a piece at a time” because there are no pieces, just one big ball of mud. To the original author: If you don’t rewrite this mess, your competitors will. I’d say: lay out the case for an overhaul, stand your ground, don’t implement any new features until you’ve got a clear path to reducing technical debt, and if you can’t get buy-in to an overhaul just leave. What you’re describing sounds like a textbook scenario for burnout and there are lots of other opportunities where you can work on things in ways that you’ll actually enjoy.


This. So much.

I'd argue that the first order of business is getting the code committed to SCM. Then you can coach the team on new branches (features/bugs), and build the culture of using the SCM. Do this before going to the execs and giving the 10,000 meter view.

Go to the execs and get buy in on the scope of what you need. I'd recomment articulating it in terms of risk reduction. You have a $20M revenue stream, and little control/testing over the machinery that generates this. You'll work on implementing a plan to get this under control (have an outline of this ready, and note that you need to assess more to fill in the details). You need space/time/resources to get this done.

Then get the testing in place. Make this part of the culture of SCM use. Reward the team for developing sanity/functionality tests. Get CI/CD going (a simple one, that just works). From this you can articulate (with the team's input), coding/testing standards to be adhered to.

After all this, start identifying the problematic low hanging fruit. Work each problem (have a clear problem statement, a limited scope for the problem, and a desired solution). You are not there to boil the ocean (rewrite the entire thing). You are there to make their engineering processes better, and move them to a more productive environment. Any low hanging fruit will have a specific need/risk attached to it. Like "we drop tables/columns regularly using user input." Based upon the culture you created with SCM/testing, you can have the team develop the tests for expected and various corner cases. From that, you can replace the low hanging fruit.

Keep doing that until the fruit is no longer low hanging. Prove to the execs that you can manage/solve problems. Once you have that done, you can make a longer term roadmap appeal, that is, start looking at what version 2.0 (or whatever number) would look like, and what internal bits you need to change to get there.

Basically, evolution, not revolution. Easier for execs under pressure to deliver results to swallow. Explain in terms of risks/costs and benefits, though in the near term, most of the focus sounds like it should be on risk reduction.


Not a good advice. I have been at Op's shoes, and I inherited a project that was a clusterf, and did a full re-write. It was a lot of work (more than anticipated), but eventually it was very successful.

The original code was just not salvageable. (It was quickly done as a fast hack, and it would break left and right, causing outages).

Just make sure the OP needs to understand what the OG system is trying to do, and what it will take to re-write it to something sane. Don't start it, before understanding all the caveats of the system/project you are trying to re-write.


Do it in small pieces and you'll be there forever - it'll never get done.

Map out the functionality related to the (hard) requirements and kick off replacing the product(s) with something modern and boring.


> Also, respect the team. Maybe they aren't doing what you would, but they are keeping this beast alive, and probably have invaluable knowledge of how to do so. Don't come in pushing for change...

Yes, 3 people creating a revenue of $20 million/year is impressive.

But what if 1, let alone 2 of them quit and/or fall ill? That's way too much risk for this type of revenue.

If a new team member needs a year to just understand how the code is organized, then a well structured and documented rewrite certainly is necessary.


Something this messy is highly likely to have many security vulnerabilities. Maybe start with a scan or pentest and use that as additional justification to get things in order. 20M a year also means that this company can't afford for this application to be compromised.


The strangler pattern of rewriting individual pieces is also what leads to 3-4 incompatible versions of Jquery. You could start with one key page and rewrite it in React or whatever your preference but if you never manage to kill one of the old dependencies you are just making even more of a tangled web.

I would try to identify how entangled some of the dependencies are and start my rewrite with the goal of getting rid of them. But yeah I agree that version control and testing is going to be key here as you any backsliding will probably result in the idea of future refactoring being viewed negatively.


This sounds like solid advice. A rewrite would be a world of hurt, particularly if you don’t have buy-in from the existing team.

Regarding the team, junior they may be, as he says, but they’re rolling with a multi-mullion dollar product. If they’re keeping the product going and continuing to add business value, then they’re doing something right. Their engineering practices might be questionable, but they seem to have a solid product.

However, getting testing in place is going to be a challenge. I’ve encountered systems that sound similar to this one (perfectly functional, zero discernible architecture, not remotely designed with any kind of testing in mind.) It’ll be difficult to convince the suits that introducing testing has any real value when you’re starting from zero.

The first thing than comes to mind is the strangler fig pattern. Sounds like a useful idea in this instance.

> …an alternative [to a re-write] is to gradually create a new system around the edges of the old, letting it grow slowly over several years until the old system is strangled.[0]

[0] https://martinfowler.com/bliki/StranglerFigApplication.html


This is exactly the right advice. Full rewrite might look good on the resume but will be a late error prone disaster.

Start with tests can't emphasize this enough.


> You can delete code as long as the tests pass.

It's true that poorly maintained code contains a lot of pieces which should be deleted but if tests where added post-hock it is hard to be sure that they cover all use cases.

After adding basic tests I would suggest to improve logging to get good understanding of how the software is used. Better to store them in a database which allows quick queries over all data you have (I'd personally would use ClickHouse but there are other options). But even with good logs you need to wait and collect enough data otherwise you can miss rare but important use cases. E. g. something which happens only during the tax season.


Basically every time I decided for a full rewrite I ended up thinking "thank god I made that decision, the new architecture is much simpler" (and no, it didn't just seem simpler to me).


The big rewrite works - but only if you have a team you can trust. You need a new team of seniors to pair with the current team, promise a promotion to the current team at the end of the task.

Committing to an iterative approach is what I do when I don't have enough authority/ political tokens and I can't afford a rewrite.

Over time it gets less and less priority from the business and you end up with half a codebase being crap and half codebase being ok and maintaining stuff is even harder.


Agreed, full rewrite is a horrible idea. Source: worked on a rewrite of a project that was like this: PHP from 2003, 7 figures in revenue, written by someone who was not a developer, no version control or testing. And it failed horribly.

I have tactical suggestions, but the strategy is simple: move toward more modern software practices, one step at a time.

But first, the elephant in the room. You say you need to help the project

> without managing [the team] directly

Who does? How can you help them?

Because you don't have direct authority, all the tactics and suggestions mentioned here won't be as helpful as they would if you were the manager in charge. And it's hard to offer concrete advice without knowing exactly how you are connected. A principal in the same company and want to help? A peer of the manager? A peer of the team members? Each of these would have different approaches.

And how much time do you have to help? Is this something you are doing in the shadows? Part of your job? Your entire job?

With that said, here's my list of what to try to influence the team to implement. Don't worry about best of breed for the tools, just pick what the company uses. If the tool isn't in use at the company, pick something you and the team are familiar with. If there is nothing in that set, pick the industry standard (which I try to supply).

1. version control. Git if you don't have any existing solution. GitHub or GitLab are great places to store your git repos

2. bug tracker. You have to have a place to keep track of issues. GitHub issues is adequate, but there are a ton of options. This would be an awesome place to try to get buy-in from the team about whichever one they like, because the truth it is doesn't matter which particular bug tracker you use, just that you use one.

3. a build tool so you have one click deploys. A SaaS tool like CircleCI, GitHub actions is fine. If you require "on prem", Jenkins is a fine place to start. But you want to be able to deploy quickly.

4. a staging environment. This is a great place to manually test things and debug issues without affecting production. Building this will also give you confidence that you understand how the system is deployed, and can wrap that into the build tool config.

5. testing. As the parent comment mentions, end to end testing can give you so much confidence. It can be easy to get overwhelmed when adding testing to an existing large, crufty codebase. I'd focus on two things: unit testing some of the weird logic; this is a relatively quick win. And setting up at least 1-2 end to end tests through core flows (login, purchase path, etc). In my experience, setting up the first one of each of these is the toughest, then it gets progressively easier. I don't know what the industry standard for unit testing in php is any more, but have used phpunit in the past. Not sure about end to end testing either.

6. Documentation. This might be higher, depending on what your relationship with the team is, but few teams will say no to someone helping out with doc. You can document high level arch, deployment processes, key APIs, interfaces, data stores, and more. Capture this in google docs or a wiki.

7. data migrations. Having some way to automatically roll database changes forward and back is a huge help for moving faster. This looks like a viable PHP option: https://laravel.com/docs/9.x/migrations which might let you also introduce a framework "via the side door". This is last because it is least important and possibly more intrusive.

None of these are about changing the code (except maybe the last one), but they all wrap the code in a blanket of safety. There's the added bonus that it might not trigger sensitivities of the team because you aren't touching "their code". After implementing, the team should be able to move faster and with more confidence.

Since you are not the direct manager, you want to help the team get better through your influence and through small steps. That will build trust and allow you to suggest bigger ones, such as bringing in a framework or building abstraction layers.


Agree with this approach 100%


Yes same. Sometimes you see a frankenstein code and devs get all emotional and wants a full rewrite or die attitude. Maybe take a step back and migrate piece by piece.


> First off, no, a full rewrite is not only not necessary, but probably the worst possible approach. Do a piece at a time. You will eventually have re-written all the code, but do not ever fall into the trap of a "full re-write". It doesn't work.

I've seen systems where the entirety of the codebase is such a mess, but is so tightly coupled with the business domain, that a rewrite feels impossible in the first place. Furthermore, because these systems are often already working, as opposed to some hypothetical new rewrite, new features also get added on top of the old systems, meaning that even if you could rewrite them, by the time you would have done so, it would already be out of date and wouldn't do everything that the new thing would do (the alternative to which would be making any development 2x larger due to needing to implement things both in the old and new versions, the new one perhaps still not having all of the building blocks in place).

At the same time, these legacy systems are often a pain to maintain, have scalability and stability challenges and absolutely should not be viewed as a "live" codebase that can have new features added on top of it, because at that point you're essentially digging your own grave deeper and deeper, waiting for the complexity to come crumbling down. I say that as someone who has been pulled into such projects, to help and fix production environments after new functionality crippled the entire system, and nobody else knew what to do.

I'd say there is no winning here. A full rewrite is often impossible, a gradual migration oftentimes is too complex and not viable, whereas building on top of the legacy codebase is asking for trouble.

> But before you re-write once line of code - get some testing in place. Or, a lot of testing. If you have end-to-end tests that run through every feature that is currently used by your customer base, then you have a baseline to safely make changes. You can delete code as long as the tests pass. You can change code as long as the tests pass.

This is an excellent point, though! Testing is definitely what you should begin with when inheriting a legacy codebase, regardless of whether you want to rewrite it or not. It should help you catch new changes breaking old functionality and be more confident in your own code's impact on the project as a whole.

But once again, oftentimes you cannot really test a system.

What if you have a service that calls 10 other services, which interact with the database or other external integrations, with tight coupling between all of the different parts? You might try mocking everything, but at that point you're spending more time making sure that the mocking framework works as expected, rather than testing your live code. Furthermore, eventually your mocked data structures will drift out of sync to what the application actually does.

Well, you might try going the full integration test approach, where you'd have an environment that would get tests run against it. But what if you cannot easily create such an environment? If there are no database migrations in place, your only option for a new environment will be cloning an existing one. Provided that there is a test environment to do it from (that is close enough to prod) or that you can sufficiently anonymize production data if you absolutely need to use it as the initial dump source, you might just run into issues with reproducibility regardless. What if you have multiple features that you need to work on and test simultaneously, some of which might alter the schema?

If you go for the integration testing approach, you might run into a situation where you'll need multiple environments, each of which will need their own tests, which might cause significant issues in regards to infrastructure expenses and/or software licensing costs/management, especially if it's not built on FOSS. Integration tests are still good, they are also reasonably easy to do in many of the modern projects (just launch a few containers for CI, migrate and seed the database, do your tests, tear everything down afterwards), but that's hard to do in legacy projects.

Not only that, but you might not even be fully aware how to write the tests for all of your old functionality - either you need to study the whole system in depth (which might not be conceivable), or you might miss out on certain bits that need to be tested and therefore have spotty test coverage, letting bugs slip through.

> Once you are at that point, start picking off pieces to modernize and improve.

It helps to be optimistic, but for a plethora of reasons, many won't get that far. Ideally this is what people should strive for and it should be doable, but in these older projects typically the companies maintaining them have other issues in regards to development practices and reluctance to introduce tools/approaches that might help them improve things, simply because they view that currently things are working "good enough", given that the system is still generating profits.

Essentially, be aware of the fact that attempts to improve the system might make things worse in the short term, before they'll get better in the long term, which might reflect negatively upon you, unless you have sufficient buy-in to do this. Furthermore, expect turnover to be a problem, unless there's a few developers who are comfortable maintaining the system as is (which might present a different set of challenges).

Ideally, start with documentation about how things should work, typical use cases, edge cases etc.

Then move on to tests, possibly focusing on unit tests at first and only working with integration tests when you have the proper CI/environment setup for this (vs having tests that randomly fail or are useless).

After that, consider breaking the system up into modules and routing certain requests to the new system. Many won't get this far and I wouldn't fault you for exploring work in environments that set you up for success, instead of ones where failure is a looming possibility.


i'd do it that way too,

- tests to cement interfaces

- gradually write module supporting this interface

- replace module on test clone and bench / retest it

when this module is ok, do another


Huh. You are literally saying do a full rewrite. But it's also the worst idea?

Edit: A full rewrite always meant replacing every part of a system. Whether you do it gradually doesn't really matter.


"Whether you do it gradually doesn't really matter."

It absolutely DOES matter. A gradual rewrite is much more likely to work than a stop-the-press rewrite.


It's still a rewrite. The crux of the statement I made.


There problem with a classic full rewrite is that the existing system is thrown away immediately. All the existing features are not available in production until the rewrite adds them back in. Often incomplete, buggy, changed beyond all recognition, or a combination of all of these. That obviously sucks and is the reason the classic rewrite is rarely done. However, it is clear that something must happen.


"Full rewrite" is a description of the end state, not the process.

The best way to do a full rewrite is incrementally, with test support and consideration for natural separation of internal subsystems.


The best way to do a rewrite may be incremental, but the terminology of "full rewrite" doesn't usually refer to an incremental rewrite, it refers to starting from scratch.


I don't think that's true -- a "full" rewrite is used in contrast to a "partial" rewrite, where only part of the system is replaced. It's called a "full" rewrite because the goal from the start is to fully replace the system with new code.

Consider that if this were not true, then there would be no way to describe an incremental full rewrite, nor any way to describe a from-scratch replacement of a subsystem.

I've written on this topic before, for example https://increment.com/software-architecture/exit-the-haunted...


He’s saying to Ship of Theseus the codebase. Don’t build a new ship and then burn down the old ship. Replace the old ship piece by piece in place.


That only works if the new pieces correspond to old pieces. If there's no good structure to build on, the units to be replaced will constrain the architecture of the new ship.

At some point you end up trying to change a pumpkin boat into an aircraft carrier, and there's no obvious way you can do that one piece at a time.


> If there's no good structure to build on, the units to be replaced will constrain the architecture of the new ship.

Which is why you do it in stages: add scaffolding until local rewrites are possible, then rewrite the business logic, then tear the scaffolding down.


That's a good analogy actually. Scaffolding is a kind of temporary test structure that you can use to maintain function while you figure out something better.


Maybe there are some underlying architectural problems that need to be addressed, but it would be impossible make those changes from the current situation. It sounds like it is impossible to even know what code is live vs sitting on the server. How do you even know you have a firm grasp on the current architecture when it is unclear what code is even running the product?

A lot of low hanging fruit to be addressed that will likely lead to meaningful improvements. Once the code is in better shape and some unfortunate legacy pattern is identified, than it can be considered time to re-tool the architecture.


Agreed. The first thing to do is figure out WTF is going on. This is perhaps the hardest kind of thing to do as a developer.


Full rewrite generally means stop the presses we are gonna migrate this whole thing from here to there and no new features until it's done (hint it never gets done).


I’ve only ever witnessed ship-of-Theseus style migrations and those also never get done.


Does not compute... Ship of Theseus is just regular old development of course it never gets done but new features aren't put on hold.


I mean like “we want to replace X with Y”. Y incrementally starts replacing X, but 100% migration is never achieved, meaning double the API surface area exists indefinitely.

Because the migration doesn’t block new features, that means the org gets tired and reallocates the effort elsewhere before it’s ever done, with no immediate consequences. Rinse and repeat.


I think you've not witnessed Ship of Theseus, but "build Ship2 next to Ship1 and start using Ship2 while Ship1 is still being used and keep saying you're going to migrate to Ship2 eventually but meanwhile Ship1 and Ship2 diverge and now you have 2 ships".

I recently witnessed this mess and it is an enormous mess. Don't build Ship2 in the first place. Instead, replace Ship1's mast and sails, and rudder etc until you've replaced all the parts in Ship1. That's the SoT approach.


Right but how do you replace the masts? Don’t you have to build mast2 and then tear down mast1 if you want to have continuous propulsion?


Yes, can you see how that's quite different from building a second ship?


In my comment, X and Y are different masts, not different ships.


I understand now.


A "full rewrite" means that after the completion of the rewrite, the old code has been fully replaced by new code.

What you're describing is a "stop-the-world" rewrite.


I think you are being needlessly pedantic. Everyone understands that "full rewrite" means "restarting from scratch" in this context, especially since the poster was very clear that eventually everything will be touched.


They’re saying to do it, eventually, incrementally and not all at once.


And critically… never completing an incremental rewrite doesn’t matter. Everything remained working the whole time and continues to make the company money. And as a bonus, you were also able to make feature changes that the business wanted at the same time. It’s classic XP, when the money runs out, the system still works!


> this code generates more than 20 million dollars a year of revenue

From a business perspective, nothing is broken. In fact, they laid a golden goose.

> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

My mistake, they didn't lay a golden goose--they built a money printer. The ROI here is insane.

> productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

But you just told me they built a $20M revenue product with 3 bozos. That sounds unbelievably productive.

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers

You should consider quitting your job.

As far as the business is concerned, there are no problems... because well... they have a money printer, and your team seems not to care enough to advocate for change. Business people don't give a damn about code quality. They give a damn about value. If 2003 style PHP code does that, so be it. Forget a rewrite, why waste time and effort doing simple refactoring? To them, even that has negative financial value.

From their perspective, you're not being paid to make code easy to work with, you're being paid to ship product in a rats nest. Maybe you could make a business case for why its valuable to use source control, dependency management, a framework, routing outside of nginx, and so on... but it doesn't sound like any of that mattered on the road to $20M a year, so it will be very difficult to convince them otherwise especially if your teammates resist.

This, again, is why you should consider leaving.

Some developers don't mind spaghetti, cowboy coding. You do. Don't subject yourself to a work environment and work style that's incompatible with you, especially when your teammates don't care either. I guarantee you will hate your job.


In my opinion OP should seriously consider this advice.

I really mean nothing patronizing here, but I suspect OP does not have the corporate experience to handle this situation. This is a corporate equivalent of a double-black diamond downhill route. OP was hired by people who have little understanding of tech and already came in with guns blazing. I might almost wonder if OP's a sacrificial lamb.

But, the tech advice of not doing a rewrite, making tests, soothing any hurt feelings, creating local instances will help. Make the everyday coding experiences of the tech team nicer. Source control, local instances, and unit/integration/E2E tests are a gimme.

The old rule of thumb applies: pick only 2 of speed, cost or quality. You cannot have 3.


> I suspect OP does not have the corporate experience to handle this situation.

I agree with this. OP doesn't say, but reading between the lines, corporate at best doesn't understand the ramifications, but corporate doesn't care about the ramifications.

They're getting 20 million in revenue from 3 cheap devs. Things are going great, according to corporate. They're not going to learn, and OP is going to get blamed when things can't get done.

I just quit because I was placed in a similar situation. The CEO, who does have a CS background albeit ancient, insisted there was nothing wrong with the tech stack that couldn't be solved by vertically scaling and then horizontally scaling. We were at the limits of the former and the architecture made many important parts impossible for the later, but that's another discussion.

The problem wasn't tech scaling, it was process scaling. We really couldn't divide work easily because there were often conflicts. People would join, see the horrible code, then leave. We specifically had to hire off-shore junior devs who didn't know any better and snowball them. I felt the last part was unethical and didn't want to be part of it any longer.

OP is not doing any favors for themselves, and especially not for the junior devs on the team. This job is going to set back the career for the junior devs. They're wasting their time on ancient methods and technologies.


>snowball them

Could you define this phrase and what English dialect it's from?


> what English dialect it's from?

I assume American English.

Prior to the sexual slang made popular by the movie Clerks, snowballing in the context I used basically means to blindside or con someone.

One definition on Urban Dictionary:

"A situation where a criminal has found themselves in possession of an easy target and proceeds to rob them and leave them mortally wounded for fun, a synonym for getting iced."

There's also snowballing meaning a problem getting bigger and bigger when unaddressed. I'm probably using it in an older, not-exactly-mainstream way.


In my dialect, which is some sort of American English, trying to snow someone means trying to BS or con them.

Blindsiding someone means taking them unawares - hitting them when they aren't looking, physically or metaphorically.

A speedball has cocaine in it, which is sometimes known as snow.

"Iced" can mean killed, but the Urban Dictionary definition is oddly specific

I agree about "snowballing" meaning increasing in size or momentum, but it doesn't have to refer to a problem.


Quite literally. I assume you never got a snowball in you eyes?


> OP does not have the corporate experience to handle this situation

Even before that, s/he doesn't even seem to understand any measure of business. $20m/year with 3 people is BIG. Any disturbance to whatever makes that happen will hurt the business greatly. When a full rewrite shakes the boat and causes a few millions of revenue loss or the loss of potential market share or opportunities, they will rightly fire him.

Best route would be to improve things without hurting anything. It doesn't matter things are 'properly done'. It matters whether the organization survives and prospers. What's 'proper' is redefined every 3-5 years, most of the time without objective basis anyway. So no need to hurt the business to force the current software paradigm.


Is the entirety of the whole company just 3 people? It doesn't sound like it. That 3 person team is just 'tech' it seems - there may be 5-10 managers/sales/support/etc people. And... $20m is revenue, not profit. If the cost of their sales is, say, $15m... and there's 15 people working at the company, that's quite healthy, but it's not some money-printing goldmine at this stage.


> Is the entirety of the whole company just 3 people?

From what is told in the summary, it seems like the software stack and these 3 people constitute the core of the business that is going on. The business may be something totally different. But it seems to be running on that software.

> And... $20m is revenue

Even if they have lower margins than what you yourself imagined, $20 m revenue/year is still a gigantic amount. You can improve upon whetever margin or inefficiency in the process and increase the net profit - optimize vendors, costs, sales pipeline.

The difficulty in modern Internet business is getting to some place at which you can command $20 m revenue from any market. Finding the customers and the users. Not the margins.


I’ve worked at a $100 million revenue online company that didn’t make any profit. OP says money is tight after covid, so seems like it the business is connected to a (physical) market that was affected by the pandemic. He says there is an extensive roadmap by a management team that are not familiar with the technical details, so it seems like the technology is not the core business.

It’s not difficult getting to a high revenue fueled by aggressive and expensive acquisition using money from investors who gets dazzled by growth numbers, but if your customer lifetime value is low and you’re not a pure SAAS business which means margins won’t automatically improve with scale, turning that company profitable can prove very difficult.


Additional reason to switch jobs: It is perfectly working for now, as far as the company is concerned, the three man team is doing a tremendous job, and the team conviced itself of the same. Now they have to ship additional stuff, and think they are perfectly able to do so. You are there to make sure they ship, most likely they wont, because things work until they don't anymore. When they don't, and the team isn't shipping, guess who gets the blame? The one team member, the one thing that changed. And after that, the mess will only grow.

Edit: Typos. I suffer severe typing legasteny more often than not...


Best answer. If the money is ok, and the environment not too toxic/stressful, you might just see it as a challenge to secretly improve a codebase without anybody noticing, while still delivering what the higher-ups want to see. Or maybe just scratch the first part and try to see how much further you can push that turd with every coding crime imaginable. One-up the juniors in ugly hack Olympics. Ship a feature and put it on your CV before leaving.

Otherwise, walk away immediately.


> a challenge to secretly improve a codebase without anybody noticing

That is how it should be done in any case anyway. Improvements should get slowly rolled out without disturbing the users and the business.


> That is how it should be done in any case anyway.

Not exactly. IT management should be always telling people stuff like "did you notice that the integration with XYZ that never worked well stopped failing?" or "did you notice that we delivered those few last features at record time?" and explaining why.


That is assuming that things are failing. $20m/year with 3 people does not look like anything is failing to me.


That is assuming things are improving. If you aren't improving anything, then yeah, you don't have anything to say.


This is very insightful. I learned it kind of the hard way. The business world is a mess. Requirements are a mess and always changing. This leads to messy code that requires a lot of time to clean up. You don't have time for that as long as there is always more customer wishes and projects coming in. As long as the business keeps working there's always something of top priority coming in. The pain starts growing but the steaming pile of code just doesn't collapse. It just kind of keeps on working while you are adding more and more code. Sure, the pain is big and progress is quite slow but what's the alternative?

My advice would be to listen to the developers, to understand them and the business. To understand what they really need. What a viable path forward would be. A complete rewrite, a second system for new developments, many more developers or something. Or maybe it is the optimum solution right now because the whole company is so messy and your job is not to change the company structure. Then you maybe you could support them by slowly enhancing their skill set and accept what you can't change. Doesn't sound like fun? Then leave soon, staying won't do you any good.


This 100%. This line is an immediate nail in the coffin:

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

I've been at a company not unlike this... several MLOC of VB WinForms that was total spaghetti, but a highly successful app that brought in a lot of revenue. In our case the majority of the dev team was in agreement that the situation wasn't sustainable, and at first (meaning when I joined the company) we had engineering leadership who mostly agreed. They brought in several rounds of consultants to evaluate the code base and announced plans for a major modernization effort. But the consultants largely agreed with the dev team that the code was in such bad shape that it effectively needed a rewrite. At one point we did a prioritization and t-shirt sizing exercise and came up with _30 man years_ worth of items in the "critical tech debt" bucket. Apparently engineering leadership was not aligned with the C level suite about how much money they were willing to spend on this thing, because within the next year there was 100% turnover of management in the engineering org. A couple of (talented!) people who had been hired for the modernization effort left first, then the other engineers who knew the code base well followed. Last I heard the company was limping along relying on Eastern European and Indian contractors to keep the lights on.

In short OP, you can probably get some wins; maybe some major process improvements like using source control, maybe more minor things like introducing patterns or frameworks so that net new code isn't a total mess. But there is zero chance that you're to do anything like a rewrite or major refactoring without leadership understanding that they're in an unsustainable situation and a willingness from them to invest time and money to fix it.


Seems so many of us have been hired at one of these companies. Same here. No source control. No bug tracker. No tests. There was no formal system for producing builds--releases to customers were simply copied from the workstation of whichever engineer could build it successfully today. There were so many bugs and crashes, it was hard to even get through basic customer use cases. There was no spec. There was no product roadmap or plan. Sales would sell something, then run downstairs and say "We just sold XYZ, you need to implement XYZ in the software blob somewhere!"

The CEO/Founder wouldn't even consider a refactoring or cleanup session, let alone a full or partial rewrite. Only features drive sales, and we've sold so many things we don't have, so all he wanted was feature cram. Every so often, a VIP customer complained about some major use case that simply didn't work, so only in those cases was bug fixing permitted. And in those cases, the VIP customer would get a custom bespoke build made with the bug fixed. I was hired because the last person of my seniority could not cram features in fast enough and gave up in disgust.

They only got source control because I came in on a weekend, unpaid, to do it. I lasted a little over a year. Bootstrapped founder (and sole shareholder) eventually sold the company for ~$150M. Sometimes it seems there is no justice in the world :)


This is why we as a software world need some minimum standards for stuff that deals with sensitive information of users.

Don't get me wrong, if it works it works, but the question is for how long and who will suffer when it doesn't?

Also from a business perspective: If I were the CEO of that company I'd probably like to know that there is something built on sand and a huge technological dept. It is a cashcow now, but I'd like to ensure it still can be one in the future. And for this some level of maintenance would be expected.

Same thing for reliabilty. If as a CEO I knew the entire functioning of the thing that brings the cash hinges on one developer remembering whether index_john-final.php or index_peter-final-final.php is the latest code I would probably have a hard time sleeping.

That means the minimum OP should do is explain the situation neutrally and your point of view is certainly something he should weave into this. In the end the higher ups need to know this, and what they decide to do is their thing, but OP needs to make them aware of why this could potentially endanger the service in the future. If they then decide to take that risk — so be it.


It wasn't necessarily written by those 3 devs. They're just the current team. Granted, they probably have been that for a long time because of the resistance to change, but the brightest minds are probably long gone.


I'd bet it's B2B and has an expensive sales division that top management believes (tbh maybe even rightly) is the real revenue driver.


Good answer.

Some developpers seem to think that their jobs is to engineer nice and beautiful systems. It's not.

As a developper, you're getting paid (fyi the minimum so you don't leave the company) in order to maximise total shareholders' returns. That's it.

The business doesn't care if the codebase is garbage, with massive technical debt, nor if you struggle working with it. That's literally not even a problem as long as it is concerned.


Nice and beautiful systems aren't the job, but a system that will continue working in the future while still allowing people to keep adding features and that won't suffer massive security breaches really is. It sounds like the current system is one bad feature implementation away from hosing the database, one hardware fault away from no longer existing, and one interested hacker away from a complete compromise.


You are right and obviously good engineering practices are not incompatible with the company’s interests.

I was more reacting to OPs attitude, where he asks how to fix everything without even being clear if he is managing the team or officially in charge of the product.

I may be mistaken but the impression I have is that the management is perfectly happy with a bad system and expects the engineers to simply go along. This is then a management decision. It may blow up later and sink the business (or not, it can just hold for the lifetime of the product).

As far as OP is concerned, this situation would mean that he should probably leave the ship now. (As staying would probably result in 1. No marketable skill development (being stuck with a garbage codebase) 2. Burn-out has he takes on the Herculean task of cleaning it up. 3. Be blamed if something bad happens.


Usually nothing changes for me as a developer if shareholders get bigger returns on their investments. So I don't really care.

What I care about is the code quality because good code makes my job easier. I inherited the code, not the business.


Well it is the management’s job to make your work aligned with the company’s best interests.

It is possible that you’re already doing what you’re getting paid for. In that case you shouldn’t go out of your way to increase the company’s profits (and nobody expects you to).

(In my original post I didn’t say that as a developer you should code thinking of quarterly results, I just stated the obvious that you are employed for the shareholders to get money back)


So there was a physicist, an engineer, and a business guy, and they were discussing God.

The physicist said that God must be a physicist, because He had to know about matter and energy and so on.

The engineer said that God must be an engineer, because He had to do something useful with the matter and energy - turning chaos into order.

And the business guy asked, "Where do you think all that chaos was coming from?"


Only 3 developers maintaining a horrible codebase is a massive business risk. In this market, they could easily leave for better jobs within months of each other. Especially if they're junior and not company lifers. They money printer will print money until one day, it suddenly doesn't.


Exactly. I often think of scenarios such as this in evolutionary terms. Imagine a doppelgänger competitor in this space, exactly the same, except they're pretty content with their crappy tech ecosystem and focus on delivering features (albeit, more slowly than they might with a more modern system.) And so...

1. You focus on the advice here, adding tests (which IMHO is actually pretty difficult on a legacy poorly architected/documented system), source control, refactoring, etc.... with any time remaining in the schedule devoted to adding features.

2. Your doppelgänger chugs along and releases several more features than you do, giving them a market edge.

What happens next I suppose depends upon what difference those extra features make. If the delta is small, you may be able to pull out ahead, in the long run. But if it's large, then your company may start to lose revenue / customers. Then, the screws will likely tighten even harder, and you'll be forced to sideline refactoring efforts and double down on delivering features. And then those refactoring/cleanup efforts will bit rot and you'll find yourself back roughly where you started, except now you're behind your competitor.

There's a quote I can't seem to find atm that summarizes this. It was something along the lines of "With php our product is already up and gaining market share meanwhile our competitor is still futzing with their rails configurations." (If you know the actual quote I'd love to see it.))


> But you just told me they built a $20M revenue product with 3 bozos. That sounds unbelievably productive.

This doesn't indicate how profitability, and seems to ignore that management/owners might have had something to do with it. A well-connected industry player is better poised to start/found/build/grow a company to that level than someone with 0 experience.

And... yes... quitting to something which matches the OPs expectations will likely be better all around than trying to 'fix' something people aren't asking to be fixed (it seems).


$20M revenue is not the same as $20M profit.


It’s not the same, but if those 20M is primarily generated by the software, then it’s those 3 ppl, who contribute to the top line. The rest, like sales, marketing, are irrelevant: fire them and the product will keep generating revenue off the existing customer base. It will stop doing so, however, if the product brakes. So, the post above is right to an extent, this is the golden goose. ))


Unless the revenue is for products ordered on the site and shipped to paying customers. Believe it or don't, this is still done at some sites that are not Amazon.


Not if they're spending $19m in google ads to make $20m in revenue.


> team is 3 people, quite junior.

Even at FAANG salaries this wouldn't be that much compared with $20M


You don't know what the costs are though. The site could have huge costs of content acquisition or any number of reasons to not be making anywhere near $20 million profit.


Revenue isn't the same as Earnings Before Salaries either.

E.g. maybe it's an e-wholesaler or widget reseller, bought $19M goods and sold $20M. Or maybe it was much slower than expected, they actually bought $25M goods and are burning 500k/month on warehousing. Or whatever.


Yes, but the parent is saying this could be an e-commerce website, or construction company, etc. But having an iOS app feels unnecessary for most of these businesses


a lot of field operations these days are done on phone or tablet apps, so I can totally see an ios developer being vital for internal processes


The engineering team might be 3 people but not the whole company.


Assuming profitability is even a problem, if $20M in revenue is coming from just 3 devs, the driving cost of the company isn't the tech. It's other parts of the company. That would be be another red flag against the leadership.


>> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

> My mistake, they didn't lay a golden goose--they built a money printer. The ROI here is insane.

We can not make that conclusion. Presumably the business still needs salespeople, support, etc.


P.S. I am not saying that ROI is bad, either. Just that we can't say very much either way.


I think the thing is that the codebase after 5 years of development was already capable of making 19M/year, and the past 7 extra years have just added 1M/year. The next year of development will not add any more because the thing is collapsing under it’s own weight.


so it will just be 20 mil a year with 3 jr devs

sounds ok?


Only in absolute terms. Kind a feels like a missed chance if you actually have a 100M market.


That's what I was thinking as well. Run for you hills.

It's likely a losing up hill battle.


Great answer. Developers are gunna hate it


THIS. A million times this. The world is about solving real user problems. And obviously this code solves a huge 20M USD problem. And the code is obviously so good, that even 3 "bozos" (to use your language) can manage and maintain in for decades. This is the holy grail. I wouldn't change a thing and instead ask: how can you make this generate 40M USD a year? This is how you will add real value to the company, and management will love you.


If the mess generates $20m a year, that's great and I agree with you!

If the mess generated $20m last year and it's projected to generate $20m next year, that's a problem.

If the second case is true, I believe it's somewhat the responsibility of the OP to sell solving this long-term problem to the _rest of_ business. If they hired him as an expert in that area, they should listen to him.

If that fails, leave.


Sheesh, I had to look this far to find this comment, half of the top comments were discussing how much testing OP should do and what manner he should approach the act of refactoring things while ignoring that the "monster" generates a boat load of money for the company. I love hacker news a lot but people seriously need to get out of their developer box and look at the big picture.


Smells like executive stupidity to me.

There is some consensus that you can't fix stupid.

Even if you are very talented at repairing airplane engines in flight.


The profit at all costs, just make the next quarter's numbers (so we get our bonus), attitude is what leads to disasters like Twitter.

Their pursuit of profits above all else have likely gotten people killed. They represent a clear and present danger to US National Security.


This is the sanest answer. No amount of leadership is going to help an incompetent team. A codebase with massive technical debt, tight coupling, and accidental complexity will be hard to improve incrementally. Impossible without competent engineers.


I agree it's the sane answer. But I don't think these engineers are incompetent. They lacked direction, accidentally followed worst practices, and _still_ came out on top. I would say they are good engineers but perhaps bad project managers / architects.


You don't not use source control because nobody directed you to and you 'accidentally' .. what, forgot about it?

You don't use it because you haven't heard of it; = not competent.


If someone incompetent can make me $20mil a year in revenue, bring on incompetence!


I find it hard to imagine you’d never heard of source control by now. You’d have to have been living under a rock for the past 15 years.


Or be a bona fide 'script kiddy', learnt some WordPress PHP or whatever and got a job as 'webmaster' or something straight out of school (UK-sense, I specifically mean no university), no formal CS/software eng. training, never properly an intern/junior trained by people who know what they're doing.

I'm sure it happens. And then you get the next job with 5y PHP experience or whatever, employer doesn't mind no formal training (not that I'm saying they should in general - but if they're non-technical hiring someone to 'do it', or first hire to build the team or whatever, then they probably should as a reasonable proxy!), rinse and repeat.


If such a team of 3 people comprised of script kiddies and 5y PHP coders are going to create a $20m/year product, you can be sure that they will take precedence over anyone who was 'properly' educated in cs when it comes to hiring.

> I'm sure it happens

Yeah it does happen. While using the Internet, quite frequently, you are looking at such products developed by such teams, making millions of dollars a year. Even as the good engineering that is being done at FAANG is now being questioned over profitability, with even Google talking about 'inefficiency'.


The shitty software probably isn't the product. It could be some sales/inventory management tool or whatever, that before they got some 'script kiddies' in was just some forms in Microsoft Access (is that what it's called.. the forms on top of database tool we had to learn in ICT at school) orwwhatever.

I think many people here are reacting to $20M forgetting not everything's a SaaS/in the business of selling software (but mostly still has some (in-house) software somewhere).


> The shitty software probably isn't the product

The shitty software is what sells the product, from the description. Even if the shitty software is a sales/inventory management tool or 'whatever', from the description it is obvious that it is vital to whatever business they are doing.

It doesn't matter whether it was built with Microsoft Access and Excel files. If its contributing a major part of that $20m /year, its not shitty, its golden.

Anyone who understands the trials of modern business, including any tech lead who had to deal with even merely stakeholders and low-level business decisions would prefer to have a $20 m/year sh*t before a well-crafted, 'properly built' architecture. The difficult thing is getting to that $20 m/year. The difficulty of rearchitecting or maintaining things pale in comparison to that.

> I think many people here are reacting to $20M forgetting not everything's a SaaS/in the business of selling software (but mostly still has some (in-house) software somewhere).

Everyone is aware of that. Many are also aware that getting to $20m/year in WHATEVER form is more difficult than architecting a 'great' stack & infra.


Well, I don't agree. You'd struggle to do it without any software at all these days, but you can certainly do it without anything written in-house.

My point about Access (or Excel or whatever as you say) was that that would be the very early days of something starting to happen in-house, that wouldn't even be the hypothetical 'script kiddies'.


> but you can certainly do it without anything written in-house

Nope. Not really. Your average SV startup idea in which the end users will do some simple, but catchy things with your app - yeah, go all no-code if you want to get it started.

But, in real business, in which there are inventories, sales, vendors, shipping companies, deliveries, contracts, quotas, FIFO and LIFO queues and all kinds of weird stuff, things don't work that way. You may end up having to code something specific in order to be able to work with just one vendor or a big customer even. They may even be using Excel. You do it without blinking because millions of dollars of ongoing revenue depend on such stuff.


Or been drinking to much of the "move fast and brake things" koolaid for all of the 5 days of your career.


Not a developer, but I was onvolved, and are again involved, in some crucial dev projects on which the future success of my employer depends. Any developer who deploys to production without testing, or worse, develops directly in production is by every definition at least incompetent. If not an incompetent wannabe rockstar ninja cowboy without even realizing it. And those devs are dangerous.


Therefore...the people mucking about with production should not be called developers! Problem solved, next?


A lovely knot to unravel!

First, get everything in source control!

Next, make it possible to spin service up locally, pointing at production DB.

Then, get the db running locally.

Then get another server and get cd to that server, including creating the db, schema, and sample data.

Then add tests, run on pr, then code review, then auto deploy to new server.

This should stop the bleeding… no more index-new_2021-test-john_v2.php

Add tests and start deleting code.

Spin up a production server, load balance to it. When confident it works, blow away the old one and redeploy to it. Use the new server for blue/green deployments.

Write more tests for pages, clean up more code.

Pick a framework and use it for new pages, rewrite old pages only when major functionality changes. Don’t worry about multiple jquery versions on a page, lack of mvc, lack of framework, unless overhauling that page.


I largely agree with this approach, but with 2 important changes:

1) "Next, make it possible to spin service up locally, pointing at production DB."

Do this, but NOT pointing at production DB. Why? You don't know if just spinning up the service causes updates to the database. And if it does, you don't want to risk corruption of production DB. This is too risky. Instead, make a COPY of the production DB and spin up locally against the COPY.

2) OP mentions team of 3 with all of them being junior. Given the huge mess of things, get at least 1 more experienced engineer on the team (even if it's from another team on loan). If not, hire an experienced consultant with a proven track record on your technology stack. What? No budget? How would things look when your house of cards comes crashing down and production goes offline? OP needs to communicate how dire the risk is to upper management and get their backing to start fixing it immediately.


Yeah, having the experimental code base pointing at the production data base sounds like fun. I did that. We had a backup. I'm still alive.


This is the right way to think about it. My only disagreement is that I'd do the local DB before the local service. A bunch of local versions of the service pointing at the production DB sounds like a time bomb.

And it's definitely worth emphasizing that having no framework, MVC, or templating library is not a real problem. Those things are nice if you're familiar with them, but if the team is familiar with 2003 vintage PHP, you should meet them there. That's still a thing you can write a website in.


> if the team is familiar with 2003 vintage PHP, you should meet them there. That's still a thing you can write a website in.

You can write a website in it, but you cannot test it for shit.


If this is true, OP can consider writing tests of the website using a frontend test suite like Cypress, especially with access to local instances connected to local databases.


There's no value to retroactive unit testing. Retroactive tests should be end-to-end or integration level, which you certainly can do without a framework.


Framework are not needed to test. I've been testing and validating my code way back, in C. Not because I was an early adopter (I'm still not), but because I needed to debug it, so... faster.


Good strategy. I would suggest not hooking it up to prod DB at the start. Rather script out something to restore prod DB backups nightly to a staging env. That way you can hookup non prod instances to it and keep testing as the other engineers continue with what they do until you can do a flip over as suggested. Key here is always having a somewhat up to date DB that matches prod but isn't prod so you don't step on toes and have time to figure this out.

Note that going from no source control to first CD instance in prod is going to take time...so assume you need a roll out strat that won't block the other enigneers.

Considering what sounds like reluctance for change the switch to source control is also going to be hard. You might want to consider scripting something that takes the prod code and dumps it into SC automatically, until you have prod CD going...after that the engineers switch over to your garden variety commit based reviews and manual trigger prod deploy.

Good luck! It sounds like a interesting problem


> Next, make it possible to spin service up locally, pointing at production DB.

I think this is bad advice, just skip it.

I would make a fresh copy of the production DB, remove PII if/where necessary and then work from a local DB. Make sure your DB server version is the same as on prod, same env etc.

You never know what type of routines you trigger when testing out things - and you do not want to hit the prod DB with this.


I am inclined to agree. The other advice was excellent, but pointing local instances to production databases is a footgun.


I've kind of reconsidered this a bit. Right now, the only way to test that the database and frontend interact properly is to visit the website and enter data and see it reflected either in the database or in the frontend.

It's less terrible to have a local instance that does the same thing. As long as the immediate next step is setting up and running a local database.


But the ting is, you have no idea if not even a single GET request fires of an internal batch job to do X on the DB.

I mean, there are plenty of systems in place who somehow do this (Wordpress cron I think) so that's not unheard of.

For me, still a nope: Do not run a against prod DB especially if the live system accounts for 20M yearly revenue.


Agree with this approach. You have nginx in front of it already so you can replace one page at a time without replacing everything.

One thing I haven’t seen mentioned here is introducing SSO on top of the existing stack, if it’s not there. SSO gives you heaps of flexibility in terms of where and how new pages can be developed. If you can get the old system to speak the new SSO, that can make it much easier to start writing new pages.

Ultimately, a complete rewrite is a huge risk; you can spend a year or 2 or more on it, and have it fail on launch, or just never finish. Smaller changes are less exciting, but (a) you find out quickly if it isn’t going to work out, and (b) once it’s started, the whole team knows how to do it; success doesn’t require you to stick around for 5 years. An evolutionary change is harder to kick off, but much more likely to succeed, since all the risk is up front.

Good luck.


I think "SSO" here maybe doesn't mean "Single-sign on"? Something else?


No, I meant single sign on.

In my experience, if you can get SSO working for (or at least in parallel with) the old codebase, it makes it much easier to introduce a new codebase because you can bounce the user outside of the legacy nginx context for new functionality, which lets the new code become a lot more independent of the old infra.

I mean there are obviously ways to continue using the old auth infra/session, but if the point is to replace the old system from the outside (strangler fig pattern) then the auth layer is pretty fundamental.

That’s what I faced a similar situation - I needed to come up with ways to ensure the new code was legacy free, and SSO turned out to be a big one. But of course YMMV.


I'd add putting in a static code analysis tool in there because that will give you a number for how bad it is (total number of issues at level 1 will do), and that number can be given to upper management, and then whilst doing all the above you can show that the number is going down.


There is significant danger that management will use these metrics to micromanage your efforts. They will refuse changes that temporarily drive that number up, and force you to drive it down just to satisfy the tool.

For example, it is easy to see that low code coverage is a problem. The correct takeaway from that is to identify spots where coverage is weakest, rank them by business impact and actual risk (judged by code quality and expected or past changes) and add tests there. Iterate until satisfied.

The wrong approach would be to set something above 80% coverage as a strict goal, and force inconsequential and laborious test suites on to old code.


Many tools allow you to set the existing output as a baseline. That's your 0 or 100 or whatever. You can track new changes from that, and only look for changes that bring your number over some threshold. You can't necessarily fix all the existing issues, but you can track whether you introduce new ones.


The results might also be overwhelming in the beginning.


Solid advice. I did 2 full re-writes with great success and to add to this list I would also make sure you are communicating with executives (possibly gently at first depending on the situation), really learning about the domain and the requirements (it takes time to understand the application), investing in your team (or changing resources - caution not right away and at once since there is a knowledge gap here). The rewrite will basically have massive benefits to the business: in our case: stability (less bugs), capability to add new features faster and cheaper, scalability, better user experience etc. This can get exiting to executives depending on the lifecycle of the company. Getting them exited and behind is one of the core tasks. Dont embark on this right away as you need more information, but this will matter.


Among the things I'd prioritize is to make a map of all services/APIs/site structure and how everything falls into place. This would help you make informed decisions when adding new features or assessing which part of the monolith is most prone to failure.


Best advice so far.


This is the way.


This hacker agiles.


First of all: PHP is fine. It really is.

Second: Doing a full rewrite with a junior team is not going to end well. They’ll just make other mistakes in the rewritten app, and then you’ll be back where your started.

You need to gradually introduce better engineering practices, while at the same time keeping the project up and running (i.e. meeting business needs). I’d start with introducing revision control (git), then some static testing (phpstan, eslint), then some CI to run the test automatically, then unit/integration tests (phpunit), etc. These things should be introduced one at a time and over a timespan of months probably.

I’d also have a sort of long term technical vision to strive against, like “we are going to move away from our home-written framework towards Laravel”, or “we are moving towards building the client with React Native”, or whatever you think is a good end outcome.

You also need to shield the team from upper management and let them just focus on the engineering stuff. This means you need to understand the business side, and advocate for your team and product in the rest of the organization.

You have a lot of work ahead of you. Be communicative and strive towards letting people and business grow. I can see you focus a lot on the technical aspects. Try to not let that consume too much of your attention, but try to shift towards business and people instead.


> home-written framework towards Laravel

I would think about that very long. Over the years, experience has shown me that regardless of what framework or library you use, you introduce a lot of dependencies to your app when you build on such a framework or library. The common sense says that they should make things easier, and at the start they definitely do that. But over the years, you start encountering backwards-incompatible changes, major moves etc in those frameworks and libraries which start taking your time. And sometimes a considerable chunk.

I would only use a framework or library that has a major backwards compatibility policy or viewpoint. JSON's 'only add, never deprecate' is a very good ideal to strive to. Even if this couldn't be entirely feasible in software, it should be at least strived to.

So I'd say if something that is built in-house works, easy to use and keep maintained, there is absolutely no reason to move to an external framework.


I don’t disagree with this :) It absolutely depends on the situation though. Having a foundation which is maintained and (perhaps more importantly) documented properly by a community can be quite powerful.

My original point was more about the need for a long term technical vision, because I’ve found this to be an invaluable guiding star as I make smaller decisions along the way. Sometimes I need to change course before realizing this vision, but having a vision still helps. I’d feel like I was stumbling in the dark without this.


It's worth re-emphasizing this point: "Doing a full rewrite with a junior team is not going to end well."


Or a senior team.


the only reason people dunk on php is because of experiences like this

from a user perspective seeing a php file extension is an accurate predictor with seeing a disorganized mess of everything and a “LAMP stack” stuck in 2003 just as described here

from a developer perspective it’s correlated with everything described by OP

you’re correct it isn’t inherently php’s problem, it can do RESTful APIs and a coherent code design pattern no problem


Lots of people are giving advice on how to fix the code piecemeal. First put it on Git, then add tests, then, carefully and gradually, start fixing the issues. Depending on the project, this could take a year or several years, which isn't bad.

The problem with this plan is corporate politics. Say that OP takes on this challenge. He makes a plan and carefully and patiently executes it. Say that in six months he's already fixed 30% of the problem, and by doing do he meaningfully improved the team's productivity.

The executives are happy. The distaster was averted, and now they can ask for more features and get them more quickly, which they do.

Congratulations, OP. You are now the team lead of a mediocre software project. You want to continue fixing the code beyond the 30%? Management will be happy for you to take it as a personal project. After all, you probably don't have anything to do on the weekend anyway.

You could stand strong and refuse to improve the infrastructure until the company explicitly prioritizes it. But then why would that job be better than just taking a random position in a FAANG company? The code quality will be better and so will the pay.


A lot of people see this as a technical challenge, but it is a political one. The road the business took to get to this point is crucial, and understanding whether the business want to fix it is key.

In my experience once businesses get into this sort of mess they will never work their way out of it. To use an analogy, there is a point where people who make terrible lifestyle choices (smoking, obesity etc) where the damage is too far gone.

A company I used to work for had a horrendous codebase that was making them a ton of revenue. It wasn't as bad as the OP's codebase, but it was pretty terrible. It was wedded to a framework that was abandoned 8 years ago. Everything was strongly coupled to everything else, meaning it was brittle. Every release they'd have to have an army of QA testers go through it and they'd find 100's of new bugs. Every bug that got fixed introduced another.

The lesson I learned? Find these things out during an interview. Ask about their CI, ask about QA and automated testing and really push them on details. If they give vague answers or something doesn't smell right, walk away.


> But then why would that job be better than just taking a random position in a FAANG company? The code quality will be better and so will the pay.

Exactly the thing people are missing here. It's a lot of work at a very high skill level with lot of political burning ground if refactoring break things or suddenly slow down at a mediocre shop.


This is more reason for the gradual approach. Bringing source control, testing, and separation of dev and prod environments, makes things safer and will speed up work pretty soon by making the people much more comfortable to actually try things.


For what benefit in the end for the individual?


I've done that progressively over two years as a junior and I'm basically unofficial tech lead now. Managers listen to me and I can plan and influence several projects. I have other goals so I won't become official team lead by choice, but that's probably valuable to OP if he can pull that off.


Big fish in a small pond.


Speed of development.

Once you have these things in place, you can make changes much more quickly, and much more confidently.


Yes, because it's famously easy to just go grab a FAANG job whenever you feel like it, wherever you are in the world.


Not easy, but easier in my opinion than the multi-year project that OP is planning to undertake.


This architecture is not a MAANG architecture and the fix is unlikely to yield experience that will land you a job at MAANG.

If MAANG is your goal, finding yourself a C-Suite sold on a tech modernization initiative whose hired ex-MAANG throughout their org chart will accelerate your MAANG transition.

If your goal is to learn how to solve business problems for medium-to-large sized non-MAANG companies, this company is probably going to yield good experience.

Two different career tracks.


at which point you leave, having spent a satisfying six months improving things for the other devs


Feels like you should start with introducing version control, dependency management, and creating a deploy process after that.

Those seem like low hanging fruit that are unlikely to effect prod.

You should also probably spend a decent amount of time convincing management of the situation. If they're oblivious that's never going to go well.

I agree a full rewrite is a mistake and you have to instead fixed bite sized chunks. It also will help to do that if you start to invest in tooling, a deploy story and eventually tests (I'm assuming there are none). If I was making 20 million off some code I'd sure as heck prioritize testing stuff (at least laying the groundwork).

Its probably also worth determining how risk tolerant the product is and you could probably move faster cleaning up if it is something that can accept risk. If it's super critical and I'd seriously prioritize setting up regression testing in some form first


> Feels like you should start with introducing version control, dependency management, and creating a deploy process after that.

Agree with this 100%. This sounds like a team that may not even know how to develop locally. Showing them that they can run the system on their own computer and make experimental changes without risking destroying the company would be a huge game changer. If they’re not open to that idea it may really be hopeless.


Here's a way to introduce version control without having to stop everyone and teach them how to use it first:

1. Commit the entire production codebase to git and push it to a host (GitHub would be easiest here)

2. Set up a cron that runs once every ten minutes and commits ALL changes (with a dummy commit message) and pushes the result

Now you have a repo that's capturing changes. If someone messes up you have a chance to recover. You can also keep track of what changes are being applied using the commit log.

You can put this in place without anyone having to change their current processes.

Obviously you should aim to get them to use git properly, with proper commit messages - and eventually with production deploys happening from your git repository rather then people editing files in production!

But you can get a lot of value straight away from using this trick.

It's basically a form of git scraping: https://simonwillison.net/2020/Oct/9/git-scraping/


That git scraping technique is amazing! This is what a proper hack looks like. Super neat.

I'm immediately searching for public maps etc to apply this on.


How do you make sure that code being committed is ready to be run, files could be saved before they're ready. I'm assuming this won't happen on production server, but you can't be sure if it's just code workspace for someone.


The system is an enormous black box and this at least tells you what N things were being manipulated at time in point X. Easy to setup and gives just a bit of peace and mind if the thing keels over one day.


You don't, but the current system doesn't do that either. That comes later down the line. Baby steps and all that!

A path to doing this might look like: - Cron job scraping and committing - add a post commit check that runs a linter/some checks/tests - assign out fixes to these issues - ask for those fixes to be done using git - pick an area to focus on, and enforce coverage in that area. You can still blindly deploy here, but at least you know when it's broken

As noble a goal of testing before deployment is, sometimes it's an enormous amount of effort to change development practices, workflows, team mindset and company mindset. You can only handle some of these at a time so choose a combination of the low hanging fruit (maybe there's a separate component) and the highest impact (every time this breaks the site goes down)


This doesn't.

It's not perfect, it's a step in the right direction.


> Commit the entire production codebase to git and push it to a host (GitHub would be easiest here)

Git on his own machine would be easiest. There is no need to consider a git host that the others can access until things are at the point where the others are ready to use git.

Even when they reach that point I question whether GitHub is the way to go. He didn't say much about their IT department but they have web servers and databases so evidently have people who can manage such things. It would be close to trivial for those people to set up git hosting on one of their own servers.


Love your Git scraping technique, very clever. Thanks for sharing.


DB schema and prod/devops config too!


Plenty of good suggestions in here about e.g. tests, source control etc. You will need them all.

But I would start by choosing how and whether to fix up the crown jewels, the database.

You say that instead of adding columns, team has been adding new tables instead. With such behaviours, it's possible your database is such a steaming pile of crap that you'll be unable to move at any pace at all until you fix the database. Certainly if management want e.g. reporting tools added, you'd be much better to fix the database first. On the other hand, if the new functionality doesn't require significant database interaction (maybe you're just tarting up the front end and adding some eye candy) then maybe you can leave it be. Unlikely I would imagine.

Do not however just leave the database as a steaming pile of crap, and at the same time start writing a whole lot of new code against it. Every shitty database design decision made over the previous years will echo down and make it's ugly way into your new nice code. You will be better for the long run to normalise and rationalise the DB first.


Good point. Using stored procedures / views etc will help crystalise the API for the DB and allow work to happen behind that wall without breaking anything else in the meantime too. Once the work is done, bits of the wall can be replaced with better bits of wall i.e. improved sp's and views pointing to an improved schema.


Yep. views in RDBMS are much underrated, IMO just because they are old tech. It should be possible to use views to:

1) normalise the database (fold these ugly add-on tables as columns in the parent table, with suitable null constraints, then drop the add-on table).

2) Use views to add the add-on tables (now views) back in again.

3) Continue running the old application code against the views, which present the old, ugly schema.

4) Simultaneously write the new code against the normalised core tables.


This use allow to update SELECT operations, but UPDATE / DELETE still have to be done on the old underlying schema right ?

No way to trigger a DELETE from a view right ? How would you approach this ?


Views can be updatable though there are caveats, but deletes and updats can be done via a function or stored procedure, meaning there's no direct access by the app to the underlying schema. If it's done well it means that the calling code in the app won't have to change (or at least be minimised) even as the schema changes. The speed and security benefits are a nice by-product.

You can even do things like prevent the app from deleting things unless the functions are used and prevent poor development practices in the process.


Respectfully i dont think you are viewing this rationally and need to take a step back.

Some these things are terrible choices but some of these are just weird choices that aren't neccesarily terrible or a minor inconvinence at most.

E.g. no source control - obviously that is terrible. But its also trivial to rectify. You could have fixed that in less time it took to write this post.

Otoh "it runs on php" - i know php aint cool anymore, but sheesh not being cool has no bearing on how maintainable something is.

> "it doesn't use composer or any dependency management. It's all require_once."

A weird choice, and one that certainly a bit messy, but hardly the end of the world in and of itself.

>it doesn't use any framework

So?

What really matters is if its a mess of spaghetti code. You can do that with or without a framework.

> no caching ( but there is memcached but only used for sessions ...)

Is performance unacceptable? If no, then then sounds like the right choice (premature optimization)...

> the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

Not ideal... but also pretty minor.

Anyways, my point is that what you're describing is definitely unideal, but on the scale of legacy nightmeres seems not that bad.


So I think the first option you should strongly consider is just running. It's a valid tactic and if the management team is difficult, is probably the best one for you.

If you stay you need to manage your relationship with the management team. This involves the usual reporting, lunches etc. You need to setup some sort of metrics immediately. Just quarterly might be sufficient. Nobody is going to care about bug fix counts, your metrics should be around features.

Testing and version control are a good place to start. But you are going to need to get them started there and you will pretty much need to instill good discipline. You will be herding cats for quite a while. If you can't get these two items going well in 3 months then abort and leave. You don't want to stick around for when the money printer stops working and nobody can figure out why.


We also need OP to do some self reflection and decide if this is really spaghetti code or they are just lazy learners. It is easy to look at complicated projects and only see the complexity.


I’m inclined to believe the op if it’s a team of juniors with no version control or tests.


Solid advice here. If OP can't get management buy-in on a solid plan to shore things up, he/she is looking at a losing situation.


Got hired into a project like this. The most critical part is not to emotionally loose your devs. So, they explained us their whole application and we set goals together. I tried to let them see their application from the eyes of a new customer. We identified performance & UX as the most pressing problems. A web service layer was added to the old application, we rewrote the whole frontend utilizing a modern framework. That way we got management excited ("wow, it's so beautiful now") and kept the original developers happy and engaged ("we're doing all the important logic & persistence here!").

We also introduced git as well as dev and staging tiers and some agile methodologies. Definitely do some that first!

Now, as management and customers are happy, the backend can be refactored step by step. Here, more test coverage might come in handy.

So, I'd recommend to be a bit picky about where to create value. You can restructure the whole database and that'll be good for maintenance (and most likely performance) but management & customers won't literally "see" much. Ask the people with the money for their preferences, excite them to get more runway. Regarding "backend stuff": Think like a Microservice architect and identify components that are least strongly coupled and have a big (performance) impact. Work on those when management is happy and you've got plenty of budget.

Your job is to create value and reduce risk. Not to create something that's technically awesome ;)


3 junior engineers are holding together legacy code that generates 20 mil a year having had no leadership that has taught them any sort of best practices? Give them all raises and get over yourself.


OP says it has been developed over the course of 12 years. The team is currently 3 junior devs. There's probably a huge turnover.

It's likely there's a lot of history and political shenanigans that OP isn't aware of yet. This could be a sinking ship. If it's a profitable business why is the team made of juniors?

A small company with legacy code that is a huge mess but that is maintained by the same person for the last 10 years is one thing. The same mess in the hands of 3 juniors who don't even use version control means no one with experience has lasted long enough at this company. That's a red flag.


Save yourself a lot of trauma and get out of this mess. Been in a similar situation, spent five years trying to fix things, gave up. Could have saved myself some of the therapy I now need.


This alternative is not to be neglected. You don't have to save the whole world. Save what you can that is worth saving. If you are competent to build good new things, do that.


Why is 'saving the world' even on the table here or in similar cases? Guys making $20mil a year don't look to me like they need saving from above. If you have that kind of money I trust you know what you're doing and can pay for help when help is needed. Otherwise, you don't deserve that cash. Other people might spend it better.

I, for one, have never been saved out of good heart or pitty. Every doctor visit somehow results in money being transfered from my bank account to theirs.


Those people didn't have to be doctors, or nurses or med techs. They all need to make a living, but not necessarily by probing you. A good many of them entered medicine because it seemed like something they could do to help people. Even if that has since all burned away, you still benefited.

I spent a career not becoming a millionaire at Microsoft (definitely on the table, at the time) because Microsoft was and remains too evil. Likewise Oracle. Or making weapons. I do not answer Google recruiters' e-mail, and not just because Google interview process is far too annoying for anybody with any self-respect to tolerate. (Who does work there? Many worked for companies Google bought.)

Doing work that benefits humanity, or natural ecosystems or whatever, is the reason to do things. The money it pays is how you afford to be able to spend your life doing that.

I feel sorry for people who work knowing the work they do makes the world worse. But not very sorry, because most have a choice. Some find ways to add value from within, e.g. two I know at Oracle do Free Software full time. I mostly cannot tell which do or don't, so I do not condemn all Microsoft, Google, Facebook, Oracle, BAE employees. But I choose not to be there.


Yes, life is too short and there are so many much better jobs to waste time on such a project.


i can imagine how some super senior engineer may like this kind of very challenging experience.


A super senior consultant might: they are paid by the hour/day and they are free to fire the client and leave if the working situation becomes too hairy.

As an employee, the best advice is GTFO.


True, perhaps some are into maintaining shitty legacy systems with not enough budget.


OP is clearly not senior. If they were they would know how to get from A to B.


Do you work at Google? They never repair things. Hence 5 (10?) unfinished chat programs.


A Google employee would have told you that they work for Google.


1. Build a functional test system, and write a big suite of tests which ensure preservation of behavior. Make it easy to run and see the results.

2. Slowly start extracting code and making small functions. Document like crazy in the code as you learn. Keep the single file or close to it, and don't worry about frameworks yet.

3. Introduce unit tests with each new function if you can.

After all that is done, make a plan for next steps (framework, practices, replace tech etc).

Along the way, take the jr backend engineer under your wing, explain everything, and ensure they are a strong ally.

Call me crazy, but that project sounds like fun.


> Build a functional test system, and write a big suite of tests which ensure preservation of behavior. Make it easy to run and see the results.

Keep it to yourself and don't let anyone know why you are so effective.

Demand a raise early once you are sure of your value.

Edit: why not? Clearly this is a huge value that would be wholly unappreciated without leveraging it yourself.


OP is leading a team, not hiding in a churning out code. The "secret superpower" strategy doesn't work here.


Where does OP say they are team lead? They sound like a junior.


> I have to find a strategy to fix this development team

Clearly responsible for team outcomes

> without managing them directly

Not a people manager. Most likely a tech lead or a technical project manager


Sounds like something they'd explicitly mention. I took it to mean they were a young gun go-getter.


I've been in almost this exact same situation with a slightly smaller team and "only" about $5 million running through the tangled web of php.

We did a complete rewrite into a Django application, it took 2 years and untold political pain but was absolutely the correct choice. The legacy code was beyond saving and everyone on the team agreed with this assessment - meaning our political battles were only outward facing.

In order to get support, we started very small with it as a "20% project" for some of our engineers. After level setting auth, cicd, and infrastructure stuff, we began with one commonly used functionality and redirected the legacy php page to the new python-based page. Every sprint, in addition to all the firefighting we were doing, we'd make another stealth replacement of a legacy feature with its updated alternative.

Eventually we had enough evidence that the replacements were good (users impressed with responsiveness, upgraded UI stuff like replacing default buttons with bootstrap, etc.) that we got a blessing to make this a larger project. As the project succeeded piecemeal, we built more momentum and more wins until we had decent senior leadership backing.

Advocating for this change was basically the full time job of our non-technical team members for 2 straight years. We had good engineers quit, got into deeply frustrating fights with basically every department in the company and had rough go of it. In the end though, it did work out very well. Huge reduction in cost and complexity, ability to support really impactful stuff for the business with agility, and a ton of fulfilling dev experience for our engineers too.

All this is to say, I understand where everyone warning you not to do a rewrite is coming from. It's a deeply painful experience and not one to be embraced lightly. Your immediate leadership needs to genuinely believe in the effort and be willing to expend significant political capital on it. Your team also needs to be 100% on board.

If you can't make this happen and you're not working on a business which does immense social good and needs your support as a matter of charity, you should quit and go somewhere more comfortable.


It sounds to me like you did a full rewrite by replacing the app piece by piece, sprint by sprint, releasing changes quite often and bringing that value all the way to the user. I think that is really clever.

My impression from others in this thread is that they mean "start from scratch and build until features are on-par with current product" when they say full rewrite.

Your version of full rewrite seems like it is generally applicable, but I have very little faith in the latter approach.


This sounds like the story of a proper, piecemeal, rewrite where the whole team was on board.


I will add myself to the others suggesting you to look for another job.

1) A rewrite from scratch is almost always a bad idea, especially if the business side is doing just fine. By the way, when you want to sell a rewrite, you don't sell a rewrite, you sell an investment in a new product (with a new team) and a migration path; it's a different mindset, and you have to show business value in the new product (still ends up failing most of the time, but it has a better chance of getting approved).

2) You never ever try to change people (or yourself) directly. It's doomed to failure. You change the environment, then the environment changes the people (if the changes are slow and inertia is working for you, otherwise people just leave).

Since probably it would be too hard to change the environment by yourself and given that your team seems fine with the status quo, my advice it to just manage things as they are while you look for another job. Otherwise my bet is that your life will be miserable.


I'm not a career dev, but I have inherited teams and projects before that were a huge mess...

This isn't going to come off nicely, but your assumption that it needs a full rewrite, is in my eyes a bigger problem than the current mess itself.

The "very junior" devs who are "resistant" to change are potentially like that in your view for a reason. Because of the cluster they deal with I suspect the resistance is more they spend most of their time doing it XYZ way because that's the way they know how to get it done without it taking even more time.

What it sounds like to me is that this business could utilize someone at the table who can can understand the past, current, and future business - and can tie those requirements in with the current environment with perhaps "modernizing" mixed in there.


A friend of mine works for a small video game studio and proposed they use pull requests instead of "trust me bro" merges to the main branch. It was met with fierce, vitriolic resistance. But now it's fine, people seem to get it, it just took time.

So uh, good luck. You're going to be the one everyone hates.

I'd just quit in your shoes, to be completely honest. Your desire for a solid foundation will never be seen as anything but a roadblock to an organization that just wants more floors added to the house with reckless abandon for safety.

Any securities gained by improvements you champion will go unnoticed. You will be blamed when the inevitable downtime from molding a mountain of shit into less of a mountain of shit happens.

You are going to lose this fight. Please just quit and go work for a software engineering organization, you seem to have taken a job at a sausage factory for some reason. I'd also try to learn from that...

Good luck.


Why is it necessary? It's ugly, but it's earning millions. Find out why resistance to change is huge. Maybe the devs are stuck in their ways, or maybe it's too damn scary for them to fail because the business comes down hard on them? Get to know your devs, find out from them what they think the problems are and how they would like to solve them. Maybe they just need more help. And as the team lead it sounds like you need to bring the "business unit" up to speed on the current reality?


This is the correct answer.


> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight

In my view, as long as management believes this, a fix is not possible at all.

You should forget about improving the code but see your job as a kind of consultancy thing where you teach management about what they have and the consequences of that are.

And probably look for a new job. If you are completely successful with teaching management, it may be working on this, but it'd probably need to be renegotiated as if it were a new job


I would wrap this in some kind of thin candy shell so that you can inspect/record every single thing the application is doing, from a behavioral standpoint. For instance, put API gateways in front of it so that all requests are logged, the routes, parameters, etc. eBPF comes to mind as a tool that might help here. Same would go for instrumenting the database. The goal is to reverse engineer 80% of how it works by just looking at external IO. This is one area where containers can be very powerful, since they naturally encapsulate the system and allow for these sorts of bulk measurements 'for free'

My thinking is that you can essentially plug a thousand probes into this frankenstein monster and start to learn the true shape and surface area of it without needing to step through the mess of the code. Then the code might make more sense, or at least a clearer path forward as to what a new architecture needs to look could appear.

Static analyzers might also be helpful, since the full featured ones tend to provide gui tools or outputs of things like call graphs or dependency chains. That can be useful in learning the 'true' surface area of the app, too.

20 mil a year is no joke, so use that to your advantage. It sounds like this has been stretched so thin that at this point it is a huge disaster/liability waiting to happen, so I would try and leverage some of that cash to use whatever paid/advanced tooling might be necessary to help here. Old PHP apps are a security nightmare waiting to happen, particularly as the world has moved on to higher and higher TLS levels.


When approaching seemingly insurmountable technical issues, I've found it important to find the root issue of what's causing all the chaos. Yes, the code is a mess. OK, the database design is non-sensical. Sure, things never get deleted. But why?

From what you've mentioned, it sounds like every change that isn't additive is viewed as too risky. So at this point before trying to make big shifts, some work should be done to de-risk the situation as much as possible. Granted, you probably can't stop work and introduce a bunch of new practices and patterns, but you need to start reduce the risk to unleash the team to make necessary changes.

For example, introducing version control should be a slam dunk. Start using a database migration facility for all database changes. Create a release schedule that requires features to be stabilized by a certain window for deployment. Create some really, really simple Selenium tests by just browser recording yourself using the app.

Once you can start making changes more confidently, then you can start unwinding some of the bad choices moving forward. Resist the urge to start "making a good foundation for the future" by trying to rewrite core parts of the system immediately and instead start thinking in terms of forward progress oriented changes. Need to add a feature? Make sure to write that feature properly with good practices and make only the necessary changes to other parts of the system. I realize that's probably going to be painful, but eventually you will accrete enough of these small changes that you can string them together with a little more work into larger scale changes under the hood.

These things are rarely easy, especially in established legacy systems. But if this is the revenue engine for your company, you'll need to move conservatively but decisively or risk making the situation worse. Good luck!


This is really good advice. Thanks for sharing this. I think you've probably nailed it on the head about trying to find the reason and it likely being because people are too afraid of the risk of messing stuff up, so they keep adding to the ball of mud.

However, in my experience, it can be very touch and go dealing with people who become so risk averse. I had a job one time where a previous employee refused to give up a computer that had to be at least 10-15 years old at the time (was running Windows 2000 or something like that) and took about 30 minutes to boot up. Because somehow it was the only computer that could run the 3D CAD program he was familiar with or some other "reasons" that it was essential to the project. The only way of moving forward was that he luckily completely washed his hands of the project before I even joined the company, and then I did a full rewrite and redesign of the whole system from scratch (which was absolutely required in that case). Even then, when I asked him about the computer to just be careful and do due diligence to try and figure out why or if it was actually important, he was very resistant of me sending the computer to the salvage department after getting the files off the computer.


Wow hard one to untangle and answer in a HN reply!

I sort of think... if you have to ask this here you might be in the wrong job? Was this a job that seemed like something else then became this? This sounds like a job for an experienced VP Engineering. It is a tough order. Wouldn't know how to do it myself. Lots of technical challenges, people challenges, growth challenges, and managing up and down.

The resistance to change is something you need to get to the bottom of. People are naturally resistant to change if they are comfortable, and we've all been through 'crappy' changes before at companies and been burned.

The solution might be to get them to state the problems and get them to suggest solutions. You are acting more like a facilitator than an architect or a boss. If one of them suggests using SVN or Git because they are pissed off their changes got lost last week, then it was their idea. No need to sell it.

This assumes the team feels like a unit. If the 3 are individualistic, then that should be sorted first. E.g. if Frank thinks it is a problem but no one else does, and they can't agree amongst themselves, then the idea is not sold yet.

Once you know more about what your team think the problems are and add in a pinch of your own intuitions you might be able to formulate confidently the problems, so you can manage their expectations.


rewriting would probably be the biggest mistake you could make. you need to refactor things little by little, and avoid making changes that would be made redundant later by changes that happen later on.

figure out what you want to fix first, and then fix that. then go to the next thing. but keep in mind - "management and HQ has no real understanding", and as far as they are concerned, what they have works.

if this doesn't sound like something you want to do, then find a new job. you are effectively the property manager for a run-down rental property. you aren't going to convince the owners to tear it down and build a new set of condos.


> you are effectively the property manager for a run-down rental property.

This is an incredibly powerful analogy. Thank you!


> I know a full rewrite is necessary, but how to balance it?

A full rewrite of a functional 12-year old application? Yea, you're going to waste years and deliver something that is functionaly worse than what you have. It took 12-years to build it would realistically take years to rebuild. Fixing this will take years and honestly some serious skill.

What you want to do is build something in front of your mudball application. For the most part your application will be working. It's just a mudball.

Step 0. Make management and HQ understand the state of the application. To do this I would make a presentation explaining and showing best practices from various project docs and then show what you have. Without this step, everything else is pointless.

If they don't understand how bad it is. You will fail. Failure is the only option.

If the team is not willing to change and you're not able to force change then you're going to fail.

So once you have the ability to implement changes.

Step 1. Add version control.

Step 2. Add a deployment process to stop coding developing in production.

Step 3. Standardise the development env.

If you have views and not intermingled php & html:

Step 4. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.

If not:

Step 4. Add views. Copy all the html into another file and then make a note of the variables. Step 5. Start a new frontend and create endpoints that reuse the original code to return json for all the variables.

... Carry on moving things over to the new frontend until everything is in the frontend.

Probably a year later.

Step 6. When adding new functionality you can either rewrite that section, do a decorator approach, or edit the original functionality.

That's without fixing the database mess or infra mess.


The approach outlined above is the approach I would take (and have previously taken). Some details that I personally think are important:

Step 0. Setup typescript with version control for your current JS mess.

Step 1. FWIW, with ReactDOM.renderToString you can use JSX as your template engine.

Step 2. Move all of your PHP views to JSX views, PHP keeps returning JSON.

Step 3. Abuse component.unmount(); component = ReactDOM.createRoot(…); to get components with view state, but also can get updated by regular JS events that require a rerender just like you would with the legacy jQuery code.

Step 4. Keep pushing more of the logic from the legacy jQuery code and PHP code into React/Typescript. You’ll find that the React components will keep getting grouped and you can start to compose more of the widgets together.

Step 5. You end up with one big React app, minimal PHP (that is easy to test), and a decision about how to manage your state.

At this point it’ll be in shape to not need a lot of attention. Truly, the DB mess will figure itself out as your PHP continually becomes minimal and well tested.


I'm no expert, but the advice in Working Effectively with Legacy Code has been helpful on occasion:

https://www.amazon.com/Working-Effectively-Legacy-Michael-Fe....

Fully apprising management of the situation in a way they can understand may also reap long-term dividends.


Thank you for so many suggestions. The main issue is productivity within a context where the company is trying to reinvent itself in terms of marketing and business model. This has for consequence that many new big features are being requested and promised by management to headquarters. But in the last years, all bug evolutions have been failures. That's why I've been asked to intervene. I love the idea of the strangler pattern associated with big unit testing coverage.


> This has for consequence that many new big features are being requested and promised by management to headquarters. But in the last years, all bug evolutions have been failures. That's why I've been asked to intervene. I love the idea of the strangler pattern associated with big unit testing coverage.

The first thing that needs to be strangled is unachievable management promises. Figure out how to get local management to not write checks the tiny team can't cash. A team of 3 juniors will likely overestimate their ability to deliver, so you've probably got to teach them to say no to things they can't do also.


You may be interested in this article:

https://medium.com/@kris-nova/organic-and-mechanistic-system...

What you are dealing with is organically grown software. No one really understands it, and it's likely to be fragile.

I agree with all those here that say the first thing to do is to introduce version control. You don't want to break that money machine without a way to revert.

Second, introduce some lightweight form of code review, but don't tick people off.

Third, if you want to do something like revise the user interface, consider adding an API to the existing organic code base, and build the new UI with that API. Generally, take that approach and avoid jamming something into the middle of the existing code that no one fully understands.


I agree with this. I built an open source package that should work for you in this situation with no version control in it. https://github.com/n0nag0n/commie2 Get this installed internally and then start doing some "lightweight" code reviews. Your other team members can get emails about any comments you make and it'll be lightweight collaboration.


I had the chance to work with Strangler Pattern in multiple projects, and in general this is a good idea.

The thing I'd suggest taking a look at is the database. You really need to make sure the new part is not based on the old data model.

You can take a look at "getting started with DDD when surrounded by legacy systems" by Eric Evans - available for free and only 21 pages long. Our implementation of one of the suggested approaches failed in one project, but in retrospective, I think it was caused by the business not putting enough effort into the rewrite.

In general, on the high level it's very important to make the product vision, data model, processes clear and well thought. Without this, you will always get to a bad codebase within a couple of years.


It's pretty risky to delete code you don't understand. Is it harming you if you don't refactor?


What is a bug evolution? Do you mean a bug fix?


"big evolution"

"u" is next to "i" on many keyboards.


Create a 'risk register', or 'service schedule' - we use this for building service contracts, where you list out all the things that need to be done or might happen (from backups to support requests), and put a rough cost on them all. We put a min and max cost on each item, and the number of times per year it might happen.

That gives you an annual maintenance cost which will include, say "every 2 years something goes badly wrong with the flargle blargle, and costs $10,000 to fix", or "every 3 days we have to clear out the wurble gurble to stop it all crashing".

Finally, you put together the same thing but for a re-written version, or even with some basic improvements as others have suggested, and hopefully you see a lower total cost of maintenance.

At that point, you can weigh up the cost of either a rewrite or incremental improvements in actual dollars.


20 Million dollars per year.

This should be the thing that starts every conversation. Because IT WORKS for the intended purpose.

Someone else said it. Put everything in source control first.

And just fix things that directly impact that 20 Million dollars a year.

Example, fix speed issues. Fix any javascript issues. Fix anything that will get you to 21 million dollars a year.

Then if you want, you can put together a small sub-team that would be responsible to transitioning certain pages into a framework. But don't rewrite the whole thing.


Right. An attempt to do a full rewrite just doesn't pass the most basic cost/benefit analysis.

Redoing it in [sexy language / framework / paradigm / design pattern] will feel aesthetic, but even if it goes perfectly, it won't get you to $40M a year.

But if it goes poorly, it might get you to $0M a year (and fired).

The bigger question for OP should be "do I wish to be employed by an organization that is OK with these engineering practices?" Nobody can change culture by themselves, and certainly not just by introducing a sexier technology.


Doing in new language will subtract some value. Adding new features may add value. Doing it as an spa could kill the entire thing.


Plenty of people are giving you tech advice and they are great but here is the reality:

Unless you have power at the executive level, or are brought in as an expensive consultant to make big changes , you are wasting your time.

I would tell you to stick around and shovel shit just to take cash home but from your post it doesn’t sound like you are happy there to begin with.

What you are seeing here is a symptom of leadership not valuing engineer so trying to improve this requires a culture change from the top which is highly unlikely from where you stand.

If the pay is really good, you might consider sticking with it for a bit and then move on. However if you feel like it will push you towards burnout, abandon ship ASAP.

My younger self would have stayed and tried to be the unsung hero but now that I’m older I chuckle at that foolishness.

Don’t be the silent hero.


People suggest rewriting little-by-little. Does that really work in practice? And why would one do it? Why not let that business rot-in-place so to speak, while building a new business on a new platform to "compete" with it?

I worked for a company early in my career that sold a $1500 piece of software and had revenue of $15 million. When I was there, the head could was 70. Ten years later the head count is two - one engineer and one person to take the orders. And revenue was still a couple million. A classic "rot-in-place" situation.


Rewriting little by little just means: each time you make a change, leave the source base at least a little better than you found it. Leave a few comments about the thing you reverse engineered. Delete a little dead code. Eventually you get the confidence to move from the lowest hanging fruit to deeper refactoring. You do it because that approach may be the best you can do with a rotten source base within your time and resource constraints,


Honestly, that sounds like making bad code even worse.


Apart from the fact they said making the code better each time?


LOL. Did you read the OP? Code is in PHP.


Yes it works. You eventually get to a place that is “not too bad”, whilst still allowing new features to be developed and the business moving forward.


I have to admit that I would have little patience for such a situation, so good on you giving it a try and godspeed. I personally feel that there's a lot of interesting work out there, so I'm not sure I would even take on the project. The fact that you have a team of three people that you don't manage or have any authority over who are also majorly resistant to change their ways of doing their job in literally the worst way possible does not seem promising, no matter how you look at it.

In these types of situations, the problems are social and possibly political and rarely technical, even though the technical problems are the symptoms that present themselves so readily.


Hi! I specialise in fixing failing projects. Yours probably not the worst I have seen.

First of all, don't do a rewrite! Your team most likely do not know what they need to know to be able to perform a clean rewrite. You are new and still probably don't know the whole picture and all the knowledge that is in the application in one way or another. If you start a rewrite, the productivity will plummet and you will have to keep choosing whether to put resources on the new or on the old and the old will always win. I have seen this play out many times, the rewrite keeps getting starved of resources until it gets abandoned.

Refactor is better because you can balance allocating resources to refactoring as you go and also keep brining improvements that are BAU development more efficient.

Do not make mistake of forgetting about "the business". They probably are already irritated by the project and will be on the lookout for any further missteps from you. You might think you have good credit with them because they just hired you but that might simply not be the case. Their fuse is probably short. You need to keep them happy.

At first, prioritise changes that improve developer productivity. This is how you will create bandwidth necessary for further improvements. This means improving development process, improving ability to debug problems, improving parts of the applications that are modified most frequently for new features.

Second, make sure to prove the team is able to deliver the features the business wants. The business probably doesn't care about the state of the application but they do care that you deliver features. This is how you will create the credit of trust with them that will allow you to make any larger changes.

Do make sure to hire at least one other person that know what they are doing (and know what they are getting into).


The worst tech team you've ever seen and yet they are generating 20 million a year? I think you should give them the respect they deserve and understand the limitations they have been under.

My thoughts:

* Get the code in source control straight away

* Get the infrastructure stable and up to date if it's not

* Get CI pipelines set up. As part of this, make sure the code is running through a static analyser. This will give you a backlog of things to work on.

* Organize an external penetration test to be carried out

* Investigate updating and/or consolidating the software libraries used (Jquery etc)

* Choose a page/feature to update on its own. Bring it up to date.

At this point, you should be in a much better state and you will have learned a lot.


> The worst tech team you've ever seen and yet they are generating 20 million a year? I think you should give them the respect they deserve and understand the limitations they have been under.

If the right opportunity is there, you can make a lot of money on an awfully built product that barely keeps it together. That doesn’t mean that there aren’t a lot of risks involved with that and that things can’t go south in a hurry.

I’m obviously not familiar with this project, but it sounds like a lot of things many’d consider table stakes are missing, especially for such a large source of revenue. Perhaps I’m not imaginitive enough, but I can’t come up with any limitations they might’ve been under that would justify that. I think it’s fair to call that out, even if they happen to make money despite this.


There is definitely loads to fix and it does sound like the team was inexperienced but they were successful it seems.


“Before you heal someone, ask him if he's willing to give up the things that make him sick.” ― Hippocrates

You have to have a conversation with the people responsible for this shit, including (and specially) stakeholders, make them aware of the problem, and get them on board with respect to the possible solution. This step is essential before even bothering to do fucking anything.

Most importantly, make it clear that while you are there to help, this is their responsibility, and they have to become a part of the solution by making amends. If they're not willing to own their responsibility and collaborate, get the fuck out of that tech debt mill or it will ruin your life.

If you want to try to redo everything alone in silence, you will have to work infinitely hard, and in the end, three things can happen:

a) you fail, and then the organization gets rid of you. the most likely outcome.

b) you succeed, but now "you know too much", you have dirt on a lot of people that fucked up and become the Comrade Legasov from Chernobyl that becomes the target of important people from the Soviet communist party. They will get rid of you once the problem is gone because now you have no value to them.

c) in the best case scenario, you succeed, but noone will congratulate you because that means that a problem existed in the first place, and since noone is willing to assume any responsibility for their contributions to the problem, noone will say fucking anything. all your contributions will be for nothing. and if you insist that a problem existed, you'll go to outcome b). Otherwise, they will go back to their old ways and create the next fucking mess for you to solve.

Personally, I would get the fuck out. It is clear that nobody there was committed to do the right thing, starting from the hiring process. It is either highly unprepared people, extreme normalization of deviance, or some highly idiotic leadership obsessed with the short-term. Whatever it is, that team is rotten and needs an amputation. If I stayed, I would start by laying off the entire team and then rehiring everyone on a 3-month test period where they will have to completely change their attitude towards development.


There are some strong arguments here


From experience.


First off, source control. I would say this was a day 1 job.

Get some type of CI/devops thing going so you can deploy to a temporary test environment whenever you want. This applies to the data too so that means getting backups working. Don't forget email notifications and stuff like that.

Next comes some manner of automated testing. Nothing too flash, just try to cover as much of the codebase as possible so you can know if something has broken.

Go over the codebase looking for dramatic security problems. I bet there's some "stringified" SQL in there. Any hard coded passwords? Plaintext API calls?

And now everything else. You're going to be busy.


Security could actually be a way to sell the need for cleanup. Hire a team of independent auditors. If the code is in such a bad state as you claim, i guarantee they will find at least a dozen of XSS and XSRF issues, very likely some SQL injections and possible even a few RCE as root.

Maybe not the best way to increase direct revenue if the product is working, but it highlights the risk they are taking with such a shaky foundation, and puts the decision on managements table rather than yours.


There is a way out of this mess. It is not straight forward though. There are two ways:

1. Convince the business team that these team members might leave and put the $20mn revenue at risk. There is no way you can make them learn and do things properly. Therefore, take separate budget, hire a new separate team. Do full rewrite of backend and plug the app and new website into it. It would be 1-2 year project with high chance of getting failed (on big bang release...stressful and large chance of you getting fired but once done you can fire the oldies and give the business team a completely new setup and team) or partial failed (that means large part of traffic would move to new system but some parts would remain...making the whole transition slow, painful and complex plus never ending).

2. Add newer strong and senior php members to the existing team. Ask new senior members to not fight with them but train them. They would listen to them as these guys would know more. Slowly add version control, staging-dev envs, add php framework for new code, add caching, CI/CD pipeline, bring on a automated test suite built by external agency etc. This would be low risk as business team would see immediate benefits/speedups. Rewrite portions of code which are too rusty and remove code bases which do not required anymore. This would be possibly take 5-6 years to complete, giving you ample job security while achieving results in a stable manner.


This is a lot. I have done a rewrite approach before. It is only one option, but if you're committed, it's probably the one that has the best chance to preserve your sanity. It can work, if you're clever about it.

The goal is to slowly build up a parallel application which will seamlessly inherit an increasing number of tasks from the legacy system.

What I would start with, is building a compatibility layer. For example: the new code base should be able to make use of the old application's sessions. This way, you could rewrite a single page on the new system and add a reverse proxy one page at a time. Eventually, every page will be served by the new application and you can retire the old.

I would stick with the language but pick up something more capable, ie. Laravel. This makes it easy to copy over legacy code as needed.

Godspeed.


"team is 3 people" and "this code generates more than 20 million dollars a year of revenue"

Whatever else you do, I hope you and the organization figure out how to celebrate that those three people are generating 20 million dollars of revenue (or at least keeping part of the machinery that does that running.

"I know a full rewrite is necessary, but how to balance it?"

Well, maybe...

How much code is it? How much traffic does it receive?

----

I would be looking at https://martinfowler.com/bliki/BranchByAbstraction.html or https://martinfowler.com/bliki/StranglerFigApplication.html


> generates more than 20 million dollars a year of revenue

and

> team is 3 people

and

> post COVID, budget is really tight

Why? All technical details aside if this can't be addressed I wouldn't even bother trying unless I owned stock.


For all we know it's a car auction website and it selling 500 $40,000 cars.

Like the company actually buys the cars and sells them. $20 million revenue, cost of goods $18.5 million.

For all we know this website could be replaced with eBay or a cheap car dealer SAaS website.


There's a lot of good advice here around how to approach technical improvements. Just as important is how your approach managing the situation in the organization.

One tip — don't complain to management about how "awful the codebase is" or "how you need to start over" (100% agree this is usually a terrible idea). Managers have been hearing this over their entire career as a technical manager (over time they can lose empathy being out of the weeds). It becomes an overused trope and management will start to see you as being problem-oriented.

I'm not saying don't surface the issues — management absolutely should have an accurate understanding. Instead, try and balance the good with the bad (and there will always be some good). Don't catastrophize — approach is as a manageable problem with quantified risk e.g. responding to estimation with "typically this is a straightforward problem to solve, but I've explored this area of the codebase and there are some challenges we'll need to overcome — the estimate will be larger and less precise then we want to see, and we'll benefit from prototyping/research/spikes to reduce risk of introducing serious bugs and come to a more accurate estimate".

You'll build trust by consistently delivering on the expectations you set around concrete features/task (including the negative expectations) then management will reach the conclusion themselves and will trust your assessment with any new project. Plus, management will ultimately see you as an incredible asset to help bridge the gap of the technical black box and their purvey.


I think doing a full rewrite is the least thing you should consider. You should think first the least risky things to this mess. Remember your team is small and the revenue is huge(20M/year lol). I will list some things to consider (importance order) and the others can consider more points, of course:

- git: More productive and more control over the code base and the each member team responsibilities. Don't change the structure of the code. If it's a monorepo, leave it as it is. Just create simple branches like, prod and dev. Consider putting nginx configuration into the repositories as well (since it's part of the application).

- Documentation Via Comments: In this part you should improve a little of culture in this team, new code should be documented at least using comments.

- Test environment: now you have a dev branch you can push all the code to this new test environment and test things without worries. If it's possible start to write configuration environment case it's needed.

- CI/CD: now everything is traceable by git, you can write a routine to deploy every branch on it's place. Some tools self hosted you can consider: Jenkins or Drone.io are great and requires almost no maintenance(no need to hire a devops to work on this)

- Database: you have test environment and ci/cd, now you can TEST(what a great news) your database migrations. In php I can remember of phinx to starting to write migrations for this application.

- Auto Tests: I think unit testing could be considered when adding new code. Old code just leave as it is.

If you apply at least 3 things of this list I think at this point you will see that's a rewrite could be not that necessary.


Unpopular opinion: this goes to show that you don't need no fancy microservices, distributed, asynchronous, highly available architecture to build a product that "generates more than 20 million dollars a year of revenue". No unikernels. No Kubernetes. Non of that cloud native mumbo-jumbo.


most software developers are driven by having marketable skills and that requires having strong opinions so they can grift their way past other recruiters and developers who do the same thing

playing around with an out of vogue programming language in a company monorepo is a waste of time, in comparison


I was in the same boat once upon a time. I actually overhauled it all but spent the next six months trying to convince 80 yr olds who self taught themselves visual basic (version 3? I think)that the code I made was faster, more sustainable, and ready to hire more staff to build into it. That was pretty much a failure, they agreed it was faster, but it just wasn't written by the guys who originally wrote it. Shucks. So here's what I did. I left. Sometimes you lose and can't win even if it's to save a company from itself. Businesses might choose to hire retirees as consultants for decades, well past the point where they can control their bowels in public.

Not saying this is your only option, but I am saying, if the tech work is hopeless, the culture is unreasonable, and it's not gonna change until two-three people go through it and are honest at exit interviews, you have to make an honest assessment of your goals. Last I checked the company decided to hire someone for 2x what I worked for, and that person put "open to new roles" on their LinkedIn a few weeks ago...


What I want to know is, is this actual software sold for $20 million a year?

Or does this software "facilitate" $20 million of revenue, instead of generate it single handedly.

What if were talking about a car sales website that 'generates' $20 million in revenue via selling 500 $40k cars?


It works, it generates a large amount of revenue

leave it the fuck alone


A product/service in this state is a ticking timebomb. The fact that it's responsible for that amount of revenue makes it more dangerous. There are probably tens (maybe hundreds) of vulnerabilities that either compromise the whole platform, or at least give access to all customers' data.

IMO there are three realistic approaches:

- Keep it in its current state with the intent of making as much money as possible until the timebomb goes off, and then run away. Just to be clear, I don't think this is ethical, but a lot of people would choose it anyway.

- Ship-of-Theseus it into a supportable state.

- Leave ASAP so it becomes someone else's problem.

IMO the first one is only an option for the people that run the company. For the manager of the Dev team, they only have the second and third options, because when the timebomb goes off, they are going to be the scapegoat, not the person running off to the Bahamas with a sack of cash.

I've seen multiple ticking timebombs like this go off in years past, and I was usually part of the heroic efforts to stop the money hemmorhages that ensued afterward. I strongly recommend avoiding it altogether.


There’s that joke graph about “happiness in the life of a thanksgiving turkey”, where things are going amazingly right until a straight drop to zero. It works “right now”, up until it doesn’t, or there’s an outage that you can’t recover from, or some bad code wipes prod and your backups are useless (in this case likely nonexistent).

It also sounds like it isn’t really working even right now - from what op claims the productivity is not at all able to meet the deadlines being imposed by upper management. Death by competitors moving faster with a better product is a real thing, and if the tech stops them from doing so, that’s a problem.

The best strategy probably isn’t a rewrite, as others have suggested, but “don’t touch it if it works” is frankly an irresponsible strategy.

I’ve worked in a team where poor core tech (along with a sort of emporer-has-no-clothes situation where upper management found it politically impossible to acknowledge the issue) directly killed the profit, although this was in the market making space which has a much more direct reliance on technology. they got into their situation with exactly the attitude of “if it works, don’t touch it!” and basically stayed still while the competition flew ahead of them. Their product “worked”, in that it did what it was supposed to, but iteration on quality of trading and strategies was next to impossible.


“productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

This business unit has a pretty aggressive roadmap…”

Sounds to me like it doesn’t work.


Honestly, I know this answer isnt going to be that popular in some circles, but.. yes. leave it. If it offends your sensibility so much that you just can't be 'caretaker' for this mess, walk away.

But if it's "working fine and generating heaps of cash" as far as upstairs is concerned, there is no way you play the 'refactor/redesign/replace' game and come out ahead.


Also go work somewhere close to your standards. From experience I can tell you that this is a battle you can’t win in a reasonable time table. There’s a reason it is the way it is and you can’t change those people.


This is the correct answer. I’ve heard enough stories on HN about nightmare-level codebases that churn out massive profits. One dude had his entire application in a single PHP file and was generating like 20k/month.


SOC2, ISO270001, HIPPAA, PCI


Code is always part of a larger business strategy. You are working for a small business that has found a way to leverage large revenue off of cheap talent. A full rewrite will destroy this business. Instead look for low hanging fruit like teaching the devs source control in a respectful way that is actually useful to their existing work process.


What do you mean by "without managing them directly"? Are you a manager or aren't you? If you're a scrum product owner, you are (or at least you set priorities).

You speak pf "resistance to change", from juniors? You are the change. You get to set the agenda, not them. Unless you don't, in which case you can't fix anything. But legitimacy comes not just from authority, but also from rigor. Anything you truly dictate needs to be 100% based in evidence and fact. This means letting go of implied a-prioris such as "PHP is bad" and "we must use a framework". The only real constraint is to keep the gravy train rolling.

So what exactly is your role, the thing you were hired for? If it's to manage, manage. If it's anything else, the best you can do is lead by example. But one way or another, you'll have to let go of some things.


It makes $20M a year. That sounds like great code.

I would: 1. Get it in source control without “fixing anything”. 2. Get a clone of the prod server up and running, vs a clown of the db. 3. Put in something to log all of the request/response pairs. 4. Take snapshots of the database at several time points and note where they occur on the log history from number 3.

You now have the raw material to make test cases that verify the system works as it did before, but for bug, when you refactor. If the same set of requests creates the same overall db changes and response messages, you “pass tests”.

First thing to refactor is stochastic code. Make it consistent even if it’s a little slower so you can test.

Once you can refactor, you can do anything. Including a full rewrite but in steps that don’t break it.

If you try to rewrite it from scratch it will probably just never be deployable. But you’d an rewrite it safely in chunks with the above.


Why is the junior non-productive team resistant to change? answer this and you might get an answer on the path forward. It sounds like this team wasn't responsible for this mess - but then they should be excited to try something better. On the other hand, if they think this is all great, then why is productivity poor? if this is all PHP code I'm not sure what the difference between front-end/backend would be - what is the mobile person doing on prod PHP code?



doesn't the strangler pattern assume it's already modularized within the monorepo?


Not if you treat the application itself as the module to replace. Essentially your "new app" starts out as an http proxy in front of the old one, and then you implement routes/modules in the new one one by one and replace the proxy w/ the new logic as it comes online.


Hire a senior engineer. They would probably 1) put the whole thing into version control, 2) streamline the deployment process so that specific versions can be pushed easily into a sandbox or production environment, 3) begin writing tests, starting with end-end and system tests in this case, and hook them into a continuous test harness 4) do an architectural review to identify major components that can be split off from the monolith in order to reduce the surface area to work off, start with the part that you will get the most bang for your buck, a business critical area for improvement. 5) add unit tests and integration tests for this and the main component. 6) Repeat as necessary.

I don’t believe this is a problem that can be solved with people skills alone. It requires senior technical expertise.


>> HQ has no real understanding of these blockers. And post COVID, budget is really tight

You are probably better off leaving. You'll have to solve a culture problem and a technology problem at the same time. Each step of the process will be an uphill battle and even if you do succeed no one will notice since the app will look the same. The appetite for change will only materialize once market share and revenues start to drop, at which point it will likely be too late.

You'll need to be the CEO or trusted executive to effect this kind of change. Trying to do this from a middle management or dev position won't work and will come at enormous personal cost.


Sounds like you're working with a previous client of mine.

The best solution - for me - ended up dropping them as a client. There was zero interest in change from both developers and management (no matter how senior).

We parted ways and I wished them good luck.

Occasionally I wonder what happened to the application containing 50,000 procedural PHP files. Yes, 50k. And no source control or off-server backup.


Yes. It's a pointless uphill battle to try to change people who don't want to change. The employees have a lot of leverage by not documenting the mess. If they leave you will take the blame.

Get another job ASAP. Let natural selection do its magic.


The "employee" lock-in played a heavy part too. No documentation meant they had significant leverage - it was an 8-person team who worked together for nearly 15yrs.


Hey boss, do you know that Jim is leaving? The backend dev, yeah him. I know, it'll be a devil to backfill his role. If only we had some sort resilience built in to allow us to bring in more team members to cover situations like this. We can afford it after all. And we don't want to stop printing all that money just because someone leaves. Tell you what, let me work on a plan, yeah, let me think about how we make the code easier to learn, maybe easier to test too so the new guy doesn't break anything while he's learning the ropes. Sure, no problem boss.


Unless you have a good reason not to, I’d quit. It’s highly unlikely you’ll be able to change the culture without a ton of frustration. It’s just not worth it unless you enjoy that kind of challenge or are being well compensated.


Haven’t seen anyone here mention instrumentation. Once you get source control set up, I would lean hard into metrics and observability, so you can easily identify and eliminate dead code, and also figure out what’s the most important.

Same for the DB - instrument your queries, figure out what your most important queries are.


It sounds like doing a full rewrite with the same team will just spend a lot of time to develop something just as crappy. This is unlikely to be a successful route.

There are some things it should be easy to sell to both the team and to management. First, adding git into the mix. Tell them it's like backing up your work forever. You can roll most changes back to the beginning of the repo, easily. I say most changes because rolling back the code won't roll back changes to the database.

Likewise, creating a preprod environment means you can make sure new stuff doesn't bring down the system before you roll it out. Yes, it will cost a bit more but having that extra assurance and the ability to do a little experiment is considered worth it by almost every other team on Earth.

If you can get those two things in place, you can make it policy that nothing is done directly on production because the risk is too high.

Then you can tackle refactoring code, a little at a time.

Focus hard on training the team. If they are as junior as you say, they need to learn good habits before their ability to ever work as professionals is destroyed. Don't explain it to them that way. Smile and tell them you just want to help them develop their careers, which should be pretty close to the truth.

Above all, keep your resume up to date and your ear to the ground. It sounds like you may burn out before all the work is complete. Have an exit strategy, just in case.

Good luck!


1. Grab a copy of Working Effectively With Legacy Code

2. You say you don’t manage the team. I guess you have some kind of ‘tech lead’ role. I think to get things to change, you’re going to need buy in from management and the team. If the budget is tight it will be harder to say ‘we need to invest in fixing all this stuff instead of whatever it is that actually makes money’. Whatever you do must have a good business case. It sounds like there needs to be better communication about the state of things with whoever in the business unit came up with the aggressive roadmap.

Perhaps a roadmap like this would work:

- First, set up source control and separate prod from however people are developing things. Hopefully this will reduce trivial outages from people eg making a syntax error when editing prod. I think this will be a difficult fight with the team and management may not understand what you’re doing. You’ll likely need to be ready to be the person who answers everyone’s git questions and un-fucks their local repos. You’ll probably also want some metrics or something to show that you are reducing trivial errors.

- I think some intermediate stages might involve people still developing in prod but having source control there and committing changes; then developing locally with a short feedback loop from pushing to running on prod (you won’t get but-in if you make the development process slower/more inconvenient for the team); then you can hopefully add some trivial tests like php syntax checks, and then slowly build up a local dev environment that is separate from proof and more tests. At some point you could eg use branches and perhaps some kind of code-review process (you can’t be the only person responsible for code review, to be clear)

- You’re going to want a way to delete old code. I think partly you will be able to find unreachable code and delete it but also you’ll likely want a way to easily instrument a function to see if it is ever used in proof over eg a week or two.

- Eventually, improving the dev environment enough may have already led to some necessary refactors and you’ll have enough tests that the defect rate will have decreased. At some point you’ll hopefully be confident enough to make bigger reactors or deletions and wean people further off messing with prod. For example moving some routing, bit-by-bit outside of nginx or perhaps using some lightweight framework.

- you should also get the team involved in making some smaller refactors too and they should definitely be involved in adding tests.


Agreed, especially starting with Working Effectively with Legacy Code.

One of the hard things about what we're assuming is OP's tech lead role is that they're having to influence changes up (with management) and down/laterally (with the team). Things that might convince the team are going to be generally different than the things that might convince management. Management will probably be more convinced by things like improved lead time for changes, eliminating risk of failure, improving confidence in correctness of feature roll-out (though some of this depends a lot on the industry / domain and the incentives of management). Meanwhile the team will be more convinced by things like making their job easier or setting themselves up with more and better skills to take a "better" job down the road.

The rub with all this is that if the team doesn't like OP's changes (e.g., using source control), they'll have management cover right now. At each step it's important to show why it's better.

A way to do that—hard to tell if it's the right way without knowing more about the team's dynamics—is to make a lot of the changes for your own work. For example, set up your own test environment, develop there, then using this mystical "source control" magic apply the safely-tested changes to prod. Eventually someone will notice that you're not breaking prod as much as everyone else. ("You" here being either OP, or someone else in the same situation.)

All of this is just nuance, politics, and team dynamics layered on top of the excellent recommendation I'm replying to.


1. The existing team clearly has done a great job achieving this level of revenue with a small team. Be sure to compliment them on that and realize that they probably have a very profound understanding of how the tech achieves the business outcomes. 2. Start with source control 3. Build up test coverage, document and understand all the endpoints and what business outcome they achieve 4. Build some small things on a new tech stack to train the team 5. Move over everything when the team is ready


I think the main thing, no matter if you're the team lead of just an individual contributor, is were you asked to fix the things you list? If you weren't, it's not your job. And even if you try, you will not succeed, because some people like things as they are (otherwise they wouldn't be that way) and if you don't have support from above then you won't be able to overrule those people.

One thing you could do if you haven't been asked to fix these things is to "provoke" management into asking you to fix these things. You could talk to your boss and ask them what they don't like about the current setup. They might answer that the velocity is too slow, that the software is too unreliable, has too many bugs, or they might answer that everything's fine they just want you do implement their new features. Be careful not to lead management here, you want to find out what they actually want, not persuade them to want something (that won't work, it won't be a real desire). If they do want you to change something, you can argue for some of the suggestions in this thread (e.g. introduce VCS) where you can clearly draw an argument from one of the desires e.g. problem "releases are too risky", solution "if we use VCS we have old versions and can roll back".

Basically you've been hired to do a job. If your job is to fix all this stuff, fair enough. But if you haven't been asked to do this (and you can't provoke them to ask you) then it's simply not your job, and you have to accept the situation or find a new job.


As others have said, you're in a precarious position: working with pure tech debt for a business that clearly has no interest in addressing tech debt.

First thing would be to use source control and get some sort of code review/release process in place.

Contrary to other suggestions of "then write tests for everything", I think that's bad advice. It's far more likely that you'll pigeonhole yourself and your team on complicated and unhelpful tests, particularly with dead code and trying to enumerate all the features that aren't documented. 3 things you could do in a short amount of time to radically increase the code quality:

- Lint all the code (php-cs-fixer is a good tool, rector can also help)

- At least start dependency management (with composer), even if it's empty.

- Introduce static analysis into the code review process (phpstan/psalm, in a CI preferably). Baseline suppression of existing errors are easy to generate.

Then personally I would try and aggressively purge dead code, which is easier said than done. Tombs (https://github.com/krakjoe/tombs) is a little awkward but can be helpful, especially if all there is is production. It requires PHP 7.1, I'm assuming you're below that, but the good news is that every route is in nginx; you can upgrade PHP piecemeal.

Again, handling tech debt sounds like it will be nigh impossible at this company, but modern PHP is really enjoyable and I hope you're able to experience it.


Not sure if this helps, but if I were you, I’d:

* Create a git repo from the code as it exists

* If the other team is still doing things live, create a workflow that copies the code from the prod server to git as-is nightly so you have visibility into changes. Here’s an opportunity for you to see maybe what the team gets stuck on or frustrated with, and you can build some lines of communication and most importantly some trust. You can suggest fixed and maybe even develop the leadership role you need.

* Get a staging instance up and running. If I had to guess why the team does things live, maybe the project is a huge pain to get sample data for. If that’s the case, figure out the schemas and build a sample data creation tool. Share with the team and demonstrate how they can make changes without having to risk breaking production (and for goodwill - it helps prevent them from having to work evenings, weekends, and vacations because prod goes down!)

* PHP isn’t so bad! Wordpress runs a huge chunk of the web with PHP!

* tailwind might be a cool way to slowly improve CSS - it can drop into a project better than other css frameworks IMO

* Pitch your way of fixing this to management while quoting the cost of a rebuild from different agencies. Throw in the cost of Accenture to rebuild or whatever to scare management a little. You are the most cost effective fix for now and they need to know that.


Can you just... Walk away? Not because of the technical challenges, but because:

- team is 3 junior people

- productivity is abysmal

- budget is tight

- resistance to change is huge

- aggressive roadmap

- management and HQ have no real understanding

I have never walked away from a technical challenge, but I've exited from management clusterfucks and have never regretted it. These people will block you, blame you for anything you break during the refactor but give you no thanks if you fix it (because they don't even understand the scale of what you're trying to fix)


The most important thing is that you communicate right now.

From the HQ perspective they make a lot of money with very few developers and all seems to be going well, with no problems at all. Judging by the spreadsheets this looks great!

Your task is now to explain to them the risks involved with proceeding forward. You can also present them a plan to mitigate that risk without interrupting ongoing operations too much and slap some money figure on it — ideally you present them three options where one of those is doing nothing. Be aware that the decision on this is not yours, it is theirs. Your task is to tell them everything relevant for that decision. You can also tell them, that your professional opinion is that this is something that ahould have been done years ago and the fact that this didn't explode in their faces yet was pure luck. But again it is their decision.

How you lay it out depends on you, but there have been many tips already. Version control might be the first thing. Maybe you can present it as: one day a week goes towards maintenance or something.

As an aside this helps to cover your own behind if nothing is done and everything goes south in a year. Then you can point to that extensive risk analysis you presented them with and tell them you told them so.


Ask yourself two questions. Why is it that the things are the way they are? What can I realistically change? Then determine the overlap in these. If there is none, walk away. It makes no sense to go for a rewrite without the underlying causes being addressed, you’ll be in the very same mess very rapidly. It makes no sense to replace the team without understanding why these folks have endured, who has hired them etc. Understand first and then make changes.


Agree on lots of the points folks have shared. Testing, fix small chunks at a time, more testing.

I'll share some less technical thoughts that I think hold true regardless of the approach taken (rewrite or not).

My experience with changes like this is that you need to be as transparent as possible to both parties (the devs, and your execs). This means consistent comms around goals, achievements, and crucially the challenges preventing the first two.

With any team, you are not going to win much by implying that their work sucks or the thing they have built is broken. While they might know it, a third party is just not going to get a good reception with that mentality. It will be important for the devs to understand why the change is needed from a business perspective (e.g. time to market is too slow to remain competitive, changing regs, etc.). The intent here is to focus the devs on what the hope is for the future as opposed to shitting over the thing they have poured their blood, sweat, and tears into.

With the execs, they need to understand just how bad of a shape things are in so they give you and the team the space they need to make a significant enough change that isn't just going to revert to the same mess as before. If you're dealing with tech background execs it might be a bit of a simpler set of convos. But if not, then you are going to have to illustrate for them how bad things are. One way I've done this is to first get an idea of what the execs want as the final state of the team / codebase / product (e.g. time to market is < 4 wks) and then draw them a picture/flowchart of what it takes to get that thing done in the current state. Could use some form of value stream map to do this as it combines players, states, activities, and also timelines.


I have experience in similar situation, not sure we are talking about the same project.

I suggested full rewrite and got fired in 3 weeks (actually it was a subcontractor role). I have been considering myself as be really good at presentations and persuading executive people to understand what I am doing and what I will be doing, but the situation was too much for me to take on. They didn't like this unrealistic 3-months roadmap to rewrite the whole thing, which does nothing on their point of view but still needs paying the whole team (even though I was the only one). So I told them we are gradually improving it, and did this full-rewrite underground on my own. It consumed me ~13 hours every day, but I was happy myself and was enjoying the birth of the product. Finally after 10 weeks, I gave up to myself and their frustration.

Regarding your problem, I totally suggest dumping your codebase into a git repo first of all, add some cypress/playwright testing to carefully probe the major functionalities, build ci for these, and start gradually removing old version files. After then, just forget how messy it was, what you thought in the first place, consider this beast as a perfect engineering gift (like linux kernel), then start making small changes then adapt yourself into it. Guide the team to follow your methodologies to treat the code, and tell the executive team that the legacy codebase looks great but complex enough to move quickly as it was a brand new startup project.


In 20 years in coding PHP going from the code style you described to what you want I have never seen it going well when a new tech lead comes in and wants to rewrite everything (always worked in similar settings though where human resources are limited).

This code is making $20mio, so something must be going well. Don't forget that a codebase like this covers all the history and knowledge.

So first make sure that you appreciate the work of the current team. As you write "resistance to change is huge" I would bet that the team doesn't feel like you're trying to understand them.

It actually reminds me of a client who I wrote an order system in PHP that made $15mio annually. As the client and me didn't get along anymore he was looking for someone to replace me, and found this new CTO who came in with "everything's shitty, nothing works, we need to redo everything". Obviously the client finally saw the chance of getting rid of me, only to ask me one month later to come back as they fired the new CTO. Seems like something was working all along :)


I won't repeat what others said but you have no idea how common it is for a newbie to come onboard a new team and think "everything is wrong, it needs to be fixed and I will fix it". No doubt the things you said are things that can be improved but they were done that way for a reason and you already mentioned budgetary constraints, so it sounds like the existing team made the best of what they were given and you will have to become like them and adapt. Improve what you can as opportunities arise but migrating away from a 10k line nginx to PHP might sound like a good idea for example, but, you will spend resources on something that won't make a noticable difference? Instead, you can implement the PHP router for new endpoints for example.

The worsr thing you can do is come in with that attitude and expect the team to be onboard. You will only alienate yourself, try to understand why things were done the way they were (never architected but put together piece by piece over time). Make them feel heard and and pace yourself with any changes.


> The worsr thing you can do is come in with that attitude and expect the team to be onboard. You will only alienate yourself, try to understand why things were done the way they were (never architected but put together piece by piece over time). Make them feel heard and and pace yourself with any changes.

Yep, I agree 100%, the last thing you want to do in this situation is piss off the three people who know how this thing actually works.

IMO the real thing OP needs to decide is whether he's willing to fix the whole thing himself or if he wants (needs) the existing devs to help. If he wants to go it alone then he can take any of the advice given here and do whatever he wants. But, if he wants the team to help then his main priority is to understand their current processes and how they get things done, and then look at where more modern practices can be introduced to improve things for the team and get them to buy in.

Sure he said their "resistance to change is huge", but mine would be too if someone joined my team and determined literally _everything_ needs to be changed immediately (even if it's completely true). I would bet they would be much more receptive to realistic suggestions after you get an understanding of their process, gradually building towards a better one. And if they're not, then OP should probably just go look for another team/job. It seems pretty clear that 'actual' management doesn't care about this (which is to be expected, I mean they apparently have a functioning product bringing in 20m) so as much as it sucks the situation is what it is.


Run.

What you need to do is full rewrite. You need business owners backing you up on this intention. From your description they don’t understand the scale of the problem. So that’s a dead end.

When they will understand that they have to halt all new developments for few years and drastically increase budgets for development team meantime, you can start thinking about how to proceed. But they will not.


Check if you're breaking the law. Those junior programmers may well have not thought to worry about privacy laws or industry regulations.

Then check for the most basic security issues like the database being accessible from the outside, SQL injection, etc.

Then set up monitoring. It's quite possible the thing is falling over from time to time without people knowing.


It sounds like the person asking this question is about as junior as the team is (in choice of words, ideas about rewriting and best practices). It's hard to give specific advice without knowing more about the situation, for example what role the questioner has in the team. It is a tech lead, a boss, or a project lead?

So let's stick to advice that is universal to all roles and I think most people who have been in similar situations would agree with. First, let's be clear about one thing: This situation isn't the least bit unusual. From the facts above it doesn't look very bad. The team is small, and you can all gather in the same room and communicate. The fact that there is no framework and no patterns in place is good given the circumstances, awful codebases based on ancient frameworks and legacy patterns are generally an order of magnitude more work to understand.

Second, be humble towards the team and the problem. After such a long time, there's bound to be details that you don't know, and you have to find out about them sooner rather than later. People may seem resistant to change, but understand their angle and work with them. The likely want their codebase to improve, too, even if they see other problems as more pressing. It all depends on what your role is, and if you intent to help out with the actual work or not. But again, this is a small team with a shared goal.

Third, start with the lowest hanging fruit. Personal opinions come into play here, but I probably would look at operational issues early. Get monitoring in place. Test backups (yes, really). Some key metrics, both application wise (on some key processes such as login or payments) and operational (memory, open files, sockets). Learn about version control and start using it. Get proper test environments in place (including databases and mocked external integrations).

Good luck! Things are probably not as bad as you think. This type of work is really quite rewarding, because results are quickly very visible to everyone.


What percentage of the $20M are you getting per year? If it's less than double digits then you should run as fast as you can and look for more fun and rewarding work.


As others have said, a full rewrite isn't needed. Go bit by bit.

Secondly the worst code you've ever seen is capable of pulling in 20 mil per year. Was the best? There is something to be said for success and it really makes me wonder about what 'good code' really is suppose to look like.

Granted a lot of what you're describing sounds terrifying.

If you want to deal with it first thing you should do is stand up a testing server and backup the code. Get some E2E tests in place to keep track of expected behavior. All of this can be done without removing a line of code and you can do it yourself while the team goes about their merry business. This is where I would start.


I have deep sympathy for OP. And reading comments I agree with the majority opinion to tackle a piece at a time rather than attemp a full scratch rewrite. Experience and Joel’s famous post bring me there.

But if I could hijack and m curious the HN opinion on a variant: what if the product never launched? It’s 10,000 files of spaghetti and dreams that just cant work well enough to put in fro t of customers?

I was brought in on such a project and the very kind business owner was under the impression they were close to launch because of all the features he’d seen demoed in isolation. But it was like a bridge built of gum and two-by-fours, spanning a massive gully but with a 10 foot gap in the middle, and nowhere near the strength to fill that last span.


Here is what I would recommend as a path, not towards changing "how we do things around here"(because that always gets negotiated against a certain business case), but as a way of gaining personal developer comfort(the actual nuts and bolts of the work):

1. Start adding logging all throughout, wherever changes are being made. That can quickly build up insight into what's happening where and gain confidence into what can be deleted safely. You want the meeting where you can show that an entire file is completely unused and has never once been called for months. It surely exists. Find it. Then say you won't delete it, you'll just comment it out.

2. As you make changes, start doing things twice: one in the way that patches the code as directly as you can manage, the other a stub into a possible design pattern. You don't want to force the pattern into production as soon as you think it works, instead you wait until the code hits a certain evolutionary state where you can "harvest" it easily. Think "architecture as a feature-flag". If it turns out your design can't work, nothing bad happens, you just delete those stubs and give it another go.

3. I would not actually worry about the state of the tooling otherwise. Backups for recovering from the catastrophic, yes. Getting the team on git, not as important. Adding standardized tooling is comforting to you because you're parachuting in. It adds more moving parts for the other devs. That's true even when they benefit from it: if the expected cost of wielding a tool wrongly is high enough to cause immediate danger, you can't proceed down that road - in woodworking that means lost fingers, in software it means lost data. You have to expect to wind down the mess in a low-impact, possibly home-grown way. There are always alternatives in software. And there are likewise always ways of causing fires to fight.

This job most likely isn't going to lead towards using anything new and hot. But if you go in with an attitude of seeing what you can make of the climate as it is, it will teach you things you never knew about maintenance.


This makes 20 million/year with 3 junior developers. So first, avoid fixing what you believing is broken just because it seems bad practice vs. the actual stuff needs fixing.

Ex: if you tell your team “drop everything you’re doing and follow my best practices”, it won’t be accepted, and business will ask why you’re wasting time. Instead, if you tell your team “we need to improve these calls making a cUrl request to it’s own domain because this is a performance/security issue that might make us lose those 20 million”, then you might have a chance of changing culture overtime after accumulating smaller wins. Keep doing this for every specific point of possible improvement, backing it with a business justification.


"I have to find a strategy to fix this development team without managing them directly"

Sorry, but you can forget about this. Because if you are not in a managerial position and do not have support from management no matter what you are trying to do to make things better could even backfire on you and could even lead to reprimands or in the absolutely worst case scenario getting you fired.

That's why the realistic old geezers around here would recommend people who are in a situation like this to please look around and try to find something better.

If you are in the situation that you actually can make decisions and have management support (or can acquire it) then it's a whole different story, ofcourse.


Even with the worst possible codebase, the act of it running stably for an amount of time is an accumulated value. And that value is a actually pretty big deal. And when you make major changes you reset that value back to zero even if your new codebase is beautiful and sensible.

Very gradual, well-tested evolutions is the way to go. If it were me I would add a LOT of unit and integration tests before I changed anything. I would also formalize the expected behaviour, schemas, APIs, etc.

You’ve inherited the Ship of Theseus. Believe it or not, this is actually a huge boon for you. 18 months from now your managers will look back and say, “wow this is the same ship?! I want you on my team wherever I end up.”


Rewrite is rarely the best approach. The best approach is to refactor it by order of importance. Break it up and go from there. If it's not broken, don't fix it.

The best approach is to:

-Assess the situation

-Create a task list

-Decide what needs immediate attention

-Create a time line for it all

-Get feedback from team

-Add the business roadmap to you list

-With upper management work on a timeline

-Define your project with realistic times

Execute and manage the project.

It took 12 years to get to this point so don't expect to change it overnight.

BTW, this type of team and codebase is not out of the ordinary. Companies start to program with the idea that eventually the problems will be fixed yet it never happens. Upper management does not care because all they care about is reducing cost and getting the results they need. You're dealing with the results.


The problem here is normalization of deviance across the entire organization. That has nothing to do with tasks.

The task #0 is sit down with the team and ask them to say in their own words what do they think about the project, about their engineering good practices, etc. See how aware are they about the problem they have created.

Try to understand how the status quo became normal and acceptable, before the same thing happens to you.

If this shit happened in the first place was because likely everyone was too busy living in their Jira alternate reality where you benefit from the perverse incentives made possible by the lack of visibility on code quality.


Good point. It's hard (impossible?) to fix the problem if you don't fix what caused it.


As others have said, approaching this with the mindset of "this all needs to be rewritten!" is counter productive, and will set you up on collision course with leadership.

I inherited something similar 12 years ago, also cobbled together PHP, also no separation of code and rendering - making any sort of progress was painful.

As others have said there are a myriad of ways to extend code like this, encapsulating the old with a better facade. Splitting some pieces off - but it needs to be approached as a piecemeal project that takes a decent amount of time, but can be done in parallel with shipping new features.


> I know a full rewrite is necessary, but how to balance it?

No, re-write over time. There's an extremely high chance there is complexity you do not understand yet.

> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

First immediate win, start using source control. Initially people can operate in the same way they have been, just through git. Slowly but surely clean up the old files and show people how they are not lost, and how it cleans up the code. The switch to more advanced code management practices, like master branch vs working branches, code reviews, etc.

> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

Make sure this is definitely checked into git. Ideally you look to simplify this somewhat, you don't really want to be so heavily tied to the server.

> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

A migration to a better database setup takes time. As long as there are no fires, treat it as a black box until you have time to fix it up. Just double check their backup strategy.

> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

It sounds like you are new to their team. You need to win hearts and minds. One small thing at a time.

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

Explain to them the code is like an old house. It has had lots of investment over the years and you have generated a lot of profit from it. The problem is, over the years, the foundations have crumbled, and despite the walls looks nice, they are painted over serious cracks. Whilst you could continue to use it how it is, one day it will simply fall down - unless time is invested today to maintain it.

They will then say "well, what needs to be done?". And you need quite a concise and well thought out way to respond to that question.


First leave the company, three junior devs supporting a $20m system isn't realistic.

Hey I've done this, everyone states just rewrite each part isn't really helpful.

You first need to fixup obvious brokenness, turn on error logging and warnings within fpm, next fix absolute path issues, next fix any containerization issues (deps, etc) and containerized it, next roll out some sort linter and formatter.

At this point you have a ci system with standardized formatting and linting now slowly part out things or do a full rewrite as you now can read the code make changes locally


How big is this code base and how advanced are the features? With only 3 juniors behind the wheel is it really that big? Was it always this small or is this the leftover maintenance team?

Is there documentation, requirements or user stories available for the existing features? Is it B2B or B2C? If it's B2B it becomes a lot easier to do customer survey of what is actually used and could help you remove half of the 12 year legacy.

Apart from the lack of source control, the rest of the issues, while being far from best practices, honestly don't sound extremely bad. Lack of framework or DI is not an antipattern in itself, even if it of course can be. Productivity of 3 juniors, split across one stack each, doing both operations and feature development on such a big application is going to be small even if using better practices. If revenue really is 20M and this code is critical, it sounds like you are understaffed.

Skipping the scm, deployment and process improvements, as others already gave good suggestions. Assuming you need to keep the existing code. One thing that has not been mentioned in static analysis. If the majority of the rats nest is in PHP, one thing you should do to add static type checking. This has zero effect on production and makes the code infinitely easier to navigate. This will expose how much of the code that is dead, how much is shared, what depends on what, etc. From here, refactoring will be a lot easier and safer. As others suggested you obviously need tests around it as well.


I wrote Modernizing Legacy Applications in PHP for exactly this situation.

https://leanpub.com/mlaphp

And it's (still) free!


Great rewrites are alomst always a bad idea. I took part in flushing 2 man-years early in my career and more recently 4 man-years. Both were rewrites that couldn't get to shore. Painful both times, moreso the 2nd time since I was one of the implementors.

It's almost always better to do small replacements. Peel the onion so-to-speak. Refactor from within first to make a migration plan away from the crufty tech possible.

First and foremost: make a plan and sell it to the devs. If you don't get buy-in from them, nothing will change.

Good luck.


Unless they’re properly compensating you, and I mean something like a revenue share on that mess if you fix it, I’d walk if I were you. The job is high risk, isn’t properly staffed, doesn’t have management support, and sounds intrinsically miserable. They’ll balk I’m sure, but point out to them that their only real alternative is contracting high end consultants who will demand even more.

If you do manage to get them to give you some points of the revenue from the project, then start with introducing source control and follow by building up a testing process and integration environment. I’d probably use tcpdump to capture a week’s worth of live traffic and replay it with a suitable speed up to replicate testing in production in your integration environment. That should give you serviceable integration tests. To start by writing unit tests will be pointless because it sounds like there are no discrete units to test.

From there you’ll want to apply some kind of strangler pattern where you incrementally replace pieces of the system. Doing that will require some refactoring to start separating concerns. Again don’t try to do it all at once and don’t try to make it perfect. Then you can start introducing unit tests.

Then there’s the database, which is a full size job in its own right.

And who knows what other unpleasant surprises await, but bank on them being there.


I think you've received lots of helpful feedback here. I've read quite some of it and can follow some thoughts. What I wonder is:

- Is anything broken after all? Yes, there are annoyances and risks, but in the greater schenem, everything seems to work. Is fixing really necessary or would you just feel better after that?

- What does the "aggressive roadmap" look like? Build another product? Double or triple the revenue from this product? I think this helps/defines how to handle the situation.

- Your job as middle level management (at least that's how I understand it, being in that position myself) is to shield your teams from direct hits with piles of shit, while getting them running to evade the stuff by themselves at some point. Seems like your team already did great things in building the product, now help them get better, one small step at a time. I think they can see the benefits in things like using Git but probably you need to help them make some room to learn it without fearing that upper level management thinks they are lazy and not doing anything...

- Leaving the company: Maybe that's a viable option, too. You can't save them all. And if you feel overwhelmed by the task and see no way forward, you should leave. That's not about being weak, it's about protecting yourself from an unmanagable task.


> Resistance to change is huge

This is the key point. Why is there resistence to change if everything is as bad as you say? How does tings look from the perspecive of the developers?

There is also a certain disconnect in what you are describing. On one hand you describe the developers as “junior”, productivity as absymal and it is inpossible to get anything done. On the other hand the code seem to be highly sucessful from a business perspective, generating millions in revenue. Something is missing in your analysis.


I would start with asking the backend and front-end developers to learn Laravel and begin using it. All code review should be quite strict. It's going to suck for a while, but that's just the way of it.

Set up two nginx servers. One that's your usual to load Laravel and the other to the legacy nginx server that acts as routing to the legacy application. I would even recommend using OpenResty to help delegate if you need something intelligent.

I would strongly discourage a JS framework that would add increased complexity when you need to keep things focused. The front-end would need to be recorded in Laravel and brought back over in a clean fashion.

Set up a CI and ensure all the code that goes over to Laravel is near 100% tested. Might also be useful to set up a visual regression test tool such as Percy to ensure everything moves over nice. Push for SMACCS and BEM to keep things consistent. Or just make new styling for the new pages to surprise the users.

Rewrites are a trap though and can be painful. Keep a balance of features entering Laravel and the big foxes entering the legacy app. I would recommend RabbitMQ to communicate between them.


Top of mind: - Teach the team vcs 101, put the code under vcs; do trunk-based - ask about critical places in code and add logging - implement dead simple feature/experiment toggles - set up an example: develop all new changes as simple functions using TDD - put yourself in „harm’s way”. Do a live coding stream where you show your team how you do it. - If they like it, offer to do pair programming sessions. - add composer and start moving dependencies there, one by one - repeat 100x

That’ll be 200$ lol


I'm sorry to hear that. Must be a terrible situation. I've seen similar projects, at least in some dimensions. Here's what worked for me & observations:

- Large fraction of features are unused. Have internal analytics that will answer you which features/code paths are used and which are safe to delete/ignore. It's much easier to migrate xx% of features than have 1:1 parity.

- Lack of tests is a huge pain. Makes incremental migration near impossible. Find a workaround for it before jumping to migration (forcing huge code coverage increase for already submitted code never worked for me in the past)

- See if some parts can be proxied. Put proxies in place and migrate features behind it (in one past project, the logic was split between stored procedures in Oracle DB, backend code and js code -- which made it possible to proxy stored procedures and break the migration in milestones)

- Hackatons are great tool for exploring options, uncovering blockers and dedicating a large chunk of focused time. Make it clear that the result is experimental, not that it must be merged to main. A nice way for introducing frameworks, vcs etc. without high friction.

The rest depends on the management support, the teams aptitude, intake of feature requests & bugs, the difficulty of maintenance etc. You are the best to judge how to approach there.


First thing: talk with both the management or whoever non-technical you need to report to, and explain the situation and negotiate timeframe. Then talk to the techies too and kindly tell that things need to change immediately.

After you've got a working time window for getting things right, prepare a workflow that should take half the time you've discussed, as it will probably take twice the time than anticipated. (if you've negotiated on 3 months of fixing the mess, assume you have only 1.5 months or even 1 month and prepare 1 month's worth of work)

Then I think the very first thing should be moving to Git (or other SVN), setup development/staging environment and using CI/CD.

After making 100% sure the environments are separated, start writing tests. Perhaps not hundreds or thousands at this stage, but ones that catch up the critical/big failures at least.

After it start moving to a dependency manager and resolving multiple-version conflicts in the process.

Then find the most repeated parts of the code and start refactoring them.

As you have more time you can start organizing code more and more.

It sucks but it's not something that can't be fixed.

Also finally, given the work environment before you came, it might be a good idea to block pushes to the master/production branch and only accept it through PRs with all the tests requiring to pass, to prevent breaking anything in production.


You have encountered a problem too big to solve alone, so your solution involves gaining the right allies and persuading them to do the right things. So, however frustrating this is, remember first always to be kind, to your team and to your management.

To make allies of senior management, you need metrics. You need to show, concretely, how current operations put revenue at risk and make the incremental investment necessary for their roadmap items prohibitive. If you can swing a penetration test, they'll probably find plenty on a stack like this. Then you have a security justification. If not, get the best monitoring stack you can. Demonstrate reliability and performance issues. (As well as reliability and performance improvements.)

From there... I'll say the #1 tool I've used in situations like this is Fastly. VCL is way more flexible than your 10k line nginx rewrite file (I've been there, too). And the edge caching will paper over minor outages. Rollbacks are easy. Rebuild your site piece by piece and stitch it all together with a reverse proxy at the edge.

Advice: propose a "canary" portion of the site to rebuild, and make it the lowest revenue / highest complexity thing you can. Once you stabilize the cash cow, getting the buy-in to finish the job and deprecate the old code base will be tough.

I'd also advocate for adding 1 incremental engineer to your team. Make it a senior dev and interview specifically for people who have done this sort of thing before. Your team needs a hands-on mentor in the trenches with them.

Best of luck. It isn't easy, but it's rewarding.


Rewrites aside, (it may or may not improve or work) but the thing I have noticed/learn about HUGE codebases which are 'bad' is to implement "feature-options" and have an "unbreakable rule" about coding to a client/actor/instance.

Nice thing is you can start with the current codebase, and add in these, it will make the rewrite a lot easier since your capabilities/feature-configs are already extracted.

Example:

If your product/code serves multiple customers, you should never have:

    if (customer-id==123 || customer-id==999) {
       //do things this way
    }else {
      //do things the other way
    } 


 instead always aim for feature-options (config?)

  if (config-feature-a == true){
       //do things this way

    }else {

      //do things the other way

    } 

If this seems not related to your codebase or product, you just need to dig deeper, it's usually there in some form or another.

PS. If you think the above is 'obvious', you have probably not seen an old enough (or bad enough ?) codebase, few coders start out with the bad case, the bad-case (coding to a instance/customer) are those 'quick-fixes' that accumulate over the years.


A good start would be the easy to fix things that can be done in a day: - Start using git.

- Start using migrations. (build migration file from current DB)

- Start using CI/CD. (Run migrations, pull/push PHP files, add new nginx routes and reload nginx)

- Start using docker for dev env.

Then I'd focus on the application itself, and that will probably take some days/weeks of work if the routes are complicated or the PHP version is very old(12 years/PHP 5.2 or 5.3 should not be too much work):

- Upgrade the codebase to PHP 8.1. (Reactor might be useful here, but PHP code is generally not hard to update.)

- Consider doing routing using a PHP entry file instead of nginx.

Then new features can follow whatever pattern you want, and old code/DB-schema can be upgraded as you go.

Many of your points are non-issues: PHP is fine. You don't have to use a framework. You only need caching if you need it. PHP itself does not necessarily need a templating language.

The fact that there are a bunch of route entries in nginx suggests that there at least is some form of pattern, for something.

A full all-at-once rewrite would probably break a lot of things at once, so I would just do the low hanging fruit first and modernize the codebase as it's being worked on.


If you don't manage that team and "the thing" is working, you need to define what is the problem you are trying to solve and why. You are describing a software that is written in a naive or obsolete way, but other than doing development in production there is no critical problem that you can fix to bring immediate value.

I saw this in the past a few times. There is no universal recipe, if this is what you are looking for. Get some development and stage environment and make them use Git, that's a start. See what is the plan for that software, maybe the company does not want (you) to waste time and money with it, if they want to do something, discuss and align that.

In the end, if it works it brings value. If you want to rewrite it, it will bring some value and some cost: which is bigger and what is the priority, a rewrite or new features?

One more thing you can do it show the developers how to do some things in a better way, like composer or cleaning up versions and dependencies, but take it easy and present it to them in a way they will buy it and do it themselves, not because you told them so. Make them better and they will make the product better.


You didn't state your position within this mess. Why is it your problem to fix? Hopefully you are being paid well, if not I might just move on immediately.


I wouldn't change the structure. Purify the codebase into pristine 2003 php, with a sane toolset. You'll learn all the quirks of the problem code as you do this.

When you've got a clean base, the team will be moving quicker, be more skilled with what they already are learning and listen to you. Then you can consider the structural changes.

Pure, clean 2003 php into a new format is way easier than spaghetti nightmare into total re-write.


Kill it with Fire, by Marianne Bellotti is an excellent resource on this question. She addresses the team dynamics, corporate politics and technical side of modernizing legacy systems.

https://www.penguinrandomhouse.com/books/667571/kill-it-with...


20 years old, no source control and 20 million in revenue? What is this? Maybe I can make a competitor

I'm guessing this is a medical billing system of some sort, lol


Quit. Not because I think it's too big of a mess to fix, but because you don't fit. You won't change them and they won't change you.


1. Find out the "real version" of the code.

2. Find out the "real version" of the sql schema.

3. Make some method of running this code + nginx config locally.

4. Add a test framework which simulates real traffic you see on the app and make sure the DB contains the right thing.

5. Make a staging environment which mirrors traffic from prod and run changes there for ~1 week and manually audit things to make sure it looks right. (You'll only do those until you feel safe)

Now you can feel safe chanting things! You can tackle problems as they come in. Focus 10% of the time of devs in new features. Focus 90% on reducing tech debt.

Lots of dead code? Tackle that.

Package management hard? Migrate to composer.

Don't do everything up front. Just make sure you have a way to change the codes, test those changes, then push them to prod / staging.


> Resistance to change is huge

I wonder what you proposed, how you proposed it, and to whom?

If it's to the business unit I'd go with stuff like "If we make a mistake and it brings the site down it hurts income, so we should have source control and dependency management, automate deployment…" etc. They think their ideas will make more money than yours and they won't be reasonable about things they don't understand. Everyone understands big screw ups and websites that are down.

Once you have that, you can kill two birds with on stone by documenting all the APIs using using integration tests. Use the same fear of destroying income argument.

Once you know the APIs you can chop things into pieces and improve code and put boundaries around tasks. You can start to cache things because you know what the API behind it expects. Then you can build new APIs with adapters behind the cache and slowly introduce them.

You can build the stuff the business unit wants.

If you can't excite your developers with the possibility to design and build new APIs like that, then:

a) you need to brush up on your "soft" skills

b) you need to move on or ask for more money/perks


>>>>Stop the bleeding<<<< 1) Source Code Management (branch, merging into main on approval) 2) Paired programming (start learning) >>>>Learn a better approach<<<< 3) Read "Clean Coder" as a team 4) Pick the best, next practice to fix (like removing commented out code) >>>>Do better<<<< 5) Refactor, refactor, refactor

I agree with the sentiment that a full rewrite is a waste of time. The team needs to learn better practices, together, or any rewrite will fall into the same pattern. We've had great success doing side-by-side upgrades (from AngularJS to React as an example). > All new features (screens) build on React (newest) > Run them in the same path, so it looks like a single app > Each sprint has additional upgrade work to start porting over to React > Use the customer and usage analytics to refactor screen, flow, function, while rewriting


In similar situations, I've done full rewrites and I've done refactor over time.

Refactoring over time is by far the least risky, and is where you should start. And the start of that is understanding the scenarios and getting tests in place. At some point, you'll know the refactoring is working, or you'll know a rewrite is needed.

But that is just the technical side. Most of your risk is not there.

As others have mentioned, you need to get your new bosses on board and aware of what the situation really is in terms they understand (specific business risks, specific business opportunities) and make sure they have your back. You will be the first to go if they are taken by surprise. They need to understand the jeopardy to business that already exists, and that while the team has reached a point of relative stability, it is perilous, and some risks will need to be taken to get to a point of actual stability.

The other main risk is the team itself. What do they value? Is it in line with where you know things need to go? If they walk, who will maintaing the beast?


Quite a bit of replies with various good suggestions in a short span of time. However, I could not understand what exactly is the problem you are trying to solve?

A $20m/year is pretty impressive with that kind of spaghetti code/tech amalgamation. It would be certainly a fun project for your more junior developers to dig into it and understand the actively used features. That raises my next question: what exactly is wrong with your 3-people development team? Are you expecting only 3 of them to make major changes, let alone a full rewrite for such a project?

The way I see it is that you only have enough development resources to make minor changes or features that fit in the project's current spaghetti framework. Is that what management wants? If they want some big new features your only option is to find path of least resistance to implement them, especially if your budget is tight. Basically, add more hack-fixes and continue feeding the monstrous legacy. Unless you get more people, more budget you don't really have a choice of doing things "the proper way".


Take a look at The Strangler Pattern, coined by Martin Fowler. That approach lays out a technique for replacing each part of complexity into a modern codebase by "strangling" each unit of functionality individually.

It takes time, but the outcome is a fully tested version of the already production-tested software, and there's no need to maintain two versions.


First of all start by preserving the application state to the version control.

Then start thinking about replicating deployment of the application. At start it can be a script that compresses and extracts the files to the production environment. This will benefit you to build similar environments or other more experimental development environments.

Once you have a flow of the application state management and deployment under control, you can start building on top of it.

The most valuable work would be to build a separate test suite that documents the most mission critical parts of code or application.

Only after this I would try to reason the changes to the application. The great part is that you have Nginx configuration as an abstraction layer. From there you dissect the application to smaller pieces and replace the application one redirection call at time.

If the application has an expected lifetime of over 2 years. Then these changes will pay themselves back as faster development cycles and maintainability of workers and codebase. This can be a selling point to management for roadmap or recruitment.

Good luck.


Get out. I’ve been there and tried in earnest fixing things. Without management’s understanding and an “aggressive roadmap”, you’re doomed to fail and burn yourself out in the process.

It got this way exactly because management doesn’t see the point or the problem. The fix isn’t technical (not yet), it’s cultural and strategic first which isn’t something you have control over.


I'd be curious about the premise under which they hired you. Did they hire you to re-do the application, knowing it was not in great shape? Or did they just hire you to up the team from 3 to 4, imagining a relative boost in productivity?

Also, the "without managing them directly" is interesting. Are you a peer of the existing three team members?


Just quit. It’s not worth saving these people if you’re not getting paid to.


What is your true objective? What will you be evaluated on?

Focus and think of any other improvement you could do.

It sounds like management doesn’t think there is an actual problem to solve, so I wouldn’t necessarily pick refactoring or rewrite as the hill to die on.

If you go the refactoring route, i have little advice:

0. Clean up the database, it will immediately impact performance and make management happy

1. Find vertical (feature-wise) or horizontal (layer-wise) architectural boundaries and split the code base into module, separated libraries. This will be an ongoing process for a long while. Do it by touching as little code as possible - this is pure scaffolding, actual refactoring comes later.

2. Stick with PHP, at least until results from #1 aren’t good enough.

3. Use testing as a tool to pressure management, it works a surprisingly large number of times

4. Rewrite one feature/page at a time, once results from #1 indicate a good candidate. It might be a good idea to introduce a new language at this point, or even some form of micro services (if it makes sense).


To be honest. I would run off. This is the kind of hell where nobody wants to work. Where only "experts" know how things work and how to expand or fix things. It will secure their job but will make yours hell.

You personally will gain no knowledge there, just that your codebase is hell.

You cat try to convince the management of creating a new gen implementation. Not a rewrite. New software, that can fulfill customer needs better. Compete better, is safer and better to extend to do all this in the future.

One thing you can do though is to immediately set up modern practices. SCM, Code Review, CI, Tests (most of the code might not be unit testable in this state, but some tests at least) - This way you can see what others do when they add if fix something and learn better (SCM, Reviews), make changes and know that you did not break the whole thing (Tests) and have CI to at least ensure the tests run and everything works and it will glue all together.

Good luck


Likely just adding to the general consensus here, but realistically it would just add extra confidence around the suggested strategy.

It would be incredibly unlikely to convince management to stop the roadmap for a full rewrite unless you can really give some solid evidence and numbers to show the rewrite costs less than the effort needed to get new functionality added reliably into those parts of the system with issues. For a large system that would be basically impossible. If not able to pause the roadmap, trying to continue development on new features and making sure the new code base is kept synchronized will just be a nightmare.

Like many others have said, the most likely strategy that will get a successful outcome would be to:

- Get some automated testing for key business flows in place. These act as documentation and contracts for the basic business functionality that guarantees that revenue. These then act as safety net for when refactoring is taking place.

- Do targeted refactoring either as part of a 20% tech debt reduction budget your work into your roadmap planning, and/or factored into new feature estimates (fix as you are in there changing something)

- Get the basic structure and processes in place early as those will likely be possible to set up without a big, or at least minimal, impact to production (source control, branch management, PR process, coding standards, CI, deployment process)

It will take time to get through the whole source code, but you would be seeing incremental improvements over time at least. Plus, you can at least still manage to continue with the roadmap with adjusted expectations a bit more easily.

I have gone through a few different projects where it was either a full rewrite with new features only going into the new code base, full rewrite being kept synchronized with existing codebase receiving updates, and the incremental rewrite and the incremental generally will make the most sense.


What do you mean you have to figure out a strategy without managing them directly? Why are you involved then?

How could the budget possibly be tight if this thing makes $20M a year?

Even at a 5% R&D budget you should be able to hire at-least a couple more devs.

Do you mean to say the whole company makes $20M? If not, what other costs are associated with producing this revenue?


Read The Phoenix Project, a book which fictionalizes this issue. As a novel, it covers not only technical and process steps, but most importantly how to talk through the negotiations and emotions that come up along the way.

https://g.co/kgs/Mq634e


> this code generates more than 20 million dollars a year of revenue

> aggressive roadmap

> budget is really tight

Leave. If you care about the space, start a competitor.


As I see it, nobody in their team, except for the OP, sees this is as a problem. IMHO, any software exists to solve a real-world problem. It does not exist solely for it's software architecture, for it's tests, for it's UI or for it's maintainability. Unless the stakeholders of the organization don't give value for the time taken to roll out new features, or think that constant bug-fixing is in the nature of software, then they don't have any value for that software.

This is pretty apparent since they seem to be earning 20 million dollars with a software managed by three junior engineers.

My advice to the OP - if you value good software engineering, this is not the organization you should be working for. Because no matter what you do, your effort will not be appreciated and you'll be replaced with a junior developer as soon as the management deems it necessary.


> ... fix this development team without managing them directly ...

That is your core problem. If you are not directly managing then how can you bring about any changes?

If HQ management can't see the problems you see, then you are unlikely to receive any support for the changes you are contemplating.

Your number one problem is politics not technology.


This is startlingly similar to the situation I'm in, where a solo PHP developer of 15+ years didn't take one moment in all that time to manage the complexity of his code or factor it sensibly. I expect the next year to be pretty awful as I try to get the system to a stable and maintainable state.


PHP is great in a sense that you can easily combine legacy and modern code in the same codebase. Just do a new_index.php for all new stuff. I'd start building new features and features that are in active development in the 'modern way', and just keep the legacy code as is. When the new way has been established, and the team has accustomed to it, it becomes easier gradually rewrite old parts, when necessary. You might find out that lots of the old code doesn't need to be rewritten, but can be managed as a legacy part of the app which means mostly frozen but working code.

You should also understand the audience. Who are the users of the app? It sounds like that the app does not need high reliability or availability, or any of the stuff that's required for typical mass market web apps. Understanding this might give you some room to improvise.


> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

Sounds like someone needs to push back against management first and foremost. Without this understanding the only thing you'll succeed in doing is denting that $20m revenue stream with very little appreciable benefit and the higher-ups will understand even less what you're up against.

Get that message right first and small doors may open to better budget. Then approach as others have said, piece by piece, or as Martin Fowler describes as a StranglerFigApp (https://martinfowler.com/bliki/StranglerFigApplication.html)


I know the typical piece of advice is to never rewrite but I am going through a similar situation and a rewrite would probably have been better and simpler.

The key is though, you don’t rewrite the code, you rewrite the app. Figure out what the functional pieces of the app and what it’s supposed to do. Don’t use any ActiveRecord style ORMs, so Laravel is out. If the app is that bad then SQL database is probably a huge mess. If it had to be PHP, use Symfony and Doctrine.

Build an MVC version of the application.

If there was any sort of structure to the application then the refactor not rewrite approach would be correct but if it’s anything like what I think it is, it’s a fucking mess. Refactoring will just make a bigger mess.

If you can get away with refactoring pieces at a time into symfony components until you can eventually have an MVC framework then do it but likely that would be a much bigger task.


You should absolutely quit and work somewhere else. You're not going to learn many useful things, at best you'll have a horrible time, not improve the company's bottom line, so they won't care ane you won't be rewarded.

It could be much worse. You could break something and cost the company money.


I've been in quite a similar situation, in which we tried to redo everything without proper knowledge. That was a catastrophe. Please, don't promise a panacea to the managers - better code is not more profitable, necessarily. New mistakes will be made (inevitably) and old stuff may become unstable. Also, it's not uncommon that developers bail out due to pressure. Have your team prepared for baby-steps. * Before doing any actual work, I'd suggest everybody reading Clean Code and Clean Architecture. You'll have a better understanding of SOLID principles by then. * Start by adding version control and a separate environment for development / testing. * Try refactoring the least important things first. If they crash, it won't be so critical. The most complex modules will end up with more quality.


Start with writing integration tests. Worry about touching the code only after you have a full test harness. Using an external tool like Playwright, Cypress, or Selenium you can write the tests in a language of your choice without touching the code.

Deploy the code into a staging environment (make a copy of prod). Kubernetes might be useful to try to package the application in a replicable manner. Then get the tests running on CI.

When the tests cover literally everything the app can do, and everything (tests/deployment) are running on CI, changing the app becomes very easy.

Your junior coders no doubt have been yelled at many times for attempting changes and failing. When they begin to understand that change with breakage is possible, their confidence will increase, and they will become better coders.

Resist the urge to change the application at all until you have tests.


I think you already gave the answer to your self, but didn't realize. You have two options:

1.) Leave this mess behind you and quit - and miss an opportunity to learn a lot about code, yourself, teamwork and solving real world problems

2.) Work together with your team and solve problems, that probably will improve your skills more than anything in your future

I recommend you to give 2.) at least 6 months, before your quit.

What I would recommend:

- Create a git repository (I would not init it on the production server, but copy the code over to your machine, init, experiment a bit, and if you found a reliable way, repeat this process on the server)

- For the first weeks, continue developing on the server with one main branch, but at least push it to a central repository, so that you have a kind of VCS

- Setup a dev system, that points to a cloned (maybe stripped down) prod database, where you can test things

- Add composer in dev and see, if you manage to migrate this to production

- As you said, you already have an API, that is called via curl. That might be the way out of your mess. Create a new API namespace / directory in the old code base, that is fully under version control, uses composer and as little of the OLD mess of code as possible (you won't get out of this with a full rewrite). Write unit tests, wherever possible.

- I recommend to use jsonrpc in your situation, because it is more flexible than CRUD / REST, but this is up to you

- Get SonarQube up and running for the new API and manage your code quality improvement

- New features go to the new API, if possible

- Start to move old features to the new API, create branches and deploy only ONE folder from dev to prod: the api directory

- The database mess is a problem, that you should not solve too early...

This should take roughly a year. Have fun ;)


I would leave the company and search another workplace where the codebase is not dogshit.

There is simply no solution to this problem, so you better leave and go to work where things are actually handled by professional engineers and not some non-dev shitty manager. Simple as.


Three engineers have built and supported a codebase generating 20 MM in revenue a year. Maybe get off your high horse, it’s really easy to mistake a complex order for chaos from the sidelines.

Rewrites are almost never the answer unless you wrote the previous version. Sure, to most of us here the code you’re describing might look like garbage, but it works and certainly a ton of wisdom has been embedded into it that will be difficult to replicate and understand unless you dive into what exists now and try to work with it on its terms for a little while.

I did a major rewrite early in my career based on something someone else built, and it was a total disaster for a while. I thought I knew better as an outsider looking in, and sure, eventually we did improve things, but a lot of my choices were not best practices, but some form of fashion.


No source control? There's "it works" and then there's "they got lucky enough to not implode yet".


The NGINX config description actually had me laughing out loud. That one seems particularly heinous.


You need observability in order to make good decisions. Get that mess into source control, and the. Start instrumenting it. Spend what meager budget you can get from upper mgmt on instrumentation.

Writing tests is great, but how do you even write good tests for spaghetti code like this and have faith in them? Answer: you can’t. But you can instrument your spaghetti code so that you have a fighting chance of seeing what’s wrong when stuff breaks.

After a year or so of instrumentation, small bug fixes, and fixing the absurdly stupid stuff, you’ll grok that spaghetti mess well enough and have enough political capital to be able to start refactoring great whacks of it. The strangler fig pattern mentioned earlier smells like the right approach, but you won’t really know until you’ve really grilled the codebase.


Better look at it from a career perspective. Is there any visibility for upper management of refactoring/rewriting this mess? For them it might look like a bunch of months without any new features. Think short-term. Your goal shouldn't be to maintain that code for a long time. Add some fancy features and present it to management. Get a promotion and switch companies.


Nothing wrong with what you describe, all successful long running projects stink up. I would look into the “strangler pattern” and start from there. Once the team understands the benefits of clean code they will give their full backing and you will he able to gradually evolve the codebase and practices. Resistance to change comes when people feel threatened. These “junior” developers have been maintaining a cash cow for quite some time. I would change the wording and intended actions in such way that they understand they will also benefit from upcoming changes in tech and practice. Those that are genuinely dicks should be let go but only after they’ve been given a genuine chance.


You're making 20M, you spend probably less than 500k between you and 3 juniors.

I've lead rewrites in worse circumstances (larger codebase split in 30 microservices, 15 people across 3 teams, making just 2M per year!) and I don't think you can do it with your current team. In the above examples we downsized the teams to 1 team with 4 people and then rewrote to 2 rightly sized services.

The new team was all new people (introduced gradually), while we shifted out the previous employees to other areas of the business.

The bottom line you have to use with management is you need a more senior team. Hiring seniors is pretty hard nowadays and it doesn't sound like you can offer much of an environment.

Get a good agency for 1M / year and let them work with your team to understand the ins and out and then replace them.


Upper management need to understand the problem and the options and need to buy-in on whatever you want to do.

Practically: cut the bleeding, get the current team at least using version control and working with a CI environment. That will be a lot of effort (been there before with a similar .Net product but much better team).

Then you're going to need significant resources to re-build on a modern architecture. I would simply go with releasing another product if that's at all possible. You clearly have some market and channel to sell into.

Just beware: this sounds like a problem which will take 3-5 years to solve and whose chance of success is dependant on organisational buy-in. So you need to ask yourself if you're willing to commit to that. If not, quit early.


I used to work on a 15 year old PHP codebase. Everything was mixed in the same file - PHP, SQL, CSS, JS… These are some things we did

Start using Git

Start doing code reviews, for newer code

Only refactor as needed. Don’t rewrite, it will likely end in disaster (we tried and failed)

Start deleting dead code. If you’re paranoid, comment it out for a few releases, before deleting

It is all about ROI - for example, removing inline CSS might be good practice, but does it really matter that much in your codebase? Maybe there are better things to do.

Even when refactoring, try to do it in stages. For example, simply splitting a large file into two or more files, without changing the code too much might be a good start.

For any new code that is being written have strict code reviews and rules in place, so past problems aren’t repeated


Relative to this situation are you a Junior in this as well? I don’t mean to downplay your experience, but as you judge the devs you must also judge yourself.

Your comment about productivity of the dev team is a red flag for me. They’re charged with containing this 20m revenue engine, it probably stresses them out big time. This is not the time to count feature development. When you’re treading water you don’t punish the survivors of the titanic for not also doing laps while they wait to be rescued.

Given you’ve made no comment as to expanding the team, I can only assume the business owner wants to make more money without investing in this product. There’s no magical advice that will unfuck the executive level if that’s the case.


Your premise is wrong. You have not inherited the worst case and tech team you have ever seen.

Two reasons for this: (1) You haven't inherited anything, the business owns the code, not you as an individual. You and the tech team need to work together to to make sure the code keeps generating revenue, and possibly more. No one is owning other people or teams. (2) The code is generating $20m annual revenue. That's pretty cool and not bad at all!

I'd follow the following steps:

1. Start by defining responsibility areas: input, code, output (business value). Any codebase can be modelled in this way. Once you have explicitly defined input and output of your code base, you know what your degrees of freedom are as long as you don't mess with input or output of your application. Also a good way to get to know the stakeholder landscape.

2. Introduce version control, move everything to Git. Git enables a nice way-of-working that is recognized industry-wide. Team work is everything.

3. Start writing tests. Preferably E2E tests that will be stable for a long time to come. In all cases, don't disturb the revenue flow with your changes. This will help you to make changes without having angry coworkers in your mailbox when your change caused existing functionality to break.

4. Fix the low-hanging fruit first. Define a list of maximum 5 issues that can easily be fixed in isolation and will improve the code base. Be sure everyone understands why and how things are done. This will boost team ownership.

5. Improve the codebase step-by-step. Be sure for every improvement to explain why it is important in terms of business value. If you can't explain it to yourself, maybe you are just fixing this for esthetics and it's not really important at all.

And finally, don't go for a full rewrite. Rewrites always seem easy, until you remember that you forgot to take into account all the edge cases the original code base did take into account and it's not as simple as you've thought after all. Instead move parts of the code to a new codebase and migrate slowly from v1 to v2.


First things first - treating code as an absolute asset is a mistake. Same outcome may be produced by different codebase. Second - present team goes away after rewrite - it is what it is - code is derivative of a vision and team. Freeze of development of legacy is strongly recommended during rewrite. Now - rewrite: You slash an app into segments on principle of cause-and-effect - input data produces output data, which serves as input for next segment. Then if you are risk minimizer - start with tail of cause-effect chain, and rewrite this shit. Continue segment by segment. After each segment reflect on need to refactor new codebase.


> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight

Listen to he more experienced people in the thread. They have good advice. Probably ignore the people who were on one lucky project that worked out with a risky full-rewrite.

But, the business' ambitious but naive plan is not viable, and it's your job to communicate why, and figure out how a less ambitious series of slower goals could be achieved. If I were in this position, as an IC, I'd literally just refuse to shoulder the stress of naively agreed upon deadlines etc.. because it wouldn't be feasible unless I risked burnout for probably the not-enough salary.


No full rewrites place.

I will start by properly correcting the NGINX codes.

I feel solving that will provide a bases to rewrite the other parts or the codebases.

Find new servers and backup everything onto the server and do the changes there including tests, and move successfully ones to production.


First read the book: Modernizing Legacy Applications In PHP Get your code under control in a series of small, specific steps By Paul M Jones https://leanpub.com/mlaphp


GTFO. Can't be saved and rewrite is too expensive.


A little over 2 years I came into a very similar situation, however in my situation we had none of the original developers working on it anymore.

At least from a technical perspective, the key to making this manageable is not a re-write, that’s probably the worst approach especially if you have little to no buy-in from above. From a business perspective, a re-write provides little to no benefit and will only be a large cost and time sink, so you will never get buy-in on that anyways.

The key here is slow, progressive improvement. For example, get it in source control, that’s a relatively simple task and provides an endless amount of benefit. The next step which is a bit more complicated, is get a way to run this in a local development environment.

Getting a local environment for this type of situation can certainly be tough, and you have to be prepared to accept what may be considered a “non-optimal” solution for it. Does your code have a bunch of hard coded credentials and URLs that you would accidentally hit 3rd party services or databases from local and cause problems? The answer to that is NOT to try and extract all those things, because that will take a ton of time and you have no test environment. Instead cut the container off from all internet access and add-in a proxy container to it and give it a container it can proxy through for outbound, then you can explicitly control what it can reach and what it can’t, now you can progressively fix the hard coded problems.

Basically the key is to accept that shooting for “ideal” here is a mistake, and you have to sneak the small progressive improvements alongside meeting the business goals that have been set for the team.

In my experience, if you can sneak some simpler but very impactful changes in, then demonstrate how those help deliver on things, it will be easier to get buy in. If you can point to be being able to deliver a feature weeks ahead of previous estimates and attribute it to say having a sane deployment strategy, or a local dev environment, the advantages become clearer from a business perspective. If you say “we need time to fix this” but have no data or concrete examples of how this helps the business, you won’t get buy in.


Talk with the team. Hear their pain points and propose source control and a test and dev environment.

If you are not managing them directly and they don't want to do those kind of things because it sounds hard or foreign, then you can't really do anything about it.


And get out fast.


You are looking at this for the wrong angle.

You have inherited working code generating revenue but in state which makes it hard to develop new features and manage productively.

As you say that the roadmap is agressive and management has no understanding of the situation, you have already established what you have to do: explain to management what makes development difficult (avoid statements like this is the worst and focus on what needs to be done to establish best practices and identify where the quick wins to gain development velocity are - more expertise and less judgement is always a good idea). Then you propose a realistic roadmap and start making the changes that need to be done.


First thing you gotta do is set up VCS of some kind, ideally git. Next, like others have said, get some sort of proxy/logging layer in front of EVERYTHING. Work from there to document all the different ways customers interact with the service and how that maps to various parts of the codebase. Once you have a shit ton of documentation/logs to reference, you/team should start rewriting things piece by piece and using the aforementioned proxy layer to duplicate requests as a test and then divert traffic over. This is one of those things where every move has to be precisely calculated, but there's still a way out. Good luck.


Quit and get a better job. I'm seriously.

Reading this my first thought was, "I hope you're getting well paid. I would triple my fees going in to this scenario."

Then you come to "HQ has no real understanding ... budget is really tight."

Life's too short. If you can do this job at all you can do it for someone who doesn't have their head up their fundament. Failure seems inevitable, but you don't have to be the captain of that sinking ship. Let it fail without you. I mean, this isn't the sole company keeping alive the small home town you grew up in? This isn't your family business that's been handed down for generations?


It's not clear from your post what your role really is (is it something like lead dev? or just a more opinionated member of the team?) but if you're not managing the team directly, then don't manage them. It's not your job and no-one likes that. If they wanted to make you the manager, they would have. And they didn't.

There's really only way to help improve a codebase / development process in a situation like this: one small incremental step after another, for a very very very long time. If you don't think you can enjoy that and have the patience to stay with the problem for a few years, consider looking for another job.


First, you absolutely have to ignore everybody here that tells you that this is a fine situation, that it's all about the business etc etc. You have to tell Senior management that they have created a Golden Goose, and while that it is admirable that it's laying golden eggs at the same time it's also playing Russian roulette.

You have to convince them that not only is this situation a drag reducing their future revenue, as they cannot develop it further with any speed, but it can also come crashing down catastrophically at any point in time.

It also sounds like the current team is not up for it you need more people and a dedicated project


Make the cultural changes needed, and the technical changes will flow through afterwards.

If you can't change the culture and get your boss(es) on board, then you will fail.

Right now, the business is likely "mostly happy" with things the way they are. They're getting their changes made (but not as quickly as they'd like). Their costs are low (3 junior devs, with just their laptops and a production server). Convince them that unless the changes you want are made, their business will become stagnant. Use phrases like "invest for future growth" and "protect the business' current investment in the product"


You already know…

Each question you have above should be solved in an order that makes sense.

Full rewrites do not make sense if you plan to put on hold the project. You have to make a greenfield space within the mud.

I recently did this, I inherited an Angular 1/Java project and someone had already hired my team they hired 6 React/Node Devs. They were JS devs but not angular. We just started embedding React in the Angular routes, also product team wanted a new design. So we had two themes, old and new at a certain point we were 80% there and made a push for the final 20%. Took 1.5 years to rewrite a FE e-commerce app.


Also don’t be an a*h**…

Be careful you don’t demotivate the team by complaining…

Use sentences like “We can do better” and remember that the past is the past, focus on the future, lead them forward.

They obviously do not know better and need your help. Teach them, be the leader they need and you will be so proud of the work you do together and the people you helped to do better.


Others have said that, but I'd like to put emphasis on getting everything in source control. If other developers don't know about source control (!) they will love it. I'd spawn my own local source control solution for my own control and after a few changes, show them the advantages.

Second: making a change without tests is like walking in the dark without a flashlight. Having tests is a very important thing.

Read Working Effectively With Legacy Code", by "Michael Feathers, one of the best books I've read that really can help in situations like that. In summary, it boils down to having tests to aid your changes you need to make.


Some of the things you listed are bad practices, some are bad outcomes. Change the bad practices (no version control, no beta environment, etc) and use them to prioritize and change the bad outcomes over time.

At least half of the stuff you listed will probably never change. Congrats! Being the senior person means becoming comfortable with people making objectively worse decisions than you would, and putting the structure and architecture in place so that it still works anyway. As a bonus, most of those “objectively worse” decisions can be really good and better suited for the team than your decisions would have been ;).



You need to develop an small app that will handle authentication/authorization. Next time a feature comes you will implement that page in the new stack, the rest of production pages will still run in the old code base .

That's it.

A concurrent small migration to the new system without changing all the system at once.

Why it works? New systems often fail to encapsulate all the complexity. also two systems, duplicate your workload until you decide to drop new system because of the first statement.

Finally, get stats from nginx and figure out which routes aren't used in a month, try disabled some and see how much dead routes you can find and clean


This post is effectively creative writing looking to probe the community for ideas and best practices, or potentially senior devs.

There's no 20m/yr 3 jr dev team, it's just to get the scene set for asking questions about "what if there was a bad code base that was making money, how would you bring it up to spec"

This community is great at offering advice and telling people how to do things the "better" way.

Posted on the weekend too so people that are having downtime on Sunday have enough time to reply. Sorry guys, if you don't get contacted, you haven't passed the first tech interview.


Don't do anything without buy in from mgmt. I'd come up with some sort of plan, give it 6 months and be prepared to bail if there is no action on your plan.

I was brought on board to 'modernize' a similar application. Almost a year later we haven't modernized anything... Despite a lot of promises from mgmt up front they have now gone into the 'if it's not broke don't fix it' mode.


Sounds like there's no source control... So you should just pick things off bit by bit. Don't rewrite, especially when you have no process to roll-back.

- Source Control

- CI/CD process

- Lock down production so there's no access

- Kill off dead code

- Start organizing and refactoring

etc...

Edit: Alot of people have already said the above. But I want to add.

Just because code sucks and is messy, obscure, has no structure or breaks everything we learn as developers that define 'good code' or 'good coding practices'... does not really mean it's bad if its generating the business money.

It can often be quite fun to work on because everything is a win, performance, cost reduction, easier maintence, etc.


Been there. Done that. Entire IT department had quit when I walked in. Saved the company.

Took a couple years to recover mentally from it. First off. Make sure whomever is in charge understands how screwed they are. Hiring and retaining staff in is a complete nightmare.

Get someone between your team and management.

Learn to say no to everything. Better yet. This is your go between job. Do not allow management access to any IT staff. They will destroy morale.

Support and slow cleanup is only work that is done for a year or more. No new work.

make sure people who had a part in decisions are gone. Otherwise your wasting your time.

End of day. Decide if your up for this.


I have been in similar situations.

Code can be fixed, but people sometimes can't be. You need to break down the "resistance to change" somehow. Trying to convince people can burn a lot of time and effort on its own. If you can't easily convince them, and you can't overrule them to dictate the direction, don't even bother.

You need people and you need budget. The business doesn't understand bad code, but you should find a way to make them feel fear. They have been drinking poison for years without feeling ill. Make them understand how easily the house of cards could come crashing down.


In my experience working for other people it normally only gets worse, not better. I don’t mean to say that you shouldn’t put in effort or try hard at work—what I mean is that broken or toxic environments tend to only get worse. People with the ability and the self-respect tend to leave if they see an interpersonal or a structural dumpster fire. I would find another job and I wouldn’t worry about your résumé having a short blip on it. You’d be surprised how many people have a similar story about starting at a company and immediately realizing it was a mistake.


Working in a similar company, they don't generate as much revenue but the codebase is that really messy mix of data and implementation details, imperatively brute-forced into a "stable" lucky build

I took the e2e approach, since making any changes is having that huge domino-effect of breaking everything else. I think it's really important to setup a proper build/e2e CI pipeline with instant Slack reports, and run this pipeline on every commit, from this point you can just add specs to fully cover it and then it can be released nightly without a fuss


Bad code doesn’t matter. It’s clearly making money in spite of the bad code. But engineer efficiency matters, especially if it’s blocking the future. So your goal should not be to rewrite anything, but to find ways to increase productivity. You’ll find that when viewed through that lens, some parts of the code and processes will have to change but other parts simply don’t need to be changed (even if it makes you cringe). The better bit though is that you’ll be more aligned with the business, whereas rewriting for the sake of improving bad code is not aligned with the business.


$20M/yr, 3 devs, "And post COVID, budget is really tight"

WHERE IS THE MONEY, LEBOWSKI?!!?!

Seriously, WHERE IS THE MONEY GOING? I'm all about keeping a tight small team, but where is the money going? Even paying for a manager to help them move in a direction and address business risk would be worth the investment.

If you really have that level of revenue, and only 3 devs, then you need to be looking at the risk of losing one of them.

All the tech debt is irrelevant, your focus should be on mitigating risk due to attrition/burnout/mistakes.

That being said, just getting a sane deployment process would be helpful.


I'd strongly advocate for what I call "The Boy Scout Campsite Approach to Code Improvement". Boy Scouts have a motto "Leave the Campsite Better than You Found It". What I'd do is:

* Pair program to teach people you work with that there is another way; they may simply not know any better. * Make any code you touch better; new / old it doesn't make any difference. Do it right. * Important: NEVER, EVER COMPROMISE ON THIS!!! Seriously you skip one time and it can all be downhill after that (sayeth the voice of regretted experience).


The thing makes over $20m a year, has only 3 junior folks for support / maintenance / development (with junior salaries presumably), and budget is tight?

Run. The problem here is empathically not on the technical side.


Similar experience so my advices are mix of Tech & Non-tech

1. Stop cribbing 2. Start using version control/git, Build Test/UAT environment 3. Upskill your team - As you mentioned your current team MUST have also inherited the code from someone else 4. Try tools like dead/junk code finder, lint etc 5. Try other refactoring tools and techniques 6. Most imp: Try to gain trust and Read 1.

A) Is current system stable [Understand it is messy!]? If it is stable there are ways and means to build/design/architech parallel future roadmap without adding more mess.


> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers.

> Resistance to change is huge.

These 2 quotes tell me they haven't yet recognized the grave danger and pain of their complexity. They will eventually, but for now neither management nor the team seem open to the radical change which they desperately need. Eventually collapse will come, but for now it's a no-win situation for you. Unless the money is insanely good and worth the stress, best path is to get the heck out.

What industry and type of business is this?


About this line: > This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

You should quit, there is no solution to that.


I've been there in the last 2 years, and we went for massive spaghetti ball to about 50% rewrite and 60% under tests.

I second most comments against the "full rewrite" here:

- source control it

- get a local environment if you can

- write tests before changing/deleting something

Adding tests can be hard at first. The book "Working Effectively With Legacy Code" by M Feather contains useful techniques and examples.

Be wary of the Chesteron's fence : "reforms should not be made until the reasoning behind the existing state of affairs is understood". Don't fix or remove something you don't understand.


I stayed 2 months on a 4 million LOC qt thing, it was quite awful. Like it was said, indeed, rewrite things slowly, or maybe rewrite the most essential parts.

Or just quit that job, it might not be worth staying there.

Resistance mean it is a situation of hostage taking: https://neilonsoftware.com/difficult-people-on-software-proj...

It's very serious and coders do this for job security. Don't accept this BS.


Here is a very good talk how to refactor safely: https://m.youtube.com/watch?v=-wYLmsizBc0

Rewriting is a wrong approach


1. Obviously add all code to version control then try to clone the site in a different environment to see if it fails

2. Start planting seeds with upper management explaining that kicking the can down the road wrt code quality is like skipping oil changes in your car. They may not like change but they’ll have a broken car if they don’t start taking small steps now.

3. Study domain driven design and software architecture, primarily loose coupling and reducing live dependencies. You’re about to become phenomenal at software architecture. Codescene.io may help.


0. Use GIT.

1. Add a staging environment.

2. Add a CICD.

3. Add tests. Start by easy ones (new features) to get team used to writing them. Then important flows. If your team doesn’t have the bandwidth hire a contractor. Ex: https://avantsoft.com.br

4. Choose a part of the code the warrants being the first to refactor. Balance easiness with importance.

5. Define a better structure/architecture for it.

6. Refactor.

7. Repeat from 4 as many times as needed.

Also, consider micro-services on new features… may be an alternative to full rewrite.


Go step by step. 1. Make buckets, identify standalone code (i.e. the one with no external interfaces including database read-writes) e.g. piece of code doing validations or some calculations. 2. Create a API for this with say AWS lambda functions or like. 3. Replace the old code/function with the API. 4. Take next function/piece of code and do it again. 5. In the next phase, involve functions having external interfaces such e.g. the one which is reading the databaes


If the code works...at least for now...don't mess with it. $20M p/yr is significant. Simply put, you don't know what lurks in the bowls (i.e., one refactoring could create more problems than it solves. The code can wait.

The team is another issue. That's where you need to make an immediate impact. But give them each the benefit of the doubt. Start by speaking with them individually. Then as a team. Establish a relationship(s). And then nudge by nudge make changes, changes to culture, workflow, coding standards, etc.


Epic fail.

The way to fix things involving people is through something called leadership. That means you need to double down on your soft skills and you need the explicit support of management. If you hope a framework will do this for you then you are just as broken as that you wished were fixed.

Train your team, set high standards, and focus on automation (not tools, not frameworks). This is a tremendous amount of work outside of product. If you aren’t willing to invest the necessary extra effort you don’t seem to care that it’s fixed.


> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

Quit.

Find a job where management has half a clue and is reasonable.


Let me rearrange some of your points:

> - it runs on PHP

> - it doesn't use composer or any dependency management. It's all require_once.

Great --- explicit dependencies are better than magic. Personally, I'm a fan of require rather than require_once, because of some history, but require_once is mostly fine.

> - it doesn't use any framework

> - no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.

This is the proper way to run PHP. Can you imagine if they used frameworks? It'd be a slow mess, with about 70 different frameworks. At least this is likely a bare metal, fast mess.

> - this code generates more than 20 million dollars a year of revenue

> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

So you've got 3 junior people managing 20M of revenue

> - productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

> I have to find a strategy to fix this development team without managing them directly.

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

HQ doesn't understand the process, can't even budget a manger, because apparently it's not your job to manage them. I'd bet their requirements are unclear and poorly communicated too.

> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

Great, the routing is one place!

> - no caching ( but there is memcached but only used for sessions ...)

Do you actually need caching? You didn't say anything about the performance, so I'm guessing not.

> - In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...

Curl to the same server port is a bad pattern; yeah. Localhost or domain name doesn't make it better or worse. Figure out how to make those a call to a backend service maybe? Are you also saying this is running on a single machine (I think you are, but you didn't mention it)

> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

Ok, check in what you have, and make a deployment procedure that doesn't suck, and set things up so you have to use the deployment procedure.

> - no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.

If you can, run profiling on the production site to see what code appears to be dead code, and run down the list.

> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

Depending on the size and volume of the database and the operational requirements, this is kind of what you need to do. Do you have anyone with operational database experience who could help them consolidate tables, if that's what's really required? Is the database a bottleneck? You didn't say that, you just said you didn't like it. There's ways to add columns and migrate data, but it requires either downtime or a flexible replication system and some know-how. Consolidating the tables without at least write downtime is going to be a lot more challenging than if they had the opportunity to add columns at the right time... of course, sometimes having tables with a join is the right thing to do anyway.

Is there budget for a staging system, complete with enough database instances to test a data migration and time to do it? Maybe focus on developing a plan for future column additions rather than trying to clean up the current mess.

> - JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.

jQuery is pretty compatible right? You can make a list of all the pages and all the versions and maybe make time to test updating the pages with the oldest versions to newer versions, etc. Again, a staging system would help with testing. Developing a testing plan and running the tests is something that doesn't require much from the three overworked developers, but could be offloaded to a manager.


I do like it when someone else saves me a lot of typing. Very much agree with all this.

(Obviously the real problems are political, but ignoring that...)

Seems to me that after it's in source control and a dev/staging system exists, the next step is to add in a data access layer - move all the raw SQL etc out of the main codebase into either new PHP code or a web service. Then add a bunch of logging so it's possible to discover what parts of the system actually get used. The data layer can then get useful test coverage, allowing the DB to be safely rearranged. The next step is to treat the rest of the PHP app as a black box and write tests around it with something like Selenium, and the work of replacing it with some other boring but more modern technology bit by bit can begin.


As long as you don't plan for a full rewrite all at once the technical part is not the hardest aspect, it's even the best part!

> fix this development team without managing them directly

This is the worrying part. If you're not their manager, or at least the technical lead dev it's a lost cause. Because you need to laid a plan and have complete buying from management.

There's almost no realistic salary that can make it for working on (I presume) PHP 5 and this codebase forever and the effect on your career future prospects.


Nobody mentions Monitoring.

Ensure the beast is monitored, like staring with the basics, cpu, disk space and so on.

Then all goes to version control. Then changes can not be done in production, you need cicd, just build one step at a time.

Do not aim for perfection, just concentrate on having a framework(mentality) of continuous improvement.

You been given the opportunity of testing all your skills in a thing that "works" (makes money), you just need to find the metrics of where the money comes from and how to maximise it.

Pareto principle can be of help when making decisions.


You won't be able to fix it unless you get business to see it as a blocker. So that's the first task.

Second task is to come up with a plan to your refactor. Break it down with time estimates, etc.


I love this! Others have pointed out already how to untangle the mess, so I'd like to point out one important fact. You can generate 20M in revenue with absolute shit Code. Yet a lot of devs over engineer software that never generates a single dollar in revenue. Sure, there's not just black and white. It's definitely a nightmare. But I think a lot of code is "too good". Always make sure that the code paths you drive to perfection are actually generating value for the user.


Start using version control (git) and put a (probably very simple) CI/CD pipeline in control of deployments. The pipeline might just sftp the files from git to the server; that’s okay. The point is production deployments cannot happen outside of fit commits. (Change server creds if necessary so this cannot be bypassed).

Doing this will put a safety net in place enabling rollbacks, introduce the team to version control, and give you a beachhead for automated testing in the pipeline.

Definitely DO NOT do a rewrite.


1) add source control and put a deploy system in place (start with manual steps, then automate what make sense

2) depending on the size of your db, you may want to just go with a shared dev db.

So now you can fix and enhance things in dev

3) add in a modern web framework. Depends on your app but I would go on something like Symfony: same language, can integrate old stuff you don’t want to rewrite yet.

4) Slowly and steadily migrate your routes to the new framework based on the new requirements

Last point is key, it is very likely to miss crucial logic hidden in existing code.


As somebody that inherited a similar mess, spent 5 years of their life on it, made a lot of progress, but still didn’t fully “fix it” in the end, let me caution you a bit before you embark on this quest.

Consider the opportunity cost of cleaning up this mess. Consider the years of your life spent. The impact to your career. The stress.

In my opinion, unless the compensation is legendary OR this is something you feel very strongly about taking on, you might consider taking a different and more fulfilling role.


You have to fire them. Hire replacements first, then fire them. No matter what you do they will think they are smarter than you because they built a successful product. They haven't seen anything other way of doing anything. If you don't get rid of these idiots you will never be able to fix anything, they are going to go around you to management and blame every single issue that comes uo on you and your changes, and eventually you are the one that is getting fired.


Such a great opportunity to get in as an engineer and make a difference. The good news is the code is running in production and generates revenue! From here you can start making small incremental changes. I would start in this order.

1. Make it script version controlled. 2. modularise the code. This will help in understanding the structure. 3. Add the dependency management. 4. improve the code deployment process. CI/CD etc.


Some of the things the poster lists is just jaw dropping, but this is the most painful one.

> - it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

First step would be to get that into source control.

> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

This might be a benefit actually. I'd just start a new application and route to the new code one-by-one using the Strangler approach.


Didn't you hear "Twitter" develops on production too, with no staging env.


Two things stand out.

That’s a very profitable business off 3 junior devs so there is money for more, senior people.

The junior devs can’t possible like working like this - it will be through necessity and fear they push back on you. Ask them what they think could be done to improve things and start there. Remove the fear of change.

If they are just being protective and won’t accommodate change then replace the most influential one with someone more senior once the team can cope with the loss.


That does look like technical bankruptcy, however rewrites of large projects almost always fail (especially without management buy-in and feature-freezes)

A strategy you can use is to incorporate any refactor into the estimates for a "new feature" development with the idea being that if you have to touch this part of the codebase that it gets refactored.

In this case since there's no framework I suggest to have a framework gradually take over the functionality of the monolith and the fact all the routes are in nginx will actually help you here because you can just redirect the route to the new framework when the functionality is refactored and ported into the new framework.

Do not refactor the database as interoperability between the legacy project and the new project can fail although migrations should be executed in the new project.

What I do suggest is to get development, staging, pre-production and production environments going because you will have to write a lot of pure selenium tests to validate that you didn't break important features and that you did correctly recreate/support the expected functionality.

You can run these validation tests against a pre-production environment with a copy of production. This also gives you feedback if your migrations worked.

On the team, that's the hard part. If they walk out on you, you will lose all context of how this thing worked.

As precaution, get them to record a lot of video walkthroughs of the code as documentation and keep them on maintaining the old project while you educate them on how to work in the new system. The video walkthroughs will be around forever and is a good training base for new senior devs you bring in.

Last, make sure you have good analytics (amplitude for example) so you know which features are actually used. Features that nobody uses can just be deleted.

Over time, you will have ported all the functionality that mattered to the new project and feature development in the new project will go much faster (balancing out the time lost refactoring).

A business making 20 million/year should be able to afford a proper dev-team though, what are they doing with all that money?

You should be able to get budget for a team of 5 seniors and leave the juniors on maintenance of the old system.


I know that in this era we no longer recommend books, but, get the book “Working with Legacy Code” by Michael Feathers for ideas and approaches here. It’s hugely helpful. It won’t solve all the problems (that’s on you) but it will show you starting points, give you hope, and show lights at the end of the tunnel. It’s a bit focused on testing and moving to TDD, but don’t make the mistake of thinking that’s all it offers.


I would send the management a link to this post. It contains a lot of good quality, honest opinions so they could understand the scale of the situation.


You can't fix the code until you fix the team. Do they trust each other? Are they capable / comfortable working together? Is there a process in place that works and they all understand? Are there skills deficiencies? Is the culture bad? Are they afraid to make or admit mistakes due to blame culture? You can't fix anything you fear to acknowledge.

Get the team functioning well, then improve the code.


Doesn’t sound like the worst code really; sounds like the average older php codebase I run into. I maintain (inherited) products that are over 15 years old and I find it enjoyable. I would be able to slowly move this thing to modern standards without rewriting or breaking anything; been doing that on very large php projects for a decade. Probably doesn’t need a rewrite, just see it as a bonsai tree.


Try to "fix", or at least go to the bottom of people issues first.

For example, not having version control was already unacceptable 12 years ago. Someone on the team must be strongly opposed to it. Find why. If no-one is against it, just set it up yourself. If it's management, and you know it's not going to change, find some other management to work for.

Rince and repeat for all the low hanging fruits.

After that... Good luck.


Get the hell away from that project asap. From a technical standpoint it's fine, and will be a lot of work.

However the lack of budget and that the management has an 'aggressive roadmap' says that the management team is toxic, ill-informed, negligent, and ignorant.

Your mental heath takes priority. Get the fuck away from that tyrefire of a company


I’ve been in this exact situation, and honestly - just move on.

I mean, if you see this a fantastic opportunity to grow or whatever then fine, have at it.

However, you’re going to be fighting a two-front battle, both against the devs and against management, for widely different reasons. It’s going to take a toll on you.

Ask yourself if you really want to spend the next few years doing work you probably won’t see any recognition for.


If you don't manage them directly that sounds like you don't have authority.

Without the authority to make changes you this will be very hard to do, given the scope of changes required. Soft skills and influence works up to a point but given your remarks about resistance to change this is a big challenge.

You need to ask for the proper remit and authority, or decline and move onto another project or job.


Like several others have posted here you are in a great position to make some very simple changes that will have big results.

If you're looking for reading resources I found "Working Effectively with Legacy Code" by Michael Feathers to be very useful helping me build a plan. Yes it's an older book but that helped me appreciate this is not a new problem.


You may create a new system from scratch and write the new features there, while temporarily leaving the mess where it is.

The team will be able to try out how good programming can be and perhaps support you more. From there you should gradually move the old features in the new system. Even if you were to never fully complete the refactoring the situation would be much better.


Haha that's what a life of contract developers is! I seen all the same for so many times - minus the $20M of annual revenue.


Is it doing anything novel? What functionality does it do above and beyond CRUD?

If the answer is no, pick a common MVC framework like Django or Rails or Adonis, generate the models from a copy of your database, and make a minimal proof of concept.

This will go a lot longer way than just complaining about how bad everything is, and how everything needs a rewrite.


I have a slightly different take than most other commenters. Perhaps you won't like what I write, since it goes against some widely held beliefs.

If the code generate 20 mio revenue, then it is very successful code. It might be ugly, but clearly something works right. You say "the mess is just too huge to be able to build anything" - nevertheless these three juniors have managed to build something with great business value. Most likely they are more productive as measured in revenue pr development effort, than most of the experts giving you advice in this comment section. The worst code is code which doesn't work or doesn't fulfill its purpose - regardless of how many patterns and best practices it implements.

The dirty secret in software development is most advice and "best practices" have no empirical basis. If "bad" code is highly successful, is it really bad? If theory does not match reality, is it reality that is wrong?

So before you try to change everything, you should eat a bit of humble-pie and try to understand how the code became successful in the first place. Otherwise you very easily throw the baby out with the bathwater.

For example:

> it doesn't use composer or any dependency management. It's all require_once

I'm not familiar with PHP patterns, but I would venture a guess that this "require_once" pattern is also the simplest? If you talk to real seasoned experts, they will harp on "keep it simple", while complex patterns are often being pushed by sophomores and consultants.

> no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.

Perhaps, but this is actually reminiscent to the open/closed principle, part of the SOLID framework, which at least at one point was considered best practice: Improve code by adding and extending, not rewriting working code already in use.

> no MVC pattern of course, or whatever pattern.

Great! Patterns are an antipattern. Or slightly less flippant: Patterns are not a sign of quality or a goal in themselves. Patterns are solution to problems, so only appropriate if you have that problem in the first place.

Bottom line: You might learn a lot from working on this project.

> Resistance to change is huge.

I can understand that, if they have built something highly successful, and now you waltz in and declare that they are doing everything wrong because they are not using enough patterns.

You are right about source control though.


If the code, whatever state it's in, generates 20 million in revenue, you're going to have a hard time arguing that it needs to change, especially with only 3 people; that's pretty damn good.


I think something a lot of commenters are missing is that people who have worked like this for a long time are often massively resistant to using source control, even after having it explained to them.

Even getting that process to stick properly ("Step 1") will be a challenge, never mind resolving the other 10 complaints in OP's list.


Have you read https://www.martinfowler.com/books/refactoring.html ? It's a pretty good guide and it's very useful to have a reference / authority to point to when asking a team to do something.


Agree about what most people are saying about a full code rewrite, also about coming in as a new manager.

Start building a wiki and get knowledge from your team - they built everything in the first place. Embrace what they know and go from there.

Have you heard of Swimm for knowledge base/wikis/documentation?


> And post COVID, budget is really tight.

> this code generates more than 20 million dollars a year of revenue

Budget is probably not as tight as you think


Revenue is quite different from profit.


That doesn’t change anything I said.


There are 4,000 code teams scattered across the country right now asking themselves "is this us?"


You are thinking like an engineer and not a business person. A three person team that generates 20mm/yr is a huge success.

Your job isn't to fix the technical mess, but rather not kill the product. As an owner I wouldn't care how fast features can be released if my revenue started to drop.


I think this can only be solved by some kind of consultant, who is not part of the company and can talk frankly about the issues here. As programmers we see everything as a technical problem, but in your case it's more than that. It's unwillingness to change and blindness to the actual situation.


There’s a lot of sane suggestions here, but one thing I’m not grokking: what is your role here? You say you don’t manage this team in your post. In what sense have you inherited this responsibility, and what levers do you have to set direction for the project or people involved.


Before you upload it all to Github etc, for folks recommending that...

Keep in mind that there may be passwords / keys in the spaghetti etc...

iykyk...


Lots of great advice in this thread, heed it.

I would just add -- embrace the challenge. It actually sounds like a fun problem. After many years in tech, I've learned that I'd rather work on improving a pile of shit codebase that produces a lot of value than a pristine perfect codebase that does not.


> with no source control

This is bad.

> It's all require_once. > it doesn't use any framework

This is not necessarily bad.

> no code has ever been deleted

This is bad

> Multiple versions of jQuery

This is bad

> a full rewrite is necessary, but how to balance it?

You never need to fully rewrite something. You can always take an incremental approach. If the code and 3 people generate 20 million dollars of revenue (on their own? or with massive sales support? what is the cost of goods?) then it's got to be doing something right.

I'd start with the source control. Just check everything in, so it's easy to go back. Do that on the server they develop on, even. (But have a script that pushes the checked-in code to offsite.)

Second, make it possible to spin up a second instance of the same application, in some automated fashion, out of bare source control. This may mean dumping schemas and checking them in, and probably figuring out what data in the database are "necessary configuration" versus "user payload data."

Then, you can initiate integration testing on top of the second cluster. You can also turn this into some kind of local sandbox development setup.

Once that is done, you may be able to change the code quicker, because you can do and test it locally, and perhaps have some acceptance tests on top of the mudball. At that point, you can switch over to doing development locally and deploying (and, ideally, having the ability to un-deploy.)

After that, starting to clean up should at least be possible with less risk, because you can test it in isolation. You can then start pulling on threads in the code, such as standardizing library versions, detecting and deleting unused code, putting like modules together, and so on.

You don't need fancy tools for managing this, shell scripts and command-line git are probably plenty enough. Resist the temptation to spend six months engineering the build system of the future!

Of course a lot will depend on details, but from your brief description, this sounds like the path forward -- focus on making it possible and safe and cheap to iterate, and then you can get on with actual iteration. Don't waste time on big rewrites; instead do things incrementally. Don't believe that any one tool will save the day, because it won't.


Many comments are permutations of different suggestions do here's mine:

Get the business to buy into "fixing" this before doing anything. Convince them to hire more, sounds like the current team is already swamped.

If you don't get business buy in, it may be the wrong place for you.


No you don't need a full rewrite. You said it already: this code generates more than 20 million dollars a year of revenue (reality: it's probably not just the code generating that revenue).

You need to introduce things bit by bit to convince the team. Start with version control.


One word: quit.

Any experience you gain from improving this situation won’t benefit you in a future job change. The team will resent you for rocking the boat and implying their code sucks. Management won’t care and will fight anything that puts revenue at risk (rightfully so).


Fire the front/backend developers and hire the best php full stack you can. Start a new php framework project and move code over. 2003 php code is straightforward and easy to port. If you try to move to a new cool language you will lose whatever is special now.


Have this team do read-only deep dives in the system and document what they find, highlight any potential opportunities to improve the system (without edit), and have them form a plan for replacement or migration to something that you have a new team build.


Sounds like everything works just fine. Nothing to worry about, as long as there are no dependencies it will just continue to work. Add some more code and see if the millions continue to flow in. Logs and tests sounds fun.

index-new_2022-test-whattodochange_v1.php


> team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

I did not get this, if it is three people who are juniors, how do they resist any changes.

Since it is only three, could you get to hire someone senior and start untangling?


There is nothing wrong with baremetal 2003 style PHP. It is NOT a compiled language. Every line of code in PHP translates to one or more lines of Machine code.

As such a framework in PHP is counter intuitive and will only slow things down more.


If management aren't on board, and have an aggressive road map, and it's pulling in $20mil, it deserves to fail. Their on borrowed time, and likely they're the reason for the current situation. Run like the wind.


It’s your career etc. but as a well meaning random stranger:

>> I know a full rewrite is necessary

Rewrite it in rust! /s

You’re most likely focussing on the wrong thing here. The tech doesn’t matter. It’s a business, this bit matters:

>> this code generates more than 20 million dollars a year of revenue

You need to be able to quantify which lines of code you’re going to change to increase that 20 number to something higher, or at the very least, increase the amount of that the business gets to keep rather than burn on costs.

This might sound like a hard problem at first glance but it’s really not.

>> This business unit has a pretty aggressive roadmap

This is a positive. To be clear the worst case is an apathetic business unit. This is huge, you’re already ahead. People want things from you so you’re free to exchange what they want for what you need. Think of other business units as part of your workforce, what can they do to help you?

>> management and HQ has no real understanding of these blockers

Yeah that’s the way it is and it’s totally ok, management doesn’t fully appreciate the blockers impacting the HR unit or plant maintenance or purchasing or customer service or etc etc but they DO NEED to know from you the problems you can see that they care about.

That means issues about how code quality are problematic are out of scope but informing management that your team are going to continue to be slow for now are in scope.

Issues about developing in production are out. Issues about your working practice is unsafe and we have a high risk of breaking the revenue stream unexpectedly over the coming weeks and months, that’s in scope for being communicated. At the same time, take them through the high level of your mitigation plan. Use neon lights to point out the levers they can pull for you, e.g. we need SAAS product X at a cost of $$$ for the next year to help us deliver Y $$$ in return.

For every strategic piece of work you line up, be clear on how many $$$ it’s going to unlock.

Be clear on how you personally can fail here. Transparency and doing what you say you will go a long way.

Practice saying no.

You’re an unknown quantity to them so get ahead of that. For example, make it so you’re always first to tell the other units when the product has broken for a customer, rather than customer service telling you about a support ticket that just came in.


no code is ever deleted. things are just added.

Yeah, we hate that. On the one hand, it's impossible to build off a shaky foundation. On the other hand, software quality rarely correlates with revenue. That why we call it work?


If this is an impression of the butt-hurt individual defending his work, then bravo because this is pretty funny.


Maybe pessimistic: leave and find a place to work on something you enjoy. If you stay you will be left fighting everyone and doing all the grunt work by yourself whilst others keep pushing more code in for you to cleanup.


i would suggest you take some money and hire someone from the outside to tell management and the devs exactly that. They will listen because it is coming from the outside. Then after that you can decide whether to change something or not, Does not matter what. personally i would also go for the low hanging fruits first: - backups of the database - git push all the code which lives on production

rewrites are super dangerous, double if the team is junior then they would need to double all the features, migrate, develop the skills they lack now, otherwise same mess in the end with a new framework and so on.


The challenge is quite interesting IMHO but the fact that the budget is tight and the team is a tiny mess of junior people with "resistance to change" makes me see the burnout would be around the corner.


Instead of a full rewrite, have you considered just continuing the shitfest?


As others, I would not recommend a full rewrite.

But first, I'd take apart some assumtptions:

- this code generates more than 20 million dollars a year of revenue

This is GOOD! Thsi means this project is important. There WILL be budget for this.

You need to find out two things:

1. What is the PROFIT margin? 2. HOW does this generate revenue?

If you can increase profits (or promise to) by either making it easier to generate more revenue (onboarding woes of new customers / sales UI, etc) or a bigger margin, you'll be golden.

- it runs on PHP

This is not necessarily bad, check ylour code wars at the entrance.

- it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

THIS is one of the things that need to be remedied - source control NOW. Get a professional course on git for all devs, add some nice dinner for teambuilding.

- it doesn't use composer or any dependency management. It's all require_once.

This needs to be adressed.

- it doesn't use any framework

This needs to be adressed.

- the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

This needs to be adressed!!!!!!!

- no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.

Source control should take care of some. The rest is on you sitting with the product manager to find out WHAT is the core, what needs to be removed, what features need to be kept.

(bunch of horrible code smells that ALL need to be adressed.)

- team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

OK... if the team is junior, you need to lead.

And WTF is an iOS / Android developer doing in your team?

- productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers.

YOu need to sit with business and make them understand the blockers, the challenges, the timelines, AND the opportunities that your solutins will bring.

And post COVID, budget is really tight.

No it is not. See above.


The only thing you must do IMO is detect and resolve any possibly security issues or SQL injections. Everything else doesn’t need to be touched until it becomes incompatible with business needs.


Not addressing the larger questions but first obvious thing (to me) is to get this into git/version-control asap and put in place a method of deployment so no one touches production.


Write e2e tests befores you change or consider refactoring at all.


Seeing a lot of people say that a rewrite is a terrible idea, but (as someone who doesn’t understand why) I’d love to hear a more fleshed out explanation re: why exactly that’d be a bad idea.


Rewrites frequently fail or go massively over budget/schedule.

It can be difficult to fully replicate the existing system and there are frequently important but subtle reasons why the existing system has the architecture it does.

To the extent that one can make modular changes and address the most-important pain-points, one probably should.

Sometimes a complete rewrite is a better choice, but if embarking on that path, a fail-fast attempt at an MVP might be the right style to do so. If the MVP crushes the existing system in performance/benefits, then subsequent iterative development may yield a viable re-written replacement system.


having seen several rewrite attempts in my career, none of which were fully successful, here are some thoughts:

- The ultimate reason: it will take too long and be over budget. The business will (rightfully) ask why should they invest x amount of capital just to get essentially the same feature set back. Businesses do not care about whats under the hood.

And here is why: - the rewriter team usually does not fully understand edge/corner cases that the current mess handles, but obscures it.

- the rewrite inevitably ends up following the same patters that the original did leading to unusual/weird cases

- rewrite teams get too ambitious and attempt to over abstract and over engineer, eventually creating another mess understood by only them


A rewrite is likely to take a lot of time, and if you're not careful, it's easy to end up running two systems rather than one at the end: the new one that doesn't quite do everything, and the old system that still does some important things.

In addition, if you don't change the development conditions, you're likely to end up with a similar mess at the end. Sometimes, code is messy because you didn't know what you were doing when you started and a rewrite could help; but sometimes code is messy because the requirements are messy and change rather a lot --- a rewrite can't help much with that.

That doesn't mean never do a rewrite, but you've got to have a pretty good reason, and it sure helps to have an incremental plan so that you don't end up with two systems and so that you start seeing the fruits of your labor quickly.





The place to start is version control, a dev environment and a CI process. Only then can you start to tackle the code. Priority number one is to have some control over the whole thing.


I think you have to be prepared to wait 4 years to build up the political capital to even suggest some of the massive changes you’d be advocating.

Or at least, have the team listen.


Replace these 3 "junior" developers generating 20m revenue with a hand picked team of kubernate bros and you'll be -2.5m in the hole.



so, what's the problem of the code base business-wise?


"Business team have an aggressive road map" and "productivity is abysmally low".


there is no logical connection from "aggressive road map" to a messy code base. there are lots of ways to solve the problem of "productivity is absymally low", e.g. training, coding guidelines, given the team are all juniors. without an objective analysis, the problem you can see is always the phenomena, not the root. it appears to me more like an internal power struggle OP wants to win, than a real tech problem he/she wants to address.


Maybe this will help: http://laputan.org/mud/mud.html


I would check out low-code tools. They are insanely powerful now. For the web use www.weweb.io and for mobile use www.flutterflow.io.



My experience with dumpster fire legacy systems is to at least ensure you have proper backups in place so you can roll back if the worst happens.


Show this thread to management

Create a monorepo

Design and setup new architecture to use going forward

Allocate x%-time to write tests and port old stuff over time (continuous weekly process)


Refactor-driven development assume next steps: 1. Add VCS 2. Cover all working code with tests 3. Start changing code in very small steps


Sounds like Canada Computers. I'd start by introducing source control and tests. Those are low cost with a high impact on stability.


How are you "inheriting" this is you don't manage it directly? Why is it your responsibility to "fix" it?


I'd view it as 12 years of experience and data gathering upon which to build a really stellar rebuild with all the lessons learned ;)


It sounds like you're overwhelmed. A rewrite is definitely not necessary. You will never get there and you will make it worse.


Cover in acceptance tests using something like Selenium and modify things slowly, making test seams to add more tests.


> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

With good reasons I might say :).


I think you have to balance your own mental health with your desire to help a company that doesn't want to be helped.


The trouble with these efforts is when you're done its bigger, slower and was expensive.

Looks like the bow wave here has swamped the boat.


Don't. Just leave. To each, their own mess.


Are you looking to hire at all? Seems like a worthy challenge.

There have been some good great replies here and I agree step one is version control


Start with source control, you can get that done in an afternoon and it could save you 20 million dollars a year.


You don't have a software problem, you have a people problem.

You have some directly measurable consequences of the underlying issues, as well as some obvious risks that are generally being ignored. Start with those:

1. Productivity is abysmal. Measure the time to implement a feature to get a feel for how long things actually take. How long does it take a feature from being requested by management to being released?

2. Unstated, but I'm guessing that release quality / safety is generally low. (due to lack of testing / staging / devops / source control). Measure this by looking at whatever system of bugs you can get (even if that's just show me all the emails about the system in the last year).

3. An aggressive roadmap. You're going to have to find some balance and negotiate this. If you happen to find a way to make the software better, but don't deliver any value to the business, you've failed. Learning this developer failure mode the hard way kinda sucks as it's usually terminal.

4. Resistance to change is huge. The team have so far been successful in delivering the software, and their best alternative to changing what they're doing for something else might just be to quit and do that something else somewhere else. What incentive do they have instead to change what they're doing here? This likely involves spending time and money on up-skilling. You've identified a bunch of areas that could be useful, now you've gotta work out how to make that change. E.g. actual time to attend paid courses during work hours on how and why to use git. You mentioned budget issues, but it's worth considering this old homily:

> CFO: "What happens if we spend money training our people and then they leave?"

> CEO: "What happens if we don't and they stay?"

5. You can see a bunch of risks, and the team knows them too. Right now, the team probably mitigates them informally with practices learnt from experience. (E.g. the add a new table with a join approach). Because the risks are adequately mitigated in their minds, there really isn't a problem. You're the problem for not seeing their mitigations. That said, by taking the approach of getting the team involved in risk planning, you may see them reevaluate those approaches and come to some opinions about what they need (i.e. source control, tests, devops, etc.)

6. Your people problem is such that you're going to have to convince the existing team to accept that they made mistakes. However you do that you're asking the team to reevaluate their output as a success and instead accept that they are failing. This might be the hardest part of any of this. To do so is going to take untangling the team's identity from their output. If you don't have the soft skills to do this, you'll need a mentor or stakeholder that can help you develop these. You will fail if you don't accept this.

7. Lastly, you're fighting against one of Einstein's quotes "We cannot solve our problems with the same thinking we used when we created them". Are you sure you can fix the problems created by the team, using only the members of the team? Unless you can change their thinking significantly, or add more people with different thinking (yourself and one more developer), then you're bound to fail.

I'd echo a bunch of jeremymcanally's comments below [1]

On the technical sides:

1. Buy each developer a copy of "Working Effectively with Legacy Code" by Michael Feathers [2]. Book club a couple of chapters a week. Allocate actual work time to read it and discuss. Buy them lunch if you have to. The ROI of $100 of food a week and several hours of study would be huge. Follow this up with "Release It!" by Michael Nygard [3].

2. Don't rewrite, use the strangler fig pattern [4] to rewrite in place. Others in this post have referred to this as Ship of Theseus, which is similar (but different enough). Spend some time finding some good youtube videos / other materials that go a bit deeper on this approach.

3. In the very short term, try to limit the amount of big changes you're bringing at once. Perhaps the most important thing to tackle is how each page hits the DB (i.e. stand up an API or service layer). If you try to change too many things at once, you end up with too many moving pieces. Once the impact of the first thing is really bedded in and accepted, you've earned enough trust to do more.

4. Stop looking at the symptoms as bad, instead always talk in terms of impact. By doing this you ensure that you're not jumping to a solution before examining whether the issue is as big as it seems, and you acknowledge that each suboptimal technology choice has real business level effect. E.g.:

- Lack of dependency management isn't bad, the problems it causes are the real issue (spaghetti code, highly coupled implementations, etc.). The business values predictability in implementation estimates.

- Lack of source control isn't bad, not being able to understand why a change was made is the real problem. The business values delivering the correct implementation to production.

- Lack of automated testing isn't bad, but spending time on non-repeatable tasks is a problem. The business values delivering bug free software in a reasonable time.

- Lack of caching isn't a problem, but users having to wait 30 seconds for some result might be (or might not if it's something done infrequently). The business values its users time as satisfied users sell more product.

[1]: https://news.ycombinator.com/item?id=32883823

[2]: https://www.oreilly.com/library/view/working-effectively-wit...

[3]: https://pragprog.com/titles/mnee2/release-it-second-edition/

[4]: https://martinfowler.com/bliki/StranglerFigApplication.html


Start with getting your source control and deployment in order. If you have to, lock down production so that the only way to deploy is via a checkin. Then fix the rest of the ops and get all the configs into source control, especially the NGInX config. Make sure memcache is set up for scaling later.

Then start in on the code. Start by writing some basic tests (you'll probably have to do this as a series of curl commands because it's unlikely the interfaces are clean enough to do it any other way). You'll need the tests to make sure everything else you do doesn't break major functionality.

Then do the easy stuff first. Fix the parts that curl itself and make it a real API call. Fix the dependency management. Compress the NGInX file by eliminating whatever rewrites you can by adding routing into the code. Test often, deploy often.

Enable tracing to figure out what code can be safely deleted. See if you can find old versions sitting around and do diffs.

Replace all the code that accesses the data store with a data access layer. Once you've done that, you can bring up a new data store with a proper schema. Make the data access layer write to the new data store and do queries by joining the old and new as necessary. If possible have the data access layer write any data it reads from the old data store into the new one after it serves the request, and read first from the new data store. Log how often you have to read from the old data store. In theory this will go down over time. Once there isn't a lot of reads from the old data store, write a program that runs in the background migrating the remaining data.

Most likely you can do all of that without anyone really noticing, other than teaching them a new way to write code by doing a checkin instead of in production. Also you'll have to teach them to use the data access layer instead of directly going to the data store.

After you've done all that, don't try and rewrite the code. Spin up a new service that does some small part of the code, and build it properly with frameworks and libraries and dependency management and whatever else makes sense. Change the main code to call your service, then delete the code in the main service and replace with a comment of where to find the new code. Maybe if no one else is working on that service they won't notice. Make sure new functionality goes in the new service with all the dependency management and such.

Keep doing that with small parts of the code by either adding into the new service or spinning up new micro services, whichever way you think is best. Ideally do this in the order of how often each function is called (you still have tracing on right?). Eventually most of the important stuff will be moved, and then you can decide if you want to bother moving the rest.

Hopefully by then you'll have a much better velocity on the most important stuff.


This sounds like a dream to fix.

I'd start by small incremental changes. A big change will be resisted.

Deployments first, separate environment next etc


I've not yet seen any comments along the lines of ask your team what they think is wrong and what can be improved


Remember that if any changes you implement impact that $20M in revenue, you will be blamed, not the code.


Hard to say from a far, but having a v2 mvp totally from scratch might actually be better in this instance.


Currently 15months into a similar situation. Successful product with year on year revenue growth. Key lessons learned:

- You need to get an understanding of why things are the way they are. Team of 3 people seems small. Is the team always in firefighting mode due to business constantly dropping things in their lap. - Do not attempt a full rewrite. Here be dragons & krakens. - One of the first things to do is to get your code into source control before you do anything else. That gives you insight into how often the code changes and in what way it changes. - The routing, templating, caching, curl requests, dependency management issues all stem from the no framework issue. - You are going to face varying levels of resistance. Part of that is going to be from the business side of things

My suggestions:

- You need to get management to understand the problems and on board with reform as soon as possible. Avoid framing the issues as technical problems. Explain the potential risks to bottom line resulting from business continuity failure or regulatory/compliance failure (esp if your industry is health/finance/insurance). If management is not onboard, your reforms are very likely going to be dead in the water. Might be best to cut your losses. - Get your code as is into git asap. - You will need more hands. At the very least, you need a senior who can help hammer things into a structured pattern that the juniors can follow. - Carrot is going to be much more effective for convincing your devs to adapt to new changes. Understand their pain points and make sure to frame things as not questioning their competence. The understanding needs to be that their time is valuable and should be spent on this that deliver the most value to them and to the business. - Business unit needs to rework their aggressive roadmap. I suspect there's an element of 'we always have delays in releasing so we need to keep the pressure up on developers to keep momentum up". You need some kind of process in place for managing roadmaps (We're currently working our way towards scrum across the business. It's difficult but persistence even in failure is important). - We've attempted rewrites of one of products. It took much longer than we planned (currently still in progress). What we're currently doing is using laravel as a front end to the legacy apps (laravel receives the request and passes it on to the legacy app in the same request) It is working well so far and has the advantage of allowing us to use laravel's tools (query builder, eloquent, views etc) in the legacy app. Then we can progressively modernize the legacy functionality and move it fully into laravel.

Also, remember to breathe and take a break now and then. Wishing you good luck. If you want to talk more or just vent, hit me up at voltageek [at] gmail.com.


Why are running on PHP or not using any framework are among the items for being 'worst'.


In a situation like this, a full rewrite is almost always tempting and almost always a bad idea.


Revenue is $20 million annually, there are 3 developers and budget is tight? Can we drill into this?


You don’t manage this team directly. You don’t have authority or responsibility to fix this.


First off, congratulations! I think you’ve found yourself with an amazing opportunity.

First and foremost, always remember that you and your team are there to support that revenue stream. At the moment the junior developers have done that, but it sounds like they are at an inflection point and need help moving on.

Either one of two cases exists. The current state of software development is holding the business back from growing, or the business is near its limit, but the software is still a possible source of expense, reducing profit, either through excess maintenance or potential for failures.

In either case, your job is not to fix, it’s to lead and help.

First listen to each one of the developers in detail. Find out what they think are the problems. What difficulties they have on a day to day basis. Then teach them.

Perhaps they complain about losing work, or problems merging code. Teach them source control. Perhaps they really fear production changes causing outages. Teach them how to use a staging stack.

If the business is making revenue and sustainable, then you’ve got time and space. And always remember, that revenue is your goal along with your teams productivity. Your goal is not your own happiness with the stack.

If you stick with this company, the opportunity for personal and professional growth is incredible. You’ll learn skills you’ll use for the rest of your career.

So stick with it. And just remember, everything you know about how to run software development is the end goal. It’s where you want those developers to be at the end of the journey. But always listen to them first, and help them by teaching them how to help themselves.


20 million dollars a year team is 3 people budget is really tight ????


Build a deployment server and a dev server. You can do this without the team knowing.

Do a swot analysis with the team. Make them answer why it takes days to do simple changes. Make them answer how they'd recover prod if the disks died.

Block access to prod. The team has to code on Dev and upload their artifact to cicd.

They'll hate the change but it's policy and it's enforced. What are they going to do?

Block artifact upload to deployment. They have to merge a branch instead. Be extremely available to help them learn the SCM tool.

They'll hate the change but policy, etc.

Set up a work tracker that lets you link bugs to features. Populate it with historic data. Triage it extensively. Show the team how each bug comes from an earlier change. Show the team git bisect. (You'll need a test server at this point.)

Set them a target: average time per feature or issue. You'll abolish this metric once it's attained for the first time. In the meantime, it's hard to game the metric, because the codebase is fucked.

Wait, and see if they come up with anything on their own - dinner is cooked when it starts making interesting thoughts.

If they fail to work it out, you'll need to coach them. Give them little breadcrumbs.

You want them to understand:

- slow delivery == poor business outcomes - bugs == poor business outcomes - git helps with bugs - cicd lets you write code - testing reduces (delivery time + bugfix time)

Only when the team understands this can they do the work of fixing the app. (IMO that's a total rewrite, but you're not short of advice ITT.)


How can the budget be tight when the code generates $20 million a year of revenue?


My $work had been in that situation a couple years before I'd started. I've been working on revitalizing/removing some of the C we had for several years, while working alongside people similarly managing the php.

I don't have silver bullets for you, but hopefully you can benefit from my experiences.

> - this code generates more than 20 million dollars a year of revenue

Priority 0: don't fuck this up. Proceed cautiously, with intention. Focus on observability before you make changes. Get some sort of datadog type product, or run something in house.

Start building the culture of understanding risk, mitigating risk by having monitors. Get the other developers on a pager duty rotation, work to get them personally invested in operational excellence.

Get management on board with investing time in it: it's risk mitigation for their business. Get any incidents in front of them. Explain how and why it happened, what lead up to it, and things you're considering doing to remediate. Track how much time winds up getting spent there, and use that as an argument to proactively fix things. Most management will understand that if you're getting randomized, you're not being able to make progress on any single issue.

Work on getting a docker compose setup going so you can easily create a dev environment that looks exactly like production.

Use that to start creating black box tests. Consider things like selenium or postman. Your goal is to test as though you were your user and have no clue about the internals of the program. You do this so that when you make changes, you're not having to update tests as well. Write the tests first. Think in terms of TDD given/when/then. As you add new code, write unit (and integration) tests. Don't try to unit test existing code unless it's very simple.

> it runs on PHP

I feel the pain. Part of the engineering challenge here is accepting unfortunate initial conditions. Your goal is to raise the bar to sustainable.

> it has been developed for 12 years directly on production with no source control ( hello index-new_2021-test-john_v2.php )

Priority 1: get this in source control. If you can't get the other devs immediately onboard, copy what's in production to your local machine and start a git repository. If you need to copy down files after they've changed them in production, that's ok, just start building the repository and tracking some history.

After rsyncing changes down, you'll be able to diff with your latest checkout to see what changes have been made.

This is another mentoring opportunity. Show the other devs how using git is making your life easier. Show them how it's helping you manage the risk that they're afraid of.

Ideally, get a gitlab or github account for it, and start getting a CI pipeline going. Proceed slowly here and make sure you build consent from everyone. Maybe start with a private gitlab account and again, show the other devs how it's saving you time.

The first iterations may just be starting a Dockerfile to recreate the production environment.

> - it doesn't use composer or any dependency management. It's all require_once.

> - it doesn't use any framework

The silver lining here is that it means you don't have any external dependencies :) My biggest concern here would be: - is it using pdo/mysqli, or is it on the legacy mysql extension? - is it using parameterized queries, or are you going to need to audit for sql injections.

Chip away at this over time. It's not urgent. I'm sure many HN heads may explode at that thought -- but until you've triaged everything, everything seems urgent. You've got unmet basic prerequisites here. Say you start fixing sql injections before observability -- how do you know you haven't accidentally broken some page?

> - no caching ( but there is memcached but only used for sessions ...)

Nothing to fix! wonderful! You can figure out a good caching strategy after everything else is under control

> - the routing is managed exclusively as rewrites in NGInX ( the NGInX config is around 10,000 lines )

Having it centralized is actually a bit of a blessing. It means you're not having to scour the application for where it's being routed.

Start collecting nginx access logs, and getting metrics on what the top K endpoints are. Focus on those. Configure it to have a slow request log as well as an error log.

Do yourself a favor and setup the access log to use tabs to delimit fields. It'll make awking it, or pulling it into a database for querying much easier.

> - no code has ever been deleted. Things are just added . I gather the reason for that is because it was developed on production directly and deleting things is too risky.

The silver lining is that the unused code is inert. This is another "chip away with time" type task. start finding paths that haven't had requests in N months. Use analysis tools to show that something isn't ever called. When someone starts with the "well, what if...", remind them that it's in the repository, and isn't gone forever. It's just a revert away.

A bit theme here is fear. You need to start instilling confidence and resiliency in the team.

> - the database structure is the same mess, no migrations, etc... When adding a column, because of the volume of data, they add a new table with a join.

This isn't the worst thing in the world. It's also not urgent. Start putting together ERD diagrams, get the schema in source control, get a docker image going so that you can easily stand up a test database in a known state, nuke it, and start over.

> - JS and CSS is the same. Multiple versions of jQuery fighting each other depending on which page you are or even on the same page.

Slowly work on normalizing the jquery version. Identify all the different versions used, where they are, and make a list. Chip away at the list.

> no MVC pattern of course, or whatever pattern. No templating library. It's PHP 2003 style.

Not the end of the world -- this is pretty low on the priority list. Both are luxuries, and you're in Sparta. Start identifying the domain models, define POD classes for them, start moving the CRUD functions near by. The crud functions can just take the POD classes and a database connection.

> In many places I see controllers like files making curl requests to its own rest API (via domain name, not localhost) doing oauth authorizations, etc... Just to get the menu items or list of products...

Same as with the domain model and jquery, make a list, chip away over time. Be sure the curl calls have timeouts. Slowly replace the self-http-requests with library calls. Explain how if you only have N request workers, if all of those N requests are then making subrequests that there wont be any workers available to serve them, and they'll fail.

> - team is 3 people, quite junior. One backend, one front, one iOS/android. Resistance to change is huge.

This is a bit of a social problem. 4 people can be very effective though, if you're all working together well. Get to the root of why they're resistant and fix that. Are they just set in their ways? Afraid?

Work with them to rank their top 3 challenges, and work through what solutions may be.

> - productivity is abysmal which is understandable. The mess is just too huge to be able to build anything.

> This business unit has a pretty aggressive roadmap as management and HQ has no real understanding of these blockers. And post COVID, budget is really tight.

Measure, so management gets some visibility. Push back on work if you don't understand it. Be clear about what a "definition of ready" and "definition of done" is.

Don't stop the world to fix things. Consider having one person working on a fixup project while everyone else gives them cover by taking on the management ask.

> I know a full rewrite is necessary, but how to balance it?

I can't emphasize it enough, do not rewrite -- resist the urge, if you can't, find a different job. One of the challenges here is integrating with respect to time.

If you have no observability, and no tests, how are you supposed to even show that your rewrite behaves correctly? And if your team is so afraid of breaking something to the point of never deleting code, how do you expect them to handle deleting all the code?

That's about all I've got in me. I hope you're able to implement some meaningful change. Take it one day at a time, and just try to make it better than it was the day before. Good luck :)


So, I do a have a war story about something like this, possibly at a worst state. And possibly with somewhat higher stakes (around 400m$/year) at the time. I came in as a consultant with my own "parallel implementation team". In my case I was somewhat lucky because most of the system was composed of batch jobs. They did have "frameworks" with "ORMs" but they had 4 or 5 of them, with many files being pinned to some older version. Which meant that actually there were dozens.

There were thousands and thousands of business rules no one knew why they were there and if they were still relevant. I remember one fondly. If product=="kettle" and color=="blue" and volume=="1l" then volume=1.5l... This rule like many others would run on the millions of product lines they would import daily. And the cutest thing in the system was that if any single exception happened during a batch run... the whole run would fail. And every run would take close to 15 hours (sometimes more).

Not going into details ... But they couldn't afford the run going over 24 hours... And every day they were inching closer.

Similar to OP they extensively used EAV + "detail tables" to be able to add "things" to the database.

The web application itself was similar but less of a time-bomb. It was using some proprietary search engine that was responsible for structuring much of the interaction (a lot of it was drill-down in categories).

Any change on the system had to happen live with no downtime. Every minute of downtime was $1,000 in lost revenue.

The assumptions we had were: 1. At some point the system will catastrophically fail so 100% of the revenu will be lost for a long time. 2. Even if it were possible to rewrite the system to the same specs (which it wasn't because no one knew what the system actually did) such a rewrite would probably be delivered after the catastrophe.

The approach we used was to 1. Instrument the code - see what was used what wasn't. We set some thresholds - and we explained to the stakeholders they were going to be potentially be losing revenue/functionality. And we started NoOping PHP files like crazy. Remember, whatever they did the worse thing they could do is raise 2. Transform all batch jobs to async workers (we initially kept the logic the same) - but this allowed us with 1# to group things by frequency. 3. Rewrite the most frequent jobs in a different language (we chose Ruby) to make sure no debt could be carried over. NoOp the old code. 4. Proxy all http traffic and group coherent things together with front controllers that actually had 4 layers "unclean external" - whatever mess we got from the outside. "clean internal" which was the new implementation. "clean external" and "unclean internal" which would do whatever hacks needed to recreate side effects that were actually necessary. The simple mandate was that whenever someone did any change to frontend code they needed to move the implementation to "clean external". 5. We ported over the most crucial, structuring parts to Ruby as independent services (not really micto-services just reasonable well structured chunks that were sufficiently self-contained). If I remember correctly this was something of the size of "User" and "Catalog browser" the other things stayed as PHP scripts. 6. And with savagery any time we got the usage levels of anything low enough.. we'd NoOp them.

Around a year in there was still a huge mess of PHP around but most of it was no longer doing any critical business functions. Most of the traffic was going through the new clean interfaces that had unit tests, documentation etc. I think that 100% of the "write path" was ported over to Ruby. A lot of reports (all of them?) and some pages were still in PHP.

I don't think anyone ever noticed all the functionality that went away. We had time to replace the search engine with Elastic Search. It wasn't clean by any means but it was sturdy enough not to have catastrophes.

The company was bought by some corp around that time... and they transitioned the whole thing to a SaaS solution. I was no longer involved for quite awhile so I only heard about it later. But we bought them that extra year or more.

So .. as far as recommendations go: 1. Instrument the code (backfire.io !) 2. Find bang for the buck and some reasonable layer separation and do it chunk by chunk. 3. Don't try to reproduce everything you have. Go for major use-cases 4. Communicate clearly that this is coming with functionality loss. 5. Be emotionally ready for this being a long long journey.


I've been in a situation like this before. Lots of good advice here. I ran eng for a product that supported an entire company for a while, and the company was eventually acquired. I don't want to repeat stuff from all the comments so feel free to reach out (my contact info is in my profile) if you want some 1:1.

tl;dr - Don't rewrite, focus on the biggest pain points first and work your way down. Build a framework in which the junior devs can work on new stuff while you untangle the big ball of spaghetti - they'll think they're doing the big fun stuff and feel like they've won, while you'll be able to be heads-down making things better in the long run. If there's any analytics, you can use that to justify some big changes if you can show that inefficiencies (like poor DB performance and cache usage) affect revenue.


I think you may need to use the probable security dumpster fire lurking in code as an impetus for change.

I find it a little shocking that 3 junior engineers can’t be convinced to learn/try something new that might look good on their resume or make their lives easier.


Sounds like facebook, but with smaller revenue stream haha


Resign and work on something interesting. Not worth your time.


If it earns 20 million + each year, I'd leave it alone.


New company. New code base. Marketers are a dime a dozen


What's the problem? THis is the best work imaginable.


Can you tell us what the code does? Just a bit curious.


If the code is so bad, how is it generating $20M a year?


I would do it by the layers, like clean architecture


What does the code actually do? Just a bit curious.


but this surely is not possible since leetcode guarantees that you will hire the best engineers? hmmmm mmm mmmmmmm


Walk away.

Life is limited, do you want to spend 5 years of it here?


There’s only one way to fix this mess. Leave.


> I have to find a strategy to fix this development team without managing them directly

Sorry what? What position are you in here? If you have no authority here then you are in a very precarious situation and you should figure that out first.


Step 1: get everything in source control.


Is this a site for a social platform?


$20m/year? With such a terrible codebase too, huh. Wow. Anyway what's the first letter of the place you work at?


But, the code makes 20 million dollars a year. How bad can it be?


«team is 3 people, quite junior» - run, buddy, run..


Just finished the project like that.

The website was terrible. Mixed encodings which messed everything up. all the routing in htaccess going to hundreds separate files. PHP 5.4, no version control and pretty much everything wrong that could be.

Company had a hot potato that was generating quite a few milion pounds. They actually didn't had a programmer for a year, but because of pandemic fortunately old one decided to come back.

I had quite a lot of trust from owners as I developed two other side projects with them already, in crazy times. they knew they had to invest in technology or die and they said they want to do that.

We had to start from scratch, there was nothing salvageable there and files where 100k lines long with no explanation which one is run anywhere. I wanted to replace piece by piece, but because db structure and encoding, I really couldn't find a way. whatever we would end up with, wouldn't be decent thing, it would just be a frankenstein that then would have to be rewritten again ( although drastically better ).

I told owners how much it takes to develop similar project and said that this is not estimate as no one would know how long it will take ( they tried to redevelop this twice and didn't manage to ).

The project struggled with a lot of issues. The other developer couldn't really contribute anything even that I tried to pull him to new code. He ended up taking over product owner job on my advice as he was actually really useful to company, just not as coder. We couldn't find anyone to work even that we paid pretty well and allowed people from any place in the world. We found two developers which were pretty skilled, but seriously didn't do anything. I often deal with that, but in such a small project it's just killing any productivity, including mine.

We managed to publish the project with large delay. It ended up being super rushed, but we generally managed to sort out most of the issues quite quickly.

We failed on one thing though. SEO. Even that the site increased in all the stats in webmaster tools, the reindexing wasn't kicking in 5 months later. We hired SEO agency, but frankly they didn't help anything. The issue had nothing to do with new site. Just we were in google bad graces with previous site and google just ignored our new links while removing old ones. I knew that will be the case with new site, but the benefits should drastically out-weight the cons.

At this stage company literally refused to pay me my shares and some money they did owe me ( they were broke waiting for new round of funding so I gave them a bit of leeway ). I had to stop working and I am suing them now.

The moral of the story is:

- Everything will take way longer than you expect.

You cannot divide and conquer such a large project. Any amount of planning outside of basic one will be just a waste of time as no one will be aware of all the features, some features are lacking and some are just plainly stupid. Any scope will change thousand times. Rewriting partially is way better, but it wasn't possible in my case.

- Business will say they want to invest the money, but they don't understand tech.

All the time, the solution to too slow progress was hiring more devs. Man hours are almost never a solution. Time is way more important investment. There also have to be contingency, as something will go wrong. In my case SEO issue will be solved, but it might take 3, 6 or 12 months, in which time the business will have to loose some sales.

- Communication with business is very hard.

At this same time, you need to explain the stuff will go wrong ( unless you have unlimited amount of time and resources ) and will be delayed. In this same time you want them to invest money in it. Frankly I failed on that the most. What I would make very clear now, is that those issues are caused by lack of investment over the years.

- Only go to it with good team.

I managed to build a very good team at the end, but it took a lot of bad apples to get there that wasted a lot of my time. People had the skills, but half of them tried to rewrite every piece of code not developing anything useful and the other half did just not do anything for weeks.

My view on that is: Unless business understand the need to change, have the money and time to do it, and you have a good team, don't do it. Seems like you are 0 to 4.

Some businesses cannot be saved. It's their fault they didn't invest any money over the years and if you want to do tech, you need to have tech people in management or on board seats.


This is a tough tough situation. There are no easy answers or quick wins here. So before we even think about code, let's ask some questions...

1) You said you can't manage this team directly. Is it your responsibility to make this team successful? I know it's annoying to see a team with horrible code and who refuse to change. But is your manager expecting you personally to fix this? If not, just leave it.

2) Even if it's your responsibility, is this where you want to spend your time? As a leader you have limited time, energy and political capital. You need to decide strategically where to spend that time to have the best impact on your company and to achieve your personal career goals. The fact that you can't manage them directly makes me think that they're not your only job. If it's just one area of your responsibilities, I'd consider letting this team continue to fail and focus on other areas where you can make some wins.

3) Is how the business views this team wrong? They're making a lot of revenue with a very cheap team who seem to be very focussed on delivering results. Yes I know, it's annoying. They're doing everything wrong and their code is unimaginably dirty. But... They're making money, getting results and neither they nor the business see any problem. So again... should you just let it be?

4) Ok, so if you're absolutely committed that this code base has to be fixed... maybe you should just find a different job? Either in the same company or in a different company.

5) Ok, so it's your problem, you want to solve it and you're unwilling to leave. What do you do?

Well, anyone can make a list of ways to make the code better. Because this team has been doing everything perfectly wrong, it's not hard to find ways to improve: source control, automated testing, CI/CD, modern libraries, SOLID, clean architecture, etc, etc.

You can't quietly make the changes, because the team doesn't agree with you. And even if they did, this hot mess is way past the point of small fixes. You need to put in some solid work to fix it.

So you need buy in from management. You either need to deliver less while you improve the code base or spend more money on building a larger team. But since they see no problem, getting their buy in won't be easy.

Try to find allies, make a pitch, frame the problem in business terms so they understand. Focus on security risks and reputational risks. And don't give up. You may not convince them today, but if you make a pitch, they will remember in 6 months time, when this team is still floundering. They will remember that you were the person who had the answers. And then, they may come back and give you the time and resources you need to clean up the code base.

So in conclusion. If it's not your problem, ignore it. If you have other teams to manage that aren't a mess, focus on them and let this one fail. If you're going to be responsible for this pending disaster, quit. If you absolutely insist on making a change, start with getting buy in from management. Then incrementally work down the technical debt.


When I started at my previous job as an IC, things looked similar - although they were at least using git already to share the code (deployments were made by uploading files to production anyway). The team was made up by a grumpy solo dev, an overly enthusiastic, hacker-type CTO, a very thoughtful but introverted engineering manager, and three junior devs. No tests, no migrations, secrets all over the place, no running locally, layers upon layers of hacks and required files, and a homegrown framework using obscure conventions (my pet peeve: the endpoint handler called was resolved dynamically by combining the request method and the URI part after /api/, so GET /api/foo/bar would call get_bar on the foo controller. As every method was public, this would also work for delete_internal_stuff).

What I did was forming a mental plan on how to get the org to a more sensible state - namely, having the application run on a framework, within a container, with tests, have it deploy from CI into an auto-scaling cluster of container hosts, configurable via environment variables. That was difficult, as the seniors all had reservations against frameworks, tests, and containers. So I went slowly, introducing stuff one by one, as it made sense:

* I started by rewriting core code as modules, in particular the database wrapper. They had cooked up an OOP abomination of mysqli-wrapper, instead of just moving to PDO. So I wrote a proper PDO wrapper that exposed a compatibility layer for the old method calls, and provided some cool „new“ stuff like prepared statements. Modules like this could be installed from a private composer registry, which helped justify the need for composer. * instead of going for Symfony, I created a very thin framework layer from a few Symfony components on top of Slim. This didn’t felt as „magic“ as the bigger options would have, and didn’t scare the devs away. * to build up trust, I added an nginx in front of the old and the new application which used version-controlled configuration to route only a few endpoints to the new app selectively. This went well. * now that we had proper entry points, we could introduce middleware, centralised and env-based config and more. In the old app, we reused code from the new one to access the configuration. Dirty, but it worked. More and more Code was moved over. * I started writing a few tests for core functionality, which gave confidence that all this was really working fine. I wasn’t really able to make the other devs enthusiastic about testing as I would have liked back then, though. * Testing showed the need for dependency injection, so I introduced PHP-DI, which brought the most elegant dependency injection mechanisms I know of. The senior devs actually surprised me here, as the accepted this without resistance and even appreciated the ability to inject instances into their code. * deployments would require uploading lots of files now, so I introduced BuddyCI, which is probably the most friendly CI server. It would simply copy everything from the repository to the servers, which was a large step forward considering the seniors suddenly couldn’t just upload fixes anymore. * with the deployments in place, I introduced development and production branches, and let the team discover the need for fix and feature branches by itself. * to avoid having to run both apps and nginx, I added container configuration and docker compose to spin up the stack with a single command. This convinced everyone. * from there on, I added production-ready containers and set up kubernetes on Google Cloud (this is something I wouldn’t do at most places, but it made sense at this particular org). We deployed copies of the app into the cluster, and set up a load balancer to gradually move requests over. * one by one, we migrated services to the cluster, until practically all workloads were running as containers. The images were built by the CI, which would also run tests if available, push the images, and initiate the rolling update. * at this point, things were very flexible, so I could add delicacies like dynamically deployed feature branches previews, runtime secrets, and more.

All in all, we went from 80+ bare-Metal servers (some of them not even used anymore) to a 12 node GKE cluster. Instead of manually updating individual files, we got CI deployments from production branches. Secrets in the code were gradually replaced with environment variables, which were moved from source-controlled .env files to cluster secrets. Devs got confidence in their code due to tests, feature branches and local execution. From a custom „framework“, we moved to commonly known idioms, paving the way for a migration to a full framework.

What I didn’t manage was introducing database migrations, disciplined testing, and real secret management.

I hope this helps you, if only to draw inspiration to get started _somewhere_. Best of luck!


As a prerequisite, get the code and database schema (+ scheduled updates to CSV's of number of rows/table, per-table storage size, per-database storage size, etc.) into source control ASAP. You can do this entirely on your own local machine day 1, automating rescanning and committing a diff at least daily. Also regularly commit all of the configuration files for the production environment (devops, installed packages/versions, bash histories, etc.) that you can get access to.

In parallel is a review of the disaster recovery plan... do a full test restore of code + data from scratch!

I would then encourage an evaluation to get the lay of the land. If my intuition is correct, there are high priority problems in production that no one is aware of, well beyond the tech debt.

Start by setting up centralized error logging as quickly as possible, from the simple 404/500 error and database timeout reporting (is there any low-hanging fruit here redirecting URLs or speeding up the DB [indexes]?) to more deeply entangled server-side error reporting... ELMAH was an eye-opener when first dropped into an existing cowboy-style ASP.NET app, I don't know if something similar exists for PHP for free but you could learn a ton just trialing a commercial APM solution (same for db optimization tools).

Then once the fires are identified and maybe even a few are out, analyze available metadata to determine the highest-traffic areas of the application. This combines client-side analytics, server-side logs, and database query profiling, and guides where issues should be fixed and tech debt should be paid down first. You can get down to "is this button clicked" if you need to, but "is this page/database table ever accessed" is helpful when getting started. (It's often nice to separate customers from employees here if you can, such as by IP if working from an office.)

Do you have the option of pursuing hardware upgrades to improve performance? (Is this on-prem?) You might want to dig into the details of the existing configuration, especially if the database hasn't been configured correctly. Which databases are on which drives/how are available iops allocated/can you upgrade RAM or SSDs? One big item here is if your are nearing any limits on disk space or iops that might mean downtime if not addressed quickly.

In the cloud you have opportunity to find resources that are not being used anymore and other ways to cut costs. Here again you can trial commercial solutions for quick wins.

Finally, implement some type of ongoing monitoring to catch anything that happens rarely but may be absolutely critical. This might be best done through an automated scan of logs for new URLs and database queries. After a year to 18 months, you should have a good picture of which portions are completely dead (and can be excised instead of fixed). You can start cutting things out much sooner than that, but don't be surprised if a show-stopping emergency comes up at the end of the fiscal year, etc.!

These are all easily justifiable actions to take as someone hired to get things headed in the right direction, and can earn the political capital necessary to begin pursuing all of the other recommendations in this thread for managing technical debt.

Edit: one mention in the thread of prioritizing restructuring the DB, sounds best but also tough.


Log and review all server-side errors in PHP: https://www.cyberciti.biz/tips/php-howto-turn-on-error-log-f...


simonw's version control plan would be my step 1.

Step -2 is what you are doing now, OP, getting informed about the best way to go about this.

Step -1 is forming the battle plan of what you're going to change and in what order of importance.

Step 0 is communicating your plan to all stakeholders (owners, managers, devs, whoever) so they have an idea what is coming down the pipe. Here is where you assure them that you see this as a long process of continual improvement. Even though your end goal is to get to full VCS/CI/CD/DB Migrations/Monitoring, you're not trying to get there TODAY.

Step 1 is getting the codebase into a VCS. Get it in VCS with simonw's plan elsewhere in this thread. It doesn't have to be git if the team has another tool they want to put in place, but git is a decent default if you have no other preferences.

Step 2, for me, would be to make sure I had DB backups happening on a nightly basis. And, at least once, I'd want to verify that I could restore a nightly backup to a DB server somewhere (anywhere! Cloud/Laptop/On-prem)

Step 3, again, for me, would be to create an automatically-updated "dev" server. Basically create a complementary cronjob to simonw's auto-committer. This cronjob will simply clone the repo down to a brand new "dev" server. So changes will go: requirement -> developer's head -> production code change -> autocommit to github -> autoclone main branch to dev server.

Chances are nobody has any idea how to spin up the website on a new server. That's fine! Take this opportunity to document, in a `README.md` in your autocommitting codebase on the production server, the steps it takes to get the dev server running. Include as much detail as you can tolerate while still making progress. Don't worry about having a complete ansible playbook or anything. Just create a markdown list of steps you take as you take them. Things like `install PHP version X.Y via apt` or `modify DB firewall to allow dev server IP`.

Now you have 2 servers that are running identical code that can be modified independently of each other. Congratulations, you've reached dev-prod parity[1]!

Note that all of these changes can be done without impacting the production website or feature velocity or anyone's current workflow. This is the best way to introduce a team to the benefits of modern development practices. Don't foist your worldview upon them haphazardly. Start giving them capabilities they didn't have before, or taking away entire categories of problems they currently have, and let the desire build naturally.

There are a number of things you mentioned that I would recommend NOT changing, or at least, not until you're well down the road of having straightened this mess out. From your list:

> it runs on PHP The important part here is that it _runs_ on anything at all.

> it doesn't use any framework This can come much, much later, if it's ever really needed.

> no code has ever been deleted. As you make dev improvements, one day folks will wake up and realize that they're confident to delete code in ways they didn't used to be able to.

> no caching Cache as a solution of last-resort. If the current site is fast enough to do the job without caching, then don't worry about it.

[1]: https://12factor.net/dev-prod-parity


Do. Not. Full. Rewrite. It would be absolute suicide and almost certainly fail. Just put that option out of your head.

1. Complete a risk assessment. List all the security, business, availability, liability, productivity, and other risks and prioritize them. Estimate the real world impact and probability of the risks, describe examples from the real world.

2. Estimate the work to mitigate each risk. Estimate multiple mitigation options (people are more likely to agree to the least bad of multiple options).

3. Negotiate with leadership to begin solving the highest risk, lowest effort issues.

But before you begin all that, focus on the psychology of leadership. Change is scary, and from their perspective, unnecessary. The way you describe each risk and its mitigation will determine whether it is seen as a threat or an exciting opportunity. You will want allies to advocate for you.

If all of that seems like too much work, then you should probably either quit, or just try to make small performance improvements to put on your resume.


Well, I would prepare 3 envelopes.....

No, seriously, some projects like this are lost causes. The company wants to just get maximal return on minimal effort. A rewrite is going to be a sunk cost with no return.

Basically, your job is to limp it along if you can't prove that a rewrite will make them more money.

If you don't like that answer, you might as well look elsewhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: