Hacker News new | comments | ask | show | jobs | submit login
Ask HN: I just inherited 700K+ lines of bad PHP. Advice?
52 points by ohmygord on Sept 22, 2012 | hide | past | web | favorite | 75 comments
So, I've just inherited a very large, very badly written monstrosity. Including javascript, template files etc, it breaks the 1 million LOC barrier. I'm looking for some advice and strategies that you guys might have used in similar situations, in particular on:

- getting a handle on the code base - communicating 'progress' to the client - not losing the will to live

The software is based on vtiger, an open-source CRM that has a (deserved) reputation of being incredibly badly written, that has since been badly hacked apart by several different companies with wildly differing ideas. My client currently have 150+ installs and 150+ angry clients.

Words fail me trying to describe the state of the software.

- no niceties such as MVC, ORMs, a DBAL, or a modular design - all DB queries are inline SQL, with tens of inner joins on most queries - dizzying call stack, yet reams of copy+paste code

The best part: the code will often query the DB and execute PHP code contained in the response, or load and run arbitrary files and modules as dictated by parsing particular DB fields. The one page I have studied in detail generates 105 DB queries in the simple case.

The DB itself is even worse. There are over 600 tables, as well as views, custom functions, cascades and (but of course) triggers. There is no consistent naming schema, very few explicit foreign key references (despite being heavily, heavily entwined) and I have already discovered several tables that don’t have primary keys, but are referenced by exact string matches on things like date stamps.

I wont mention the table-based HTML, javascript, lack of version control etc.

I’m not sure if its even possible to give relevant advice (besides perhaps ‘run screaming’), but if anyone here has come through a similar situation and has any advice to share, I would be deeply grateful.

Help me HN - you're my only hope. (PS. 2K char limit sux)

1. Get it onto version control.

2. Make sure there is some workable strategy for deploying and testing the code.

3. Ask somebody to provide you with a list of the changes, or else try to create some kind of diff against the original version of the code. If you can see crazy stuff here then find out who did it...

4. Ask somebody what the biggest bugs are? Which things are causing clients the most problems?

5. Try to establish which convention is 'winning' in the codebase. But you might want to create a more sensible convention which will allow unit testing (start this immediately!)

6. At this point, ask if you can hire people to work on this with you as it's a big problem, and you need to free yourself up for the rewrite.

7. If that isn't possible then leave. You have done enough to make your CV better and a company which passes you something like this does not care about your career.

First read the Fowler's 'Refactoring' book; it was written just for you. Then:

1. Identify a small and easily separable piece of code (what you woud call a component in a normal system.)

2. Write tests covering every (important?) edge-case of the piece of code you want to rewrite.

3. Mercilessly refactor until it's nice and squeeky clean.

4. Lather, rinse and repeat.

And of course, make sure your client acknowledges that it's a giant clusterf... and is on board with you pulling the system out of the stone age.

Also, if you want to make life a bit more interesting for yourself, get the PHP code's AST and programmatically rewrite existing code to shared conventions for kicks.

> First read the Fowler's 'Refactoring' book; it was written just for you.

"Refactoring" is not the tool for the job, although it's a nice sidearm.

What OP needs is the big gun, Feathers's "working effectively with legacy code": http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...

As the title hints, it was written specifically and expressly for the "I just got a huge amount of complete shit of a codebase shoved unto me, how do I survive". Just check the TOC of part 2 (the meat of the book): http://my.safaribooksonline.com/book/software-engineering-an...

> And of course, make sure your client acknowledges that it's a giant clusterf...

That's hugely important. No promises of delivery, and that the client understands it's not a cakewalk.

I started writing some suggestions but you know what - someone is dumping this on you because they didn't care and their predecessors didn't care, etc. They probably make far more than you for doing far less.

The moment you start touching the code, you are going to start being blamed for the nightmare preceding you. It could even affect your career if future employment researches where you worked previously and gets told you made the mess in the first place.

My thoughts are in line with this. Are you (honestly) being hired or promoted to fix this mess, or to keep things running?

If the former, has the incredible scale and scope of this been properly identified, addressed, and acknowledged? Are you guaranteed anything near the resources to (try to) accomplish this (including your own time, without traipsing far into overtime)? Is the current state documented sufficiently to obviate any and all future attempts to blame you?

If the latter (more likely, I suspect), well... I guess the simplest question is, do you have an agenda and an exit strategy that leaves your career intact? (And your health...)

Maybe, given the particulars, this is a real opportunity for you. But that's not spelled out at all, nor obviously implied, in your post. And given that this situation was allowed to develop to this extent in the first place, and that you have angry clients to deal with, right off the bat, it doesn't sound promising.

Do you like playing the role of unacknowledged hero who falls on his sword and is cursed by his clueless fellows, while some other protagonist goes on to get the girl?

There is a lot of downside, here. What's the upside? Do the organization's goals and commitments match your personal ones?

Yes, and the chance that the OP will get the codebase in order sounds minimal. It's simply too large, it would take one person many years to clean up a mess like that.

I've faced similar Augean stables in the past (and present, unfortunately).

I'd suggest that correcting the DB is one of the last things you can do, especially if queries are scattered throughout the code instead of in functions. You could attempt to abstract it with an internal API and as you update the codebase replace with calls to the API. Once fully abstracted, you can then focus on getting the DB corrected and only need to modify the new API functionality.

As to the code itself, sit down and map out all the verbs and nouns in the system. If you have a Contact noun, what is the definition of that role and what verbs can be applied to it or what verbs could it do. This gives you a good map for creating functions that can then be used to replace existing inline stuff.

Triage the worst bugs or performance bottlenecks and see if they are particular to a noun and/or verb, which should give you an obvious starting place to begin refactoring. For emergency hotfixes and such, feel free to just tweak the existing crap code but otherwise try and work on your functional units to get ahead of the game.

And always remember, pimpin' ain't easy. ;)

+1 for this. The first job is to stabilise the system by fixing critical bugs.

As you're doing so, move all those queries into one big fat DB class, and when you spot groups of related queries, split them out into their own classes.

The next priority should be to get rid of the PHP from the DB - if need be create another huge class with a zillion if-else statements.

You need to modify the code to simplify it. You don't need to improve the design, you just need to dumb it down until you can understand where all the parts are. Stabilise, simplify, then refactor.

But first... version control!

What is most important is what are you trying to achieve. Are you trying to make the system as stable as possible at the lowest cost to the client or are you trying to bring this system into a future proof state and the client is willing to pay for that?

Personally if I were in your position I would explain to the client that I'm happy to temporarily fix some bugs but long term the system needs to be rewritten. 700k lines of code is a lot, but the way you've described it I get the feeling most of that code is needless. Depending on what the system actually does you could conceivably rebuild in a few months.

I've spent quite some time the cross-component spaghetti code large companies sometimes write.

I've come to believe that the skill of not rewriting from scratch but forcing yourself to slowly refactor (as per Martin Fowler's definition) existing systems into a proper state is one of the most important skills you can develop.

That way, once you've refactored most of the system (which includes adding tests for all the important functionality) you can indeed confidently rewrite everything. If you do it any sooner than that though, you're in for a world of pain.

Second this. The temptation to rewrite from scratch is to be avoided; the code is going to be a mass of edge-cases, and you can't spot them all at once. Rewriting will take just as long as refactoring and introduce new bugs instead of killing old ones.

I was in this exact same situation two years ago, only with an eCommerce platform which shall remain anonymous. The client had gone through 4 companies, trying to get the project built and finished. Nobody had been able to wrangle it clean.

Reluctantly, I took on the project, and started working through it. What I initially estimated would take me a month to untangle ended up taking a year. That's an entire year in the snake pit. And since it was eCommerce, there was serious money on the line when it came to bugs. And there were hundreds that I found.

Just understand the commitment you're making. Make sure your client has the money and the time to make things right. Ultimately, in my project, the client insisted on quick hacks to keep competition at bay, and the code dissolved into a mess. I decided I couldn't keep working with a codebase that was never given a chance.

Understand what you're getting yourself into. Because you're taking over the responsibility of loading a massive crap ton of software into your head for diagnosis. How long do you want to have fragile, crappy, lazy code in your head? Forget about bring superman, you're not going to save a million lines of code, you're going to become the builder of the hacks that work around absurdity. Make sure you understand that.

The burden of broken code you're responsible for, that's always broken in production is like nothing I've ever encountered. Make sure it's worth it.

So the cold reality for this client is that the codebase will have to be replaced over time. You are not trying to escape legacy simply for the lure of something new, you are trying to escape insanity.

I would talk to the client about focusing your effort on helping them transition--a small piece at time--to a sane architecture. If they aren't open to that, they aren't your client. I've been a business leader in a position where we had to make really tough and painful decisions about coding projects gone awry. I don't envy their position, but continuing forward with this monster does not seem to be in the long-term interests of the company.

This is the beauty of the web- the transition to a sane architecture can be done page by page- without any visibility to the user. Every time you have to make a change- fix an old feature or add a new one, you replace it with the new architecture. Even if it's not a whole page- you can load in a partial with javascript. The challenge is holding back from going too deep on the refactoring all at once. Functional tests around the system have to be added to keep your sanity as part of the change process.

This is the best response. To be a bit more general: evaluate whether the client is actually willing and able to take the sane course of action starting NOW and continuing over the long term. If not, and you stay, there's no advice that's going to be much help. Your life will be hell.

1. I'm so sorry.

2. Set up a development environment and deploy the code there. Get it working. With code that large (and with the added wrinkle of executing code out of the database) changing things is going to be a nightmare of unintended consequences. Getting a testable environment up will let you find those things and help you understand what it does.

3. Get it in version control. This should be number 1: Before you make changes get a baseline of where it was.

4. Find a bug that exemplifies the nastiness of the whole situation and make a fuss. Let everybody know why this bug is so bad and what caused it. This will give your employer a concrete example to look at when you say "this code is shite". Harp on this bug.

5. Fix that one bug. Roll it out. Be a hero.

At this point you'll have a good base line, some credibility, and the organization will understand what a mess they've got. Now you'll have to figure out what you want to accomplish: keep it limping along? Improve it? Rewrite? The above steps will get your feet under you.

I would start by spending a week or two wrapping the application in Acceptance tests (cucumber/capybara) just so when you do make a change you are able to quickly find out to a semi decent level of confidence things are ok.

I would also recommend Working Effectively with Legacy Code By Robert C. Martin.

Good luck!

"Working Effectively with Legacy Code" is by Michael Feathers. http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...

Can't agree more with this. The way you reduce risk when migrating code (especially code you don't fully understand) is to build strong integration and end to end tests.

This will allow you to be more aggressive when replacing crap code with new functionality.

Given this huge mound of code, this process will be quite a drag, but it will pay off in spades in the long run. Trust us.

My tips:

Raise your rates. If you just took the gig, you'll have to wait a bit. But you should price your work such that whether they say yes or no, you have no regrets.

Expose the problem. Start inventorying the issues. Track them in the same way that you track other work. As you do things that the client has requested, track real vs actual time. E.g., "This change took 12 hours; if the code base were clean, it would have taken 1."

Estimate the size of the problem. Talk in terms of technical debt. E.g., "Module X needs 120 hours of work to bring the code to commonly accepted standards of code quality." The clients are thinking, "We have a system we paid $1m for, so it's an asset worth $1m." Expose the debt and they will have a better idea of the true value of the code base.

Look for opportunities to declare tactical bankruptcy. Once you have numbers, you can show that some portions of the code base will be cheaper to rewrite than to clean up. Help your clients make good financial decisions about when to just toss and rewrite particular parts of the code.

Don't let them make you crazy. I'd recommend something like a kanban board to track work and strictly limit work in process. This system is probably a mess because the client is insane. Develop some very clear, very firm boundaries that keep them from driving you crazy as well. If you are lucky, they will, over time, learn from you to behave rationally about software.

You should quit. Life is short, there is no reason for you to spend it doing that.

* Rewrite.

* NEVER modify existing one. Once you change one line of comment, you own all the code and problem from that point.

* If rewrite is not allowed, then ask huge pay raise for this work. Basically it is not about money, it is about bring everyone on the same page on the status of he existing solution.

* If the above does not work out, prepare to switch to another project, or quit the job totally.

There is absolutely no way to rewrite a million lines of business logic without ending up with an even bigger mess. See also: http://www.joelonsoftware.com/articles/fog0000000069.html

I read that article before and totally agree with the point.

But that situation is different from the one we discuss here.

I do not know more information about ohmygord's project, but I basically want to point out to consider non-technical side of it. For example, people in the same team may not technical, and/or think maintaining existing solution is simple. I was in similar situation before, I was lucky to happen to select right strategy to deal with the situation.

This might sound sacreligious to the many vim fans here, but get a good IDE, it will help you get a handle on what the code is doing and let you navigate around faster, which is especially handy if the execution path for accomplishing any one thing involves dozens of files. A good IDE will also point out blatant errors, and a really good IDE will point out potential errors as well. I personally really like PHPStorm by JetBrains, the code inspection tool is quite good. I was recently able to cut the size of our code base in half by using it to identify tens of thousands of bugs, a lot of them on inspection were "this never worked" type bugs, which with a little digging I was able to confirm could never be called. Eliminating code also makes refactoring the remaining code easier because you have fewer interdependencies to worry about.

Don't touch anything. Run. There's no glory for you in this.

First of all, make sure the client and you agree on what the long-term strategy is. Then get buy-in for a first step in this direction.

If the system is as bad as you make it sound, the long-term goal has got to be a complete replacement of the existing codebase. That will usually require as much effort, and thus money, as writing the existing version did. (Experience shows there's is no reason to confidently assume different.)

Then, explain how to get there without doing a (hopeless) complete rewrite in one big bang:

First, you need to make a set of decisions for the new code you're going to write, i.e. the language, framework(s), architecture, whatever you want to new code to be based on.

Next, you try to modularize the existing system so that you can replace one tiny part.

That's going to be really, really hard - modularizing a systems after it's in production always is. Don't do it all at once: If you've isolated some small piece of so that there's a clear interface (based on a programming language API, a database interface or (my favorite) some RESTful HTTP API), rewrite that small piece using your new technology stack and integrate it with the existing monstrosity.

Once you have done that successfully for some small aspect, you have some sort of proof that this approach can work.

Then, over the next months (or more likely: years), rinse and repeat.

This is a hugely expensive thing to do, but that shouldn't come as a surprise – after all, you're replacing the organs in a living body while it's running a marathon on its last breath. An MBA should understand that the additional cost is because this strategy drastically reduces the risk.

You can explain to the customer that they can try this out using just a small part, and decide whether or not they want to continue afterwards. Point out that you're going to start with those parts that produce most of the existing pain. Explain that you're helping them to return to a situation in which they have fewer bugs, can introduce new features quickly and easily, and best of all, that the end result will be a system that's modularized, hopefully ensuring that they won't run into the same situation again.

If they expect you to do magic, i.e. maintain the mess and magically turn it into a good piece of software without being allowed to actually change it significantly, get out of the contract as quickly as you can.

Stefan humbly omitted a reference to his excellent treatment of breaking up a monolithic giant. It's well worth a watch, so here you go: http://www.infoq.com/presentations/Breaking-the-Monolith

You should make sure to set expectations. If your employer only wants the minimal amount of maintenance done, then don't do any more. You could go to heroic lengths to repair the codebase, but if that's not what they're asking for it will be in vain.

Second, I suggest applying as many tools as you can. A modern version control system, of course, and keeping any version control history that you inherited (although it sounds unlikely).

A powerful IDE might also let you start cutting out crap immediately, so try PHPStorm or Eclipse+PHP (or both!) and see what they can tell you.

And start writing tests as you start making changes, because you'll likely break something seemingly unrelated when you start changing things.

700k+ codebase and a single developer? That sounds crazy.

Run away from this. Trying work with this code would make you stressed and frustrated, which will have a significant negative impact on your productivity.

If the company plans to add features to this software, they should hire more than one developers and perhaps rewrite it from scratch.

Edit: Also send your boss link to this discussion :)

A few weeks ago I was commited with something like that. I just quit the job, I couldn't sleep at night and I was not making any progress in the first days. I know there is a learning curve, but it had been two weeks and I couldn't do anything. Be sure you can work with that before accepting, because then is really hard.

I had a similar experience a few months ago, myself, with a huge, very poorly written PHP app. I lasted three weeks; three weeks in which I didn't sleep and felt constantly agitated while I spent nearly every waking moment at the computer trying to be productive amidst an ocean of stress. My whole family suffered from this project, because I became very difficult to live with.

The first thing I would do, before doing any work, would be to sit down with the client/your boss and explain just how bad the situation is, that drastic solutions are in order, and that it will take years to get this under control (with 1m LOC and 150 clients presumably all running customised software, this would take years to sort out even with a large team working on it). Unless they understand that from the beginning you will never get the backing you need to sort this out.

This might be one of the few occasions where a complete rewrite is justified (if you can keep scope limited to reproducing what you have). You've said the code is incredibly complex, and if the problem domain is incredibly complex too, you're probably stuck refactoring. If the problem domain is pretty simple (a CRM without too many extra features might be), you may be better starting with your smallest client who uses the product the least, asking for all the pain points, and things they love, about the current software, and writing a simple CRM to cover their needs which replicates the features of the current product, then gradually porting other clients over to the new system and adding new features to it, while keeping the old code-base in maintenance mode and fixing serious bugs only. If you do a rewrite you'd have to port the 150 clients over 1 by 1, and leave the other code in maintenance mode - your primary client may not be at all happy with that.

If that's not possible, you'll have to refactor it slowly while keeping the code in place, so the first step is to get it into version control, sort out a sane deployment strategy with testing servers, then try improving some small isolated areas of the code for one of the clients in isolation. Good luck!

Lots of good advice in here. I lean toward the "run screaming" side myself.

The fact that they've burned through several contracting companies and still think it's possible to get large numbers of bug fixes in the first month suggests that they're pretty clueless. It's going to take you two or three months (if not longer) just to get your head into the code enough that you can fix anything nontrivial.

If I were advising them, I would say they need two teams: one team of just two or three people (or maybe just one) solely trying to fix bugs in the existing code, and the other team of three or four people doing a complete rewrite from scratch. As others have commented, complete rewrites are normally a bad idea, but this code base is so far gone I don't think it can be incrementally refactored into sanity. Oh, and they should expect the rewrite to take two years.

But despite their experience to this point, it sounds like they're still not ready to hear that. Which leaves you little choice but to run screaming.

EDITED to add: what these people need to understand is that their demand for results in a hurry is what got them into this mess in the first place. Until they get that I don't think there's any hope.

I have actually worked on a code base like you're describing, on a contract basis, for a client. I loathe maintenance programming, so the relationship didn't last very long -- just a few months. So:

1. Make sure you have a rock-solid contract in place with the client that will ensure that you get paid, get paid well, and get paid often. Receiving a check in the mail makes it easier to look at the code. If your payment terms are anything like, "payment-upon-completion of ...", or, "paid net 30 after invoice", or anything like that, you simply won't want to work on the code.

2. If "soul-crushing", "depressing", or "makes me want to hang myself" are phrases you'd use to describe the code or your state of mind when looking at it, then go into this project knowing that you're not going to last long. There are people who genuinely enjoy working on stuff like this. You aren't one of them.

3. Everybody that says "rewrite" is dreaming. It is impossible to rewrite something that large without breaking something and spending too much money. Re-factoring a function is doable. Re-factoring a thousand-line file is doable. Re-factoring part of a database is doable. Re-factoring all of it all at once is starry-eyed fiction. Not gonna happen.

4. But, if taking ownership of this code base is something you want to do, then add re-factoring time in to your agreement with the client -- something like, "20% time spent replacing bad code" -- and focus on the tiniest little ugly thing you can find, and re-factor that. Start on it, don't stop until it's done. Keep it in small bite-sized chunks.

5. Make sure you're getting paid for time spent just getting familiar with the code base. If you work with it long enough you'll actually get pretty familiar with most of it, but you want to do that on their dime, not yours.

6. Get help. They surely realize by now that they've got a mess on their hands. Talk with them about whether or not you can bring on additional help. If they flat-out refuse, run. (That is what killed my work with my client; I wanted to move into a position where I managed a junior programmer and focused on code rewrites and higher-level stuff; they refused, I quit. They wanted an employee, not a contractor.)

7. Version control and a sane bug tracking system (Mantis isn't horrible) are must-haves. If they don't have these, again, make sure they pay for it.

Dealing with a code base like this one is as much about state-of-mind as anything else. Either you can handle it or you can't. No amount of advice here will make it more palatable to you if you're not the sort of person that's OK with inheriting a disaster.

Also, even if you've got some kind of agreement in place with the client already, it sounds like you've just now gotten your first look at the code. This, in my opinion, makes it totally OK to go back to the client and re-negotiate. You can open it with, "I'd like to work with you, but now that I've seen the project that you want me to work on, I can understand why this has been a problem for you, and I need to make sure that we can come to an agreement that will work for both of us so that I can fix this for you." (Or something.)

" Everybody that says "rewrite" is dreaming. It is impossible to rewrite something that large without breaking something and spending too much money. Re-factoring a function is doable. Re-factoring a thousand-line file is doable. Re-factoring part of a database is doable. Re-factoring all of it all at once is starry-eyed fiction. Not gonna happen."

I don't think so. Yes, the opportunities to do something like this are rare. It is up to the project lead and the client to decide whether or not this makes sense.

Also, keep in mind that a "complete re-write" doesn't necessarily literally mean that every line of code must be re-written. There's often tons that can be salvaged.

If the code base is an absolute disaster I would not touch it without the understanding that the project might entail massive re-writing of portions of the codes base as well as significant structural modifications. Maybe I'm lucky in that I've never really had to go look for work. I would flat-out reject a project like this without massive client buy-in.

If it is mission critical for the client and they can afford it there is no reason not to step back, truly evaluate the situation and consider a significant redo of the app.

For any non-trivial enterprise having a solid and maintainable code base is nearly priceless. Is it worth investing a year and the corresponding financial commitment to fix the problem once and for all? For the right business, yes! The alternative is to live with a patch-work of code for the next ten years of more.

Because I move across disciplines I have seen this sort of thing in many areas outside of just software code-bases.

I have, as an example, seen data processing facilities with millions of dollars in equipment designed in patch-work fashion that bleed money on a daily basis. In one such cases I proposed a complete redoing of the facility (in staged fashion in order to not affect business). It was very costly, but the owners where under such pain due to the constant bleed that they saw the intelligence in investing a lot of money to lay down an infrastructure that would withstand the test of time, not to mention stopping the bleeding.

Similarly, I have seen this in faulty processes. Process optimization or redesign can be critical to a business. The most well known example of this is the automobile industry.

Car manufactures like Mercedes were devoting fully 20% of their factory floor space to repairs. Cars would come off the assembly line with defects that would have to be repaired after the fact. This consumed a tremendous amount of time, money and resources.

In sharp contrast to this, companies like Toyota where using an approach that aimed to have cars come off the line with zero defects. They'd stop the assembly line when an defect was detected. At first they nearly couldn't make cars. The philosophy was to ensure that detected defects never re-occurred. With time cars started to come off the line with few, if any, defects. Most car manufacturer have now adopted these ideas.

The point is that sometimes a "complete rewrite" is warranted and even necessary. On cannot categorically state that the idea of a re-write is "fiction" any more than stating that it is an absolute necessity while being completely detached from the players and their circumstances. I suggest that it is for the client and consultant to evaluate and decide.

On a personal note. I don't enjoy working with crap. I enjoy my craft. Whether it is writing code, designing electronics or mechanical. I enjoy doing good work and working in quality projects. Life is too short to work on shit projects. You learn nothing and nobody is happy.

Yes, I have often said that you should never, ever throw (significant amounts of) code away and start from scratch. I have worked on two bodies of legacy code before and more or less used most of the techniques discussed in the comments, but this is unprecented - for me - in both size and badness. Client and I have agreed that I will work for a month and then see where we stand... I'm hoping that after a month I will have a better feel for whats going on and can outline either a staged or complete rewrite.

Lots of great advice in here, has lifted my spirits a bit. Especially getting complete client buy in (which I have internalised but I guess haven't expressed, either to myself or the client).

That's a good approach. Remember, it isn't your problem. It is your client's problem. You are there to help solve his problem. If he doesn't care enough you certainly shouldn't.

In a month you'll know a lot more about what you might be walking into. It is critical that you client also learn what he has to contend with. In other words: Communicate profusely throughout the process.

Funny, I had been considering replying to your other comment in this thread advocating for the rewrite. I don't really think you and I would find that we have very different opinions if we could sit down and talk this out over tea.

For example, I agree that sometimes a rewrite is warranted, even necessary. I just don't think this is one of those times.

First, you have to think about this from the client's point of view. To borrow some of patio11's recent wisdom, this is, to them, a business problem, not a technical problem. You, I, OP, others are inclined to look at the code and tell the client, "You have a problem with your code." But, the client doesn't care if the code is beautiful or ugly. They care about just a few things: whether or not they can hire people to work on it, whether or not bugs can be fixed, and whether or not it does what they want it to do. Does it matter to them if the code produces HTML tables as output or stores called code in the database? Nope.

So when it comes to a rewrite, you have to sell that to them as a solution to a business problem that they have, and unlike your examples, that's much harder than pointing to defects on an assembly line or hemorrhaging money. Put yourself in the client's shoes: why should you pay for the software that you already have to be completely rewritten -- and in the end, if you're lucky, end up with the thing that from your point of view you already have -- instead of just hiring a different programmer who's willing to just keep his head down and fix the bugs?

Of course it's possible to make a bunch of arguments to the client that they are losing money on it and that it is a problem they need to address, and if you're really good at that kind of thing, maybe even convince them that a rewrite is worth their money. But it's hard to do, in practice.

Then we have to consider the costs involved. First, there's downtime. To get it done as expediently as possible, they'd have to give up any hope of having any bugs fixed for, what, the next six months at least? They'd be dealing with the very real business problem of angry customers in the meantime, complaining that the software is broken and it's not getting fixed in a timely manner.

Second, there's monetary cost. Another difference between big, ugly software and your examples is that software can be far, far worse in terms of the amount of effort required to understand it before changing it. That software didn't get to be a million lines of code overnight; it probably started out, in its infancy, as maybe a few tens of thousands piled on top of a bad framework. The rest is probably largely business logic edge-cases and lazy programming. There is almost certainly some really important business logic deeply hidden somewhere in that code; one way or another that business logic needs to be in whatever version of the software they're using. If a full rewrite is done, that means either the programmers read and fully understand the existing code before rewriting it, or they build the new version without that business logic and force the business to deal with a problem that they've already had and solved once before.

What would be the cost of reading and understanding every line of code, and then writing a new version? Conservatively, $1/line? Heck, assuming you could somehow side-step that whole problem, what would the cost be for an entirely new version? Still $1/line? And how much smaller could the new version be? 100,000 lines, a 90% reduction from the previous version? That's still awfully expensive.

So I think that a full rewrite for this project isn't realistic. It's what many of us would like to dream about -- taking something crappy, throwing it out, erasing it from the world, building something beautiful in its place -- that's psychological sugar. But it's not realistic.

But what can be done is to rewrite it piecemeal. It's like home improvement in that sense -- you don't move in to a home and then demolish it and raze the entire lot and build something from scratch and landscape it and everything. But, you can move in, and replace a water main one month, put in a sprinkler system another month, put sod in the following month, and so on. You can take a really ugly, overwhelming, huge project and break it down into manageable pieces and fix it one piece at a time.

I agree wholeheartedly that life is too short to work on shit projects, assuming of course that you have the money to be in a position to never work on a shit project. That's why I was pretty up-front about this maybe not being a good project to work on. (And also why I didn't stick with my client's project when I was presented with something really similar to OP's.)

My secret weapon is a folding editor called Code Browser. http://tibleiz.net/code-browser/ -- with this, you can do a non-destructive (well, it only adds comments) folding of the source. This is a very fast way to get a better view of what's going on. I've used it many times when trying to make sense of legacy code.

That is, if you choose to go through with it. My real advice would be to avoid it altogether as many others here. It's going to be extremely frustrating no matter how you attack the problem.

Make sure the code (including any stored procedures or code that is otherwise stored in the database) is in a version control repository. Because any change in this code might have unexpected, subtle consequences (i.e. introduce more bugs). In which case you'd do better rolling back that particular change.

The next step would be to get a grip on deployment. Automate it, so you can roll out updates and roll back updates to all clients without breaking a sweat.

Then set up a proper backlog and bug tracking system, where you can prioritize bugs and work items. (And maybe open it up for bug reports by clients?)

Just like with a real debt, with a technical debt, seeing progress can help to keep you going. At this point, you should have a grip on it, it's just still going to be a lot of hard work. There's good advice on how to approach the refactoring.

Finally, and this is not related to the code, educate stakeholders in your organization about the concept of technical debt. (Back it up by time tracking various work items from the bug tracker.) Somehow your organization got into this situation, so there may be a problem where new features or custom features for clients get priority before bugfixes, and are written without much guidance. Joel Spolsky has written on this subject, you may find his writings help explain the concept, as well as find a way out of this mess (like the '12 steps to better software').

Good luck!

The problem you will face is that you have no way to verify that you haven't broken something unrelated when you make a change, because the current behavior of the system is unknowable; you can't write comprehensive tests for a codebase that large and that bad because you don't even know what it's supposed to be doing. The chickenshit nature of PHP and the lack of a sensible type system and refactoring tools will make things even more difficult.

So, run. Seriously. You're doomed.

1. Put it under Version control. Preferably GIT, You will need a lot of the tools that git and github provide. A private Github account will do but hosted github is what I'd prefer.

2. Get a Test System that has enough horsepower.

3. Create a deployscript

4. Deploy until it seems to work.

5. Start working with CI and static code analysis. You might get lucky when it comes to copy paste code. Copy-Paste detection and Coding Standards come to mind at first but there are a lot more helpers

6. Automatically create some API Documentation. The worst code cant hide what is inheriting from which class etc. Integrate Generation of Docs into the CI.

7. Create some basic so called "Smoke Tests". I'd prefer some very basic Selenium Tests opening the most important parts of the app. This is straight forward. Run them against the APP with error logging turned on on every E_ALL. This error.log is your scary list.

8. Setup Single Builds and try to integrate with More than one Version of Vtiger, PHP and Mysql. Since you have 150 Customers, chance is great that you have 150 different setups.

Note: You havent changed one line of code yet. Sit down with the customer and discuss all your findings and metrics.

9. Start creating different GIT repos with the above process for all the modules that are added by your customer. Integrate with the build and run the tests until you have the same amount of errors like before. start extending the build To build Against your Mysql, PHP Versions

... I could go on forever .. but basically this will get you up and running.

* Situation like this is not only technical issue any more. Your solution needs to reflect that.

* This is not best situation to be in, but if you learn to deal with this and emerge from it. This experience will make you so much stronger. So be ready to quit, but do not quit too early.

Good luck!

(I replied earlier, but the above two points are so important that they worth a different post.)

Take a few steps back and relax. If you were looking at this from the Space Station, who would you say has a problem with the code base? That's right, your client. Not you. Your client.

If he/she has 150 installs and 150 angry clients he/she knows that this thing is rotten somewhere. You client may or may not have some technical understanding but rest assured that they understand business.

Life often boils down to binary decision. You have two choices. Gracefully exit and move on or try to help your client.

If you choose option B you've also made another choice: Your first job is NOT to be a programmer. No, you are going to have to be a teacher.

You have to do your best to explain to your client why he might be sitting on a ticking time bomb (or whatever you might want to call it). It is imperative that your client understand that he has handed you an ugly, stinking, putrid and smelly mess. Without client buy-in I would walk away.

Now, here's the challenge: You have to find a way to communicate the problem that is not menacingly full of CS jargon and acronyms that mean exactly zero to your client.

I've had to deal with these kinds of problems before. On one or two occasions I made the mistake of not securing an understanding with my client and suffered the consequences. These were miserable walking-through-feces-infested-mud experiences. Never again. Once I learned that lesson things changed. My most memorable experience was when I got client buy-in from a major international corporation and, once they realized that they had a huge problem, they put me up at the Waldorf Astoria in Manhattan for a full month (these guys are so big that they have rooms pre-paid for "emergencies"). Imagine a guy in a t-shirt, jeans and sandals showing up at the Astoria. I've never been looked at like that before. Once they realized who my employer was things changed. Fuck, the room had marble and gold-plated crap everywhere.

But I digress, the point of that last example is that once a client understand the degree of the problem in their hands things change. If having a solution to this problem is important enough there is no end to what they will spend to fix it. Is it a business-killing problem? Even better.

Judging from your description my proposal to your client --after they really, really get it-- is to re-write their entire app from scratch.

I would further propose that you are going to need to hire a few more people (two to five?) in order to get this done as quickly as possible. And, yes, this will be expensive.

You can use many analogies to explain the problem. I'll leave that up to you. I've used ideas like that of constructing a building on a foundation of sand rather than concrete while using substandard supplies rather than industry-accepted good quality building components. Whatever analogy you use, it has to convey the severity of the problem without resorting to CS. If your client has some technical chops you can get into it a little AFTER you are done with your analogy.

Finally, the most important part: You have to be willing to walk away from it. You state the problem and explain that it will be expensive. You also state that you are not interested in anything other than a full re-write of the app because you are not in the business of doing further damage to your clients. Respectfully suggest that without full buy-in you'll need to move on and he will need to find another developer who might we willing to patch this thing up.

In many ways, it's that simple. Two choices.

Excellent, thoughtful and presumably experience-based reply. This gets my vote.

This is a case where having a defined development process and good sharp tools can be very helpful. Here are the steps I'd use to tackle this problem code (though it's based on what you wrote above and might need to be adapted as you learn more.

1) Study how the software is actually used and design the "ideal architecture" (this may be a moving target).

2) Get the software into a version control system.

3) When a section of the code needs work, first write tests that pass for the current functionality of the module but fail for the behavior you're trying to fix.

4) As you repair code in step 3, also migrate the code "towards" your preferred architecture ... this is going to be a very gradual process so don't try to complete it in one step and use your tests to verify you haven't broken the system. This is also a good time to start inserting patterns like MVC/MVP as it will help. - http://c2.com/cgi/wiki?TestEveryRefactoring

5) When you've found "reams of copy+paste code", refactor that code into utility classes (files, whatever). - http://martinfowler.com/refactoring/

6) Establish processes for migrating the database both forwards and backwards between versions (you'll need a rollback someday).

7) Treat the database schema as source code and refactor it as you work. It sounds like you're a long way from being able to use an ORM, but have a plan for migrating the database towards the day you can. - http://martinfowler.com/articles/evodb.html

8) Get the PHP code out of the database ... that's going to be painful but worthwhile.

9) Get some help! I've used the Sonar source code quality analysis tools on Java projects for years. There's a PHP plugin for it here (http://docs.codehaus.org/display/SONAR/PHP+Plugin) and it will help you determine what areas might be worth targeting. It also helps by establishing style and practice rules that will help get a team coordinated.

One of the hallmarks of a project like this is that coding styles changed dramatically during the project's existence - Establishing a style guide (including patterns and forbidding anti-patterns) can be very helpful.

So in short ... don't "run screaming" but rather sit and think when you feel overwhelmed. If you can solve a complex problem when writing source code, you can also solve systemic problems.

Good luck!

I hope you're getting paid well for this.

One of the most important things is just to manage expectations - it sounds like you've got a huge task ahead of you and people will underestimate how long it will take you to fix stuff.

It might also pay to just focus on getting the software into a maintainable but ugly state.

This sounds like a decent candidate for being put into maintenance-only mode while you gameplan a new product. It's pretty impressive the kind of distance you can get with a modern framework these days. You've got the other application there to refer to, so it shouldn't be too hard to port over the more core logic into a service layer that you can actually test.

What I like about the "start from scratch" approach, though most people argue against it, is that it gives you an opportunity to shape the entire development process and architectural philosophy of the product. Sometimes the tree of good software must be refreshed with the blood of bad projects.


But what do you have to do with that? Is it maintenance? Do you need to fix bugs?

If it's maintenance/bug fix, I'd suggest starting by writing tests and fixing things on a day to day basis.

So, for all small tasks that you'll have to do with the code base, just analyse the safest way to tweak it. Most often than not, you'll see that it's just changing a couple lines. If you need to add new features, just code them correctly in another part of the program.

And before you know, you'll understand the code-base. But, tests are really the most important thing here. Don't try to refactor if you can't make sure you're not breaking everything.

I have a similar story... only I inherited terribly written JSP. It was just as bad if not worse than you describe. I ended up re-writing everything from scratch; now, I am so glad I did.

I remember a client asking for vtiger once. I downloaded the source, and didn't even get to installation before I fired the client.

I really, really feel for you on this one.

The only thing I can say is: Make a beachhead of clean, working code. Slowly work your way out. Make it very clear to this client how much of a favor you're doing them, and give them meaningful status reports (even if all you did is rewrite the glue between to pieces of code).

Chances are infinitesimally small you can fix it. If you start to maintain it, you'll be the one who gets the blame if it doesn't work. In an absolute best-case scenario you get to rewrite it whole in your spare time, while trying to keep the legacy code together with ducttape and chickenwire on your employers time.

Get out, get out now. They don't need a maintenance programmer, they need a ninja programmer, the liquidator kind.

OK, so everyone else has pretty much covered the arguments for "don't do it, run" and for "re-write it". But assuming that you either

a) have to maintain it anyway (can't afford to lose job etc.)


b) are going to re-write it but don't have a definte spec to know what it has to do

then you are going to need to try and understand the code base. Here are some PHP specific tools to help you.

- Use XHGUI[1] (which is a fork of Facebooks XHProf) to profile the code as it runs. It can draw call-graphs for you (if you have Graphviz installed) which will help you to visuallise the code flow.

- Use PHPdoc[2] to generate API docs. This will help you get a simplified overview of the code to use as a reference.

- Use Xdebug[3] as you make changes and execute code to get more insight into how it is running and to trace variables etc. through the execution. You can use KCacheGrind [4] to visualise the output of Xdebug.

- Use a staging/development environment for everything you do with this code, and don't push any changes into to production until you really, really have to. When you do, use version control (e.g. Git, SVN etc.) and use an automated build system (Phing[5] is a great PHP specific one) to try and keep everything consistent.

Good luck! Quick plug : I'm currently writing a book [6] about PHP development (called PHP Everywhere : Programming beyond the web with PHP) which covers the tools above (albeit not for the kind of job you are taking on!). The one small mercy you may have when tackling a project like this is that it is written in PHP. PHP is usually quite a verbose language, which while it doesn't always produce sexy code, does mean that its straight forward to read and understand (at the local level!). An extra space here and there doesn't usually alter the meaning of the code as it does in some languages!

[1] https://github.com/preinheimer/xhprof [2] http://www.phpdoc.org/ [3] http://www.xdebug.org/ [4] http://kcachegrind.sourceforge.net [5] http://www.phing.info/ [6] http://leanpub.com/php

Hey RobAley, thanks a lot for the tool recommendations... also big thanks for the person who suggested SONAR+PHP, will definitely look at that too.

I was planning on using phpdoc and xdebug, but haven't ever looked at XHGUI. Is it significantly different from xdebug? At first glance there seems to be a fair bit of overlap in terms of functionality.

There is a fair amount of overlap in what they do, the main variation is in the interfaces and how the information is presented. I tend to use one or the other depending on the task at hand. They're both of good "pedigree", xdebug has been around now for about 10 years I think and so has a good amount of history behind it, and XHProf which XHGui is based on was developed by Facebook and used against their code base which is probably somewhat larger than yours (though hopefully better written!). At then end of the day they're both pretty easy to get up and running (and of course they're free), so I would suggest giving them both a test run and see which you prefer the feel of and which better suits your needs in terms of the information it gives you for your task. Given that you look like you will need all the help you can get, you might even end up using multiple tools like this to get as much insight into the code as you can. If you do, be aware that they can often interfere with each other (or so I've read, I've never tried those two on the same code base at the same time) so you might need to deploy them on separate virtualised but identical environments with the same code.

Edit: Just to say, I usually use XHGui for profiling existing code and code in production, and xdebug for profiling changes to code and code under development. But thats just because thats how the tools "feel" right to me, and there's no reason why you can't do both with both.

If you haven't done anything like this for the past I would say that as much as it is a pain in the ass you probably can also learn a lot from it and you will get out of the job with a lot of experience in refactoring, testing, bugfixing and deploying. You could also see it as a chance to establish a long lasting relationship and a boost in confidence and salary if you do it right.

what does inherit mean here ? You were most likely not hired to refactor 700KLOC, because they would not have been in that position if they had a decent engineering process in the first place. Obviously, do not rewrite from scratch: it is a 700 KLOC piece of code so even at a completely unrealistic rate of 500 LOC / day, it would take you a 5-10 man years to do it, and I doubt the system is well specified.

First, I would focus on doing something visible for the client: show that you can deliver, and do it as quickly as possible. This means: do not try to understand everything, do not try to get a mental model of the whole thing. Once you get some buy-in from your customer and people within your client, you will have more flexibility to negotiate things, and be able to use most of the technical advices you were given.

If the customer is not willing to enter this kind of discussions after you showed you could deliver, I would just walk away if you can.

you are doomed. flee.

Is there an option to migrate db to another CRM with similar functionality? If memory serves, vTiger is a sugar clone.

Even though the DB is a monstrosity, I'd start by familiarising yourself with it intimately. Once you know what data is meant to go where, it should be a good starting point for fixing things.

Explain the situation as calmly and thoroughly as possible to the client, and suggest a complete rewrite, if they ever want future development to be possible.

Are you working on this by yourself? How long until your employer is expecting bugs fixed and features added?

Yes, by myself. The client so far has been understanding... they've already burnt through several contracting companies, and I think they're starting to understand what a mess it is. But still, they want to see serious progress within a month (eg. large number of bug fixes).

You're not going to have serious progress for a year. The DB is borked, so you have no foundation at all.

Software Engineering is serious business, there's bugs, new features, maintenance, testing, etc. They failed to manage their code. You need to be realistic that with a team of 2-5 people it could take years to fix.

It might be best to put it out of its misery if they can't hold off their clients demands and buy you the time needed to rebuild it.

They've burned through several contractors, and now you're the latest one to be headed for the auto-da-fé?

Unless you're can't-pay-your-bills broke and there are no other jobs in your area, the correct response is "I'm sorry. This can't be fixed. It needs to be scrapped completely and replaced with something maintainable. If that's not an option, I'm going to have to resign."

Even if you manage to fix it, it'll still be a pile of crap, and you'll never get the credit you deserve.

On the other hand, if you don't manage to fix it you're going to get blamed.

Zero credit on the upside, major blame on the downside. There's nothing for you in this but pain, my friend.

I think that programmers frequently adopt a new product, see it is a mess, and burn-out trying to clean up the mess on the first pass.

I think you should start figuring out where it forked from the original code base & generate a diff from that. Then throw a bunch of tools, from Cacti to Xdebug/APC's control panel, to get a good handling of the current benchmarks. Get the code into a SCM (probably git), then start tackling current bugs & feature-requests - BEFORE you start tearing down to the baseboards.

After a month or two of that, you'll see the first pain points. Going after the pain points one at a time, rather than all at once, will keep your client happy and you sane.

You will be lucky to get serious numbers of bug fixes on that timetable I suspect. These codebases are hard to work with. Been there done that.

I would look at migrating the clients to another product that works, or even a less forked version of vtiger.

You need to manage their expectations, explain they can only expect one or two fixes in the first month. They will probably tell you that isn't good enough but you have to hold your ground.

I've turned around a few failing projects and rarely is the problem technical, most of the issues start with poor management.

But still, they want to see serious progress within a month (eg. large number of bug fixes).

You need to sit down with them and explain that just isn't possible, then explain clearly what is possible, and what the options are (painful rewrite in parallel to maintaining the old software, or painful and slow refactoring), and just how much it's going to cost them. They aren't going to like either option but it's better for them to understand upfront exactly how much of a problem they have, and that the problem was not necessarily with the previous contracting companies, but with their codebase.

I love this kind of stuff - if you fancy an extra pair of hands then contact details are in my profile

Charge them a dollar per line?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact