Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Good resources about legacy code?
56 points by ASVBPREAUBV on Dec 30, 2017 | hide | past | web | favorite | 38 comments
Hello HN: I got offered my first consultant job for a company with a really old&bad (no documentation, spaghetti, monolithic...) PHP codebase. Most parts of the codebase is working fine in production but some parts have to be replaced. Can you recommend any good books/papers/websites on how to get started? i don't need language-specific material. i need methodic/abstract advise.

I sencond Working With Legacy Code. A lot of advice comes down to writing tests so you don’t break existing functionality. You should write a lot of tests, particularly high level stuff that tests the entire system because with tightly coupled systems you’ll modify part A but part Q will break. Integration level tests help find this stuff out.

You have two problems on your hand. One is understanding what the code is doing from a technical perspective but another is understanding the business rules.

If you haven’t already, get a high level view of the system. Maybe it can be divided into 4 chunks, and chunk 1 can be broken down into 8 components, etc. Then start documenting the different components in the codebase. Try to understand what the different components do — how are they called, what’s the input, output, do they mutate objects, etc.

Once you have a road map you search for “seams” where you can break things apart. Maybe component A, B and C are tightly coupled, but you can split A into two parts — A1 and A2 — and write something that encapsulates all of them (A1, A2, B, C) pinto a cleaner interface. Try to write wrappers that use existing code, then you can have higher confidence that behavior isn’t changing. If you rewrite low level components there’s no telling what the side effects may be.

Lastly, learn the language well. I work on a similar code base but it’s in Python. Knowing “advanced” features of the language has helped. Often a lot of boiler plate code can be eliminated by an advanced language feature. By knowing the “seams” of the system and the language you can bend the system to your will.

Can you share a python example of replacing boilerplate code with advanced features?

Context handlers come to mind and can often save a lot of error handling code.

I have learned that it's good to approach something like this with a level of humility. What looks like a big pile of spaghetti code may actually have a structure, just not one you may like. I often think "How stupid can these people be?" only to learn later that they actually had a reasonable design. It pays to take the time to understand the code.

Otherwise I'd try to refactor the code into testable modules as much as possible. Unfortunately PHP is not on your side when it comes to refactoring. Especially in the older version people used a lot of "tricks" that make refactoring hard.

"Working Effectively with Legacy Code" is a classic read. It dates back to 2004 but the techniques are still relevant.


It’s on my list of books that every developer should read.

What's the rest of that list?

The Pragmatic Programmer, The Design of Everyday things, Peopleware, Hacknot, Clean Code

Clean Code is good as well as anything by Bob Martin or Robert Fowler.

> spaghetti, monolithic


It can be tempting to declare entire sections of legacy code - or even the entire project - to be an unmaintainable mess that isn't worth fixing. Reading code for a true understanding of how it works can be a slow, laborious, annoying process. Simply rewriting everything from scratch often appears to be easier than trying to read, understand, and fix a legacy pile of spaghetti.

Giving in to that temptation and rewriting a project instead of fixing and refactoring the existing code is almost always the wrong decision. The messy spaghetti probably started out a lot cleaner. The mess that accreted over time is often important bugfixes and design changes. Some problems only show up in the field and sometimes requirements change. The spaghetti of bugfixes, workarounds, and changes might be the most valuable part of the codebase. Throwing it out might be throwing away the accumulated knowledge and experience of many expensive developer man-years.

Instead of rewriting, preserve the bugfixes and real-world experience by refactoring the spaghetti. It can be annoying and tedious, but it's probably less work than re-debugging old problems until the "clean rewrite" accretes it's own spaghetti-like layer of bugfixes and workarounds.

My advice be will be: don't put your soul inside it. I had lots of projects supporting/rewriting legacy code. Less emotions you dig in it the better. Otherwise it will be torture. One day when you will have chance to rewrite it, you will be exhausted and empty.

Are you me?

This is the exact position I was in 6 months ago and it wasn't just legacy code (legacy code isn't inherently bad) it was/is bad legacy code.

https://leanpub.com/mlaphp is very very good, it's a roadmap/process to get a legacy PHP project to a reasonable state efficiently.

If pieces of a codebase need to be changed, I break it into a few general steps (varies depending on specifics):

1) Look at the code and examine what it would take to make it testable.

2) make tiny, safe refactorings to prepare the code to be tested -- only if you are absolutely certain these changes can cause no side-effects (usually I'll rely on tools to help me do this, just to increase the confidence level). If you can safely extract related code into appropriately small & related files / objects, that can be a great start.

3) Put the existing code under tests. Write tests around that code -- preferably unit tests that exercise the legacy code, as written, to verify its behavior.

4) refactor the code that the tests are using to represent a cleaner codebase -- maybe you extract some objects / functions.

4) Write new tests to demonstrate the desired changed behavior. Write them as if you're writing them for a brand new codebase.

5) Make the code pass those tests. Some old tests may fail -- if they represent the old requirement, you can delete them. If they don't represent the old requirement, you know you have a bug.

Repeat this process for each piece of the application that needs to be changed. NOTE: In a legacy app, you sometimes have to make peace with the fact that some of the app in production will remain legacy, untested code. If it doesn't need to change, then you don't necessarily need to sink a huge amount of time trying to refactor / put all the code under test. Get in the habit of doing it whenever you need to make a change (and factoring in that time into any estimates, etc.) -- over time, the cost of a change will hopefully go down as you pay off that technical debt.

If you're trying to figure out what the cost of change will be, sometimes you can use static analysis to look at a codebase and show potential issues for a given section. Using such tools can sometimes help you understand how heavy of a lift it will be to modernize the code.

As has been suggested, "Working effectively with legacy code" is a guide book here. You'll likely also want to seek out language-specific material about refactoring, unit/integration testing patterns & tooling, etc.

Good luck!

Since you are working with dynamic language, step 2 should include adding type annotations or whatever they are called in php. Specifying stricter types makes refactoring much easier.

Agreed. Adding type annotations is already a huge refactor in itself though and pretty risky with PHP. Once you have that done life will be much easier.

For those unfamiliar with php, or its type annotations, why is adding type annotations a risky step? Do they become enforcing?

    function foo(string $bar) { ... }
    foo(1234) // kaboom?

Yeah pretty much. It's risky because type checking produces fatal errors that aren't always easy to trap properly. You don't realize how much a code base relies on PHPs type coercion until you add annotations and watch it blow up, in often subtle ways.

"Subtle" is the keyword here.

I think your code will work because 1234 can be converted to a string. I think the trouble starts once you pass around objects.

It actually doesn't according to http://sandbox.onlinephpfunctions.com/code/8804b49c23d620228..., though different PHP versions could produce different results.

This works with PHP 7

<?php //Enter your code here, enjoy!

function foo(string $bar) { return $bar . $bar; }

echo foo(1234);

Wait? You mean you get runtime errors? That's well...good in a way but as you say dangerous on old code.

To get the true benefits of typing you should also run a static type analyzer.

Are there any for PHP?

In addition to Working Effectively with Legacy Code, Re-Engineering Legacy Software[1] is pretty decent. It's really a lot of the same recommendations; put a test harness in place to guard against regressions, then start pulling out things and making them into more sane, SOLID components. For that aspect, you can look at things like the Uncle Bob Clean Code series, or my personal favorite Adaptive Code via C#[2] which despite the title is pretty general and all about writing code with the SOLID principles.

[1] https://www.manning.com/books/re-engineering-legacy-software

[2] http://amzn.to/2CfxK8w

I've worked with Paul Jones in the past and he has actually modernized one of my own legacy PHP codebases. He wrote down his experiences and advice in "Modernizing Legacy Applications in PHP": https://leanpub.com/mlaphp

It might be worth your time. (Edit: noir_lord also recommended the same book in this thread.)

Others have already recommended Legacy Code by Michael Feathers, that's the place to start. The only other suggestion I have is Refactoring: Improving the Design of Existing Code by Martin Fowler.

"Object-Oriented Reengineering Patterns" has some good advice on how to approach a legacy system and rewrite parts of it, pdf available from the authors website:


This mirrored my experience joining a company when my prior PHP usage was also custom code. I struggled for about a month with the current workflow of writing code, pushing it live to a hidden test area, and then getting feedback from the changes I made. Fortunately, vagrant was newish and I learned of the site https://puphpet.com/.

I set out on a mission to recreate the production environment as closely and as primitively as possible. Instead of the full 16gb legacy database for instance, I only dumped the structure and added rudimentary test data. Now my primary workflow involved local, manual testing but the feedback loop was orders of a magnitude faster than waiting for subversion changes to get deployed. Recreating the production environment 1:1 was wrought with large annoying challenges.

Barring full conversion there's various techniques that required less effort. Using scratch scripts and running the local server in phpstorm helps but stripping code down to run locally can be cumbersome. Another option I took was getting lightweight functions working in an environment like http://sandbox.onlinephpfunctions.com/ and slowly integrating them into local scratch scripts or production.

Fortunately, the future at the company involved selling Laravel as a viable option, which makes everything so much easier. I'm a big proponent of frameworks or packages over custom code or NiH as they often soften edge cases or work around quirks in the language.

The core idea is that you need to imagine what would be the condition of the artifact in question if it'd be reasonable to do the required modifications, and (slowly) push the environment towards that state, filling in the gaps.

IMHO it's to be expected that you won't get where you want to be, you'll fill some of the gaps but not all of them.

The key issue is to identify what is the major pain point that prevents you from doing this. For example, in a particular similar situation for me the key points were (a) ability to reliably build a deployment package that's sure to work; (b) brief documentation about the functionality of the main components of the software and their interaction/interfaces; (c) creating a basic suite of tests to ensure that key functionality keeps working as intended if we change/rewrite certain parts fo the codebase.

The pain points will be different for you, but that's the direction that needs to be identified and taken to proceed properly.

This has been the bread and butter of my work life - I either inherit a pile of garbage or get a green field.

Your first chore: getting a working dev instance with a debugger. Depending on the stack, and dependencies this might be a mountain that is rather hard to climb but it is going to make your life easier.

Second chore: look at the history you have. God bless you if you have source control, with commit history, and any sort of sane commit messages. Bug trackers are also your friend. Lastly there has to be SOMEONE with lore/knowlege of how the code base got to be the way it is - if you find them and TALK to them. Knowing WHY is almost as important as knowing what.

Third chore: Pull the schema. Get a schema dump of the production database, and look at what query logging is set at (might not be sane) and what it should be (query sampling might be your friend). If your lucky enough to have a MYSQL setup then use workbench to help generate a diagram or any other tool you prefer. You want to have an artifact when your done, one that you should maintain.

The fourth and fifth tasks are going to occur concurrently: Walking the code base and understanding or building in logging - Your going to walk the hot spots in the code base first - think home page, log in, and the core functionality. Every time you find something interesting add a comment, and LOG where you think is appropriate (remember you can always shut this off later).

Really this is an exercise in reading and understanding what exists today with as much context as to WHY as you can discover (see step 2). Don't be afraid of either using the @bug and @todo syntax in comments and opening up tickets against yourself/the codebase. You may end up with a list of 200 things to change in the first week and that is OK.

Once you can READ the code base as it exists make sure you REVIEW the code for what your replacing -- even money says that there are bug fixes and edge cases that have already been solved for in that code, ones that your replacement may have to solve for even if it isn't in the "requirements"

Lastly, find someone to commiserate with and someone you can "bounce ideas off of" - rubber ducking works up to a point but sometimes in explaining to someone else they ask the critical question and it sets off your thinking/exploration in a new direction. They don't HAVE to work where you do but if they don't having history with them (even at another job) sure does help.

Some advice from someone who has done the same thing multiple times. Find yourself someone who knows the application really well and become their best friend. You'll find some weird things in the codebase and often they'll be able to give you context

Some really good suggestions here, so I'll just add: become acquainted with a code search tool (I used grep for a long time, now ag -- I don't think the tool matters that much for most purposes as long as you are comfortable with it).

The Legacy Code Rocks podcast is pretty good. They have a nice community on Slack too: http://legacycode.rocks

It's funny how you can write something terrible now and put it into production anyways, skipping the process of slow but steady degradation altogether. BAM! Instant legacy code.

How did you get this job with no experience? How did you estimate the price and time to completion? Congrats and good luck

Possibly the OP has got hourly/daily rate gig.

Yes exactly. My first task is to examine the system and give recommendations on how to improve it.

I'm working for the same company for a different product as a developer. I did not estimate anything yet.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact