
Ask HN: Good resources about legacy code? - ASVBPREAUBV
Hello HN: I got offered my first consultant job for a company with a really old&amp;bad (no documentation, spaghetti, monolithic...) PHP codebase. Most parts of the codebase is working fine in production but some parts have to be replaced. Can you recommend any good books&#x2F;papers&#x2F;websites on how to get started? i don&#x27;t need language-specific material.
i need methodic&#x2F;abstract advise.
======
wirthjason
I sencond Working With Legacy Code. A lot of advice comes down to writing
tests so you don’t break existing functionality. You should write a lot of
tests, particularly high level stuff that tests the entire system because with
tightly coupled systems you’ll modify part A but part Q will break.
Integration level tests help find this stuff out.

You have two problems on your hand. One is understanding what the code is
doing from a technical perspective but another is understanding the business
rules.

If you haven’t already, get a high level view of the system. Maybe it can be
divided into 4 chunks, and chunk 1 can be broken down into 8 components, etc.
Then start documenting the different components in the codebase. Try to
understand what the different components do — how are they called, what’s the
input, output, do they mutate objects, etc.

Once you have a road map you search for “seams” where you can break things
apart. Maybe component A, B and C are tightly coupled, but you can split A
into two parts — A1 and A2 — and write something that encapsulates all of them
(A1, A2, B, C) pinto a cleaner interface. Try to write wrappers that use
existing code, then you can have higher confidence that behavior isn’t
changing. If you rewrite low level components there’s no telling what the side
effects may be.

Lastly, learn the language well. I work on a similar code base but it’s in
Python. Knowing “advanced” features of the language has helped. Often a lot of
boiler plate code can be eliminated by an advanced language feature. By
knowing the “seams” of the system and the language you can bend the system to
your will.

~~~
matt_the_bass
Can you share a python example of replacing boilerplate code with advanced
features?

~~~
auxym
Context handlers come to mind and can often save a lot of error handling code.

------
maxxxxx
I have learned that it's good to approach something like this with a level of
humility. What looks like a big pile of spaghetti code may actually have a
structure, just not one you may like. I often think "How stupid can these
people be?" only to learn later that they actually had a reasonable design. It
pays to take the time to understand the code.

Otherwise I'd try to refactor the code into testable modules as much as
possible. Unfortunately PHP is not on your side when it comes to refactoring.
Especially in the older version people used a lot of "tricks" that make
refactoring hard.

------
yarinr
"Working Effectively with Legacy Code" is a classic read. It dates back to
2004 but the techniques are still relevant.

~~~
mirceal
+1

It’s on my list of books that every developer should read.

~~~
djuralfc
What's the rest of that list?

~~~
mirceal
The Pragmatic Programmer, The Design of Everyday things, Peopleware, Hacknot,
Clean Code

------
pdkl95
> spaghetti, monolithic

[https://www.joelonsoftware.com/2000/04/06/things-you-
should-...](https://www.joelonsoftware.com/2000/04/06/things-you-should-never-
do-part-i/)

It can be tempting to declare entire sections of legacy code - or even the
entire project - to be an unmaintainable mess that isn't worth fixing.
_Reading_ code for a true understanding of how it works can be a slow,
laborious, _annoying_ process. Simply rewriting everything from scratch often
appears to be easier than trying to read, understand, and fix a legacy pile of
spaghetti.

Giving in to that temptation and rewriting a project instead of fixing and
refactoring the existing code is almost always the wrong decision. The messy
spaghetti probably started out a lot cleaner. The mess that accreted over time
is often important _bugfixes_ and design changes. Some problems only show up
in the field and sometimes requirements change. The spaghetti of bugfixes,
workarounds, and changes might be the most valuable part of the codebase.
Throwing it out might be throwing away the accumulated knowledge and
experience of many expensive developer man-years.

Instead of rewriting, preserve the bugfixes and real-world experience by
_refactoring_ the spaghetti. It can be annoying and tedious, but it's probably
_less work_ than re-debugging old problems until the "clean rewrite" accretes
it's own spaghetti-like layer of bugfixes and workarounds.

------
8ezhikov
My advice be will be: don't put your soul inside it. I had lots of projects
supporting/rewriting legacy code. Less emotions you dig in it the better.
Otherwise it will be torture. One day when you will have chance to rewrite it,
you will be exhausted and empty.

------
noir_lord
Are you me?

This is the exact position I was in 6 months ago and it wasn't just legacy
code (legacy code isn't inherently bad) it was/is _bad_ legacy code.

[https://leanpub.com/mlaphp](https://leanpub.com/mlaphp) is very very good,
it's a roadmap/process to get a legacy PHP project to a reasonable state
efficiently.

------
SeanKilleen
If pieces of a codebase need to be changed, I break it into a few general
steps (varies depending on specifics):

1) Look at the code and examine what it would take to make it testable.

2) make tiny, safe refactorings to prepare the code to be tested -- only if
you are absolutely certain these changes can cause no side-effects (usually
I'll rely on tools to help me do this, just to increase the confidence level).
If you can safely extract related code into appropriately small & related
files / objects, that can be a great start.

3) Put the existing code under tests. Write tests around that code --
preferably unit tests that exercise the legacy code, as written, to verify its
behavior.

4) refactor the code that the tests are using to represent a cleaner codebase
-- maybe you extract some objects / functions.

4) Write new tests to demonstrate the desired changed behavior. Write them as
if you're writing them for a brand new codebase.

5) Make the code pass those tests. Some old tests may fail -- if they
represent the old requirement, you can delete them. If they don't represent
the old requirement, you know you have a bug.

Repeat this process for each piece of the application that needs to be
changed. NOTE: In a legacy app, you sometimes have to make peace with the fact
that some of the app in production will remain legacy, untested code. If it
doesn't need to change, then you don't necessarily need to sink a huge amount
of time trying to refactor / put all the code under test. Get in the habit of
doing it whenever you need to make a change (and factoring in that time into
any estimates, etc.) -- over time, the cost of a change will hopefully go down
as you pay off that technical debt.

If you're trying to figure out what the cost of change will be, sometimes you
can use static analysis to look at a codebase and show potential issues for a
given section. Using such tools can sometimes help you understand how heavy of
a lift it will be to modernize the code.

As has been suggested, "Working effectively with legacy code" is a guide book
here. You'll likely also want to seek out language-specific material about
refactoring, unit/integration testing patterns & tooling, etc.

Good luck!

~~~
Too
Since you are working with dynamic language, step 2 should include adding type
annotations or whatever they are called in php. Specifying stricter types
makes refactoring much easier.

~~~
maxxxxx
Agreed. Adding type annotations is already a huge refactor in itself though
and pretty risky with PHP. Once you have that done life will be much easier.

~~~
mdaniel
For those unfamiliar with php, or its type annotations, why is adding type
annotations a risky step? Do they become enforcing?

    
    
        function foo(string $bar) { ... }
        foo(1234) // kaboom?

~~~
w0rd-driven
Yeah pretty much. It's risky because type checking produces fatal errors that
aren't always easy to trap properly. You don't realize how much a code base
relies on PHPs type coercion until you add annotations and watch it blow up,
in often subtle ways.

~~~
maxxxxx
"Subtle" is the keyword here.

------
megaman22
In addition to _Working Effectively with Legacy Code_ , _Re-Engineering Legacy
Software[1]_ is pretty decent. It's really a lot of the same recommendations;
put a test harness in place to guard against regressions, then start pulling
out things and making them into more sane, SOLID components. For that aspect,
you can look at things like the Uncle Bob Clean Code series, or my personal
favorite _Adaptive Code via C#[2]_ which despite the title is pretty general
and all about writing code with the SOLID principles.

[1] [https://www.manning.com/books/re-engineering-legacy-
software](https://www.manning.com/books/re-engineering-legacy-software)

[2] [http://amzn.to/2CfxK8w](http://amzn.to/2CfxK8w)

------
carbocation
I've worked with Paul Jones in the past and he has actually modernized one of
my own legacy PHP codebases. He wrote down his experiences and advice in
"Modernizing Legacy Applications in PHP":
[https://leanpub.com/mlaphp](https://leanpub.com/mlaphp)

It might be worth your time. ( _Edit_ : noir_lord also recommended the same
book in this thread.)

------
web007
Others have already recommended Legacy Code by Michael Feathers, that's the
place to start. The only other suggestion I have is Refactoring: Improving the
Design of Existing Code by Martin Fowler.

------
vincenv
"Object-Oriented Reengineering Patterns" has some good advice on how to
approach a legacy system and rewrite parts of it, pdf available from the
authors website:

[http://scg.unibe.ch/download/oorp/](http://scg.unibe.ch/download/oorp/)

------
w0rd-driven
This mirrored my experience joining a company when my prior PHP usage was also
custom code. I struggled for about a month with the current workflow of
writing code, pushing it live to a hidden test area, and then getting feedback
from the changes I made. Fortunately, vagrant was newish and I learned of the
site [https://puphpet.com/](https://puphpet.com/).

I set out on a mission to recreate the production environment as closely and
as primitively as possible. Instead of the full 16gb legacy database for
instance, I only dumped the structure and added rudimentary test data. Now my
primary workflow involved local, manual testing but the feedback loop was
orders of a magnitude faster than waiting for subversion changes to get
deployed. Recreating the production environment 1:1 was wrought with large
annoying challenges.

Barring full conversion there's various techniques that required less effort.
Using scratch scripts and running the local server in phpstorm helps but
stripping code down to run locally can be cumbersome. Another option I took
was getting lightweight functions working in an environment like
[http://sandbox.onlinephpfunctions.com/](http://sandbox.onlinephpfunctions.com/)
and slowly integrating them into local scratch scripts or production.

Fortunately, the future at the company involved selling Laravel as a viable
option, which makes everything so much easier. I'm a big proponent of
frameworks or packages over custom code or NiH as they often soften edge cases
or work around quirks in the language.

------
PeterisP
The core idea is that you need to imagine what would be the condition of the
artifact in question if it'd be _reasonable_ to do the required modifications,
and (slowly) push the environment towards that state, filling in the gaps.

IMHO it's to be expected that you won't get where you want to be, you'll fill
_some_ of the gaps but not all of them.

The key issue is to identify what is the major pain point that prevents you
from doing this. For example, in a particular similar situation for me the key
points were (a) ability to reliably build a deployment package that's sure to
work; (b) brief documentation about the functionality of the main components
of the software and their interaction/interfaces; (c) creating a basic suite
of tests to ensure that key functionality keeps working as intended if we
change/rewrite certain parts fo the codebase.

The pain points will be different for you, but that's the direction that needs
to be identified and taken to proceed properly.

------
zer00eyz
This has been the bread and butter of my work life - I either inherit a pile
of garbage or get a green field.

Your first chore: getting a working dev instance with a debugger. Depending on
the stack, and dependencies this might be a mountain that is rather hard to
climb but it is going to make your life easier.

Second chore: look at the history you have. God bless you if you have source
control, with commit history, and any sort of sane commit messages. Bug
trackers are also your friend. Lastly there has to be SOMEONE with
lore/knowlege of how the code base got to be the way it is - if you find them
and TALK to them. Knowing WHY is almost as important as knowing what.

Third chore: Pull the schema. Get a schema dump of the production database,
and look at what query logging is set at (might not be sane) and what it
should be (query sampling might be your friend). If your lucky enough to have
a MYSQL setup then use workbench to help generate a diagram or any other tool
you prefer. You want to have an artifact when your done, one that you should
maintain.

The fourth and fifth tasks are going to occur concurrently: Walking the code
base and understanding or building in logging - Your going to walk the hot
spots in the code base first - think home page, log in, and the core
functionality. Every time you find something interesting add a comment, and
LOG where you think is appropriate (remember you can always shut this off
later).

Really this is an exercise in reading and understanding what exists today with
as much context as to WHY as you can discover (see step 2). Don't be afraid of
either using the @bug and @todo syntax in comments and opening up tickets
against yourself/the codebase. You may end up with a list of 200 things to
change in the first week and that is OK.

Once you can READ the code base as it exists make sure you REVIEW the code for
what your replacing -- even money says that there are bug fixes and edge cases
that have already been solved for in that code, ones that your replacement may
have to solve for even if it isn't in the "requirements"

Lastly, find someone to commiserate with and someone you can "bounce ideas off
of" \- rubber ducking works up to a point but sometimes in explaining to
someone else they ask the critical question and it sets off your
thinking/exploration in a new direction. They don't HAVE to work where you do
but if they don't having history with them (even at another job) sure does
help.

------
twunde
Some advice from someone who has done the same thing multiple times. Find
yourself someone who knows the application really well and become their best
friend. You'll find some weird things in the codebase and often they'll be
able to give you context

------
ioddly
Some really good suggestions here, so I'll just add: become acquainted with a
code search tool (I used grep for a long time, now ag -- I don't think the
tool matters that much for most purposes as long as you are comfortable with
it).

------
camnora
The Legacy Code Rocks podcast is pretty good. They have a nice community on
Slack too: [http://legacycode.rocks](http://legacycode.rocks)

------
mschwaig
It's funny how you can write something terrible now and put it into production
anyways, skipping the process of slow but steady degradation altogether. BAM!
Instant legacy code.

------
newbear
How did you get this job with no experience? How did you estimate the price
and time to completion? Congrats and good luck

~~~
imhoguy
Possibly the OP has got hourly/daily rate gig.

~~~
ASVBPREAUBV
Yes exactly. My first task is to examine the system and give recommendations
on how to improve it.

