
Ask HN: How to be productive with big existing code base - maheshs
I have just started working with one of the client who have existing nodeJS code which they build in last 3 years.<p>Is there any guiding principle which is beneficial while working with existing code base?
======
cbanek
My #1 rule for existing codebases: Just because you wouldn't have done it the
way they did doesn't mean they did it wrong.

I think it's developer nature to look at a huge pile of code that someone else
wrote and immediately think: "This is a pile of crap. I can do better, so the
first thing to do is rewrite all of this, my way (which just so happens to be
_The Right Way_)."

Figure out what you're trying to do, and what is keeping you from doing it.
Take an iterative approach to get things done. Realize that after 3 years,
they have hopefully fixed a lot of bugs and got to a solution that is somewhat
mature and better than you can do in a week.

~~~
debt
"This is a pile of crap. I can do better,"

I'll take this one step further and say: if you think this, you're unqualified
for the position. You are an amateur.

~~~
marcell
I dunno, I've seen quite a few piles of crap in my day as a SWE. The bar is
not that high to do better.

~~~
fsloth
The point is "doing better" is pointless unless it serves a specific business
value. Beauty is _not_ a sufficient reason to refactor a code base in
production - because refactor always has risks, and those risks should be
offset always by some tangible expected reward.

~~~
jamesmus
Definitely. "Is it good enough for the business?". If yes, then leave it
alone. Try to add new features in a safe way with automated tests.

I once worked a contract where it was expressly forbidden by the dev manager
to refactor any code unless it was demonstrably necessary to support a new
feature. Not much fun for devs at times but I respect the reasons behind the
decision (esp. as it was a derivatives trading platfom and it can get pretty
expensive pretty quickly when they go wrong).

------
nisa
I have a similiar problem like you, except it's Java and more like 15 years
old...

What helped? Using a debugger and stepping through the code was useful, it's
more a less a REST-API here (build ontop of the system, before it was SOAP,
etc.pp) and I've just used some heavily used endpoints and stepped through all
the way...

Another huge boost in understanding was using flamegraphs (not sure what's hip
for nodejs maybe this?
[https://github.com/davidmarkclements/0x](https://github.com/davidmarkclements/0x))

This was really an eye opener because that app also used an external huge Java
ECM and there was lot's of AOP magic, reading the flamegraphs and looking at
the source was a big boost in understanding.

It's also a really useful tool to get visibility for performance problems that
are not directly visible in the code.

If there are tests, reading them might also be worthwile.

And take your time... took me a few months to get a basic understanding how
it's working (I'm more sysadmin, not really a dev there), so don't except to
grasp everything in one week.

Ask your colleagues - maybe were to find documentation or if you don't
understand something while reading the source.

~~~
sriku
This is good advice.

I'd add that producing something as part of your process of understanding the
code base would be helpful to your colleagues.

If there is hard to understand code, figure it out and capture it in some
documentation. If some code doesn't look robust, add tests. If you find bugs,
log them and write tests. That way, your colleagues also understand the
process you're taking during your "grok the code" period, see progress and can
jump in to help. Having a top level understanding of the system and its
business functions is important so you can evaluate code your read relative to
them.

Adding type annotations can be useful, so try FB's "flow" which gives useful
results without having to do any annotations yourself for starters.

------
spricket
It depends. Number one, find out if the codebase is bad or just big. This will
take a few months, so I try to keep my mouth shut for a while.

If it's really that bad, build a world in a teacup. Try to make one small new
area of code that's nice and slowly work existing code into it whenever you
get the excuse. It's very unlikely they'll allow you to rewrite or even make
substantial changes to existing code. If it was allowed, somebody would have
done it.

It's also unlikely you'll ever have the codebase migrated completely. In my
case, this meant migrating part of the app to a new web framework while
keeping the ORM layer relatively the same. Focus on the worst parts. Kinda bad
stuff can wait. Expect to write some glue between the worlds on your own time.

In your case, IMO Node is a bad plaform for large code bases. My approach
would be to introduce TypeScript into a small corner of the app and grow it
over time. Even in the existing code, Typescript will type checked the JS and
make work/refactoring easier.

Once you have typescript up and going, pull in some add-ons to make Node work
with async code. The biggest downsides of Node are dynamic typing and callback
hell. Typescript + async + heavy linting so this doesn't happen again should
put you on a good path unless there's more demons lurking in their stack

~~~
zoul
_If it was allowed, somebody would have done it._

People are often scared, don’t care or don’t know how to. I have found it best
to include a batch of refactoring each time I touched something during
development. (Because adding features or fixing bugs requires understanding
the part of the code, so you may as well improve it since you already have it
in your head.) This makes everything take more time, but that has to be
expected when dealing with a lot of technical debt.

A huge +1 for TypeScript. I usually work in a well-typed language (Swift), but
have been recently working in JavaScript and it was amazing how much
TypeScript improves the situation. I can’t imagine working on a serious code
base without types. (Some people can.)

------
mannykannot
You haven't said much about your role here, but let's assume that programming
figures prominently. Your immediate problem is that you will be given tasks
that involve making changes to a codebase that you don't understand very well.
There are many different ways to understand a codebase, and to be effective,
you will need to learn some of all of them.

Firstly, there is the purpose of the system, which means getting to know what
its users want from it and how they use it to achieve that. I put that first,
because everything else follows from it, but that does not mean that you have
to know all there is to know about that aspect before tackling anything else.

You will need to learn about your environment's process: whatever is used for
task assignment, scheduling and tracking; version control; building; testing
and verification, inspections, test setup and execution; integration and
deployment. Of these, you will need to know how to get the source code and
test any changes you make, before you can do any programming, and a
significant landmark in getting to know the system is when, given nothing but
a backup of the source and configuration files, you could resurrect it.

The more you know about the architecture, the better - it is the first step in
understanding how the system meets (or fails to meet) the users' needs. The
architecture often imposes requirements and constraints on how you approach
completing a given task.

Understanding how everything works at the code level would be a desirable
goal, but not one that can be achieved quickly (if at all), so you will have
to be guided by what you need to understand in order to do do your assigned
tasks.

It is also useful to know who knows what among the people you will be working
with.

------
otras
I recommend the book _Working Effectively with Legacy Code_ by Michael
Feathers. I've been reading it recently, and I've enjoyed the lessons and
advice so far.

~~~
slipwalker
second that. but in a much more concise way, i would suggest the same answer
to the question "how to eat an elephant"...

------
fsloth
Based on my years of experience in working with a millions-of-locs over
decades codebase:

Be aware of the abstraction fallacy: developers are often guided by this
insane notion that they can get _rid of_ accidental complexity by wrapping it
away behind an abstraction layer. You can't. It's better to suck up to the
complexities of the existing system, and prefer explicit, procedural copy-and-
pasting than trying to invent your own abstraction layer on top.

The problem with the after-the-fact abstraction layer is that if the original
team members are not available, you are likely not in possession of the whole
theory of the software. Hence it is not likely you can in the beginning choose
the right abstractions.

The correct way - if possible - to simplify existing code is to refactor the
code itself.

The specific anti-pattern you will reach by following the false abstraction
strategy is the lasagna architecture:
[https://herbertograca.com/2017/08/03/layered-
architecture/](https://herbertograca.com/2017/08/03/layered-architecture/)

Two literary works that helped me enormously to grok working with legacy code
and programming in particular:

* Peter Naur's paper "Programming as theory building" \- this was an amazing eye opener to me. It specifically highlights several problems that may arise when working with legacy code when the original developers have left the building.

* Michael Feathers: Working Effectively With Legacy Code - not to be read necessarily as a "how to" recipe book, but rather as a collection of philosphies and techniques to utilize when faced with a huge in-production codebase. It can be read as a recipe book if the examples match your situation, but that's not the point.

------
nephrenka
A large codebase under active development presents a moving target; Even if
you knew how something worked last week, that code might have changed twice
since then. Detailed knowledge in the solution domain gets outdated fast.

To address this issues, I work with something I call a behavioral code
analysis. In a behavioral code analysis, you prioritize the code based on its
relative importance and the likelihood that you will have to work with it and,
hence, needs to understand that part. Behavioral code analysis is based on
data from how the organization works with the code, and I use version-control
data (e.g. Git) as the primary data source. More specifically, I look to
identify _hotspots_. A hotspot is complicated code that the organization has
to work with often. So it's a combination of static properties of the code
(complexity, dependencies, abstraction levels, etc) and -- more important -- a
temporal dimension like change frequency (how often do you need to modify the
code?) and evolutionary trends.

I have found that identifying and visualizing hotspots speeds up my on-
boarding time significantly as I can focus my learning on the parts of the
code that are likely to be central to the solution. In addition, a hotspot
visualization provides a mental map that makes it easier to mentally fit the
codebase into our head.

There are a set of public examples and showcases based on the CodeScene tool
here: [https://codescene.io/showcase](https://codescene.io/showcase)

I have an article that explains hotspots and behavioral code analysis in more
depth here: [https://empear.com/blog/prioritize-technical-
debt/](https://empear.com/blog/prioritize-technical-debt/)

I also have a book, Software Design X-Rays: Fix Technical Debt with Behavioral
Code Analysis, that goes into more details and use cases that you might find
useful for working with large codebases:
[https://pragprog.com/book/atevol/software-design-x-
rays](https://pragprog.com/book/atevol/software-design-x-rays)

~~~
thecupisblue
Nice! I'm working on internal tooling for us that does a lot of the same
things - gonna buy the book, thanks for that, weird I've never heard about it.
For now I'm measuring: churn, complexity, linting, test coverage, test quality
and am going to add a dependency graph. It seems to me that churn, complexity
and dependencies are the biggest indicators of a hotspot.

Got any tips for possible problems I'll encounter along the way?

~~~
nephrenka
Cool - thanks! While the measures a simple in theory, there are some practical
challenges; git repositories tend to be messy. So part of the practical
challenge is to clean the input data (e.g. filter out auto-generated content,
checked in third party libraries).

Another challenge is that version-control data is quite file centric, while
many actionable insights are on a higher architectural level. In CodeScene we
solve this by aggregating files into logical components that can then be
presented and scored.

~~~
thecupisblue
I'm already onto data cleanup, tbh I'm focusing on Android repos for now. But
great idea, now that I think of it as an architect I'd mostly like the option
to group a few classes/packages or let's call them "modules" together and then
visually see which pieces are too dependent on outside sources and which are
the hotspots/connections/dependencies inside that group.

There goes my weekend...

------
tomlagier
The things I do when starting on a new codebase:

1) Ask if there is onboarding documentation, or someone who can give you a
high-level overview of the codebase. Typically finding a person with a lot of
context on the code is the fastest and most thorough way to understand the
responsibilities and layouts of a codebase. Ask if they can draw an ER
diagram, it's extremely valuable documentation for any additional developers.

2) Read all the documentation possible, especially design documentation. This
should hopefully give you some clues as to both function (what) and purpose
(why). The discussion around this will also introduce you to the major players
in the architecture of the codebase.

Note this does not necessarily mean formalized design docs, it could just be
searching for any README's or relevant wiki pages. You're just gathering
threads at this point, and documentation tends to be a lot more compact and
easily digestible than foreign code.

3) Look at the models - there will be compact representations of data at some
point. This gives good insight into the shared language of the code and can
give a lot of clues about how things are done. They also tend to be a lot more
human-readable than other pieces of code, so this is a plus.

4) Find and skim the largest files. Typically these perform the majority of
the work, have the most responsibility, and introduce the most bugs. Knowing
roughly where the major players are and what they do makes it a lot easier to
read any individual file.

5) Run the application, find some small behavior (a single, simple endpoint)
and debug it, stepping through the application code so you can see how a
particular request flows through the system. This can show you how a lot of
different concerns within the code are tied together and also ensures that
you're set up for both running and debugging the codebase.

At this point you should have a fairly solid understanding of at least the
most critical points of the codebase, and also be set up to run and debug it.
You should also have at least one or two points of contact to ask questions.
This gives you a good framework for figuring out how to modify the codebase
moving forward.

------
sandreas
Some of my rules for a big legacy code base:

\- don't plan or do a full rewrite - it'll almost never work

\- learn and use tools to automate the build system and quality assurance
(jenkins, sonarqube, docker, git, etc.)

\- take the time to improve your skills and the skills of your team (coding
dojos, experiments)

\- write automated tests (unit, integration, acceptance) for existing code
where ever possible - write at least unit tests and integration tests for new
code

\- do refactoring and first focus on cross cutting concerns (APIs,
translations, caching, logging, database, etc.)

\- migrate things to well tested isolated APIs (e.g. use REST / Graphql APIs
with new endpoints in the frontend and try not to use untested code for these
APIs)

\- don't be too backwards compatible (move fast and break things)

Hope it helps ;)

~~~
WhompingWindows
These text boxes are not great...even on my 4k monitor I have to click the
scroller at the bottom to see 1/2 of your longest bullet points; must be even
worse on mobile. Better to just write it in plain text instead of a box.

~~~
sandreas
Thx for the hint... edit done :-)

------
keithnz
A lot of people are giving advice on changing/testing/refactoring etc, I'm not
sure if that's what you are asking, as opposed to how to lean a new code base.

My technique for learning code bases, other than asking other devs questions
like :-

"What architectural patterns are you using?" "How do you deal with testing?"
"How is it deployed?" "how do you deal with data access?" "How do you deal
with data migration" "how do you deal with scaling / concurrency / security /
authentication" etc... broad stroke things.

If there are no other devs...look at the code and look around at some of these
broadstroke things.

One thing you may find is they are using an architectural pattern you aren't
familiar with, so be on the lookout for things that look odd but look very
deliberate and duckduckgo to see if you can find any information around names
in the code. Like if you saw FooActor BarActor, google for Actor / software
etc.

Another thing to keep in mind when you find odd stuff is that quite a lot of
devs copy paste stuff from the internet, and implement partial ideas they have
learnt about, so duckduckgo for snippets you suspect.

Then the main thing I like to do is take a usecase from the software and
follow it right through every layer, either through static inspection, or
using a debugger. I then follow up on things that initially seem confusing. I
then start sketch out a bit of an architectural diagram ( throwaway )

For Embedded systems code I tend to start from boot and draw out a bit of a
flow diagram of what's happening.

------
luisehk
Do they have good test coverage? That's key. If they don't, start with that.

~~~
mikekchar
One thing to consider: large code bases are large. To get any kind of useful
test coverage it may take you a very long time. While you are doing that, you
won't be visibly making any progress as seen from other parts of the company.
I often describe it as a kind of horizon -- you can do what you want until you
get to a time horizon. After that, the rest of the company feels like they
have lost sight of you. This causes them to panic and they will usually alter
your priorities dramatically. My rule of thumb is that your time horizon is
about 2 weeks. If you don't make visible progress in that time frame, your
project takes on more and more risk as you sail away.

Instead, try to find as much low hanging fruit as you can find. Fix bugs. Add
small features. Address nagging complaints that have been around for a long
time. At the same time, start filling out your test framework (and also beef
up your build process). Keep pairing refactoring and janitorial tasks with
"work that pays the bills".

One last thing. Try to identify areas where your users/paying
customers/stakeholders think, "This shouldn't be hard" and cross reference
that with areas that _are_ hard due to code complexity. Make sure to pick some
of that work so that you can make the tests in that area robust. Once that's
done, start refactoring so that you can do those requests very quickly. Once
you do _that_ don't forget to advertise your success! "Remember how much of a
pain X used to be? Now we can do it quickly because we worked hard on fixing
the problems". This will give you much needed political capital that you will
spend investing on improving other more tricky areas of the code.

~~~
Slartie
This guy got it!

Intertwining refactoring work with quick wins that fix visible issues is key.
Do not blow all your load in the first timeframe by just crunching away on all
quickly fixable issues - even if that would probably shed you in a light as
bright as the sun, resist the temptation, and just fix as much visible stuff
as you need to justify your position and to please the Excel number-juggler
crowd. You should project an image of being productive, but not overly
productive - while you are in reality actually being overly productive, it's
just that you put the remaining time into refactoring, increasing test
coverage, improving build/dev infrastructure, all that stuff that people don't
grasp the value of.

If you keep that up for some time, you should get the system to a point at
which your "hidden" investments start to deliver actual, visible value. Maybe
build time goes down, people can iterate faster, maybe your deliveries get
better in quality because your tests catch more bugs earlier...whatever, at
some point in time you can come out of the dark and start talking about some
of the improvements you already did, even actively advertise their value. This
is just as crucial as the earlier "shadow phase" \- if you want constant
improvements and refactorings to be done all the time in a sustainable way
(especially long after you've left the project) you need to change the
attitude towards such "janitorial tasks", you need to showcase their value.
Ideally, you (and others in the project) will be able to actually get time
allotted for some refactorings and you won't have to do those in the dark.

------
xupybd
I’d say the most important thing is to learn the domain and the business you
are working with. Never assume the code is doing things the right way for the
business. Get to know your client really well and try to understand what they
need to software to do. Keep them in the loop as much as possible.

~~~
fendy3002
This should be the best course of action before diving into technical aspect.
Look for any updated business documentations. If there aren't any, ask the
users of systems to find use cases and replicate them in dev system. Try to
document the business flow and emulate any hidden behaviors. Last, open the
codebase and matching business process with code, and comment everything
possible.

Then you can begin fixing or refactoring with use cases / test cases in hand.
If time isn't possible for that, try to shift the responsibility to the one
giving you task (pm).

------
andreasklinger
This is super specific to each project but here things that worked for me in
previous projects.

Two assumptions: You plan to work on this longer-term (not a 1month project
stint) and there are things worth improving (eg barely used legacy app might
not be worth your time)

#1 Get the team on board

if there are multiple people you need their buy-in and support for whatever
approaches you want to do

#2 Plan for "health by a thousand small improvements"

it will be an iterative approach and you will refactor as you go.

#3 Don't assume different = bad

people might have done differently, consider using their approaches. you might
do it differently. but it's better if you keep a consistency within the
codebase. in codebase management consistency trumps cleverness

#4 Create space

Consider introducing a fix-it friday where everyone can work on little
improvements

#5 Create non-blame culture

Stuff will break if people risk improving things. Avoid blame shifted to them.
If bug trackers ping individual people consider pinging the whole team instead

#6 Consider automation

introduce linters, autoformating, codemods, danger.js, code complexity
analysis, etc

#7 Introduce tests

This one is the most annoying. But worth doing: whenever you improve a feature
a bit try adding a test - often in legacy apps there are no good tests. A lot
of people recommend writing a test suite for the whole app before you do
anything. If you are lucky enough to do this try it. I always found the
iterative approach more realistic as you can also do feature work while
refactoring.

When doing tests focus on integration (vertical/functional/etc) and not unit
tests (unless the "unit" contains critical or complex logic). Your goal is to
know "that you broke something" \- you get by if you don't always know "what
you broke"

#8 Acknowledge tech debt

not everything needs refactoring. If it's not critical and nobody needs to
touch it consider acknowledging it as tech debt. Add larger notes above the
problematic areas and explain why you aren't refactoring it, explain things
worth knowing to understand the code better, etc. Whenever you leave comments
remember that comments should explain "why" not "what" the code does.

hope that helps! good luck.

~~~
abraae
> "code that doesn't get touched dies" \- so you want to "touch up" code as
> often as possible and get into a habit of small improvements.

I've seen many cases where this is far from true. Tightly and well-written
back end code in a well-designed system can run for years - even decades -
hardly being touched.

User-facing UI code less so of course.

~~~
humbleMouse
I agree. That's one of the most ridiculous code tip's I've ever heard.

~~~
andreasklinger
disagree here but in favor of simplicity of the main message removed it.

------
ioddly
Lots of good comments here, but I didn't see a code search tool recommended
while skimming.

I use this one:
[https://github.com/ggreer/the_silver_searcher](https://github.com/ggreer/the_silver_searcher)

But the important thing is to be comfortable popping open the console and
using it. Makes it so much easier to "research" a particular part of code
quickly.

~~~
Jach
I just linked ag too, then decided to skim the comments again since it's grown
to >100 and saw yours... Someone did mention OpenGrok, my company supports it
but I find it less useful than ag because it's on the web and (due to company
policy) gated behind another layer of authentication despite my browser being
SSO'd... For monstrous code bases it's also prudent to ag on an SSD or at
least make sure there's enough RAM to have most files in cache.

Another simple CLI tool I like to use is tree.
([https://linux.die.net/man/1/tree](https://linux.die.net/man/1/tree)) Seeing
the full project layout, with everything expanded, is occasionally extremely
useful.

------
dusklight
Reading Refactoring by Martin Fowler helped me a lot. The examples are in Java
but many of the concepts apply across all languages. However I would say the
examples makes the most sense for statically typed languages. I wonder if
anyone knows of a book that covers the concepts in Refactoring but with
examples in a dynamically typed language like javascript?

~~~
kbp
I haven't read it, but the second edition of that book used Javascript for the
examples.

------
pizlonator
Read a lot of code!

Spend time reading code in the codebase even if it doesn’t seem to make sense,
even if you don’t think you’ll need to know that part of the code. Keep
reading until it starts to make sense.

My trick has always been to just read. I even avoid tools that automatically
navigate the code because I like to just read it until I know where things
are.

------
nslindtner
Very simple input, but they have served me very well :-)

1: Make sure you have development and test environment available (including
data transfer from prod -> test)

2: Source control and Easy deployment (to all environments)

3: Map the code into importance. (not all code is equal).

4: If possible spend time with the main user / product owner (especially in
peak periods). You're blessed to have users, who understand your system.

And rember code that has been live for 3 years have earned a lot of expirience

PS: Typescript +1

------
sqs
First, get your tooling set up, especially a code search tool with go-to-
definition and find references. A good code search tool will make you much
faster and better at understanding code, finding correct usages, debugging
problems, etc.

It also makes it easy to get a URL to any line/region in a code file to paste
into email/Slack to ask/answer questions. (Of course, GitHub has URLs, too,
but you probably aren't browsing code on GitHub already because it lacks code
navigation/intelligence features, so getting the GitHub URL would add an extra
clunky step.)

Here is a study of Google's internal code search tool with some example use
cases and interesting stats:
[https://research.google.com/pubs/archive/43835.pdf](https://research.google.com/pubs/archive/43835.pdf).
Most(?) engineers at companies with large codebases use code search frequently
_if_ they've ever tried a good code search tool (i.e., it's hard to give it up
once you've used it).

(Disclaimer: I work on a tool that does this, but I'm omitting the name/URL
because the advice is general.)

------
TheOtherHobbes
1\. Assess how much of the code is actually understood. Is there any record of
the design decisions, the edge cases, the debugging process, the paths that
weren't taken? Who knows the most about the codebase and how it got to be how
it is?

2\. What's the current specification? Don't look at the stack, look at input
and output cases. How well does the code meet the spec? Where is it failing?

3\. Before you change anything you need to know what the change process is.
You probably do already know this, but if you don't need to find out whether
there are any demarcations of responsibility, even if they're only informal
and unstated areas of interest.

4\. When you have all that, you can start working on the code with some
knowledge of the context you - and the code - are operating in.

5\. If code works, don't rewrite or refactor for style without a very very
good reason. And don't do it unless you can change all the "bad" code at once.
Otherwise you'll end up with a mess of incompatible idioms that make future
changes hard to read.

6\. Write your own docs as you go. Best case is other people will benefit from
reading them, worst case is you'll remind yourself what you were doing six
months from now - because you'll have forgotten by then.

If you're a junior you may not have access to all of the above, so the fall-
back is to find out what you specifically are supposed to do, and where you're
supposed to do it.

If that's vague or unspecified, I'd suggest studying the code to make your own
model of it and then running possible actions past other team members and the
client before you make the first few changes - to establish a working pattern.

------
arkh
Get a copy of "Working effectively with Legacy Code".

The definition of Legacy Code for the author is "untested code". The book is
mainly a list of situations a code base can be in and how to add tests.

Once your application is end-to-end tested you can play with the code with
better peace of mind. Just adding those tests require you to get all the
specifications the app has to fulfill so it is a good way to learn them.

------
notacoward
I just happened to write a blog post about this a while ago:

[http://obdurodon.silvrback.com/navigating-a-large-
codebase](http://obdurodon.silvrback.com/navigating-a-large-codebase)

Short version: document everything as you learn, master your tools, look at
message and data formats before code, follow some of the important code paths,
change something and see how the system responds.

------
mariopt
Been there multiple times.

Before making such judgement, you need to be sure you know the community
guidelines for that language being used, libraries, etc. You'll need to get
familiar with the language ecosystem so that you're not the one doing things
your way. You can not live inside your head.

You should never aim for a full refactor but rather try to refactor a module
at a time. It might happen that codebase in questions doesn't uses modules but
you can start to commit one at a time.

The rule I follow is: I work on a ticket and will only clean/modify the files
that are relevant to the ticket I'm solving. This way you get more familiar
with the codebase has you go and you might notice that some parts might
actually be well written.

Do not fell tempted to wipe out the old code because it contains a good chunk
of specification that got lost in past conversations and is probably crucial
to the business logic and/or fixes bugs. Another funny side effect is: might
be buggy but actually works according to the spec :D

------
Pooparadox
Never try to refactor everything. Sometimes logic may look like it was
implemented incorrectly, but from my experience it may have been written like
this on purpose. Business logic can be really twisted.

Also remember about scout rule: "Leave Things BETTER than you found them.". It
will help you to slowly, yet steadly improve the codebase.

~~~
gwbas1c
> Never try to refactor everything

That's certainly true when you have a working product that just needs
incremental work.

When you inherit something that doesn't work, then you really do need to
refactor "everything." Most likely, problems arise from incorrect low level
assumptions or design decisions.

What do I mean by quoting "everything?" This is the kind of refactoring that
feels like you're refactoring everything, but in reality, you're still keeping
higher-level assumptions and design decisions.

In this case, if you don't refactor away the mistakes that lead to something
that doesn't work, then you'll never have something that works.

(BTW, I know this from personal experience, I had to refactor a shipping
product shortly after starting the job because the shipping product did not
work. Now I'm the lead architect on the product.)

------
hello_jerry
Step 1: investigate. find out where the program is breaking and/or delivering
unacceptable performance. This means asking stakeholders "is anything broken?
What is wrong with the app? what do you want to change?", reviewing
exceptions, and crash logs. Keep in mind that often times stakeholders don't
know what they want - determine the business objective and then work from
that. Determine where the broken code is, and write narrative comments about
what you think it does. If there is documentation, read it. There often isn't
any documentation, and often times the comments are useless on a good day.
Resign yourself to playing a combination garbage man/forensic psychologist for
the next few months.

Step 2: triage. determine which broken parts you can get away with leaving
alone for the short term, and which parts need immediate attention.

Step 3: Fix the most critical broken parts incrementally. If there are no
tests (there are never tests...), write a test for each block of code you
modify. Avoid wholesale redesigns if possible. make sure that you write tests.
Try to avoid getting mad about the previous person's style - focus on getting
things working to a borderline acceptable level, writing comments to explain
your decisions so you or someone else has a frame of reference. The goal of
doing this is to buy yourself time to clean the entire thing up.

Step 4: Once the app is working at a baseline acceptable level, examine the
codebase and determine which areas (if any) require redesigns, and determine
the cost/benefit of each redesign, based on what the stakeholders need, want
and expect. If any redesign is necessary, negotiate with stakeholders to buy
time for it - your bargaining chip should be an additional feature or two that
the previous guy shat the bed on. Basic criteria for a redesign: is the
current design impossible to understand? Does the current design impose
unacceptable costs in terms of performance or development time? if yes to
either, a redesign is probably worthwhile.

------
Noumenon72
You'll come back to the same pieces of code over and over, forgetting most of
the details and context each time. If it takes you a while to figure out,
write it down. If you had to use the debugger to find out what's in a map,
leave a comment with an example of what the keys and values look like and
where they're populated from.

Once you have added comments, it lets you hover over a function to remind
yourself "This does X to Y when the deposit is a check", so you never have to
read the internals of that function again when you're not tracing a check.

When you have to go 12 levels deep in the call stack to find the source of
parameter Y, make a note in a side wiki so you can recover that detective work
the next time.

Your knowledge of the code base grows like compound interest when you only
have to figure out what each piece of code does _once_ and can skip over it
after that.

------
OliverJones
If you use an IDE, learn and exploit its code-exploration features. Use them
all the time.

Do global searches whenever you aren't sure how things work. If they aren't
fast enough for you, get an SSD. Use them to look for all kinds of things to
convince yourself your change is safe.

Add comments as you figure things out. For example, "this algorithm seems to
match the one at /src/core/foobar.js:123".

You can also add "DEBT" or "REFACTOR" to your IDE's list of TODO tags, and use
it in comments where you see something you think might need cleaning up. If
you mark these places now and change some of them later, you'll avoid doing
damage; you'll get a chance to learn more before changing things.

Put ticket numbers (Jira, whatever) in your comments too.

Think "dig safe". You're spraying painting warnings near significant
opportunities to break things.

------
rdiddly
It's not exactly a guiding principle, just more of an analysis & learning
technique: Get ahold of a big piece of drafting paper, or a whiteboard, or one
of those big-ass pads of paper they're always putting on an easel for silly
brainstorming sessions. Something physical, big, writable, and not a computer.
Set it up or pin it up or tape it up, semi-permanently, and use it daily to
diagram and map out each new thing you learn about the system and its
interrelationships. I'm assuming here that you have the space to set something
like that up. If you don't, you're in a 3rd world programming situation and
you have my sympathies, but you can always do something graphical within the
computer, too, especially if you have a big screen or multiple screens. I just
have always found it quicker to do it in a physical medium.

------
poisonborz
To my experience __nothing works that isn 't directly besides the code / isn't
generated by it __. So schema generation and /or javadoc-like mechanism.
Nobody will read or want to maintain separate structures, especially not
arbitrary tomes like wikis.

------
huuugo
Keep the goals of the code in mind: providing its users value. It's a tool,
even if you think code should be art. The code's purpose is not to look pretty
or to get best marks in static analysis. It's this code that earned the
company the money they are spending now on you. Clean code pays off in the
long run, but might make you go bust in the short term. In the beginning of a
project, it's usually a prototype and it needs to prove itself. Once it has
generated some value, you can consider either spending time for quality
improvements to negate the tech debt, or you re-write it with everything you
have learnt from the previous version.

------
villeez
Run the code through in Softagram analyzer and start digging into the
structures using the visual browsing capabilities in Softagram Desktop app.
There is free trial at softagram.com where you can do it easily if you happen
to have your codes in some Git repo in cloud services such as Bitbucket or
GitHub... Shameless self promotion this is however, as I work for the company.
But that approach I have also been personally using more than 10 years: static
dependency analysis coupled with excellent visual dependency browsing. I think
some expensive version of Visual Studio has also similar stuff available.

------
Jtsummers
Document everything as you explore it. I'm an advocate for literate
programming but accept it's not going to be accepted by most organizations. So
I use it as a personal tool.

Tools: emacs, org mode, org babel.

Create a parallel directory structure, hypothetical project:

    
    
      ./src
      ./project/src/main.js
      ./project/src/some-file.js
    

Create a new directory structure with one org file per source file and one
index org file:

    
    
      ./project-org/src/main.org
      ./project-org/src/some-file.org
      ./project-org/index.org
    

(You can organize it differently, this has worked for me.)

 _index.org_ will be a simple tree view of the folder hierarchy:

    
    
      * Project Name
      Description
      ** Source Files
      *** src
      **** [file:src/main.org]
      **** [file:src/some-file.org]
    

You may add in some notes about the general purpose of each of those files.

 _main.org_ copy the entire code into the main.org file like:

    
    
      * main.js
      #+BEGIN_SRC js
      // all the code from main.js
      #+END_SRC
      * [[file:../index.org][Project Root]]
    

Start splitting the contents of _main.js_ into separate snippets, I don't know
Javascript very well so let me make some quick C example:

    
    
      * main.c
      #+BEGIN_SRC c :noweb yes :tangle yes
        <<includes>>
        <<structs>>
        <<functions>>
      #+END_SRC
      ** includes
      #+NAME: includes
      #+BEGIN_SRC c
        ,#include <stdio.h> // [0]
      #+END_SRC
      ** structs
      #+NAME: structs
      #+BEGIN_SRC c
        // structs
      #+END_SRC
      ** functions
      #+NAME: functions
      #+BEGIN_SRC c :noweb yes
        <<some-func>>
        <<main>>
      #+END_SRC
      *** main
      #+NAME: functions
      #+BEGIN_SRC c
        int main (...) {...}
      #+END_SRC
      *** some_func
      This function will initialize a block of memory
      to be used as a shared buffer between two processes.
      #+NAME: some_func
      #+BEGIN_SRC c
        void some_func (...) {...}
      #+END_SRC
    

As you go through this you can make cross-references to other files and
functions/structures. Eventually you'll find a smallest reasonable unit. A
long, but clear, function doesn't need to be dissected. But a short, complex
one, may end up with each line broken out and analyzed.

I don't just import a massive code base and do this in one go. Instead I
import parts of it and break down all the related files to a particular topic
("How does X happen?"). I trace it from start to end, and then repeat with the
next question. Good, modular code makes this much, much easier. The more
tightly coupled, the harder it is to understand no matter the method.

[0] The comma is inserted by org babel to distinguish from it's on #-prefixed
content.

~~~
dpflan
I like this and have considered this approach using a git branch for
annotations (although specific to using git, not familiar with other version
control software). Have you done the git branch (or equivalent) approach?

~~~
Jtsummers
I have made a new repo or a branch. Yes, but I typically keep it to myself and
generate reports for others (if used at work).

EDIT: I was on mobile earlier, so extending my thoughts.

I typically make a new branch or repository but keep it on my own machine.
I've gotten zero interest from colleagues in collaborating on this sort of
thing, but they usually like the _output_. Org mode (my tool of choice, but
not the only one) creates decent HTML output (you may want to play around with
your own CSS or color schemes for the code blocks). So what I've done when we
on-boarded a new project was to start doing this for certain critical sections
that were under-documented. I then generated HTML output as a sort of white
paper, and a PowerPoint deck that walked through the structure and control
flow (would be best if I used flowcharts, but usually this is just text).

If we had _good_ development machines at work, I'd definitely do the above
with PlantUML or something similar to do text-based diagrams. Org will produce
and embed images in the HTML output. This would make the flow for producing
documentation much easier, I disliked trying to embed flowcharts created in
Visio (tool available at work) into the HTML. I had to generate them, export
to an image, link the image in org, and then keep it up to date manually. For
a few charts it's not bad, but if you make a lot it's tedious to switch
between tools and correctly export the image.

=====

For non-work stuff, I try to use literate programming from the start, but it's
always solo projects so there's no "selling" this method. If I were
collaborating with others, I'd have to reconsider the method. Leo has (from
what I've read) an effective literate->code->literate story (that is, edit the
code and the changes show back up in the literate format). Org mode _can_ do
that, but I haven't explored it. I'd really want that if I was to pursue
literate programming in a collaborative environment (so that those
uninterested in my method could still contribute).

~~~
dpflan
Amazing, thank you for following up!

[I had constructed a reply to your comment while it was being edited so when I
posted the comment was much longer and had provided more than enough detail!
Revised this comment accordingly]

------
franee
Took me 1 month to be productive.

Went in to a large rails codebase as a backend developer in the middle of a v1
to v2 rewrite and it had custom folders with their own conventions on where to
put stuff.

All that with no documentation except for the setup.

What I did was just ask how the app works + specific workflows in front-end
side of things and just connect the dots on the backend part.

Most of my troubles where on where to put modules because of their existing
conventions.

Tips: 1\. Ask a lot 2\. Read the code 3\. A debugger helps 4\. Make sure you
add tests for areas you touch.

------
aasasd
When improving existing code, only do gradual changes. Don't do rewrites,
don't replace existing code with better solutions in one swoop.

This way, you won't be in a situation when the new solution doesn't work in
some cases and you already thrown all the code it replaces under the bus.
You'll always have a mostly working app.

OTOH, if you introduce an alternative solution, finish migrating to it before
beginning improvements in other places in the codebase.

------
tomduncalf
On a practical level, getting really familiar with grep/similar for searching
the codebase and using a debugger (I like the one built into VS Code for
working with Node) will help you when trying to work out how it all fits
together.

Depending on time and other constraints, upgrading the codebase to typescript
would be a great way to both familiarise yourself with it and to make working
with it more productive, but obviously you’d need client buy in.

------
sidcool
Apart from what others have mentioned, it may help you to run some stats on
the Git (assuming it's Git) repos/code. The most changed files would be the
important ones. Modules/Classes with most test coverage would be important
ones. Check out for God classes
([http://wiki.c2.com/?GodClass](http://wiki.c2.com/?GodClass))

There is no better substitute to talking to people though.

~~~
mjul
I can recommend trying out Empear’s CodeScene tool - it takes source control
analysis to a whole new level and highlights many issues.

For example it will help you figure out relationships like when I add another
widget here I should remember to add new switch/case clauses there and there
and there.

Another nice approach is using static analysis tools to look at metrics like
cyclomatic complexity, coupling and the dependency structure matrix to find
the most important and troublesome parts of the code base.

------
JustSomeNobody
1\. Build the code. Don't do anything else until you can build the code.

So simple, yet so many places get it wrong. Lazy devs check in code that is
broken. Project structures that depend on you having things on your machine
and in a particular place. Circular references. You name, I've cursed it.

If it's a project that you should be able to check out and build locally, then
you should be able to check the code out in any directory and build it.
Period.

------
badfrog
Are you able to talk to the original author or the most recent maintainer? If
so, I'd spend a couple hours reading the code on my own to get a basic
familiarity, then sit down with the original author and ask them to give you
an overview. They'll probably even still remember which parts they consider
bad or hacky, and knowing about that stuff could save you a lot of trouble.

------
AnimalMuppet
It's a three-year-old code base? That's _good_. It means that you can probably
talk to some of the authors, to figure out what they were trying to do. A big
20-year-old code base is worse, because many of the original authors are gone,
and there's been a lot more time for it to be patched by people who didn't
understand what the original author was up to.

------
djklanac
Has anybody tried software intelligence tools? Seeing the architectural
components and control flow of the code seems like a quick way to document the
codebase and figure out which “clusters” to study.
[https://en.m.wikipedia.org/wiki/Software_intelligence](https://en.m.wikipedia.org/wiki/Software_intelligence)

------
pelle
Git blame is one of the best tools for understanding the history and
archeology of a large legacy code base.

There is nothing worse when you're trying to do solve some problem that you
discover a giant reformatting commit typically instituted by a youngish
developer.

It obviously doesn't remove the history, it just makes it so much harder to
actually find out why the code was written as is.

------
franzwong
What is your problem with the code base actually?

~~~
badfrog
> What is your problem with the code base actually?

All OP has told us about the codebase is that it's big, it's in Node, and it's
3 years old. It doesn't sound like they think there's any problem with it.

------
edem
I'd suggest the book [Working Effectively with Legacy
Code]([https://www.amazon.com/Working-Effectively-Legacy-Michael-
Fe...](https://www.amazon.com/Working-Effectively-Legacy-Michael-
Feathers/dp/0131177052)).

------
Clanan
Before modifying an existing line of code, understand what it's purpose
is/was. Even if it doesn't appear to have one, or make any sense, someone
created it for a reason.

Also IDE/search tools for determining where a function is used are great for
removing stale, unused cruft.

------
rusk
If you have the time, make sure that you have adequate coverage from a suite
of automated tests. This goes doubly so for a dynamic language like
javascript. This will free you up to make changes and will make the whole
process of introducing changes less nerve wracking.

~~~
speedplane
You beat me to the punch. Yes if I'm hired to work on a large, buggy, and
antiquated code-base, the first thing I'd do is build tests. Many of them,
both unit tests and big functional selenium-style tests. I would definitely
try to setup some coverage analysis, but I wouldn't be religious about it. The
focus should be to cover popular features, not lines of code.

With those in place, you can start chopping away left and right. Even if you
make big changes, when you see all of those tests pass green (or blue if
you're on Jenkins), it gives you a level of comfort that no careful reading of
the code can supply.

------
dev_dull
If you inherit a big code base my suggestion is to always spend _a lot_ of
time going through the tests before jumping into the code. Learn about the
business logic. What things have a lot of coverage? What broke a lot of caused
regressions? You will learn so much.

------
gwbas1c
Unit tests / automated tests are critical.

If you're lucky, you will have a suite of unit tests and automated tests with
high code coverage. Rely heavily on these tests as you refactor and debug.

If you don't have working tests, consider writing them before making any major
change.

------
sorryforthethro
That's nothing. Try a 10 year old PHP codebase with remnants of a failed GWT
integration.

~~~
acemarke
Oh dear. I wrote a GWT app at one point (and recently replaced the client with
a new React+Redux implementation).

How would you even go about integrating GWT and PHP in the first place?

------
JamesBarney
Fear large changes to existing code. If you come into a large existing code
base with a mindset of a smaller one you will cause 3 bugs for every one you
fix.

Instead always try to make the smallest possible change to achieve your end
goals(bug fix/feature etc).

------
jrochkind1
This book is pretty great:

[https://www.amazon.com/Working-Effectively-Legacy-Michael-
Fe...](https://www.amazon.com/Working-Effectively-Legacy-Michael-
Feathers/dp/0131177052)

------
tonymet
focus on outcomes. have an objective like better monitoring, better perf,
better code coverage, and use that as your guiding principal. make sure any
change benefits the customer or your team. avoid refactoring just to
modernize, every change is a potential regression. and what is in style now
will be legacy in due time.

once you decide on the objective, try to divide components up into interfaces
so you can test and refactor a component with reduced side effects.

set a goal for each objective. like improve coverage 5% a month or no net neg.

long story short, be methodical and focus on outcomes

------
kilon
The only thing it gets in my nerves is lack of comments and documentation. The
fallacy that code can be remotely as easy to understand as natural language
text. The rest I can endure and tolerate.

------
ehnto
Get familiar with your debugger. You may be surprised by how deep in a call
stack a bug can live. It will also make you intimate with the different ways
your codebase is orchestrated together.

------
grigjd3
Debuggers are your friend.

~~~
throway88989898
... singular?

------
vladimir-y
If you need to maintain and extend the project I'd consider converting the
JavaScript code to TypeScript with strict compilation option enabled.

------
sreedharbukya
I have been there for couple of times now. First thing, I look into it to how
to get refactor if i see something here.

Luckily, I had one of the best mentor, who taught me how to refactor the any
piece of code without failing.

It is really works,If this code base test cases. If doesn't have test cases, I
think we have take one step back and do small functionality at time and move
on from there.

Bad of big code repositories is mostly, they are still using some version of
framework which is most uncomfortable to new bee for getting their head
around.

------
jarechiga
Just squash every commit from the repo into a single commit with message
"Legacy code" and force-push to master.

-David Winterbottom

------
ngneer
OpenGrok supports JS and helps to navigate the codebase easily, in turn
helping you to reason about it. Good luck!

------
IloveHN84
Rule #1 use the boyscout approach: clean where other left dirty stuff.

Sure, many here would argue that "never touch a running system" is a gold
rule, but what if the running system is running in the wrong way?

If everyone attain her/himself to this rule, there wouldn't be what we call
innovation.

Clear example: why develop Windows 7, 8, 10 / Mac 10.10,.11,.12.. if their
predecessors were working fine?

------
altmind
my #1 working with existing code bases is you need to get debugger working.
nothing can explain the processing workflow better than tracing the execution.

also having any kind of schemas, protocol definitions and tests can help grasp
whats happening in the system.

------
bitL
You can't really. Especially in JavaScript world where velocity is everything.
You'd spend at least ~1 year if it is really large and hit all kinds of hacks
in the form of callbacks within callbacks within callbacks that are tuned in a
way that (mostly) works. I'd recommend you to run away, seriously.

------
peterbraden
Learn to use grep well.

------
purple_ducks
\- put a watch on the app dir filesystem

\- put a watch on the network

------
egfx
Same exact boat. Case sensitive search.

~~~
egfx
I think this was downvoted because JS is case INsensitive. But they are
talking about large code bases modified over years. In my experience,
especially with TypeScript, case sensitive search is better for figuring out
what you need in this case.

------
mamcx
Devil advocate:

I made so many rewrites that it can be say is most of my job.

My first real project was move a FoxPro DOS app to Visual FoxPro. I probably
not read any (old) code at all. The next, move torwards a N-Tier with SQL
Server (that was circa 2000). Then move in other company a Fox desktop app to
ASP.NET 1.

And that's is only counting my first years.

\----

How I could tackle this WITHOUT READING THE OLD CODE?

TALKING TO THE OLD DEVELOPERS!

If is possible for you, let them talk about how the old app work. Even better,
let them (or anyone in marketing, support, etc) explain the problems that this
app solve, and the new ones this app have failed to solve.

I have the luxury to mostly work with apps with a RDBMS in the back, and
rarely fancy (and NASTY) architectures like micro-services. Understanding a
RDBMS is orders of magnitude easier than the codebase:

[https://quotes.yourdictionary.com/author/fred-
brooks/31361](https://quotes.yourdictionary.com/author/fred-brooks/31361)

    
    
        Show me your flowcharts and conceal your tables,
        and I shall continue to be mystified. 
    
        Show me your tables, and I won’t usually need your flowcharts; 
        they’ll be obvious.

\-- Fred Brooks

So, before get deep in code: MAKE A DATABASE, and PICTURE THE FLOW OF DATA.

Sometimes, is even possible to cut a massive amount of (legacy) code that is
the result of a iterative development (under pressure and without planning)
that result in a terrible "flow of data". Fix the flow, fix the schemas, and
suddenly the code is short and easy!

Also:

Complicated software are infections of a complicated business requirements
(ie: company). When something is a pile of mud, NOT FIX THE MUD.

Fix the business requirements until it get easier to handle. This also lead,
most of time, to a massive reduction in messy code bases.

Also:

You have applied any of the sensible advice elsewhere. Instead of rewrite, you
make testing and all that.

IF YOU FEEL IS STILL TERRIBLE AND YOU KNOW IN YOUR GUT YOU WILL GET STUCK HERE
FOR ALL THE ETERNITY

cut that code without mercy. Not push along when you have, proved is a dead
end. I made a mistake like this with a rewrite from a iOS App made by a
consulting firm and lost 6 months(!!!!) trying to be reasonable.

This cost me the contract? You bet it. However, In the last 2 week I remove
almost 60-70%(?) of the code and rewrite it to be more along the Apple
guidelines. I still lost the contract but the next team? Finish it in a month.

------
lugg
Grok.

Make a small change.

Loop.

-

When you enevitably come across a trade off, choose the one which is easiest
to change later.

Everything else is just noise.

Large changes in legacy systems are often suffer from the second system effect
among other problems.

------
poojapanwar27
comment

------
meng3310467870
aakybe How yus i do,you now pieng wued vase is just is code us fade1jsyeb ask?

