
Strategies to quickly become productive in an unfamiliar codebase - sabarasaba
http://devblog.nestoria.com/post/96541221378/7-strategies-to-quickly-become-productive-in-an
======
scott_s
I'm surprised I didn't see something similar to what I do: the deep-dive.

I start with a relatively high level interface point, such as an important
function in a public API. Such functions and methods tend to accomplish easily
understandable things. And by "important" I mean something that is fundamental
to what the system accomplishes.

Then you dive.

Your goal is to have a decent understanding of how this fundamental thing is
accomplished. You start at the public facing function, then find the actual
implementation of that function, and start reading code. If things make sense,
you keep going. If you can't make sense of it, then you will probably need to
start diving into related APIs and - most importantly - data structures.

This process will tend to have a point where you have dozens of files open,
which have non-trivial relationships with each other, and they are a variety
of interfaces and data structures. That's okay. You're just trying to get a
feel for all of it; you're not necessarily going for total, complete
understanding.

What you're going for is that _Aha!_ moment where you can feel confident in
saying, "Oh, _that 's_ how it's done." This will tend to happen once you find
those fundamental data structures, and have finally pieced together some
understanding of how they all fit together. Once you've had the _Aha!_ moment,
you can start to trace the results back out, to make sure that is how the
thing is accomplished, or what is returned. I do this with all large codebases
I encounter that I want to understand. It's quite fun to do this with the
Linux source code.

My philosophy is that "It's all just code", which means that with enough
patience, it's all understandable. Sometimes a good strategy is to just start
diving into it.

~~~
NotAtWork
Your strategy is similar to mine, except that I only focus on the types.

Which types are required to define the function, both as parameters and
through any global/inherited scope/other state? Which types are directly
related, eg, containers to string, containers themselves, indexing, etc?

By the time you have some sense of what things go where in the program, and
what they turn in to, you usually have significantly narrowed down what the
program can be doing.

Note: This works considerably better with some languages than others, but can
work reasonably well on weakly typed things like Python.

~~~
scott_s
You inevitably spend much of your time understanding the types and the
concepts they represent, but I find using a specific, high-level function to
drive the understanding helpful. One, it provides a "theme" for the deep-dive,
and two, the logic of that functions gives you particular paths through those
types to focus on.

------
shadowmint
Am I the only one reading this thinking "as written by someone who seldom
dives into foreign code bases".

After a few months you'll be familiar with it?

Wow, I'm lucky to get 8 hours on a new code base before I have to ship a
bugfix. Months?? O_o luxury. Sip your pina colada as you write your
immediately out dated documentation. Great advice.

No, that code base was probably written by several people _some_ of whom knew
what they were doing, some of whom were just 'getting the job done' and some
of whom were idiots.

Probably, the only useful suggestion in that list is code reviews. Hack a fix
in, get someone who seems clued up to review and suggest a better way.

Look for logging, thats the first thing I do; you're probably not the first
person to be given this code to work with, and if you're lucky the last
folk(s) made some debugging tools. If not, be a pro and leave some good ones
behind when you go.

...not documentation.

~~~
davvolun
Wow, I'm lucky to get 8 hours on a new code base before I have to ship a
bugfix. << spotted your problem. If you're constantly working in completely
unfamiliar code with at best an 8 hour lead time to get a fix in, you're being
setup for failure. Yeah, sometimes you have to just get a fix fast, but
spending your whole life coding like that is going to burn you out. I only
have my own anecdotal evidence to go by, but I'm inclined to think your
experience is not the only way.

~~~
gizmo686
Most teams I have worked with deliberatly introduced new members to their code
base by:

A) Making sure they can compile it, and showing them where the various
locations for documentation are.

B) Giving them simple bugs to work on while they get familiar with the
codebase. 8 hours is more than enough time for a competent developer to fix
many simple bugs.

You also have situations where the nature of the problem dictates that you
fairly often need to make changes in unfamiliar code-bases. For example, at my
previous job I was developing an Android fork. Android has a fair amount of
its own code, but also includes a large number of external projects. It was a
fairly common occurrence where I would have to fix a bug in one of those
projects that I have never seen before (and will likely never see again).

------
goshx
From my experience, the "be humble" strategy is by far the most important one.
Many developers tend to dislike whatever piece of code that was not written by
them before even looking into the actual code or they get discouraged because
they have a hard time learning how it works. For whoever is in this situation,
be humble... pretend your way of writing software is not the only way and see
if you can learn something. You may get surprised.

------
paperwork
I once had to get my head around over 120k lines of complex, concurrent, buggy
spaghetti Java code (for a real-time trading system).

My first attempt was to reverse engineer the code into a UML diagram. For some
reason I keep making this mistake. A messy code base will result in an
extremely messy diagram. It can give a few insights, but between finding a
tool which will work and trying to make sense of a tangled mess of lines,
visual diagrams usually aren't worth the time.

I found that a tool called Chronon was somewhat useful (google "DVR for
Java"). This tools just records a single run of a program. It is great for
going forwards AND BACKWARDS and you step through the code, take a look at
different threads, state of various objects, etc.

My strategy was to run the server and have it execute a small and simple bit
of functionality (execute a single order). Follow it all the way from input to
output. Make the scenario a bit more complex and follow that through to
completion. This way you get to understand the core functionality, edge case
code and start to get a sense of performance enhancements, etc.

I found myself making steady progress and fixing a number of bugs, until I hit
heisenbugs, caused by overly clever concurrency/object pooling. It is enough
to drain your soul :)

------
metatation
Another strategy that I've often used is to just fix a few bugs that seem to
be in the vicinity of the area you want to familiarize with. Heck, I even do
this on my own codebases that I haven't touched in a while.

I find it helps focus the mind, provides a clear definition of success and
forces you to think about a specific area of the code without requiring too
much context.

------
asuffield
I've done this a lot in my career. My strategy is simple: start by finding a
trivial bug, and fix it. Then find a slightly less trivial bug. Do this four
or five times. Don't ask anybody about anything unless I'm really lost, just
chase the thread and let the bugs lead me through the active code paths.

After doing a few of these I've got a fair understanding of the structure of
the code and can figure out where to go next.

------
inversion
I find it helps to focus on a 'successful' path through the code, starting
with an important entry-point and ignoring error conditions and validation.
Once I've covered a fair chunk of what the software does, I can go back and
delve into the parts I didn't understand.

I'm also wary of comments as I find they can often have 'drifted' in old code
and become misleading. I make detailed notes of things that look dodgy or
could be improved separate from the code.

I'd add that if the project has poor-to-no build scripts that require a lot of
manual steps it's worth at least bandaging it up with a shell script early on.

I recently worked on a project that had many separate modules with independent
build scripts, requiring copy-and-pasting built artifacts that others depended
on, and several manual configuration tweaks post-packaging. That stuff is very
tedious and sucks your energy. It's not generally a priority to rewrite all
the build scripts, but if you're editing several modules it's worth being able
to do a one-shot build from early on.

------
caissy
I think that the two most important strategies are to actually pair with
someone and ask questions after doing some research.

Being able to do some pair programming helped me to understand a new code
base, in a language I never used before bits by bits. I could ask questions
and actually helped on the issue. Asking questions is important, but as stated
on the post, you should try to find to answer by yourself first. I'll actually
time myself for 10 minutes, and if I can't find an answer, I'll just poke the
most prominent person based on a git blame of the file that is related to my
question.

This is my second week working on a Ruby codebase, without any prior
experience with Ruby, or Rails (I've mostly been doing Python with Django and
Pyramid for the past 3 years). I managed to get quickly up to speed this way.

------
karlb
What types of tests—or testing software—could be run on a WordPress site? For
example, after we update a plugin on a WordPress site we manually check that
the site hasn't changed in appearance. What software – or methodology – would
you recommend for automating this game of spot-the-difference?

The nearest I've found (but not tried) are:

Screenster:
[http://www.creamtec.com/products/screenster/index.html](http://www.creamtec.com/products/screenster/index.html)

Wraith: [https://github.com/BBC-News/wraith](https://github.com/BBC-
News/wraith)

~~~
albemuth
There's also Huxley:
[https://github.com/facebook/huxley](https://github.com/facebook/huxley)

------
simmons
There's a good episode of Software Engineering Radio on this topic, as well:

Episode 148: Software Archaeology with Dave Thomas [http://www.se-
radio.net/2009/11/episode-148-software-archaeo...](http://www.se-
radio.net/2009/11/episode-148-software-archaeology-with-dave-thomas/)

------
scottmwinters
I recently started a new job with a new language and did a few of these steps.
The only one that has really helped me learn this codebase is "make
something".

Making unit tests can be great if you already understand the language (and the
IDE), but when you're new to xcode, dont expect a unit test to make any more
sense than the code.

Pairing can be a great tool, if you are pairing with the right person. You
can't always find the right programmer (with the time or care) to sit down and
work with you like that.

I was certainly humbled by this process, whether by choice or not, and asked
my share of dumb questions, but I don't think you can really understand a
codebase until you work on it

------
cratermoon
Working Effectively with Legacy Code, by Michael Feathers, is still my go-to
reference, even though it's now ten years old.
[https://cmdev.com/isbn/0131177052](https://cmdev.com/isbn/0131177052)

------
jakozaur
Nice 10kf overview, but more hand on tips:

1\. Run the program, look at stacktraces/profiler. Sometimes it's easier to
analyze runtime and figure out what the program is doing and what are the
main/common patterns.

E.g. Java - jstack, C++ - gdb, gprof

2\. Use some static tools to get call/caller graph and be able to browse
program quickly. Jump between definitions, see what's used, what's not.

E.g. C++ - docgen, Java - most modern IDE will do just fine

~~~
kaokun
Tools like gdb and docgen are useful for smaller open source projects, but I
think they quickly loose effectiveness on a large code base. For example I
think the original post's high level view would work great when starting a new
job as well as when diving into an open source project.

That said one step down the chain from stack traces and profiler output might
be to look at log files, or run the code in verbose/debug/trace mode if it
exists.

------
zorbo
How can you write tests if you're familiar with what the code is supposed to
do?

~~~
hrjet
The tests are one way to express your understanding of the code. If the tests
you have written fail, then you can focus and improve your understanding in
that part of the code. And if they pass, you can be reasonably assured of your
understanding.

------
wsc981
For my current employer I landed on the job during their regression test
sprint.

Just running through the tests and communicating a lot with testers and
developers helped me understand how the app (ought to) behave.

------
_RPM
"Ask questions", I found that if you ask too many questions, management starts
to get annoyed and actively avoids you and or has attitude in their response.

------
LilBibby2342
Somewhat adjacent to the conversation, but Bowery -
[http://bowery.io/](http://bowery.io/) \- is building a toolset for getting a
full dev environment in a few seconds.

Saw them on the "Made in NY" Product Hunt collection yesterday. Has anyone
played with Bowery?

------
phpnode
Can't over emphasise the importance of a good IDE for this kind of thing, even
if you're a hardened vim or emacs zealot. A tool like IntelliJ IDEA makes
navigating new/large codebases a million times easier.

~~~
szabba
With Vim and Emacs you can get this if you have a plugin that autogenerates a
ctags/etags file. Then (in Vim) just <C-]> to an identifiers definition and
<C-o> back.

~~~
phpnode
for many languages ctags doesn't come close unfortunately, for code
exploration a good IDE is essential. Even if you don't use it for actual
editing.

------
EGreg
This is very useful, and I think I will post or link to something like that on
my own open source project's page.

~~~
joshdance
You should also try to make sure your project has tests, is clean, commented,
and easy to read. :)

