

Ask HN: How do you learn your way around a new codebase? - nmb

I've been spending the past few weeks learning about various new technologies/libraries that interest me. While it's been fun, I find that my ignorance and lack of a systematic way of exploring new codebases lead to simple mistakes that take me hours to track down. Also, when a class spans multiple source code files, it can be difficult to contain it all in my head. In principle abstraction barriers should mitigate this, but in practice this isn't always the case and sometimes I need (or, perhaps, want) to go to the lower level to see what's going on. As of late I've been drawing inheritance and call stack diagrams to help with my understanding.<p>Do these problems sound familiar? Are there any general strategies that you use when you want to quickly understand how a codebase works?
======
stevenjowens
Find the edges and work your way in.

Find the core and work your way out.

The edges depend on what kind of application you're talking about. A web app
will have different edges than a GUI app, and so forth.

Find the main method or equivalent, and follow the trail of instantiation.
Usually you'll find that configuration takes place somewhere in this code
path, which is another place to come back to later and start connecting the
dots.

When all else fails, identify and follow a single trail of execution all the
way through the code. Find another and follow it. Pretty soon you'll start to
get a sense of where to find the core of the code.

Identify the presentation classes and work back from them to identify the
business logic.

Identify the data access layer (either file or database) and look at the app
in terms of what it's pushing into and pulling out of the data.

There are some code analysis tools out there, though I've never managed to
quite fully grasp them, using them to look at the code can sometimes help you
get a sense of the structure. One that is tantalizing, for java, is SA4J aka
Small Worlds:

<http://www.theserverside.com/news/thread.tss?thread_id=24255>

There are others out there. There used to be a copy&paste detection tool in
sourceforge... I might be thinking of this one:

<http://pmd.sourceforge.net/cpd.html>

Add comprehensive logging:

Add a log statement for all method/subroutine/function/procedures, and then
run the app while watching the logs, to get a sense of what gets called, when.

You can try to go a step further, and log all returns (and the values
returned) but that can be a more challenging task. On the other hand, figuring
out where to insert those log statements will also force you to engage with
the code, so it's worth it if you've already picked the other low-hanging
fruit.

Speaking of engaging with the code; don't underestimate the value of
refactoring the code as a way of learning your way around it. It's much more
engaging and less likely to induce highway hypnosis if you're _actively_
working the code instead of just skimming it.

Obviously you use good revision control, and ideally keep a pristine install
of the code alongside your experimental working working set, to compare
behaviors.

Don't be afraid to throw away the refactored code if you find you've bogged
down (and keeping that clearly in mind can help you avoid analysis paralysis).

Good luck, Jim!

------
wpeterson
Some quick advice:

1) Pair Program w/ someone Senior - best way to learn is to ride shotgun,
literally.

2) Fix bugs first - trace a single feature, take things apart

3) Use the Source - Make sure you've got the source to both the primary
product any any 3rd party/open source libraries. You should be able to chase
any line of code down to it's source.

4) Draw Diagrams or Make Tables

5) Don't Hesitate to Break Out the Debugger

6) Use source control "blame" - find out who wrote the code and talk to them
about it

------
tarmstrong
Using a debugger has been useful to me lately. They can take a while to learn
(depending on the language and environment), but they're well worth it. You
get to step through everything that the interpreter sees. The only drawback is
you can get lazy, as it lets you read only what you need to read. You won't be
disappointed unless you know the codebase well enough for the debugger to be
tiresome.

------
NonEUCitizen
I compile the source with a good IDE, e.g. Visual Studio, so I can:

1\. set breakpoints

2\. right-click on an identifier and "Go to Definition"

3\. search in entire "Project" or "Solution"

I've found helpful on Unix systems:

U1. grep -R ...

U2a. ls -lR > lslr.txt [if tree is big, this is slow]

U2b. view lslr.txt in an editor [no longer slow, even if tree is big]

~~~
nmb
Thanks for the tips. My recent projects have been in Python or JS, but I've
been looking into things like PyDev or PyCharm so I can do stuff like this.
Thus far I haven't been using anything much more heavyweight than a text
editor.

When I code in Java I'm used to doing this stuff already, but the codebase
sizes for those projects tend to be much larger.

------
huntero
I've recently started working with a decent size codebase in C. At first I
spent quite a while looking through the code, but didn't really learn that
much about it.

Eventually I bit the bullet and just dove right in, not afraid to break
things. I had figured out more in 10 minutes than I did in a whole hour of
passively looking through the code.

------
keeptrying
You have to just dive in and change something and then build/run it.

Ie you have an expectation, that if you change A then A' will happen.

Make the change, build and test it. If A' does happen then make a bigger
change B with the expectation that B' will happen.

Again build and test this. Soon you'll find that either B', C' or D' doesnt
happen and then you'll have to debug it.

This is the quickest and fastest way to really learn the codebase IMHO.

Passively looking at files is not going to help. And its just a big waste of
time.

Now doing this will be painful but its better to go through the pain now
rather than when you have a customer breathing down your neck for a fix.

------
ryanfitz
Try to make a modification that should be simple. For example, in a web app
update the text shown when there is a validation error on the server side.
Make the modification, get it deployed and working on your local dev machine.

This should take you around most high level areas of the system and in my
experience show you just how nice or complicated of a system you just
inherited.

------
pbreit
Having someone who knows the code base spend even just one hour with you can
be extremely valuable.

------
runjake
Fixing bugs, or refactoring small things.

