
Ask HN: How do you dive into a new codebase? - reinhardt
I recently started working on a non-trivial 3-year-old codebase (~50KLOC, mostly python, developed in-house, plus a dozen or two 3rd party dependencies) and I'm not sure how to ease my way into it. The developer I replaced as well as most current developers on this project are working remotely and the main way of communication is chat and email; they don't even use Skype for voice chat.<p>As for documentation, although there is some it seems kind of unstructured and random; it's hard to see the big picture and how each piece fits with each other. Plus most last-modified timestamps are from 2008/2009, which from experience makes me question how relevant to the currently working system are the docs.<p>So direct communication and documentation aside, how should I start learning about the system and the code on my own ? Should I go for a top-down approach, i.e. start from the top directory and try to figure out what each subdir is responsible for down to the actual files ? Or rather pick a narrow subtask and drill down to it, ignoring everything but the absolutely minimum required to complete it ? Any insight will be much appreciated, I feel kinda overwhelmed at this point.
======
tomasr
Personally, I wouldn't worry about directories or code layout; focus instead
of functionality/use cases.

When I've had to do this, I usually try a dual approach: \- Reading code, by
identifying an interesting function / use case and tracing the code (reading +
stepping through debugger) from the top to the bottom. Example: start with a
webpage / api and drill down to see how it works. \- Adding features / fixing
bugs: Once I've got a basic idea of the code layout, having to try and and do
small bug fixes or small features is a great way to learn more about the
system.

I've never had any problem with using chat to talk with the original
developers; that's what I do most of the time, and several times have had to
write some initial "get started"-kind of docs for new developers coming into
the project. It's something to ask for; can't hurt.

------
thibaut_barrere
It's really like a worldmap: I think it's better to have a good overall view,
then dive only when needed. You can literally spend weeks into a tiny part of
such codebases.

What I used when dealing with very large code base (>500kloc) was putting the
classes, their relationships (inheritance etc) as some kind of graph (UML,
MindMap) on a large (A3 or A0) sheet of papier.

To achieve that, I used either an existing tool (reverse engineering UML model
in C# for instance) or a home-baked approach (use reflection or even home-made
file parsing) then graphviz or similar.

In any case, you'll want to have an automated way of doing this, so you can
refine afterward (ie: put colors on what you already visited, add a summary
extracted from comments, etc).

Good luck! If it's like for me, it will become fun :)

------
jey
I think the best way to get started is to have some _small_ bugs or features
to implement, and start diving in. By tackling a series of small problems
you'll get familiar with different parts of the code base as you try to figure
out why a particular buggy behavior is happening, or where to put a new
feature. Make sure you give yourself a decent amount of time as you do this;
you obviously aren't expecting to fix the bug as quickly as you would if you
already understood the code.

That said, reading and diving into existing codebases is definitely a learned
skill, and a skill that you'll get better at as you do it more and more.

~~~
togasystems
I have to dive into new codebases on a daily basis.

The best way to start wrapping your head around it is one, actually try and
use all of the features of the software.

Then as the above poster said, start fixing small bugs and adding small
features.

Be happy that there is documentation, even a small amount. Then also be joyous
that it was written in python. It is a lot easier to read than diving into
some legacy C++ code.

------
henryfarbles
I would second some of the advice about working on a set of small bugs or
features. At my current job we have 2 million+ lines of code in a proprietary
language with absolutely no documentation! I found while working on smaller
issues it gave me an opportunity to familiarize myself with the rest of the
codebase. I also drew a diagram of the class hierarchy as I went but quickly
ran out of paper. It's a daunting task but you will become familiar with the
code in no time. Good luck!

------
arethuza
Some rough process like:

\- Identify the main use cases (i.e. what the thing is actually for)

\- Work through each one until they bottom out in calls to a database,
webservice, message on a queue, file update etc.

\- Try and sketch out a rough architecture of how the "modules" in the
application hang together

Once you get used to doing this it can be quite fun in a masochistic kind of
way.

------
adrianscott
neat question, thanks for asking.

i like to figure out what roughly what:

\- the major modules are,

\- the communication paths between different pieces of the system / different
servers -- Drawing a picture is really helpful for this

\- key objects or function libraries

And then start with a very small upgrade; sometimes as simple as a little
html/view change.

hope this helps -- good luck!

------
hackinthebochs
Was version control used? Preferably something like Git or Hg where you can
see each changeset? Start at the beginning of the project to get a feel for
the overall structure then follow how it grew from there.

