
Ask HN: How do you familiarize yourself with a new codebase? - roflc0ptic
My question is pretty straightforward: how do you, hacker news enthusiast, familiarize yourself with a new codebase? Obviously your answer is going to be contingent on the kind of work that you do.<p>Some background: What&#x27;s motivating me to ask is that I am flirting with the idea of trying to add a couple of features to SlickGrid (https:&#x2F;&#x2F;github.com&#x2F;mleibman&#x2F;SlickGrid), Michael Leibman&#x27;s phenomenal javascript grid widget. Unfortunately Leibman got busy and isn&#x27;t actively supporting it anymore.<p>The codebase is something like 8k lines of javascript, so it&#x27;s not ludicrously big, but I&#x27;m kind of intimidated thinking about trying to make sense of it. My first strategy is just to open up important-looking javascript files (slick.core.js, slick.grid.js) and read through for comprehension. This seems like a pretty slow way to build a mental model of the code, though. Some features I want to implement are 1. an ajax data source that doesn&#x27;t require paging, and 2. frozen columns. Someone else has implemented a buggy version of frozen columns (and since abandoned the project), and I might like to use it, but I can&#x27;t tell if it&#x27;s buggy because it&#x27;s a hard problem, or because their implementation strategy was poor (or both!). So at the moment I can&#x27;t evaluate if I should implement my own, or try to fix the issues with theirs.<p>Picking up other people&#x27;s code seems to be one of the harder tasks developers face, as evidenced by how much code gets abandoned, so I wondered if the voices of experience on here could point me in the right direction, either by talking about this problem in particular, or more generally, how you build knowledge about a new codebase.<p>Thanks!
======
tessierashpool
I wrote some simple bash scripts around git which allow me to very quickly
identify the most frequently-edited files, the most recently-edited files, the
largest files, etc.

[https://github.com/gilesbowkett/rewind](https://github.com/gilesbowkett/rewind)

it's for assessing a project on day one, when you join, especially for "rescue
mission" consulting. it's most useful for large projects.

the idea is, you need to know as much as possible right away. so you run these
scripts and you get a map which immediately identifies which files are most
significant. if it's edited frequently, it was edited yesterday, it was edited
on the day the project began, and it's a much bigger file than any other,
that's obviously the file to look at first.

we tend to view files in a list, but in reality, some files are very central,
some files are out on the periphery and only interact with a few other files.
you could actually draw that map, by analyzing "require" and "import"
statements, but I didn't go that far with this. those vary tremendously on a
language-by-language basis and would require much cleverer code. this is just
a good way to hit the ground running with a basic understanding which you will
very probably revise, re-evaluate, or throw away completely once you have more
context.

but to answer your actual question, you do some analysis like this every time
you go into an unfamiliar code base. you also need to get an idea of the basic
paradigms involved, the coding style, etc. -- stuff which would be much harder
to capture in a format as simple as bash scripts.

one of the best places to start is of course writing tests. Michael Feather
wrote a great book about this called "Working Effectively with Legacy Code."
brudgers's comment on this is good too but I have some small disagreements
with it.

~~~
fokz
Thanks for sharing. I often get lost in large projects. Blindly jumping around
is quite inefficient and frustrating.

How hard do you think it is to write a tool to draw dependencies map for a
specific language?

May be there're built-in code analyzing tools in compilers for popular
languages that I'm not aware of?

~~~
h8liu
For golang std lib, since packages have no circular deps, it can be
automatically drawn out like this:

[http://lonnie.io/gostd/dagvis/](http://lonnie.io/gostd/dagvis/)

If you write your project that even has no circular deps among files and all
files are small (like me in
[https://github.com/h8liu/e8vm](https://github.com/h8liu/e8vm)), you can draw
the similar graph but at a much finer granularity, like this:

[http://8k.lonnie.io/](http://8k.lonnie.io/)

~~~
baby
wow! What did you use to make these? It looks awesome!

~~~
h8liu
Thanks.

Here are my hand-made tools to generate those stuff:
[https://github.com/h8liu/e8tools](https://github.com/h8liu/e8tools)

Javascript version (I actually wrote this first for go std lib):
[https://github.com/h8liu/dagvis](https://github.com/h8liu/dagvis)

------
scott_s
A post from last year, "Strategies to quickly become productive in an
unfamiliar codebase":
[https://news.ycombinator.com/item?id=8263402](https://news.ycombinator.com/item?id=8263402)

My comment from that thread:

I do the deep-dive.

I start with a relatively high level interface point, such as an important
function in a public API. Such functions and methods tend to accomplish easily
understandable things. And by "important" I mean something that is fundamental
to what the system accomplishes.

Then you dive.

Your goal is to have a decent understanding of how this fundamental thing is
accomplished. You start at the public facing function, then find the actual
implementation of that function, and start reading code. If things make sense,
you keep going. If you can't make sense of it, then you will probably need to
start diving into related APIs and - most importantly - data structures.

This process will tend to have a point where you have dozens of files open,
which have non-trivial relationships with each other, and they are a variety
of interfaces and data structures. That's okay. You're just trying to get a
feel for all of it; you're not necessarily going for total, complete
understanding.

What you're going for is that Aha! moment where you can feel confident in
saying, "Oh, that's how it's done." This will tend to happen once you find
those fundamental data structures, and have finally pieced together some
understanding of how they all fit together. Once you've had the Aha! moment,
you can start to trace the results back out, to make sure that is how the
thing is accomplished, or what is returned. I do this with all large codebases
I encounter that I want to understand. It's quite fun to do this with the
Linux source code.

My philosophy is that "It's all just code", which means that with enough
patience, it's all understandable. Sometimes a good strategy is to just start
diving into it.

~~~
bentcorner
I find it frustrating that languages features work actively against you when
you're trying to understand something.

Wide inheritance and macro usage are probably the worst. Good naming can aid
understanding, but basic things like searchability are harmed by this.

Of those two, macros are the most trouble. You can't take anything for
granted, and must look at every expression with an eye for detail. Taking
notes becomes essential.

------
JustSomeNobody
1\. I make sure I can build and run it. I don't move past this step until I
can. Period.

After that, if I don't have a particular bug I'm looking to fix or feature to
add, I just go spelunking. I pick out some interesting feature and study it. I
use pencil and paper to make copious notes. If there's a UI, I may start
tracing through what happens when I click on things. I do this, again with
pencil and paper first. This helps me use my mind to reason about what the
code is doing instead of relying on the computer to tell me. If I'm working on
a bug, I'll first try and recreate the bug. Again, taking copious notes in
pencil and paper documenting what I've tried. Once I've found how to recreate
it, I clean up my notes into legible recreate steps and make sure I can
recreate it using those steps. These steps are later included in the bug
tracker. Next I start tracing through the code taking copious notes, etc, etc.
yada yada. You get the picture.

------
monk_e_boy
Debugger! Surprised no one has mentioned it yet. I work in js and php, both of
which I use the debugger a _lot_.

Set a breakpoint, burn through the code. Chrome has some really nice features
- you can tell it to skip over files (like jQuery) you can open the console
and poke around, set variables to see what happens.

Stepping though the code line by line for a few hours will soon show you the
basics.

~~~
shogun21
What debugger do you use for PHP? I've yet to find one I really like.

~~~
spdionis
Xdebug is de facto the only debugger AFAIK. The integration in Phpstorm is
great.

------
kabdib
I just crack open the source base with Emacs, and start writing stuff down.

I use a large format (8x11 inch) notebook and start going through the
abstractions file by file, filling up pages with summaries of things. I'll
often copy out the major classes with a summary of their methods, and arrows
to reflect class relationships. If there's a database involved, understanding
what's being stored is usually pretty crucial, so I'll copy out the record
definitions and make notes about fields. Call graphs and event diagrams go
here, too.

After identifying the important stuff, I read code, and make notes about what
the core functions and methods are doing. Here, a very fast global search is
your friend, and "where is this declared?" and "who calls this?" are best
answered in seconds. A source-base-wide grep works okay, but tools like Visual
Assist's global search work better; I want answers _fast_.

Why use pen and paper? I find that this manual process helps my memory, and I
can rapidly flip around in summaries that I've written in my own hand and fill
in my understanding quite quickly. Usually, after a week or so I never refer
to the notes again, but the initial phase of boosting my short term memory
with paper, global searches and "getting my hands to know the code" works
pretty well.

Also, I try to get the code running and fix a bug (or add a small feature) and
check the change in, day one. I get anxious if I've been in a new code base
for more than a few days without doing this.

~~~
rymndhng
Totally agree with the point of pen/paper.

Something that compliments that approach is in-code annotation. Recently, I've
recently been trying out
[https://github.com/bastibe/annotate.el](https://github.com/bastibe/annotate.el)
which is pretty sweet. Check it out!

~~~
john2x
Off topic, but anyone know what font and theme (it looks like the default
theme but I'm not sure) are used in the project's screenshots?

~~~
systemfreund
The font is PragmataPro, which I am also using. Best font ever, but expensive.

------
agentgt
There is a significant number of answers that may interest you on
Stackoverflow. Specifically: [http://stackoverflow.com/questions/215076/whats-
the-best-way...](http://stackoverflow.com/questions/215076/whats-the-best-way-
to-become-familiar-with-a-large-codebase)

Two things I do to familiarize with a code base is to look at how the data is
stored. Particularly if its using a database with well named tables I can get
some rough ideas of how the system works. Then from there I look at other data
objects. Data is easier to understand than behavior.

The other is watching the initialization process of the application with a
debugger or logger. Along those lines if your lucky (my opinion) and the
application uses dependency injection of some sort you can look to see how the
components are wired together. Generally there is an underlying framework to
how code pieces work together and that generally reveals itself in the
initialization process if its not self evident.

------
bite_victim
Side rant:

I just cannot believe people praising 'Unit Test'-ing. Fellow programmers, how
exactly do you unit test a method / function which draws something on the
canvas for example? You assert that it doesn't break the code?!

I see some really talented people out there who write unit test as proof that
their code works without issues, that it's awesome and it cooks eggs and bacon
etc. They write such laughable tests you cannot even tell if they are joking
or not. They test if the properties / attributes they are using in methods are
set or not at various points in the setup routine. Or if some function is
being called after an event is being triggered.

My point is this: unit testing can only cover such tiny, tiny scenarios and
mostly logic stuff that it is almost useless in understanding what is going on
in the big picture. Take for example a backbone application like the Media
Manager in WordPress. Please tell me how somebody can even begin to unit test
something like that.

Unit testing is a joke. And sometimes a massive time consuming joke with a
fraction of a benefit considering the obvious limitation(s).

~~~
fourier
We've used image comparison tool, which produces pixel-wise diff with the
expected image, exactly to verify these kind of things(we've been developing
the rendering tool). In addition to this unit tests, combined with the
coverage tools, allows you to find potential problems/crashes etc in your
code. Different levels of testing are for different things, unit tests just
one of the pieces in the equation.

Your point is just for some tiny tiny scenarios of the software you are
working on.

You don't need to think about 'how could I write a unit test', you need to
think about how could you improve the quality of the code, and unit tests are
just one of your tools available to solve this problem.

~~~
bite_victim
That is pretty awesome that you've wrote a such a tool (although I can only
imagine how long it took to create such a tool and how it affected the project
time frame).

From a web developer's mind: the coll thing is that the tool can be further
developed and taken to new directions. For example implement the capability to
take snapshots of pages and see if they've changed in layout and notify the
user of changes (pretty awesome for scrapping).

I totally agree, unit testing is such a small cog in the wheel of software
quality that it is truly a shame how something like this takes all the scene.

------
Mithaldu
This may or may not apply to you, since i work with Perl. Typically i'm in a
situation where i'm supposed to improve on code written by developers with
less time under their belt.

As such my first steps are:

1\. tidy/beautify all the code in accordance with a common standard

2\. read though all of it, while making the code more clear (split up
if/elsif/else christmas trees, make functions smaller, replace for loops with
list processing)

While doing that i add todo comments, which usually come with questions like
"what the fuck is this?" and make myself tickets with future tasks to do to
clean up the codebase.

By the end of it i've looked at everything once, got a whole bunch of stuff to
do, and have at least a rough understanding of what it does.

~~~
lukaslalinsky
Please don't take this as a criticism, but how long have you been programming?
I'm asking because I used to have an opinion like this when I was just
starting, but after a few years I realized that changing all of the code as
the first thing is one of the worst things to do.

~~~
Mithaldu
For pay since 2005. Do keep in mind that the code i am working with usually
has some sort of test suite available, and that over time i have become very
good at transforming code between different forms of expression without
changing the effects it causes. (Excluding memory use and performance, which
is not something one usually has to consider much in Perl.)

~~~
lukaslalinsky
Ah, so long enough for the advice to be based on real experience.

I was really surprised to see it, because it's exactly the way I was learning
C. I switched to Linux around the same time, so I'd take some abandoned DOS
program that had source code available and port it to Linux. During the
process, I'd read the entire source code, make sure I understood it and
reformatted everything. A few years later, the original author of one of the
programs released a new version and thanks to my reformatting, it was pretty
much impossible to merge. After a few similar experiences at work, I have
decided to always stick with the original style of any code I touch, because
any unnecessary changes are just going to make life harder for me in the
future.

~~~
Mithaldu
Three things to keep in mind here:

1\. Perl is MUCH more concise than C, since we have institutionalized code
sharing (see CPAN), whereas most C devs i know (and maybe i don't know too
good ones) seem to at most reuse code others wrote by way of ctrl+c/ctrl+v.

2\. Perl's reformatting tools are automatic. I just have a little config file
that says how long my lines are supposed to be and where i'd like the spaces
on my parens (inside, between the parens and arguments) and then i hit ctrl+e
and boom it's done. If i need to do it to many files, find + perltidy. In your
case i would've just taken his new version, automatically formatted everything
in less than 5 minutes, and merged on top of that.

3\. When i do this, i'm doing it with team lead consent and team concensus, in
an authority position, not with random code maintained by people i never even
talked to. :)

~~~
mercurial
The big problem (apart from the fact that ill-formatted codebases have often
much more serious problems...) is that reformatting is an excellent way of
messing up your VCS' diff ability, which is extremely precious in trying to
understand why things have been done the way they have.

------
vineet
I studied a lot of people doing this as part of my PhD. The thing is that
there are not many answers that work well in a lot of situations. Given that
though, my suggestions is to iterate on developing three views of the code:

1\. The Mile High View: A layered architectural diagram can be really helpful
to know how the main concepts in a project are related to one another. 2\. The
Core: Try to figure out how the code works with regards to these main
concepts. Box and arrow diagrams on paper work really well. 3\. Key Use Cases:
I would suggest tracing atleast one key use case for your app.

------
jpgvm
I usually work on more traditional command line applications and daemons so my
approach might be a little different to a web developer.

I always start by gauging how much source code there is and how it's
structured. The *nix utility "tree" and the source code line counter "cloc"
are usually the first 2 things I run on a codebase. This tells me what
languages the applications uses, how much of each, how well commented it is
and where those files are.

The next thing I usually do is find the entry point of the program. In my case
this is usually an executable that calls into the core of the library and sets
up the initial application state and starts the core loop and routine that
does the guts of the work.

Once I have found said core routine I try to get a grasp for how the state
machine of the program looks like. If it's a complicated program this step
takes quite a while but is very important for gaining an intuitive
understanding of how to either add new features or fix bugs. I like to use my
pen and paper to help me explore this part as I often have to back track over
source files and re-evaluate what portions mean.

Once I have what I think is the state machine worked out I like to understand
how the program takes input or is configured. In the case of a daemon that
often means understanding how configuration files are loaded and how the
configuration is represented in memory. Important to cover here is how default
values are handled etc. I actually prioritise this over exploring the core
loops ancillary functions (the bits that do the "real" work) as I find it hard
to progress to that stage without understanding how the initial state is
setup.

Which brings us to said "real" work. Hanging off of the core loop will be all
the functions/modules are called to do the various parts of the programs
function. By this time you should already know what these do even if you don't
know how they work. Because you already have a good high level understanding
at this point you can pick and choose which modules you need to cover and when
to cover them.

------
droppedasakid
Whatever your IDE/editor of choice is, I think these having these three
functions are critical to learning a new codebase, or even developing for that
matter: 1\. Go to definition 2\. Find all references 3\. Navigate back

This allows you to go down any code rabbit hole, figure stuff out, then get
back to where you were. If you can't do those things it will take much longer
to understand how things are interconnected.

~~~
SloopJon
Absolutely. In Emacs, I depend heavily on etags and the occasional rgrep to
find my way around a fairly large project written mostly in C.

I haven't dealt with a JavaScript project large enough that I've bothered
setting up tags, but I imagine something similar is available.

------
gshx
I start with running the tests if there are any. Typically peeling layers of
the onion starting with the boundary. If there are no tests, then I'll try to
write them. Then running tests in debug mode helps step through the code. If I
have the luxury of asking questions to an engineer experienced with the
codebase, I request a high level whiteboarding session all the while being
cognizant of their time.

Some others have mentioned recency/touchTime as another signal. For large
complex codebases, that may or may not always work.

------
brudgers
When you think you understand something write a test and test your belief. If
the test passes then both your knowledge and the code base are better for it.
If the test fails then rewrite the test to the failure and write another test.
Again you will know more and the code base will be better.

Good luck.

~~~
tessierashpool
I feel your comment should be the top one. but I disagree with this bit:

> the code base will be better.

I'd change "will be" to "might get." because this is true if you're doing unit
tests that the code base can use. but sometimes you do characterization tests,
which are not worth keeping around. or you might build a couple variations on
"hello world" with the unfamiliar code base, just to be sure that it works the
way you think it does.

~~~
brudgers
When writing a test exacerbates a too-many-tests problem, it's both a rare and
and good problem to have because reading and running such tests is a more
expeditious route to understanding than writing tests in the blind.

------
nissimk
I agree with what many others on here have said. It's also a personal thing.
In general I like to try to force myself to learn only the minimum required to
do what I need to do. If that philosophy sounds good to you, I would recommend
taking the buggy version of frozen columns and try to fix the bugs. You may
learn that the bugs are structural and you need to implement it differently,
or you might be able to fix it with minimal changes. You will certainly get an
understanding of the parts of slickgrid that you need to interact with to add
this feature.

For the ajax data source thing, I would try to modify or extend the existing
data source code to add the behavior you are looking for. As you mess around
with it trying to figure out what you need to change, you will encounter the
areas of the code that you need to understand.

With this sort of strategy you can avoid having to fully understand all the
code while still being able to modify it. You might end up implementing stuff
in a way which is not the best, but you will probably be able to implement it
faster. It's the classic technical debt dilemma: understanding the complete
codebase will allow you to design features that fit in better and are easier
to maintain and enhance, but it will take a lot longer than just hacking
something together that works.

------
fourier
I'm working a lot with a huge legacy codebases in C/C++. Here are some
advices:

1\. Be sure what you can compile and run the program

2\. Have good tools to navigate around the code (I use git grep mostly)

3\. Most of the app contain some user or other service interaction - try to
get some easy bit (like request for capabilities or some simple operation) and
follow this until the end. You don't need a debugger for it - grep/git grep is
enough, these simple tools will force you to understand codebase deeply.

4\. Sometimes writing UML diagrams works -

\- Draw the diagrams (class diagrams, sequence diagrams) of the current state
of things

\- Draw the diagrams with a proposal of how you would like to change

5\. If it is possible, use a debugger, start with the main() function.

~~~
PirateDave
_4\. Sometimes writing UML diagrams works_

I found myself doing this more often and find it very useful. I've been using
Freemind and it seems to do the trick.
[http://freemind.sourceforge.net/wiki/index.php/Main_Page](http://freemind.sourceforge.net/wiki/index.php/Main_Page)

~~~
fourier
Yes this could work for personal things. But it is often you need to represent
your understanding of the current state of affairs and propose your ideas, so
the UML is a good tool to do it - everybody understand it or could learn in
like 0.5 hour.

------
Sakes
I wish I had a better answer, but I honestly just stumble around it. I
typically start by trying to understand how they structured their files, then
I'll start diving into the code. I wouldn't try to "understand" it completely.
Just look over it until you feel comfortable enough to try to make some
modifications.

Michael's code looks clean and well organized. Shouldn't be terribly difficult
for someone proficient at JS.

~~~
amorphid
Glad I'm not the only one who stumbles around :)

Another thing I do is try and replicate one core feature of the thing I'm
trying to understand. Like others have suggested, I like using debuggers.
Recently, I wondered how debuggers work. So I built the following to find out.

[https://github.com/amorphid/rails-
debugger_example](https://github.com/amorphid/rails-debugger_example)

------
eterm
My approach is to break stuff. If I can break it (and I am good at finding
bugs, so I usually can) then I now have a narrow focus which helps me getting
"lost" in the code base.

Once I've found and fixed a few things, or if the code base is particularly
small or clean that I can't find bugs to fix, I'll set about hacking in the
feature I'd like.

I usually start by doing it in the most hacky way possible. That sounds like a
bad approach but it narrows the search of how to implement it and means I'm
not constraining myself to fit the code base that I don't yet appreciate.

In hacking that feature I'll often break a few things through my carelessness.
In then trying to alter my hacked approach so it no longer breaks stuff I'll
become more aware of the wider code base from the point of view of my initial
narrow focus. This lets me build up the mental model.

Eventually I'll be comfortable enough I can re-write the feature in a way more
consistent with the wider code base.

I don't normally start by trying to "read all the code" because that
guarentees I won't understand much of it (I'm not quick at picking up function
from code). I might have a skim if it is well organised, but I find the
"better" written a lot of stuff is, the harder it is to grok what it is
actually doing from reading it. to me, reading good code is often like trying
to read the FizzBuzz Enterprise Edition[1].

I've worked on many legacy systems: I was last year implementing new features
into a VB6 code base, this year (at a different job) I am helping migrate from
asp webforms to a more modern system. I've found that starting with trying to
fix an issue to be the best way to dive into the code base.

Use good source control so you're never "worried" about changing anything or
worrying that you might lose your current state. Commit early, commit often,
even when "playing around".

[1]
[https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpris...](https://github.com/EnterpriseQualityCoding/FizzBuzzEnterpriseEdition)

------
jdefr89
I tend to use a hybrid approach, but In general I try to identify the entry
point of the code which will lead me to the core datastructures and possibly
event loops that act as a central hub for any other code that is called.. That
is I look for some kind of dispatch pattern that Integrates the rest of the
system, routing and calling different code when needed. Once you identify this
"hub" you will have a good mental model and the system and its high level
components. From there you can delve into different subsystems and slowly
tweak and make changes to be sure a code path does what you conjecture it may
do. Using a debugger is helpful at certain points to explore depth of the
code.. When you can get a small tweak working as expected you probably have a
decent starting model of the code base that you can easily add to.

------
spion
Another thing that is helpful, especially if you don't even have knowledge of
the problem domain of the codebase: Write a glossary.

As you read the code and encounter terms/words you don't know, write them
down. Try to explain what they mean and how they relate to other terms. Make
it a hyperlinked document (markdown #links plus headings on github works
pretty well), that way you can constantly refresh your memory of previous
items while writing

Items in the glossary can range from class names / function names to datatype
names to common prefixes to parts of the file names (what is `core`? what
belongs there?)

Bonus: parts of the end result can be contributed back to the project as
documentation.

------
aleem
Some good pointers and links here, surprisingly they miss both my favourite
approaches.

1\. If it's on Github, find an issue that seems up your alley and check the
commits against it. Or the commit log in general for some interesting commits.
I often use this approach to guide other devs to implement a new feature using
nothing more than a previous commit or issue as a reference and starting
point.

2\. Unit tests are a great way to get jump started. It functions as a
comprehensive examples reference--having both simple and complex examples and
workflows. Not only will it contain API examples but it will also let you use
experiment with the library using the unit test code as a sandbox.

------
jameshart
For clientside JavaScript, one useful way in is to run the Chrome profiler on
it. That will produce a treeview of the calling hierarchy, and give you an
idea of what are the code's 'hotspots' \- the functions that are called from
everywhere, or the functionality which dispatches everything.

This can be especially useful for event driven code (looks like SlickGrid is
jQuery-based, so that definitely applies here); you can start a recording
profile, carry out the action you're interested in, then stop recording, and
you can then find out exactly which anonymous function is handling that
particular click or scroll or drag.

------
etagwerker
This is usually how I do it for libraries:

* Read the README.

* Install it and start using it with a couple of sample cases. That will give you an idea of what it does.

* Read the test suite. This will give you a better idea of what the library does.

* Look at the directory structure. This should tell you where things are.

* Start reading the core files.

* Start looking at open issues. Try to solve one by adding a test and changing the code.

* Submit a pull request.

------
barnacs
I think a top-down approach is pretty much the only way to do it: Start at a
high level of abstraction: packages, modules, namespaces, etc and their
relations. Pick one that seems related to some core functionality or central
to the change you intend to make and dive deeper: interfaces and data
structures within that unit and possibly other related units they depend on.
Ideally, up to this point you shouldn't even have to worry about function
definitions and algorithms, just declarations, types and relations.

While static typing helps a lot with this kind of exploration and navigation,
I don't know of any IDEs or other tooling for any language that would really
help you with it. Sure, you can probably generate UMLs or something, but it
usually requires some additional tool and the output is pretty static. You
can't just zoom in from a package-level view to an interface-level and then
keep zooming until you are eventually shown line-by-line implementation of a
specific function.

I've been thinking about this lately, and I've come to the conclusion that the
way we think and reason about code is pretty far from the way our tools
present it to us. I tend to think in terms of various levels abstractions and
relations between units, yet the tools just show me walls of text in some file
system structure (that may or may not mirror the abstractions) and hardly any
relationships.

------
mavidser
Well, I'm not very good at this either but here's what I do. I usually work on
modular projects where there are hundreds of files in the project. I usually
skip directly to locating the file where I've to make amends (using a lot of
grep. grep for function and object definitions, grep for usage patterns, grep
for checking how to implement something). Thus, I learn about the codebase as
I go along.

Sure, this is not the best practice, and unsuitable for many, but it's what
works for me.

~~~
jdavis703
If this is your modus operandi your life will be much improved with Ack. It's
specifically designed for searching codebases.

~~~
harrumph
Amen. Once you've had ack, you never go back.

~~~
rane
Isn't ag a better option these days?

------
geertj
My typical workflow for checking out new open source projects:

\- find . -type f

\- find . -name \\* .ext | wc -l (get an idea of complexity)

\- git log (is this thing maintained?)

\- find . -name \\* .ext | ctags

\- find main entry points, depending on platform and language

\- vim with NerdTree enabled

\- CTRL-], CTRL-T to jump to browse tags in vim

Generally a lot of find, grep and vim gets me started.

------
shawnps
Get it running locally and then see what happens when you delete some stuff,
especially stuff that you don't understand when reading through the code.

------
ignoramous
I work on AOSP, which is a fairly larger code base. During the early years,
the documentation on the internals of Android was close to non-existent.
Plenty of tutorials in Mandarin/Cantonese, but not many in English.

A good way to get hang of the code base was to read it (usually using a tool
like sourcegraph [0], pfff [1], open-grok [2], doxygen [3], javadocs [4]).
Although a lot of people have argued that code is to be not treated like
literature [5], but in this case, there was no choice.

The second step was to see if assumptions about the what the code does is
correct. This is usually achieved by adding log statements, writing sample
apps, and debugging in general.

Repeat the steps above, over and over again.

Checklist:

1\. No matter what you do, you absolutely need to document everything you
understand / misunderstand about the code base.

2\. Never underestimate value of having a different pair of eyes look at code
you have hard time reasoning about.

3\. Be in constant search for resources (like books, blogs) available on the
code / topic of your interest. You'd learn amazing amount by reading through
other people's analysis. Stackoverflow is a great start. Heck, you can even
ask well thought-out questions on Quora/Stackoverflow.

4\. Hang out on related IRC channels / community mailing lists. For things
written in esoteric languages such as OCaml, I found these to be pretty
helpful.

5\. You could blog about it, share the information you know over email lists,
setup wikis; and people who know better would correct you. Its a win-win.

Good luck.

[0] [http://sourcegraph.com/](http://sourcegraph.com/)

[1] [https://github.com/facebook/pfff](https://github.com/facebook/pfff)

[2]
[https://opengrok.github.io/OpenGrok/](https://opengrok.github.io/OpenGrok/)

[3]
[http://www.stack.nl/~dimitri/doxygen/](http://www.stack.nl/~dimitri/doxygen/)

[4] [http://www.oracle.com/technetwork/articles/java/index-
jsp-13...](http://www.oracle.com/technetwork/articles/java/index-
jsp-135444.html)

[5] [http://www.gigamonkeys.com/code-
reading/](http://www.gigamonkeys.com/code-reading/)

------
awinder
You got a little bit lucky with this project because there's a decently built-
out test suite. I would start by digesting the tests because if they're good,
you'll be able to see the mechanics about how the exposed interfaces in the
code work, and this should also give you a good idea if changes you're making
are breaking the expected workflow or not.

From my experience, there are really two ways that learning a new codebase can
happen. One is that there's an existing test suite that's fairly
comprehensive, and you can learn a lot by examining the tests, making changes
to add features / make bug fixes, and then validate that work by rerunning the
tests and adding new ones. That's really a great place to be as someone
unfamiliar with a new codebase. The other is that there are no tests, and you
inevitably need to rely on people familiar with the code, and make peace with
the idea that you're going to write bad code that breaks things as you learn
the depth of how the project works.

------
lukaslalinsky
I'm working with somebody else's code more often than writing something new
from scratch. It takes some time to get used to that, but it's very far from
the hardest tasks developers face.

A couple of things that I typically do:

\- Start with a fully working state, i.e. setup your environment, make sure
tests (if there are any) are passing. If you can't get things to work
properly, that's your first issue to investigate and fix.

\- Don't try to understand all of the code at once. You don't need it yet. I'm
assuming you want to take over the project for a particular issue. So just
focus on that and ignore the rest of the code. If you ask any senior developer
about something in their project, there is a great chance they will not
remember the exact details, but know where in the code to look at. Aim to get
at that level, not memorizing how everything works on the lowest level.

\- Don't make any changes to the code that you don't understand. I have a
recent example of this. Yesterday I was trying to find a bug in the Phoenix
database, which was failing to start after an upgrade. I have never seen the
code in my life. After some debugging I realized it's doing something with an
empty string that shouldn't be empty. The obvious "solution" is to add an
check if the string is empty and be done. Don't do that. Understand exactly
why is the problem happening and only do a change like that after you are sure
of all the implications. This has two effects, you are not introducing new
bugs and you are learning about the codebase. At the end, the fix from my
example was just a simple "if", but without understanding how is it ending up
with an empty string, I might have caused more problems than I fixed.

\- Use the VCS a lot when figuring our why something is done they way it's
done. Use "blame" to see when things have been changed, read through the logs,
etc. This is one of the main reasons why I don't like people
rebasing/squashing their commits before merging. There is so much information
they are throwing away this way.

\- Adopt the coding style of the existing code. Don't try to push your style,
either by having inconsistent style in different parts of the code or re-
formatting everything. It's just not worth it.

\- Don't be afraid to change things that need changing. There is nothing worst
than making a copy of some module, call it v2 and then having to maintain two
versions. If you are afraid to make a change in the existing code, make
yourself familiar with the part of the code first.

------
hal9000xp
I probably won't say anything new here. Last five years, I do the following in
order to get my foot wet with new project (some projects I worked with
contains more than two millions lines of code):

1\. Just make sure I can build project;

2\. Play around with services/application (just run, send some requests, get
response);

3\. Pick up simplest case (for example, some request/response);

4\. Find breakpoints (for debugging) somewhere connected with this simplest
case (for example, which stopped somewhere when I send request) and setup them
in debugger. Usually, I find place where to put breakpoint by just searching
keyword associated with my request;

5\. Play around with these breakpoints while performing simplest case (for
example, sending request) and try to find out call graph;

6\. Try to change code and see what happens;

After I do this stuff several days/weeks, I become more and more familiar with
the project.

------
physicsmichael
A very simple method that helps me is to make sure I tackle a new code base on
a large monitor, vertically oriented, with small font size. Add to that a pane
that shows the file/class structure. Seeing more at once helps ground me in
the types of interactions in the code and the code landscape.

------
dave_ops
I take it out for dinner and drinks. Spend some time getting know about it and
where it comes from and what it does for a living. Then after we're a few
cocktails in we get all philosophical. Really start asking the hard questions
like, "Why do I even exist? Is any of this real or is it all some weird
virtual world?"

We become fast friends and feel like we really understand each other.

But days pass, and each encounter feels less magical. It's almost like we
having nothing in common. Like we're from two completely different worlds. One
where its stuck in the past and one where I'm ambitious and excited about the
future.

After awhile we don't really speak to each other anymore, and after some
pretty ugly fights at work that get too personal... I rewrite it.

------
matthewrhoden1
I've worked in a lot of legacy code bases. Here's my approach: * Skim around
to get a general idea of what components are involved. * Try to understand
that one module/class that keeps getting used a lot or is really important. *
I mentally trace through that code, as if I'm a debugger. * Most importantly,
I write down my discoveries/understanding as I go to help me retain this idea.
* Re-skim with my new understanding and/or reorganize the code to be more
concise or simpler. Depending on how ambitious you are, you might try to keep
these changes. But with legacy code, it typically breaks as a result.

Every code base takes time to digest all the information. Sure the information
passed your eyes, but is it committed to memory?

------
VeejayRampay
Drawings will help tremendously. Extract the big masses, their respective
interfaces to each others and the means through which they communicate. This
will help build a mental map of the code and reduce the cognitive load needed
to understand each separate part.

~~~
h8liu
If the project does not have circular dependency, it can be automatically
drawn from the code like this:

[http://lonnie.io/gostd/dagvis/](http://lonnie.io/gostd/dagvis/)

or this:

[http://8k.lonnie.io/](http://8k.lonnie.io/)

~~~
wocram
What did you use to generate the grahps?

------
vpeters25
This answer is going to be rather unortodox and might get downvoted but this
is how I do it:

I just skim throught all the sources, then somehow I am able to point
approximate file and line of code where a specific question might be answered.

This might sound "out there" but I realized during college I had the ability
to recall the approximate location of specific information I needed from a
textbox If I just skimmed through the whole book at the start of the semester.

For years I did this out of intuition, then about 10 years ago I took a course
named "photoreading" and to my surprise they were teaching my "ability" but
with clear steps so anybody could use it effectively.

~~~
JoeAltmaier
This is underrated - folks are scared to just read the whole thing. Most of
our thinking isn't conscious. The sooner you have an impression of the whole
code, the sooner you can start having insights. I read the whole thing, every
time I start with a new code base that I'm going to be spending time with.

------
tjallingt
I personally like to reverse engineer functions within a certain codebase to
better understand what is happening.

For example I would start by looking up out a basic example of that codebase
and for each of the function calls go through the files and see what is
happening. This gives me an idea of how the code base is written and how it
works. It also gives a clear understanding of the level of
separation/specificity of the different functions.

Disclaimer: not very experienced so there might be better ways to
familiarising ones self with a new codebase, this is just one way of doing it
and it has worked for me in the past.

------
scrabble
I generally just start by fixing small bugs in different areas of the system.
I find that debugging various areas of the system help me understand them
better and allow me to start forming a cohesive picture in my mind.

------
wiresurfer
The most critical step is to get the lib in your workflow , preferably with
(build-introspect-debug) capabilities. This increases the upfront time to
start, but leads to much quicker "code understanding in my opinion.

TL;DR; Start with the minimum exposed surface area of the project (API), dig
through these functions first. Definitely know the initialization sequences
the library needs.

This is my approach concerning JS projects or for dealing with other peoples
code in general.

First, I make a mental model of what I want to do. !important. Then I write
the smallest wrapper needed to start fledging out points where "separation-of-
concern" happens.

At this point I should have an idea of what the other persons libraries expose
as API. I also should have an idea of what can be done with a unmodified
library, and what would need patching.

Then comes monkey-patching the lib at individual function levels with a
healthy dose of TODO markers and NotImplemented Method signatures.

By this point I should have a good picture of what goes on in the library
apart from what gets exposed and would probably have forked a branch by now.

This strategy has been useful not just for JS projects but bigger codebases of
java/scala libraries like Lucene Core/Solr or Play framework, Django in the
python realm and to limited success with Research code releases like Stanford
Core NLP.

------
wazari972
I like to use interactive debuggers like gdb (for C) or pdb (for python) for
that.

You first have to localize a region (function) you want to study, then you
reach one of its execution with a breakpoint, or a conditional breakpoint.

Then, you inspect:

\- the callstack: in which condition the function was call

\- the parameters / local variables

\- the subfunctions: in both tools, you can manually call any (reachable)
function, try different parameter values and check the result. Pay attention
through to the side effects!

------
pmontra
Build it, if there is something to build. Scripting languages usually don't
have builds but JS minification and dependencies installation could be a
build. Find and read the code paths that perform some recognizable action. Run
tests, read them. Add a new feature with tests or pick an open issue and fix
it. You're going to have to debug something and that will give you more
insight in the inner workings of the code.

------
l-jenkins
I see a lot of comments talking about code that is in a repository. And that
is great, if you have it available. There have been many, many times where our
team is handed an application that is broken (or has a bug) and asked to fix
it. In many, many of those cases we don't have access to the original
repository, or there wasn't one.

We generally approach it with heavy customer/owner involvement at first. We
need to know what the application's intended purpose is. It is sort of like a
lightening BA session. We get what the application should do, and what it
isn't doing properly out of this session (and more importantly, what it should
be doing instead).

Our first step: get it into a repo.

Now that we have an understanding of what the application's intended purpose
is, we can dive into the code. We don't have any analysis tools (but if there
are some that people could recommend, I'm all ears) outside of our IDE (Visual
Studio). We generally look for the last-modified date as an indicator of what
needed work most recently. Of course, we don't have file history so we don't
know exactly what changed, but it gives us a rough idea of what was worked on
and when.

Next we usually try and use the application in our development environment. We
chase each action a user takes in the code to determine what is the
core/central part of the application. After that, we try to determine the
cause of the problem (and while we are at it, we generally do a security
review of the code).

It takes time, and is painstakingly nuanced and very boring. But I'm not sure
what other options we have in such cases. As I said, I'm all ears as to what
other might do in these situations.

------
aethertap
The first thing I do is try to get a handle on the libraries it pulls in
(maybe spend a day just going through the high-level readme material for each
one). That will usually tell me where to start looking for the entry points
where I might want to start modifying things. After that, I give myself a
series of small functionality changes to implement, kind of like capturing a
bunch of little flags. After doing that for a bit I usually have a decent idea
of how things work, and it's easier to go forward, at which point I can dig
into the relevant parts of the codebase with more confidence.

The first few mods are inevitably disgusting hacks, so don't pick anything you
want to keep for your first couple of goals. It is pretty easy to go back and
do them right once you've got your head around the rest of the project if you
do end up wanting to keep them though.

I've used this method on some decently large C++ and javascript projects
(around 100k-200k lines) and it works pretty well for me. I don't learn very
well by just reading the code, but doing the little mods seems to make it
stick.

------
datashovel
If it's not obvious just by looking at how the directories are structured, and
files are named, generally I find that everything is (or should be) relatively
easy to understand if you start from the perspective of a user.

1) Read docs for how to USE the library if they exist

2) Review example code that describe how a person would use the library to
accomplish tasks.

3) In order to start diving in, find a specific example that does something
interesting, then hop in from there. Read the code within the methods /
functions the user calls, then the functions / methods called inside those,
etc.

4) As you dig deeper you may start finding that you understand, or you'll
start building up your own hypotheses like "If I change X to Y in this
function then something different should happen when I call it". Try it out,
and see if your hypothesis is correct.

After a few iterations of doing something like this you'll probably start
getting an idea of how the code is structured and where you'd need to go in
order to make the changes you'd like to make, or add the features you want to
add.

------
corysama
I pick a function or an outcome and type out the pseudocode stack traces
leading to that in notepad.

I include function names and the names of the variables passed as parameters.
But, no braces or other syntax. Almost always omit branches/variable
decls/error checking. Include all interesting function calls along the path,
but omit any branches/function bodies that lead off the desired path. Inline
callbacks as function calls with addition notation. If the process has
separate steps that aren't a single call/callback tree, start a new tree with
the note "then later..."

To do this, I have to start from the line of code that enacts the outcome and
determine the backtrace with a combo of debugger stack traces and examining
the code for branches/callbacks of interest.

But, when it's complete, I'll have the start-to-finish process of some
complicated task in the code --usually on a single screen of text. It's a
tremendously better use of my short term memory to scan over that than to
constantly bounce around the actual code base.

------
jdavis703
When I have a new code base that I'm unfamiliar with and need to understand it
quickly, I'll go line-by-line and add comments about what I believe to be the
intended behavior. As I gain more knowledge I'll update the comments. For me
explaining something I've learned helps me commit it to memory better, and
makes sure I really did comprehended what I just read.

------
jajaBinks
For a large c/C++ code base, I use an editor called SourceInsight. This is the
most invaluable tool for navigating code I've come across in my 3 year career
as a software developer. I work in a very large software company, and there
are several code bases running into millions of lines of C/C++ code. My
previous team had 60,000+ files, with the largest file being about 12k loc.

If you have access to logs from a production service / component, I find
TextAnalyzer.net quite invaluable. I take an example 500 mb log dump - opened
in TextAnalyzer.net and just scroll through the logs (often jumping, following
code paths etc), while keeping the source code side by side. This allows me to
understand the execution flow, and is typically faster than attaching a
debugger. If it's a multi-threaded program, the debugger is hard to work with
- and logs are your best friend. You are lucky if the log has thread
information (like threadId etc)

------
git-pull
I love wrapping my brain around large codebases in my spare time. I wrote an
application for help me download source code repositories in git, svn and
mercurial and keep them in sync:

[http://vcspull.readthedocs.org/en/latest/](http://vcspull.readthedocs.org/en/latest/)

I keep the applications I want to study in a YAML file
([https://github.com/tony/.dot-
config/blob/master/.vcspull.yam...](https://github.com/tony/.dot-
config/blob/master/.vcspull.yaml)) and type "vcspull" to get the latest
changes.

You can read my ~/.vcspull.yaml to see some of the projects I look over by
programming language. You can setup your config anyway you want (perhaps you
wanted to study programming languages implementations, so have ~/work/langs
and cpython, lua, ruby, etc. inside it.

------
dnprock
I don't do code reading or comprehension study. Reading code is boring. I
typically create a list of small tasks that I want to achieve with the
project. If the task is big, I break it down into smaller tasks. Then I rank
the tasks from easy to hard. This way, I can start learning about the codebase
and achieve my tasks.

In your case, frozen columns seems to be a hard feature. So I would start with
ajax data source. I'd start with a simple SlickGrid example and get it to run.
Then go find how SlickGrid sets up data source. Expand that piece of code to
add ajax data source. Once I finished ajax data source, I'd dig into frozen
columns.

If you are working on a new codebase and worry about bugs, you just give
yourself more stress. Bugs (that are not yours) are expected. If they aren't
blocking your task, ignore them. Most likely, they aren't relevant to what you
are trying to do.

------
goblin89
Document the codebase, in my experience it helps.

In case of JavaScript you’d probably use something like JSDoc. Describe your
units and make the tool automatically create beautiful HTML out of that. You
don’t have to document everything at once but be sure to lay the groundwork,
automate documentation build process, and in general try to make maintaining
the docs effortless (for yourself and for others). Take some existing well-
documented JavaScript codebase as an example.

This’d make a great contribution already: SlickGrid’s codebase is somewhat
poorly documented, which is a barrier to the involvement of interested
developers.

As you write the docs weak spots in existing implementation will come to your
attention, helping you figure out what to fix first.

One downside is that writing down and structuring your knowledge in easy for
others to grasp way is a challenge in itself, though arguably a useful
exercise.

------
yellowapple
It depends on the language, the libraries, the tooling, etc.

My dayjob is with a Ruby on Rails consultancy. Said dayjob involves
familiarizing myself with a _lot_ of different codebases. My strategy here is
rarely to try and digest the whole codebase all at once, but rather to focus
on the portions of code specific to my task, mapping out which models,
controllers, views, helpers, config files, etc. I need to manipulate in order
to achieve my goal.

The above strategy tends to be my preference for most complex projects. The
less I have to juggle in my brain to do something, the better. I tend towards
compartmentalizing my more complex programs as a result. For simpler programs
(and portions of compartmentalized complex programs), I just start at the
entry point and go from there.

Languages with a REPL or some equivalent are _really_ nice for me, especially
if they support hot-reloading of code without throwing out too much state.
Firing up a Rails console, for example, is generally my first step when it
comes to really understanding the functionality of some Rails app. For non-
interactive languages, this typically means having to resort to a debugger or
writing some toy miniprogram that pulls in the code I'm trying to grok and
pokes it with function calls.

For some non-interactive languages, like C or Ada, I'll start by looking at
declaration files (.h for C and friends; .ads for Ada) to get a sense of what
sorts of things are being publicly exposed, then find their definitions in
body files (.c/.cpp/etc. for C and friends; .adb for Ada) and map things out
from there. Proper separation of specification from implementation is a
godsend for understanding a large codebase quickly.

For a rigorously-tested codebase, I'll often look at the test suite, too, for
good measure. When done right, a test suite can provide benefits similar to
specification files as described above; giving me some idea of what the code
is _supposed_ to do and where the entry points are.

------
techbio
I've written scripts to read files and match function calls to their
definition/body and output text "trees"; but the process deserves some better
visualization, navigation of dependency graph/comprehension specific
highlighting. I'd be interested in trying an IDE that can do this.

------
orthoganol
First, have in your mind what the function of the chunk of code is. If it's
not important to the system, skip it, don't read it. If it is important to the
system, take a guess how you think it should work, how you would probably
implement it if you were the original develop. Then begin reading it.

------
LoneWolf
At least to me there is no specific method, I work mainly with Java, since
your specific case is JavaScript it may not even apply.

If the problem is some bug and there are stack traces that is my starting
point, debugger and a few breakpoints chosen from the trace and then follow
the stack and from there I start knowing how it is structured, and then the
next bug and so on (fixing them of course) For code where I need to add
features things get a little more tricky, but there is always some entry
point, a web-service invocation, some web page, and try to understand what it
is currently doing, again using the debugger to follow the calls and how the
data is changed (sometimes even going into libraries).

Reading the docs if there are any is also a good place to start.

Once again, use the debugger a lot, makes it easier to understand than just
reading the code.

(edit: formatting)

------
chipsy
I try to seek out the data structures first. If I need help doing it, I either
run a profiler or insert some debug prints to get an idea of what parts of the
callstack are "hot" and then progress from that to discovering the data.
(Languages that don't require type signatures everywhere often have this
problem of hidden structures.)

Once I know what the data is I can look at the code with an eye towards
maintenance of data integrity. I might still need some "playtime" to grok the
system but the one truism of large software is that data is always getting
shoved from one big complicated system to another, and I can usually identify
boundaries on those systems to narrow the search space.

(the exception to this is if you have code that leaks global state across the
boundaries. Then much swearing will occur.)

------
meadori
I enjoyed this presentation by Allison Kaptur on how to understand CPython
better:

[http://pyvideo.org/video/3465/exploring-is-never-boring-
unde...](http://pyvideo.org/video/3465/exploring-is-never-boring-
understanding-cpython)

While it is focused on CPython, most of the techniques are applicable
elsewhere. It also mentions a great article by Peter Seibel
([http://www.gigamonkeys.com/code-reading/](http://www.gigamonkeys.com/code-
reading/)) that discusses why we don't often read code in the same way we
would literature.

Essentially, as the complexity of software has grown people have been forced
to take a more experimental approach to understand software even though it was
created by other people.

------
tracker1
Try to fix/change/adjust something in the front-end and look back from
there... Although this can be frustrating depending on the codebase, but your
best bet in learning something new is to try to do something... even something
small. If you want to go the extra mile, add comments to stuff that doesn't
make sense as you go, and tag things for refactoring with Todo's and
corresponding tickets.

Going a step farther still would be to add to the user documentation as you
go...

Do something small, and iterative, and go out from there... for that matter,
just getting a proper build environment is hard enough for some projects...
automate getting the environment setup if it's complex. I've seen applications
with 60+ step processes for getting all the corresponding pieces setup.

------
twunde
The first thing I try to do is to understand the directory structure, ie where
should I be looking for files? Hopefully there should be a standard structure
that's used. After that I'll typically try to dig in and fix a minor bug or
two. This is especially helpful if you can narrow down the part of the
codebase you're working on. I also recommend using an IDE like WebStorm which
will give you the ability to jump to a function definition and will help you
find the functions you're calling.

One thing I do NOT recommend is changing the code style, unless you're ready
to take full ownership of the project. It can make it much harder for the
project owner to merge in and if there are any lingering PRs those will
typically need work to merge in properly.

------
buremba
I use debuggers a lot for that purpose. It really helps to find the code paths
for specific operations. Instead of reading code file by file, just setup a
debugger, set a few breakpoints to the code, perform an operation and follow
the read application code through through the paths.

------
macNchz
A proper IDE can go a long way towards understanding a large codebase. It will
be able to index everything so you can really quickly jump around the
project–being able to jump directly from a method call to its declaration
without momentarily context switching to search for where it lives is very
valuable.

As you start to add to a project the IDE can also prove valuable in
discovering how everything fits together, since it will provide smart and
helpful completions with docstrings, method signatures, types etc. This can
really help you start writing new code a lot faster.

Also, an IDE will usually also have a decent UI for running the code with a
debugger attached, which can be incredibly useful for understanding the
changing state of a running program.

------
dmuth
As someone who hates debuggers and is a fan of "learning by doing", I make
heavy use of console.log() or similar, and I start putting breakpoints all
over the code that print out sentinels ("hey, I'm in this part of the code")
and data ("the content of this variable are: XXXX").

Then I run the app and put it through its paces, while watching the output in
another console.

If there's some code that doesn't make sense, I use console.log() heavier in
that section, to help me fully understand what it does. Once I have that level
of understanding, I then write some comments in the code and commit them so
that other contributors may benefit in the future.

------
bliti
This codebase is documented and well structured. I would simply being by
tackling the issues on github first and sending pull requests. No need to take
over it right away. After you feel comfortable reading the code and knowing
where is what, you can ask to become a maintainer.

I'd try to fix it using the same style used in the codebase. This way anybody
else reading, maintaining ,or using it won't have to make sense of the new
style. Pay attention to how each method is defined. They are very readable.
Very few traces of complex one line statements.

Most importantly, be patient. You won't be any good with it in less than 2
weeks of constant tinkering. Good luck.

------
shurcooL
This is not a comprehensive answer, but it's additive.

If you're looking at a large Go codebase with many packages, I find it helpful
to visualize their import graph with a little command [0].

Here are results of running it on consul codebase:

    
    
        $ goimportgraph github.com/hashicorp/consul/...
    

[http://virtivia.com:27080/cehy9dnqaq92.html](http://virtivia.com:27080/cehy9dnqaq92.html)

[0]
[https://github.com/shurcooL/cmd/tree/master/goimportgraph](https://github.com/shurcooL/cmd/tree/master/goimportgraph)

------
padator
I use CodeMap[1] which is a kind of google maps but where the countries are
the code, and CodeGraph[2] which helps to understand code dependencies at
different granularities (package, module, files, functions). [1]
[https://github.com/facebook/pfff/wiki/CodeMap](https://github.com/facebook/pfff/wiki/CodeMap)
[2]
[https://github.com/facebook/pfff/wiki/CodeGraph](https://github.com/facebook/pfff/wiki/CodeGraph)

disclaimer: I am the author of those tools.

------
makmanalp
I do divide-and-conquer. Find some part or feature of the tool you know from
an outsider perspective, and then try to find it within the code. Then work
backwards from there. Maybe even try to fiddle with it to change how it works,
and see what happens.

I think reading each file or reading the data structures is more difficult
because you have no familiarity as to what is going on and you have no
knowledge of why things are structured as they are, so it'd end up like
reading a math paper straight down: memorize a ton of definitions without
knowing why, until you finally get to the gist of it.

------
deepaksurti
I first try to familiarize myself with the high level design/org of the code
base, going through the README, other docs, looking at the test code if any
and just generally scanning the important files/modules etc.

Then I prefer to jump into fixing any existing issue. Working on fixing an
issue teaches a lot, more fixes, then features, rinse, lather, repeat.

While this post talks about fixing compiler bugs, the overall steps are much
replicable: [http://random-state.net/log/3522555395.html](http://random-
state.net/log/3522555395.html)

------
bonestamp2
I like to try two impractical tasks (impractical in the sense that they might
not be possible, which is fine).

1\. Access some data in the highest level component from one of the lowest
level components

2\. Access some data in one of the lowest level components from one of the
highest level components

In a lot of cases, good architecture will prevent one or both of these from
being possible, but identifying how data flows through the app seems to be a
good way to understand the general architecture, limitations and strengths of
most apps. These two tasks give concrete starting points for tracing the data
flow.

------
kalari
I usually skim the code to get an idea of patterns and organization, get it
working in a local environment and then run/step-through the code. This
usually gives a good idea of what different pieces do.

------
gregulator
I've had to ramp up quickly on a number of projects so far during my career,
and I can tell you there's no substitute for simply reading the heck out of
the code. Yes it takes discipline to go through code line-by-line, and at
times may seem pointless or like its "not sticking". But persistence here pays
dividends.

The first read-through is not about comprehending everything. It's about
exposing your mind to the codebase and getting it to start sinking into your
subconscious. It's kinda like learning a new piece on the piano.

------
niuzeta
First build and run. See what it does. Check what it does and what I think it
does, see how they differ.

Start from main() and start from the one click event(or any end-game action).
Try to connect the two.

------
fsloth
I try to compose a formal model and algebra of the codebase - quite
informally, mind you. Takes a bit of pen and paper and a few caffeinated
drinks usually.

People really do learn quite differently and everyone needs to find their mode
of learning - there is no one single true way. This is one of the most
important skills in software development, IMO. Once you learn how you learn
you can apply it to most new contexts.

I write stuff down because for me that-the process of writing seems to be the
most effective way to learn.

------
crcsmnky
While this generally works best for larger code bases, I tend to start reading
through open bugs/tickets and find things that appear easy. Then I will assign
them to myself and do what I can to fix it or at least track it down.

Generally I find it hard to just start reading through packages, source,
functions, etc. and find it much easier to try and solve some sort of problem.
By tracking and debugging a particular issue through to the end, I find a
learn a lot about the codebase.

------
perlgeek
git grep.

I search for strings that appear in the frontend (or generated HTML source, or
whatever), and then I use a search tool (git grep) to find where it comes
from. And then I the same search tool again to trace my way backwards from
there to where it's called, until I find the code that interests me.

And then I form a hypothesis how it works, and test it by patching the code in
a small way, and observe the result.

Oh, and don't forget 'git grep'. Or ack, or ag, or your IDE's search feature.

~~~
mVChr
I find Sublime's Cmd-Alt-↓ (goto definition) to be very useful since you jump
straight to the source code for that function/class/method. When you grep you
may also get all the usage instances which can be quite a lot of noise.

------
zaphar
Other than the many great answers here I will frequently start by doing
cleanups of the codebase.

I'll start reading the files using any of the strategies mentioned here and
looking for things I can cleanup. Formatting, Simple Refactors, Normalizing
Names.

These are all things that are comparatively easy to do and safe but force you
to reason about the code you are reading. Asking yourself what you can
refactor or fix the naming for is a deent forcing function for actually
understanding the code.

------
zzzcpan
I found call tracers to be the most efficient way to do this kind of thing. It
could be as simple as a perl script inserting printfs on every call and every
return, since not every compiler supports instrumentation.

Simply digging through code, tests or reading commit messages in an unfamiliar
code base takes at least an order of magnitude more time.

EDIT: tried call graphs too, better than reading through code, but still
require you to understand and filter out a lot of unnecessary information.

------
bozoUser
I have recently jumped onto working for a very huge codebase at work. In
general here are a few tricks that helped me. 1) Look at the unit tests and
see the flow of the code 2) Try to make a mental picture of how the code is
organized(doing it on paper is more helpful) 3) Every codebase has few core
classes that do lot of heavy lifting, talk to other contributors and ask them
to point you to these.2) also helps you achieve this. Good luck.

------
sown
I only recently developed this skill a little.

The Ruby application server I looked at was for doing social network feeds.
Posts/Likes/Comments go in, feeds come out.

I followed some common code paths for things such as posting a comment and
getting a feed. I would write the stack trace down on paper as I went.

It also helped that I happen to know that this ruby server used wisper and
sidekiq. This way I didn't overlook single lines of code such as 'publish:
yada yada'

------
ivan_ah
On that note, could someone recommend a tool for automatically generating the
graph that shows the class dependencies/hierarchies in a Java code base? I'm
sure there are good tools out there, but all the ones I tried so far
(JArchitect, CodePro Analytix, SonarQube) don't seem to have a good graph
layout engine.

I'd like to print out a big graph and stick it to the office walls so I'll
have a good view of the logical structure.

------
thoman23
If it's code that I need to understand in intimate detail, I actually trace
through the code keeping notes with pen and paper. I complement a simple
reading of the code with actually exercising the code with test data and a
debugger. I go through a few iterations, each time learning a little more
about what is important and what can be safely ignored, until I eventually
build up a Gliffy diagram of the important parts.

------
coolsunglasses
If it's in Haskell, I start cleaning up and refactoring datatypes.

Like changing some function like:

    
    
       Text -> Text -> IO ()
    

into:

    
    
       ServerHost -> Path -> IO ()
    

Changing the types will naturally lead you through the codebase and help you
learn how everything fits together via the type errors.

In any language I'll try to read the project like the Tractatus.

In stuff that isn't Haskell? Break stuff and run the tests.

------
amenghra
When you find interesting pieces of code, look at the commit that brought it
to life. Commits contain precious gems of information: you'll understand what
files are related, who worked on which parts of the codebase, how the commit
was tested, related discussions, etc.

Some people use graphical tools to visualize a codebase (e.g. codegraph). It
can help you understand what pieces of code are related to each other.

------
misterjinx
This is one of the reasons I've always thought that each project should have a
minimal developer documentation that should include the project's scope, how
it's structured, what are its main components and how they are connected etc.
This would help a lot a future developer to faster start working on the actual
project and reduce the initial time spent on figuring what is all about.

------
kh_hk
When adding support for small new features or fixing bugs on large codebases
the answer is: you don't [1].

You do not need to familiarize yourself with the full codebase at the start.
It's too time-consuming and mostly not worth the effort. Set up an objective
and go for it slashing your coding axe around until it works.

[1]: Unless you have an special interest or you are assumed to familiarize
with the codebase.

------
xarien
I'd speak to the last person who worked on it face to face with a whiteboard
and a marker handy. Get a brain dump ASAP. Even if the person no longer works
there, you can take some time to contact them for a lunch. Most people would
not say no to this type of request (especially if you're buying). Just make
sure you have questions ready so you don't waste their time.

------
exacube
One idea is to use Linux's `perf` to sample stack traces, as the program is
running, over a minute or so and see where the code flows.

------
richardlblair
IMO, the one tool you can't do without is grep.

My typical strategy is to get the project running, then just get to work.
Start fixing bugs, and adding requested features. Use the code around you as a
guide on what is right and wrong within that company, and forge forward. When
you are unsure of something turn to grep, find some examples, and keep going.

------
estsauver
I try to work backwards from the public api to get a sense of the operations
that are supported by the system. A trick I picked up from a thoughtbot
training video a couple years ago for Rails applications is to look at the
routes file. If you work with webapps, the routes generally define the things
that people can do.

~~~
almog
This routes-trick is my starting point as well on web apps.

The next place I try to understand is the persistence layer - be it the a
database scheme or models working against remote APIs. Building a mental model
of the the data (around which the app surrounds) serves as a map for the rest
of the code tour.

------
ludwigvan
I gave a talk on this subject at At the Frontend conference in Denmark
recently. Take a look:
[https://vimeo.com/129469530](https://vimeo.com/129469530) It goes over
general techniques and then drills down into the React JS code base.

------
mtrn
Related question of programmers.se:
[http://programmers.stackexchange.com/q/6395/436](http://programmers.stackexchange.com/q/6395/436)

> What tools and techniques do you use for exploring and learning an unknown
> code base?

------
benjamg
Assuming there is some form of bug list associated with it that is often my
preferred way to learn a new code base.

Try to fix a bug and you'll soon find yourself having to learn how the code
involved works, and with a goal your focus will be better than just reading
through the code flow.

------
ausjke
I use source navigator to understand the code base. I wish someone will keep
improving it, especially the font etc under linux is not looking impressive,
under Windows it's all I need. I'm unsure if other tools can provide as many
functions for code base analysis.

------
MarkMc
One thing helps me enormously: I sketch a class diagram as I explore the code.
Here's an example:

[https://s1.whiteboardfox.com/s/494b923d01d7ad05.png](https://s1.whiteboardfox.com/s/494b923d01d7ad05.png)

------
fasteo
Brute force: Choose a new feature to implement and start looking for the place
to write your first line of code.

This is probably not the best way to approach this, but I am somehow ADHDish
and I need a clear task to avoid perpetual diving in the codebase.

------
lucidguppy2000
Write characterization tests for modules, see what inputs produce which
outputs. Then you have the start of unit tests.

Programming with unit tests really helps. And it points out where certain
parts are too entangled and bound to implementation.

------
ak39
"You cannot understand a system until you try to change it." ~ Kurt Lewin

------
stuaxo
Back when I did Java, using static analysis tools like findbugs, then going
and fixing all the issues found was a good way to get coverage of the
codebase... I'm sure for JS there must be similar analysis tools.

------
antoinevg
Read it until I can identify which fad of the moment the author was following.

~~~
mVChr
This is a particularly insightful (if snarky) comment.

The project I just had to refactor had a DSL that was completely unnecessary
and had a ton of business logic tangled up within the domain language itself.
I ended up being able to remove the DSL and parser completely in lieu of using
simple config files and extracting the business logic into middleware.

The DSL was probably originally created due to some Slideshare presentation
that had just hit the top of HackerNews 3 years ago. The original devs molded
the problem to fit the ability to use a DSL and cool parsing library rather
than figure out what the most suitable design for the problem was.

------
puissance
I don't.

Take the extreme programming approach. Don't try to familiarize yourself with
a new codebase all at once. Start small. Work on a small ticket. It will,
organically, help you assimilate what's happening.

------
Lord_Cheese
If there is a bug list handy, I find tackling a few small ones is often an
excellent way to get to know a codebase. It also gives some good insight into
the codebase's quirks and oddities.

------
makuchaku
Start with smaller bugs & try to fix them. Bugs help you to focus your
understanding on very small parts of code/paths. This helps in time spent vs
output vs confidence.

------
pbreit
Best thing by far is to find someone familiar with the code and spend 15-30
minutes with them in person or by phone. That should be possible in the vast
majority of situations.

------
IanCal
Try doing some profiling. It'll take you through some of the more heavily used
parts of the code, is useful in and of itself, and provides a target / some
focus.

------
netoarmando
Good resource for Code Spelunking:
[http://www.codespelunking.com/](http://www.codespelunking.com/)

------
OpenDrapery
Pick a class and new it up from a unit test. You will quickly find out where
the dependencies are, and how tightly coupled things are.

------
blago
The first thing I do is turn on db and http request logging. Sometimes this
alone can be quite a challenge.

------
nickbauman
I always read the tests first. (If there are no tests, I don't take the job.
Life is too short.)

------
elkhourygeorges
Pick couple of bugs and fix them. Best way to familiarize yourself with a new
codebase.

------
chris_wot
Answer: with great difficulty.

------
lloyd-christmas
Break it one line at a time.

------
aikah
sourcegraph.com can help.

~~~
Dowwie
looks useful , thanks

------
AdrianRossouw
read tests, and then start writing tests for things.

something usually comes up.

------
dm03514
Build and run the project locally

Then I write unittests

------
gdubs
Fix a bug. Repeat.

------
dm03514
I Write unittests

------
latenightcoding
grep -r "function()" .

------
mVChr
I've spent the last year rebuilding a huge business-critical system from
scratch (along with one other engineer). Yes, usually complete rewrites are a
Bad Idea®, but in this case product and business decided it was the only way
to move forward because the system was in maintenance hell and it was way too
difficult and risky to add new features. I discovered why as I learned the
architecture, business logic and features of this behemoth pile of spaghetti.
Here's what I recommend to do if you're in a similar situation, whether it be
a large and great project or a large and horrible project...

\- Get a functional dev environment set up where you can mess around with
things in a risk-free manner. This includes setting up any dev databases and
other external dependencies so that you can add, update and delete data at
will. There's nothing that gives more insight than changing a piece of code
and seeing what it breaks or alters. Change a lot of things, one at a time.

\- Dive deep. This is time consuming, but don't be satisfied with
understanding a surface feature only. You must recursively learn the
functions, modules and architecture those surface features are using as well
until you get to the bottom of the stack. Once you know where the bottom is
you know what everything else is based on. This knowledge will help you
uncover tricky bugs later if you truly grok what's going on. It will also give
you insight as to the complexity of the project (and whether it's inherent to
the problem or unnecessary). This can take a lot of time, but it pays off the
most.

\- Read and run the tests (if any). The tests are (usually) a very clear and
simple insight into otherwise complex functionality. This method should do
this, this class should do that, we need to mock this other external
dependency, etc.

\- Read the documentation and comments (if any). This can really help you
understand the how's and why's depending on the conscientiousness of the prior
engineers.

\- If there's something that you really can't untangle, contact the source.
Tell him what you're attempting, what you tried, exactly why and how it's not
working as you expect, and ask if there's a simple resolution (I don't want to
waste your time if there's not). You may not get an answer, but if you've done
a lot of digging already and communicate the issue clearly you might get a "Oh
yeah, there's a bug with XYZ due to the interaction with the ABC library. I
haven't had time to fix it but the problem is in the foo/bar file." You may be
able to find a workaround or fix the bug yourself.

\- When you do become comfortable enough to add features or fix issues, put
forward the effort to find the right place in the code to do this. If you
think it requires refactoring other things first, do this in as atomic a
manner as possible and consult first with other contributors.

\- Pick a simple task to attack first, even if it's imaginary. Get to more
complicated stuff after you've done some legwork already.

There are other minor things but this is generally my approach.

------
kungfooman
Overwrite functions in dynamic languages (like JavaScript) with some "dump all
arguments code" and call/return the original function, to get a quick glimpse
in the code. Though this doesn't work with closures without some extra eval
tricks.

