
Ask HN: Ways for reading open source code bases? - furtivefelon
Hi all,<p>Can any suggest a good way of understanding a largish code base (thousands of lines)? Is there any tools to help you visualize/understand a code base? In particular, i'm looking for tools for javascript/ruby/python code bases. Thank you very much!
======
patio11
Once upon a time, a client of the day job dropped a substantial Perl codebase
on our desk and said "Tell us what this does." My boss gave me the job, and
expected me to actually READ all the code, but given that I had no desire to
read through 100 kloc of Perl code commented only in Japanese, I went for
visualization instead:

1) Inspect several files for commonalities. Thankfully, the author was
obsessive compulsive about coding standards.

2) Write a parser for the Perl they used. Use it to glean what pages of the
site were connected to each other and what the flow control was like. (a -> b,
b -> c, b -> d, etc)

3) Plot that on a graph (all hard work already done for me:
<http://rgl.rubyforge.org/rgl/index.html> )

4) Visually inspect the graph to learn non-obvious things about the codebase
like "Oh, there is an English language version of the site embedded in here.
Isn't that TOTALLY UNDOCUMENTED." Do a bit more code to chop the graph into
subgraphs by related functionality (signup flow, admin functions, etc etc).

5) Spit out all the code into HTML pages with appropriate autogenerated
navigation, inline flow control graphs, and syntax highlighting. Do a bit of
quality control, add in some comments about notable things I had learned, burn
on CD and hand to customer.

6) Charge customer $X0,000 for the CD. The customer was overjoyed they got it
done so cheaply. (Did I mention _100 kloc of Perl_?)

~~~
cperciva
_(Did I mention 100 kloc of Perl?)_

Hey, you got off easy. Just imagine how much more work it would have been if
everything had been done in a _single_ line of Perl. :-)

~~~
patio11
70 hour workweeks, 3 hour daily commutes, salary below the median for a
starting graduate from my alma mater, and debugging code written in India
based on instructions translated by Babelfish: all of these I will endure
without complaint. But I will not blackbox analyze Perl one-liners. That is
just abusive.

~~~
tptacek
... why DO you work 70 hour workweeks with 3 hour commutes for below-market
wages? "Because, Japan"?

~~~
mahmud
This happened "once upon a time", he might have done it while learning the
ropes and picking up the language.

------
nostrademons
Google Code Search:

<http://www.google.com/codesearch>

I use it for working my way around Google's codebase, which is a few orders of
magnitude bigger than that.

Also, there's no substitute for getting your hands dirty and diving into the
code. You don't really understand something until you've changed it a few
times. Grab a couple of low-priority bugs and write some patches for them;
you'll learn far more than if you just sit down and study things.

------
gtani
ruby:

<http://eigenclass.org/hiki/rcodetools>

<http://railroad.rubyforge.org/>

[http://www.pathf.com/blogs/2008/12/read-the-source-luke-a-
re...](http://www.pathf.com/blogs/2008/12/read-the-source-luke-a-readers-
guide-to-the-rails-source/)

[http://stackoverflow.com/questions/37105/how-do-you-
actually...](http://stackoverflow.com/questions/37105/how-do-you-actually-
read-source-code)

\---------------------------

(you'll see lots of questions on stackoverflow: navigate/inspect/read source
repo's)

[http://stackoverflow.com/questions/1623906/programmatically-...](http://stackoverflow.com/questions/1623906/programmatically-
inspect-net-code)

[http://stackoverflow.com/questions/935516/how-does-macos-
dev...](http://stackoverflow.com/questions/935516/how-does-macos-developer-
navigate-large-code-base)

python:

<http://code.activestate.com/recipes/213898/>

[http://stackoverflow.com/questions/1568544/given-a-python-
cl...](http://stackoverflow.com/questions/1568544/given-a-python-class-how-
can-i-inspect-and-find-the-place-in-my-code-where-it-is)

emacs, vim, ctags, etags,

[http://stackoverflow.com/questions/1220456/navigating-
effect...](http://stackoverflow.com/questions/1220456/navigating-effectively-
through-source-code-in-linux)

------
rajasaur
Itd be good to get a debugger, connect it to the process and run through the
code. You will find a pattern emerging when playing with the application, say,
for different clicks of a webapp. After a few days, you will know how the flow
works.

That said, Ive been more successful looking at the forums of an open source
project, figure out what problems folks have and trying to solve them. You
will be amazed to see how much you can learn about the code base and
undocumented features solving those problems.

------
tptacek
Is it a web app? Open up Firebug, look for interesting URLs, and grep for
segments of the URL. Find the function that implements it, read bottom-up.

------
cmars232
I used to use pdb.set_trace() liberally to poke around at the internals of
Django to figure out what it was doing in undocumented places.

~~~
forsaken
I recommend ipdb as well. It's IPython's PDB replacement. pip install ipdb

------
ssamuli
I guess you could also take a look at a tool called "LXR Cross Referencer" at
<http://sourceforge.net/projects/lxr/>. Page <http://fxr.watson.org> includes
LXR-generated cross references for multiple operating systems.

------
xenoterracide
Maybe try this book? [http://www.amazon.com/Code-Reading-Open-Source-
Perspective/d...](http://www.amazon.com/Code-Reading-Open-Source-
Perspective/dp/0201799405) I'm not sure how good it is.

------
gte910h
Doxygen is great if it understands your language in question

~~~
tptacek
Doxygen is great when it Just Works, and pretty much useless when it doesn't.
Even for languages it groks.

The problem with the Big Code Navigation Tools is that when they're wrong,
they're misleading.

