
Ask HN: How do you read and understand an open-source project? - amerf1
Looking for different approaches<p>Would love to hear your experiences
======
_hardwaregeek
First, RTFM. If there's documentation, I read it. README, CONTRIBUTING,
whatever is available really.

Then, I start hunting through the codebase. Sometimes despite people's best
efforts at compartmentalization, there's one or two files that are the heart
of the project. Depending on the project they'll be different things. For
instance, TypeScript has checker.ts, which contains the core typechecking
logic. Ruby has vm.c, compile.c and parse.y. If that's the case, that's
actually very helpful, as I can spend the majority of my time in one file.

To aid in this hunting, I use a few tools. Stuff like grep and find (although
I prefer ripgrep and fd) are a huge help, cause you can search through large
codebases with relative ease. IDEs are great too. I particularly like being
able to goto definition, then go back, then go forwards, etc. Switching
between call site and definition makes understanding functions easier.

I take notes on occasion, although I don't always reference them. It's more to
process what I'm reading. I try to write notes about types, functions and
files. I do it in org mode and embed urls so that I can link definitions
together.

Definitely run the code as soon as possible. Then add print statements and see
where they go. I've used flamegraphs on occasion to see the stack trace.

------
letientai299
If the project is small (just a handful of files), then I will just jump
directly to the code.

But if it's medium to large (several folder, tens or hundred files), then I
prefer "working" with the code more than just read it.

The process usually follows this order:

\- Check the document for how to build and run the test. Make sure test passes
before doing anything else.

\- Once test pass, go back to the document, look for the program entry point
(if it's built into a binary), or the exposed interface (if it's library).
Skim through that to get the overview.

\- Load the project into my IDE, and try debug the test to understand the
flow. Sometimes, I'll also write new code to check my understanding.

That said. If the goal of reading that OSS project is to hunt down a bug, I
would just start from my own code, and debug into the library itself. Skip the
overview part.

------
simonw
After glancing over the README here are a few of the other things I take a
look at.

The "contributors" tab on GitHub helps get an idea for the health of the
project: how long has it been actively maintained? Who did most of the work?
[https://github.com/dgraph-
io/dgraph/graphs/contributors](https://github.com/dgraph-
io/dgraph/graphs/contributors)

For software libraries, I like looking at the brand new "used by" tab -
[https://github.com/huge-
success/sanic/network/dependents](https://github.com/huge-
success/sanic/network/dependents) \- in addition to indicating project health
it's also a great source of examples to look at later when I'm trying to
figure out how to do advanced things with the library.

I love reading through the CI configuration - .travis.yml or
.circleci/config.yml - because at the very least it shows what it takes to run
the test suite, and often I'll pick up some fun new CI and automation tricks.

I use GitHub code search extensively: sometimes for searching within the
project, but I'll also search the whole of GitHub for examples of people doing
something I want to do with the library:
[https://github.com/search?q=sanic+cookies&type=Code](https://github.com/search?q=sanic+cookies&type=Code)

~~~
TehShrike
Good tip about Github's code search, I always forget it's a thing and don't
use it nearly as much as I should

------
Uptrenda
I read it like a book and Google all the tooling I don't understand. Usually
the first page will be something like a make file. Understanding every detail
of this file will reveal the many hacks needed to get the project to run and
that tells you a lot about the project.

After that, I look for the entry point and try get a sense how the code files
have been organised. If it's good code, it will look like a curated library of
small functions with few inter-module dependencies. It should be easy to add
and remove functions without having to change countless files.

From this, you can start thinking of improvements and how to add them to the
existing software. It's a lot easier if it's only for your use as you can get
by with something that 'works' rather than something that 'works well.'

Second what another poster here said too: if there's too many files its easier
to work with the code than try read all of it.

------
trebligdivad
Following one feature through the code base can be one interesting way -
especially if that feature is something you understand from previous
experience. e.g. follow the life of a packet or a block read sya in a kernel;
or one particular library call somewhere and just follow the path it takes.
You do have to be a little wary that you might be following an unloved/old
part of the project that needs work; so you might not be learning the ways
that they want new contributions to use.

~~~
mikekchar
This is pretty much what I do. Very few people do literate programming, so
it's hard to read code like a book. Instead, I think, "How did they do X" and
try to find out. This introduces you to the code. Another interesting way to
read code is to ask yourself the question, "If I wanted to add feature X, what
would I have to do"? This allows you to peruse the structure of the code. I
find that searching code is much, much easier than reading it and once I've
answered my questions, I usually have a better understanding.

The last way I read code is to add a unit test. Most code is poorly structured
for unit testing (by "unit test" I mean testing some function with real
collaborators -- the real collaborators are important for this exercise). Then
I try to refactor the code so that it's easy to write the unit test. Just
trying to create the collaborators usually leads to a _lot_ of insight into
the code (and is one of the reasons why I recommend to people that they avoid
over-mocking their tests -- but that's a story for anther time). You may not
succeed in writing your test, but you will almost certainly understand how the
code fits together and where the smooth and rough parts are.

------
wyc
In general:

1\. Find key data structures, guess at their purpose, and examine frequently-
called functions that operate upon them

2\. Outline program entry points: command line, API calls, RPC, etc.

3\. Identify major systems and interfaces, especially any module management
code

------
jolmg
I usually get to the code when I have a purpose beyond merely understanding
the project, like wanting to make a change or just understand why a certain
behavior is like it is.

After I've downloaded the project, I'll think of a few words that are related
to what I want. For example, in the program "sweep", an audio editing program,
there's currently a bug where the arrow in the horizontal ruler gets redrawn,
but the horizontal ruler as a whole doesn't. That causes overlapping drawings
of the arrow to be drawn in a similar fashion as when a program with a window
freezes and you move another window over it.

So, I'll think of the words "arrow", "cursor", "ruler", "point", etc. and grep
them in the code. The grep's/ag's -C option is awesome for this, too. I'll
look over the matches and visit the matches that look the most relevant to
what I'm looking for.

This is easier when what you want is logically near text that the program
outputs or that you otherwise know must be in the code. That way, you don't
need to guess. For example, to modify gnupg so that you can change the
directory it uses for socket files with an environment variable, you can just
grep for something like the filename of a certain socket like "S.gpg-agent"
and look at the code for where the directory path comes from. That's pretty
much guaranteed to quickly take you to where you want to go.

grepping is awesome. It's simple and works with every language.

------
PopeDotNinja
I get it up and running, build something trivial, read the API a bit, and
maybe try running the test suite. You can learn a fair amount of stuff just by
using a thing. Then if it's remotely interesting, I start the RTFM slog (slog
meaning I don't really learn efficiently from reading). I also like to skim
the Changeling, and fish for high quality video summaries in YouTube.

A recent example was I was curious how to write a PostgreSQL client. I used a
PG client, skimmed the source code to see how it was wired up, read the public
API docs, and then watched this video in the PG wire protocol...
[https://youtu.be/qa22SouCr5E](https://youtu.be/qa22SouCr5E).

If you're trying to learn and dont have much of a foundation for evaluating a
open source, try building it from scratch. When I wanted to better understand
the DSL is testing framework, I implemented a basic testing library of my own.
It was really informative.

------
iamgopal
What is the git command to find out most frequently changed file ? ( Apart
from compiled or temp file one forget to gitignored. )

~~~
tuckerpo
git log I guess, assuming you're on the most recently pushed-to branch

------
kureikain
The only approach I found to understand any non trivial open source project is
to have a specific goal of what you want to do.

Once you have a goal it feels so much easier to understand anything because
you narrow down the scope and you have some key words you can grep the code.

Example. let's say you found a Redis driver for your language. Now Redis6
include some new commands which you want to add it immediately instead of
waiting for your driver. Now you will know how to search for similar
command(grep the heck out of it) and try adding break points or just printf to
see where the code path it.

I enjoy reading open source code and publish a newsletter[0] with a section
call "Code to read" that have some repo you can try to read and see how they
do thins

\---

[0] [https://betterdev.link](https://betterdev.link)

------
gizmoduck
If you're starting from scratch, as other users have mentioned, the READMEs,
CONTRIBUTING, etc. are all good sources.

After that, take a peek into the Issues. Many people open bugs and/or create
pull-requests to resolve something. It's quite plausible that you can gleam a
lot of information from what's going on behind the scenes from these;
especially, if they're open-source projects with a lot of public consumption.

If it's in a language I don't understand (presumably because I have never had
the need to use it), I'll try to write the basic "Hello, world!" apps or
something slightly more complicated, just to get the gist of the language. The
helped a lot with Rust, for example.

------
ctas
Start with the available documentation.

After that, getting an abstract overview over the packages/modules and their
responsibilities is key IMO. It helps you understand the structure and how the
logic is tied together.

Developers often stay away from contributing to open source projects, because
of the initial hurdle to understand a large grown code base written by someone
else.

My team and I are currently building a developer productivity tool. The goal
is to help developers grasp code quicker with visuals / graphs. If anyone’s
interested — preferably OSS maintainers, contributors — feel free to reach
out. We would love to get some feedback.

------
ksherlock
One trick I've learned is to browse through recent commits. Aside from recent
activity, it's also a shortcut to where the source files are located. Some
larger projects turn into a jungle of directories.

------
kodachi
Adding to all what been said, the first thing I do is to count the lines of
code using cloc, and see the contributors list. There is generally one key
person who has the most commits, so it's good to know that person philosophy
and style. For example, for reading the redis source code, I learned a lot
from its creator blog posts and the redis manifesto
([http://oldblog.antirez.com/post/redis-
manifesto.html](http://oldblog.antirez.com/post/redis-manifesto.html)). By the
way, I love Redis' style.

------
brailsafe
Visual debuggers are really nice. PyCharm's is mostly trivial to set up. One
thing I've learned in debugging the framework I'm currently using
([https://ckan.org/](https://ckan.org/)) is that even if I can trace code
execution, legacy code might make no sense at all if the reasoning for a block
isn't explained.

------
werber
For me, if there's not comprehensive documentation, I will not try to
understand it. I'm not smart enough to grasp it quickly and not willing to
dedicate the time to doing so. I'm forever in debt to those who have the time
and patience to make that documentation possible because I wouldn't have a
career without them

------
TehShrike
1\. Read/skim the API documentation to get the shape of the project

2\. Pick an entry point to dig into, and read the code on Github. Octolinker
is a lifesaver: [https://octolinker.now.sh/](https://octolinker.now.sh/)

------
alacombe
If I'm getting to the source, it's generally because I need to add/tweak a
feature, so the first step generally involve `grep' through the source tree,
and pull the string from there.

------
jmakov
What I am missing in almost any project is a second README (the first should
be an introduction to the project) where arcitecture and design decisions are
discussed.

------
JoeAltmaier
To use it? To find out how it works? To begin contributing to it? Different
approaches to all these.

------
git-pull
1\. Clone the source

2\. Try to build it

3\. If there are tests, try to run them

Just from doing this many times over, I learned a lot about programming.

Also fork the repo, then do `git clone <url>` for the original repo, `cd
repo`, then do `git remote add yourusername <yourgithubforkurl>`

To ease the above process, I created vcspull: [https://vcspull.git-
pull.com](https://vcspull.git-pull.com)

Here is an example of my vcspull file: [https://github.com/tony/.dot-
config/blob/master/.vcspull.yam...](https://github.com/tony/.dot-
config/blob/master/.vcspull.yaml)

This also helps studying, read code, and also do open source in general since
it's easy to setup the original repo and the fork.

For generic open source: Download source, check README.md/rst to see if they
are testing/development instructions. Check .travis.yml commands, those are
showing what packages/steps are taken to build and probably test the code

If it's node: do `npm install` and check the "scripts" in the package.json.
Those commands can be run like "npm run <task>"

If it has CMakeLists.txt, it uses CMake. Download and install cmake, then do
`cmake .`. cmake will let you know if you're missing libraries and those are
easy to google package names for. Then `make && [sudo] make install`

If it has Makefile.am/autogen.sh... download and install
autotools/autoconf/automake. Run `./autogen.sh`, then `./configure` (google
for package names of any libraries that show missing headers, .h, or symbols).
Then `make && [sudo] make install`

If it is python, and there's a Pipfile, download and install pipenv. Then do
`pipenv install .`. If it has requirements.txt, `pip install -r
requirements.txt`

Carried foreward, if the project has anything resembling a package manifest
(e.g. Gemfile, composer.json) google them to find the appropriate package
manager for your OS. That gets you 75%-100% of the way to running locally a
lot of the time.

For the more complex projects, they typically have dedicated setup
instructions and sometimes very detailed overviews (e.g.
[https://github.com/OpenTTD/OpenTTD/blob/master/docs/Readme_W...](https://github.com/OpenTTD/OpenTTD/blob/master/docs/Readme_Windows_MSVC.md),
[https://devguide.python.org/setup/](https://devguide.python.org/setup/),
[https://www.kernel.org/doc/html/latest/process/index.html](https://www.kernel.org/doc/html/latest/process/index.html))

Reading the source of your favorite interpreted programming language can be
rewarded, e.g.
[https://github.com/python/cpython](https://github.com/python/cpython)

