Would love to hear your experiences
Then, I start hunting through the codebase. Sometimes despite people's best efforts at compartmentalization, there's one or two files that are the heart of the project. Depending on the project they'll be different things. For instance, TypeScript has checker.ts, which contains the core typechecking logic. Ruby has vm.c, compile.c and parse.y. If that's the case, that's actually very helpful, as I can spend the majority of my time in one file.
To aid in this hunting, I use a few tools. Stuff like grep and find (although I prefer ripgrep and fd) are a huge help, cause you can search through large codebases with relative ease. IDEs are great too. I particularly like being able to goto definition, then go back, then go forwards, etc. Switching between call site and definition makes understanding functions easier.
I take notes on occasion, although I don't always reference them. It's more to process what I'm reading. I try to write notes about types, functions and files. I do it in org mode and embed urls so that I can link definitions together.
Definitely run the code as soon as possible. Then add print statements and see where they go. I've used flamegraphs on occasion to see the stack trace.
But if it's medium to large (several folder, tens or hundred files), then I prefer "working" with the code more than just read it.
The process usually follows this order:
- Check the document for how to build and run the test. Make sure test passes before doing anything else.
- Once test pass, go back to the document, look for the program entry point (if it's built into a binary), or the exposed interface (if it's library). Skim through that to get the overview.
- Load the project into my IDE, and try debug the test to understand the flow. Sometimes, I'll also write new code to check my understanding.
That said. If the goal of reading that OSS project is to hunt down a bug, I would just start from my own code, and debug into the library itself. Skip the overview part.
The "contributors" tab on GitHub helps get an idea for the health of the project: how long has it been actively maintained? Who did most of the work? https://github.com/dgraph-io/dgraph/graphs/contributors
For software libraries, I like looking at the brand new "used by" tab - https://github.com/huge-success/sanic/network/dependents - in addition to indicating project health it's also a great source of examples to look at later when I'm trying to figure out how to do advanced things with the library.
I love reading through the CI configuration - .travis.yml or .circleci/config.yml - because at the very least it shows what it takes to run the test suite, and often I'll pick up some fun new CI and automation tricks.
I use GitHub code search extensively: sometimes for searching within the project, but I'll also search the whole of GitHub for examples of people doing something I want to do with the library: https://github.com/search?q=sanic+cookies&type=Code
After that, I look for the entry point and try get a sense how the code files have been organised. If it's good code, it will look like a curated library of small functions with few inter-module dependencies. It should be easy to add and remove functions without having to change countless files.
From this, you can start thinking of improvements and how to add them to the existing software. It's a lot easier if it's only for your use as you can get by with something that 'works' rather than something that 'works well.'
Second what another poster here said too: if there's too many files its easier to work with the code than try read all of it.
The last way I read code is to add a unit test. Most code is poorly structured for unit testing (by "unit test" I mean testing some function with real collaborators -- the real collaborators are important for this exercise). Then I try to refactor the code so that it's easy to write the unit test. Just trying to create the collaborators usually leads to a lot of insight into the code (and is one of the reasons why I recommend to people that they avoid over-mocking their tests -- but that's a story for anther time). You may not succeed in writing your test, but you will almost certainly understand how the code fits together and where the smooth and rough parts are.
1. Find key data structures, guess at their purpose, and examine frequently-called functions that operate upon them
2. Outline program entry points: command line, API calls, RPC, etc.
3. Identify major systems and interfaces, especially any module management code
After I've downloaded the project, I'll think of a few words that are related to what I want. For example, in the program "sweep", an audio editing program, there's currently a bug where the arrow in the horizontal ruler gets redrawn, but the horizontal ruler as a whole doesn't. That causes overlapping drawings of the arrow to be drawn in a similar fashion as when a program with a window freezes and you move another window over it.
So, I'll think of the words "arrow", "cursor", "ruler", "point", etc. and grep them in the code. The grep's/ag's -C option is awesome for this, too. I'll look over the matches and visit the matches that look the most relevant to what I'm looking for.
This is easier when what you want is logically near text that the program outputs or that you otherwise know must be in the code. That way, you don't need to guess. For example, to modify gnupg so that you can change the directory it uses for socket files with an environment variable, you can just grep for something like the filename of a certain socket like "S.gpg-agent" and look at the code for where the directory path comes from. That's pretty much guaranteed to quickly take you to where you want to go.
grepping is awesome. It's simple and works with every language.
A recent example was I was curious how to write a PostgreSQL client. I used a PG client, skimmed the source code to see how it was wired up, read the public API docs, and then watched this video in the PG wire protocol... https://youtu.be/qa22SouCr5E.
If you're trying to learn and dont have much of a foundation for evaluating a open source, try building it from scratch. When I wanted to better understand the DSL is testing framework, I implemented a basic testing library of my own. It was really informative.
Once you have a goal it feels so much easier to understand anything because you narrow down the scope and you have some key words you can grep the code.
Example. let's say you found a Redis driver for your language. Now Redis6 include some new commands which you want to add it immediately instead of waiting for your driver. Now you will know how to search for similar command(grep the heck out of it) and try adding break points or just printf to see where the code path it.
I enjoy reading open source code and publish a newsletter with a section call "Code to read" that have some repo you can try to read and see how they do thins
After that, take a peek into the Issues. Many people open bugs and/or create pull-requests to resolve something. It's quite plausible that you can gleam a lot of information from what's going on behind the scenes from these; especially, if they're open-source projects with a lot of public consumption.
If it's in a language I don't understand (presumably because I have never had the need to use it), I'll try to write the basic "Hello, world!" apps or something slightly more complicated, just to get the gist of the language. The helped a lot with Rust, for example.
After that, getting an abstract overview over the packages/modules and their responsibilities is key IMO. It helps you understand the structure and how the logic is tied together.
Developers often stay away from contributing to open source projects, because of the initial hurdle to understand a large grown code base written by someone else.
My team and I are currently building a developer productivity tool. The goal is to help developers grasp code quicker with visuals / graphs. If anyone’s interested — preferably OSS maintainers, contributors — feel free to reach out. We would love to get some feedback.
2. Pick an entry point to dig into, and read the code on Github. Octolinker is a lifesaver: https://octolinker.now.sh/
2. Try to build it
3. If there are tests, try to run them
Just from doing this many times over, I learned a lot about programming.
Also fork the repo, then do `git clone <url>` for the original repo, `cd repo`, then do `git remote add yourusername <yourgithubforkurl>`
To ease the above process, I created vcspull: https://vcspull.git-pull.com
Here is an example of my vcspull file: https://github.com/tony/.dot-config/blob/master/.vcspull.yam...
This also helps studying, read code, and also do open source in general since it's easy to setup the original repo and the fork.
For generic open source: Download source, check README.md/rst to see if they are testing/development instructions. Check .travis.yml commands, those are showing what packages/steps are taken to build and probably test the code
If it's node: do `npm install` and check the "scripts" in the package.json. Those commands can be run like "npm run <task>"
If it has CMakeLists.txt, it uses CMake. Download and install cmake, then do `cmake .`. cmake will let you know if you're missing libraries and those are easy to google package names for. Then `make && [sudo] make install`
If it has Makefile.am/autogen.sh... download and install autotools/autoconf/automake. Run `./autogen.sh`, then `./configure` (google for package names of any libraries that show missing headers, .h, or symbols). Then `make && [sudo] make install`
If it is python, and there's a Pipfile, download and install pipenv. Then do `pipenv install .`. If it has requirements.txt, `pip install -r requirements.txt`
Carried foreward, if the project has anything resembling a package manifest (e.g. Gemfile, composer.json) google them to find the appropriate package manager for your OS. That gets you 75%-100% of the way to running locally a lot of the time.
For the more complex projects, they typically have dedicated setup instructions and sometimes very detailed overviews (e.g. https://github.com/OpenTTD/OpenTTD/blob/master/docs/Readme_W..., https://devguide.python.org/setup/, https://www.kernel.org/doc/html/latest/process/index.html)
Reading the source of your favorite interpreted programming language can be rewarded, e.g. https://github.com/python/cpython