
Interesting Codebases - markpapadakis
https://medium.com/@markpapadakis/interesting-codebases-159fec5a8cc
======
joshmarlow
If you enjoy going through interesting code bases and learning some new tricks
and patterns, you would probably enjoy "The Architecture of Open Source
Applications" ([0]) - each chapter is a description about the history and
architecture of a separate open source project. Whenever learning a new
technology, I usually try to find a chapter in this series about it.

[0] - [http://aosabook.org/en/index.html](http://aosabook.org/en/index.html)

~~~
erichdongubler
I've actually done undergraduate research with AOSA, and by far it was the
most interesting to analyze the differences and commonalities in architectural
approaches between FOSS projects with similar domains.

------
swang
one of my biggest issue with looking through codebases is, where do i start?

if i'm not familiar at all with the language, or more specifically how the
language architectures the program, i'm just going to be spending a lot of
time looking at stuff that probably isn't the meat and bones of the
library/app.

take for example, his first suggest codebase, seastar. i haven't done c++ in
years (school, using turbo borland) so where do i look? "apps" maybe? nope
just seems to be a folder of libraries. ah, probably the core folder. whoops
there are 20+ files/headers. should i dig into this assuming that's where most
of the code for the app/library is?

i suppose it would probably be nice if there's a site that explains how most
programming languages layout their code. e.g) javascript generally is laid out
similarly now, as is ruby/rails. so a site to explain the general layout
structure would be kinda cool. it's kinda late at night so maybe i'm
overthinking this.

~~~
JustSomeNobody
First and foremost, clone and build it. If you cannot clone, build and get it
running within an hour, it's generally crap and not worth your time[0].

Once you get it building and running, you can now start making simple changes
in order to find your way around.

[0] Developers should pride themselves on how (relatively)simple this process
is for their project(s).

~~~
imtringued
That would disqualify every C++ codebase I've worked on.

------
runlevel1
Redis is a code base I've been impressed with.

Short, readable functions and good comments make it quite easy to follow.

[1]: [https://github.com/antirez/redis](https://github.com/antirez/redis)

~~~
nullpunkt
An important thing it has, is an actual source code layout in the readme,
something that is missing from nearly all open sourced projects, making it a
hassle to map out the project and start reading them...

~~~
spike021
After seeing your comment I decided to click through and check out the readme.
That's a great amount of documentation. Wish the codebases I work with would
be so detailed.

------
apetresc
I guess I'll be that guy and say that I highly doubt the author has taken more
than a cursory glance at more than half his list. He certainly hasn't had the
deep epiphanies he's implying from each of them.

Seriously, he expects people to believe he's evaluated the code of the Linux
kernel, the Chrome browser, Postgres, LLVM, Tensorflow (just to name a few,
less than half of his list), deeply enough to be able to make statements like
"finest codebase in [x, y category] that I've seen", while also being the CTO
of a company?

~~~
markpapadakis
I never implied that I studied the Linux kernel codebase insight out --
specifically, I was interested in the networking stack and the VFS layer, and
from there I looked into the memory subsystem and certain drivers
implementation. I did map the codebase to the extent that I can more less find
my way around easily.

Same is true for Chrome. I was interested in the code for the various UI
components, but I looked into other bits here and there.

I 've studied most of Postgres, LLVM and Tensorflow codebases - as in, I went
through pretty much most if not all files looking for interesting bits (and
finding plenty). I guess there's no way to "prove" anything to anyone, but I
don't really care or want to do that either -- it's not about bragging rights,
if that's what you implied; I just thought I 'd share a list of codebases I
came across that I thought were interesting and worth of other people times.

As for my job, and my work, you may want to check out my Github profile
([https://github.com/markpapadakis](https://github.com/markpapadakis)) --
though there's only public stuff there. I guess I like to spend my free time
learning instead of say, watching tv or waste time elsewhere:)

~~~
rents
Can you also suggest how much time one can expect to spend on a particular
codebase (which are listed) to get any meaningful take-aways?

I am interested in exploring codebases now that I have sufficient experience
and feel confident (and not initimidated). This will be useful because then I
can start from the smaller (simpler) ones and move on to the complex ones
later.

Any recommended path, please do suggest.

~~~
markpapadakis
It depends on the size and what you are trying to do. For example, lately I
've spent maybe a week studying Lucene, and kept going back to it every other
day, because I needed to understand pretty much everything about it, to get
ideas for improving Trinity ( [https://github.com/phaistos-
networks/Trinity](https://github.com/phaistos-networks/Trinity) ).

Some other codebases are so vast it takes a lot, lot longer to understand them
enough to feel 'comfortable' navigating them (e.g the Unreal Engine codebase).

Most codebases however are quite small, and within 1 hour, or a few, you 'll
be able to understand where to go to find what you need, which are the primary
data structures and functions, etc.

------
nextos
Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp

[https://norvig.com/paip.html](https://norvig.com/paip.html)

Classic expert systems, but IMHO not outdated. I think they will make a
comeback soon once we understand how to integrate probabilistic reasoning,
logic and connectionist approaches.

------
irfansharif
There was some discussion on the very same for an Ask HN[0]. Copying my
comment from there I've found the google/leveldb[1] source code to be
immensely educational, authored by Jeff Dean and Sanjay Ghemawat. The
implementation of leveldb is similar in spirit to the representation of a
single Bigtable tablet[2].

[0]:
[https://news.ycombinator.com/item?id=13854431](https://news.ycombinator.com/item?id=13854431)

[1]: [https://github.com/google/leveldb](https://github.com/google/leveldb)

[2]:
[http://research.google.com/archive/bigtable.html](http://research.google.com/archive/bigtable.html),
section 5.3

------
simplegeek
Great achievement to have read so much code. Honestly, I felt a little
depressing for not having read this much code.

As an aside, how do you read codebases? Do you read every single line to
understand what's going on? Or you get general idea about design/architecture?
What are the proven strategies to read code bases?

------
laGrenouille
If you are interested in reading codebases but find some of the larger
projects intimidating, I suggest checking out Timothy Davis' sparse matrix
library CSparse:
([http://people.sc.fsu.edu/~jburkardt/c_src/csparse/csparse.ht...](http://people.sc.fsu.edu/~jburkardt/c_src/csparse/csparse.html)).

It is is used internally for sparse matrix representations in Python, R, and
Matlab. The entire library fits into 2100 lines of concise yet well documented
C code. It is now mostly installed bundled with SuiteSparse, but the link
above has the 2006 codebase from the original stand alone library.

------
dunham
Back in high school and college I spent a lot of time reading code. I was on
the Amiga, and remember reading Matt Dillon's stuff (a C compiler with
library, the DME editor, and Dnet), Tim Budd's smalltalk, David Betz' "advsys"
(which inspired me to read a book on parsing and automata), and other stuff
that I've since forgotten.

These days, it's hard to find time between work and an absolute deluge of
interesting stuff to study and play with.

------
wiremine
This is great! We need more of these sorts of posts. There is so much the
community can learn from excellent code bases. Thx for posting!

~~~
mbparsa
Start with looking at the unit test, it will help you split the code bases to
units and understand them one by one

------
drej
Adding golang/go to the list. It's interesting to read how an actual language
is implemented, also it's fairly well documented (most of the documentation is
extracted from the codebase, so all the crucial bits have to be there).

There are a few parts of the codebase that were automatically transpiled from
C, but the rest is usually very readable.

------
ben174
I've always found thefuck to be a great Python code base which is nicely
organized and easy enough to wrap your head around. Also an easy project to
contribute to.

[https://github.com/nvbn/thefuck](https://github.com/nvbn/thefuck)

------
rdiddly
The codebase I inherited and now maintain is "interesting" too, but more like
this:

[https://en.wikipedia.org/wiki/May_you_live_in_interesting_ti...](https://en.wikipedia.org/wiki/May_you_live_in_interesting_times)

------
LiamPa
I always look to requests in how python should be used:

[https://github.com/requests/requests](https://github.com/requests/requests)

