
Ask HN: How do you learn to read source code? - ywecur
I&#x27;m a decent solo programmer and freelance web developer and I&#x27;ve created some simple back end programs together with a friend.<p>But when it comes to understanding even the most simple open source projects I can&#x27;t for the life of me understand what&#x27;s going on. There are so many functions relating to each other, so many classes that I just get lost. I&#x27;d probably get lost in everything my friends have written as well if they weren&#x27;t there to in detail explain their architecture for me.<p>So how do you do you learn this? There are plenty of books and resources on how to program but I&#x27;ve yet to find any on how to read source code.
======
jlg23
* Start at obvious entry points: main(...) for standalone apps, exported functions in libraries and top-level handlers for web applications.

* Use a debugger and breakpoints for stack traces to understand program flow (if not available, throw an exception in a function of interest or insert print statements for the same effect).

* Either use an IDE that allows you to jump to definitions or map the project with grep (e.g.: grep -H 'class ' * > classes.txt) to save time when manually going to the definition (or anything in between, but what's available depends on the language).

* Focus: Especially when not used to code reading one can easily get lost trying to understand every single statement at once. Focus on what you want to understand in this reading session and develop a habit of doing the next step towards that goal once you got a good idea of what you are looking at. Perfect understanding is not necessary - if it turns out you missed a crucial part, you can still go back and re-read.

* Practice: The more you read and write code, the more proficient you'll become, the more intuition you will develop and thus the faster you'll be when skimming over unknown code.

~~~
ywecur
Your comment touches a lot on how to get an idea of the overall architecture,
and I thank you for it.

My main problem though is that I can't seem to see the purpose of functions,
because more often than not they are written in a, to me, foreign way.

An example I have is this uncommented if statement:

`if
(tab.url.toLowerCase().indexOf("[https://facebook.com"](https://facebook.com"))
> -1)`

Should it be clear to me immediately that this means "If the target website is
open"? It took me a solid minute just to understand this statement.

Are there some common design patterns I should memorise?

~~~
jlg23
>> * Practice: The more you read and write code, the more proficient you'll
become, the more intuition you will develop and thus the faster you'll be when
skimming over unknown code.

> if
> (tab.url.toLowerCase().indexOf("[https://facebook.com"](https://facebook.com"))
> > -1)

> Should it be clear to me immediately that this means "If the target website
> is open"? It took me a solid minute just to understand this statement.

In theory you should figure out what the _url_ -property of _tab_ is (e.g. by
logging it's value to the console). Then you see it is a string and check the
javascript reference for _toLowerCase()_ and _indexOf()_.

When you've gained some experience you'll recognize those two functions
immediately and know what they do and just assume that _tab.url_ is a string
(and you'll be mad at the original developer if it isn't).

------
curtis
The simple fact of the matter is that reading code is hard, maybe even
impossible in the general case. You _can_ understand code with some amount of
effort, but it often boils down to an exercise in reverse engineering.

One thing this means is that in any substantial codebase you are never going
to understand all of it. You will typically only have time to learn a fraction
of the system, so if you are going to proactively explore the codebase, you
will need to prioritize. You probably (but not necessarily) want to get a
handle on the top-level architecture before digging deep anywhere.

My final piece of advice is that I personally find it impossible to understand
just about any non-trivial piece of code without running it, and running it
multiple times (1). Perhaps even many, many times. You can run under a
debugger (single stepping or breakpoints) and this seems to work for many
people. I still rely on print statements sprinkled through the code myself,
adding and removing them as I run the code in question over and over again as
my current point of interest moves from place to place in the code. This might
sound scary, but it's not that different from the way you normally debug code.

(1) It's entirely possible that the person that wrote the code in the first
place also ran it many, many times (testing each small change) as they wrote
it. So it's perhaps not unreasonable that you yourself may need to run it
many, many times in order to understand it later.

------
tyingq
Knowing which source file to start with is important. That depends quite a bit
on the language, and the type of software.

Hard to make a short summary here, because there's no easy rule of thumb. For
example, you could say, for C, "find the main() function first." That doesn't
help, though, if the open source project is a library, like pcre.

------
zzzcpan
It's not that hard, really, just use a call tracing tool. (Doing it without
such a tool is very hard, of course, and is not very smart.)

