

Ask YC: Learning to read code? - noahlt

I'm a high schooler who has worked on a few small projects of my own.  Recently I've started to work with a friend[1] and I've discovered that reading code really is an order of magnitude harder than writing it.  What's the best way to learn to read code?  If the solution is just "read a lot of code", do you have any suggestions for reading material?<p>--<p>[1] pg is right about cofounders.  At least for some people, including me.
======
lbrandy
There's actually two separate skills here. One is learning to just read code,
as in, understanding what the code is saying. The second is understanding the
code, which would be what the code is doing.

Learning to just read the code requires writing alot in that particular
language (I find writing teaches you alot more than reading). When some piece
of code uses some aspect of a language in a way you don't understand, go
figure out what they are doing, internalize that concept, and go write some
code yourself that uses that same concept in that language.

In my experience, writing alot of code is the best way to learn to read it.
You'll discover and truly understand why certain things are done certain ways
once you've had to go through the pain of doing it any other way. One of the
'first' examples you might run into is how every single header file in a c/c++
library seems to start with an #ifdef/#define combo, and end with an #endif.
It might seem a bit bewildering at first. But when you run into your first
case of multiple includes of the same file, you'll quickly see why they do it.

As for actually 'understanding' a big pile of code, that's more difficult.
This is where you understand the language just fine but you need to try to
wrap your mind a giant pile of someone else's code. You have to immerse
yourself in their codebase, figure out where all the important files are, and
so on. Take it small chunks. Pick a feature, trace the code, put in print
statements, etc. And you -really- need a way of quickly navigating the code
(jumping to decelerations, switching between header/source, etc). Many IDEs
provide some level of functionality to do this. Personally, I use emacs and
etags.

~~~
swombat
I'd add a third level - debugging the code.

The reason I'd count it as separate is because there have been many times when
I've fixed a broken piece of code without fully understanding it, going by gut
feeling rather than analytical understanding.

I would suggest this is like a higher skill built on top of 10+ years of
experience of both of the skills you mention.

------
blogimus
Do you have a favorite open source project? Download the source and try to
build the runtime files yourself. Then you have an environment to poke through
the code and play with. for example, do you like Python? download the source
and create a debug build of it.

Another thing to do is to go to code samples online at the example sites. Type
them in and/or download them and play with them.

Yet another thing to try is to do a port of a program to another platform.
Even if you are not successful, you should learn a lot.

The key is to take more than one approach, as there is no one right way, there
is not one true path. Immerse yourself in different programming activities.
The important part is practice, practice, practice, like the story on going to
Carnegie Hall.

Me? One of my earliest bigger projects was porting an old star trek game
written in Altair basic to Apple Basic. I had only mimeographed printouts
published in a book of computer games back in the 1970's. Arrays were not
handled the same between the two versions of basic, so I had to come up with a
different array scheme than was in the book.

------
gcv
Try stepping through the code in an interactive debugger. I typically run a
gdb-style debugger under Emacs, but anything which lets you step through code
and inspect data works well.

Some people really loathe these tools, but I think they're invaluable when
someone drops a pile of spaghetti code in your lap and tells you to support
it. Figure out the entry point to the code, set a breakpoint in main() or
something like that, and start stepping. After ploughing through a few of the
code's basic transactions, you should have a sense for the data structures and
abstractions it relies on, if any. Then, you should be able to read individual
functions and such to get the gritty details. The use of threads substantially
complicates the picture, of course.

------
andrewcooke
i'm not sure there's a good answer to this, apart from "keep trying and become
a better programmer", but maybe if i explain why it is hard to read code it
will help you see how to improve.

the trouble with most programming languages is that they are pretty much stuck
at one level of detail. so at the level of detail of "add these numbers" or
"print this text" they are ok.

but when you get to higher levels, like "while reading the file, get the input
from the user" they don't do as well. as a programmer you are "trapped" using
the same language that was designed to make working with the low level details
possible.

this is what makes programming hard/interesting, really. and of course there
are lots of ways to try solve the problem. one approach you might have met if
you've been reading what pg writes is to use a language like lisp which is
extensible (i'm thinking of his book "on lisp", which i think is online now).
then you can build the language up in parallel with your ideas so that you
continue to use a language at the right level for what you are describing.

in theory at least. in practice it is not so easy, and we have to read code
written by poor or average programmers, as well as good ones.

another way to deal with the problem is to keep _thinking_ about things at a
higher level, even if you are stuck with a language that forces you're writing
to spell things out in detail.

if a program is written by someone working in this way (thinking big thoughts)
then what you need to do is guess what they were thinking. once you
(correctly) guess the ideas behind the code then you can see the structure
that is otherwise obscured by all the details (like not being able to see the
wood for the trees).

obviously that's an impossible task in general - you cannot guess what someone
else is thinking. but in practice it turns out that some ideas are a lot more
popular than others. often because they are good ideas; sometimes because they
are common mistakes.

so one way of getting better at reading code is to learn what the possible
ideas are. then, when you read the code, you can pick that up. it might take
some effort at first, but eventually you get good at picking up "the scent".

for example, yesterday i was trying to understand why some code wasn't working
(part of the excellent sqlalchemy library for making python work well with sql
databases). stepping through the code in the debugger i was completely lost -
i couldn't have told you in any detail what was happening. but at the same
time, i was pretty sure that i knew what was happening in broad terms. the
library was written in a way that made it very customizable, with lots of work
being delegated to function calls that could be replaced in various ways. this
is a common idea for complex libraries, so i wasn't too worried that i didn't
know the detail of function calling function calling function because i was
pretty sure that the end result was just to delegate the process to the right
part of the library. understanding what the idea was made it a lot easier to
read (or skip parts of) the code.

sorry, i am writing too much. almost done.

anyway, for oo languages (particularly those with fairly rigid type systems -
java, c++, etc) some of these ideas are documented as "patterns". even in
other languages, som e of these ideas are so general they appear there too
(but with different names :0). so one way to improve your code reading skills
is to first understand the patterns, so that you have a catalogue of ideas
that you can check against the code (or against what you think was in the
programmer's mind when they wrote the code). the most famous book that talks
about patterns was also the first (afaik) and is called "design patterns"
-<http://en.wikipedia.org/wiki/Design_Patterns>

------
zemariamm
If your problem is finding your way through big source code repositories you
may want to look at <http://www.spinellis.gr/codereading/>

------
vikram
When reading code don't treat the task like that of reading a book, the best
way I find is to interact with the code that you are reading. So it's crucial
to get it to run, then get it to do something useful, then refactor it
slightly by making functions shorter and giving the new functions specific
name. E.g. instead of saying (first lst) maybe say (node-name lst), ideally
the library functions should have names to do with the languages/library data
structures and your fns should have names which correspond to the problem you
are looking at. This is the approach I take when I want to rewrite something.

If you want to use the code as in a library, then try not to read the details
of it until you really have to. Use the interface or write an appropriate
interface for you to use. Work through some examples in the debugger to see
what it does.

If you need to fix the code then you need to read the code. I try to focus on
just the problem, rather than understand the whole thing. This approach makes
it much simpler, I get in and out really quick, just focus on the problem and
fix it. If I find something else that's a problem then I look at that too,
otherwise I avoid figuring out how the code works.

------
mark-t
I don't find reading well-written code any more difficult than reading a math
book. But of course it is an order of magnitude harder than reading prose. You
have to force yourself to slow down and understand what each part is saying.
As opposed to ordinary text, every word is important, and you need to know
what it's doing there.

Some of this is just proficiency in the specific language, but it's more about
the concepts and idioms of programming in general. I've fixed a bug in an open
source python project, and I couldn't write a "Hello World" program in python
without a reference.

But do note that "well-written" is mandatory. If the code isn't well-written,
it will take you a lot longer to figure out. The code I wrote in high school
was pretty bad.

Variables should be named to describe what they represent. You should have a
reasonable number of comments, not too many or too few (ironically, this
amount changes as you get better). Divide your code into manageable chunks and
split them into subroutines. Within subroutines, separate the chunks with at
least one blank line.

------
kjldskjh
Thanks to the people who wrote really long comments, i was stuck "trying to
hack a project" and though i could never really get a 200 file, 2K lines per
file project in my head. The thing about choosing your level of abstraction
and the try no to go deep into a library unless needed were spot on. Thanks A
grateful newbie "hacker".

------
whycombo
This might be a good place to start: Python Source Walkthrough Series
<http://showmedo.com/videos/series?name=qVIxaDJxY>

------
henryw
read a book on the language your code is written in. write some codes,
starting with really simple stuff and keep pushing yourself to hard stuff.

------
xlnt
write tests for the code to learn about how to use it. just write what you
guess should work, then see if it passes.

if there are already tests, try reading those.

