
How to Read Other People's Code -- And Why - cschanck
http://designbygravity.wordpress.com/2009/10/23/how-to-read-other-peoples-code-and-why/
======
Mongoose
_Read the comments, but don’t believe them_

Love that one. A grad student friend I work with, whenever he catches me
poring over documentation, always tells me to read the code. Such a seemingly
simple tip, but so valuable.

~~~
cgs
As my boss likes to say, the extension for Ruby documentation is ".rb".

~~~
eru
The extension for Haskell-documention is `.lhs'. But you can compile it to a
`.ps' or `.pdf' with LaTeX if you want to.

------
pgbovine
one technique i often use when trying to understand someone else's code is to
add in comments myself in my private branch (of the form "I think that X works
like Y and Z"), or even better, adding in run-time asserts that I think ought
to hold, and then running tests to make sure they do hold.

Of course, i never check in my comments/asserts to the main branch, since i'm
not sure whether my understanding is correct

~~~
JoeAltmaier
Code management tools need a way to add notations without getting in the way
of code. Something that can be view thru a link, but is out of the way when
poring over the code. ASCII text: anachronism. We're not that far from
80-column IBM punched cards...

~~~
omouse
You want something like literate programming then

------
sophacles
On the how bit: there is no such thing as reading code once. It is a simple
matter of reading a method/object/what-have-you, then rereading it until you
see it called/referenced etc and don't ask yourself "wtf does foo do". Trace
various code paths often enough and you'll 'get it'.

As for the why: I used to decide I didn't like module X, then decide I was
going to implement it myself. After some time I would end up with something
looking like X, but not nearly as refined. As such, I learned it's better to
read others' code and understand it -- much faster than reaching the same
conclusions as the original authors w/ crappier code. Of course if after
understanding you decide you still think it's crap -- have at it :).

------
Aschwin
One of the reasons I don't write comments is because it gets outdated so
quickly. And I don't need them myself, because I read only the code, even my
own. Reading the code makes also for reviewing the code. Quickly changing bits
so it makes better sense (like counts outside the for-loop instead of in the
statement) etc. And use version-control system to make changes, test them and
roll it back (revert). TortoiseSVN helps a lot with this for me. I also try to
make the code more readable by adding my codingstyle or the codingstyle by
convention (I still think mine is best ;-)

~~~
dasil003
Now that you know how comments can fail, you should use that knowledge to
write good comments rather than not writing any. There are plenty of very good
reasons to write comments.

The most common reason I write comments is to explain the purpose of something
that is unintuitive from the pure code. Examples are comments in CSS about a
particular browser quirk, or hacking around an edge case in an efficient but
opaque way.

~~~
dbshapco
. . . and one of the plenty of good reasons being producing API documentation
automatically via javadoc, doxygen, etc. Having the API documentation source
and code live together is a big win. It's easier to maintain the inline
documentation so it doesn't go stale. I hate being forced to go read code when
I just want to use an API (I'm looking at you, Dojo :-/ ).

One of the first things I'll do encountering a feral code base is run an
automatic documentation generator over it, even if there are no API comments,
because many will produce at least some level of documentation from pure code,
including cross references, a type index, call graphs, type diagrams, etc.
This can be especially helpful when the code is poorly organized, and trying
to trace simple program flow in an editor means navigating a dozen modules
manually. Doxygen, for instance, will produce hyperlinked program listing, so
that I can use a browser in a natural fashion to navigate the code structure
and program flow. The browser can maintain virtually unlimited context,
whereas my brain loses track of where I am once I'm seven levels deep in
function call nesting.

Some IDEs and UML tools also are capable of reverse engineering documentation
from the code base. The Togethersoft tools used to be excellent at grinding
through code (and may still be, but I haven't used them in years).

RE'ed documentation of feral code can reveal how well (or more frequently
poorly) the code base is structured, and identify key areas for architectural
or design refactoring (if that luxury is possible).

In writing my own code, I decompose until each function or method has a single
purpose (f() does X, not X & Y & Z!), and therefore the API documentation
suffices to document the code itself. Rarely do I write a comment inside the
body of a function or method. That happens when I re-visit the code, and
discover that it's operation is non-obvious. The non-obvious stuff tends to be
the tricky stuff it took some time to get right, and so it doesn't get mucked
around with, and such internal comments rarely go stale.

I wait until re-visiting the code because authorial bias (my code effectively
become's someone else's after several weeks, sometimes faster :-) ) obscures
what is and is not obvious. I used to over-comment from a tendency to perform
a mini-brain dump in comments -- but the knowledge required to WRITE the code
(this is what I was thinking at the moment) is no reliable indicator of that
required to READ the code (this is what _you_ need to know).

(I've theorized that having someone else comment the code from the start, just
like having an unbiased tester, could make for better comments -- wherein the
commenter is also necessarily a code reviewer as well. I've never gotten any
of the places I've worked to agree to 'cross-commenting' as a standard
practise, but most love worthless, perfunctory desk-checks prior to check in.)

To avoid comment churn, and because I refactor aggressively when creating
brand new code, writing API comments is the LAST step in coding.

Finally, I developed a habit of writing comments exclusively in point form,
because context switching from programming constructs to proper English
grammar broke my flow. The point form comments feel like a miniature brain
dump, whereas otherwise I'd pause to think about how to put the information
into a proper sentence, and then make nice paragraphs, and suddenly I'd be
channeling me from 7th grade compsition class. It's also easier to scan and
digest comments as point form notes.

That's what I do, and I leave it at that, because telling someone else how to
code is like telling them how to raise their children.

tl;dr version

\- at least write API comments, pls

\- doc generators (and other tools) sometimes are a great way to RE docs for
feral code

\- write comments in point form

\- write API comments as the FINAL step in coding

\- try to remove authorial bias from comment writing

~~~
dstorrs
>In writing my own code, I decompose until each function or method has a
single purpose (f() does X, not X & Y & Z!),

I hear this one a lot, from a lot of different sources. In practice, it never
seems that practical. Here's a trivial example: the dashboard-page function in
our webapp. It needs to do all of the following:

1) verify that the user is logged in. If not, bounce them to the login page,
then bring them back on successful login

2) collect the list of all their providers

3) collect the list of all their profiles

4) collect the list of all their account information

5) load up and display the appropriate templates to render the web page with
the requisite information.

Now, points 1-4 are accomplished by calling other functions, true. But
dashboard-page still needs to do 5 separate things.

~~~
dbshapco
In this specific case, the dashboard-page function is actually sequencing
operations, not performing them (except for 5, and I'd wonder why that
couldn't be moved to a separate function), and can be described as such:

    
    
      // - control the login sequence
      function login_sequence() {
        var user  = verify_login
        if (!user) {
           user = login();
        }
        var providers = collect_providers(user);
        var profiles = collect_profiles(user);
        var account_info = collect_account_info(user);
        load_templates(user);
        display_templates(providers, profiles, account_info);
      }
    

(Forgive the guess at what your code might look like.)

State may be passed between called functions, and used in control decisions,
but state should not be grossly manipulated in sequencing functions (I do find
with this style of programming that at high levels the state passed around
tends to be large 'context' objects, rather than granular arguments
encountered at lower levels). What I would NOT want to see in such a
hypothetical login function is ALL the actual lower level code to do the
login, collect the data, etc., so that essential higher order detail is
obscured by the lower level operations.

A function's API comments do not need to repeat the purpose of called
functions.

As I come up with API comments last, I usually think about them in reverse --
it's not 'I need to think of the single purpose of this function before
writing it', but 'what single purpose did this function end up serving?'. Not
being able to think of a decent answer for the latter is a possible symptom of
sub-optimal decomposition. Then again, cutting blocks of code and pasting them
into their own functions has become an instinct rather than conscious decision
for me, so I'm effectively anticipating writing the 'single purpose' API
comments.

At a certain level of detail I don't need to know the minutae of login, just
that there is some black box function that controls the lower level details.
And if I need to know the details, I break open the function and follow its
call flow (or look at the autogenerated call graph in the doxygen docs or
similar).

Having read a lot of feral code, I find the major indicator of quality is the
static navigability of the code base (i.e. can I find my way around just by
reading the code in an editor, without resorting to debuggers or autogenerated
documentation), and having a level of detail structure, akin to the zoom
feature on Google maps, is one method of achieving navigability (and partially
the value of OO techniques). So it's okay to have functions/methods that
simply sequence or aggregate calls to lower levels, and to describe them as
such.

It was mentioned elsewhere on the thread that debuggers are useful tools in
understanding a code base -- and I do often find myself setting breakpoints on
code because it's near impossible to understand how particular functions get
invoked by just reading the code. Then examining the call stack at the
breakpoint I see that event loop called the network code invoked some code to
read a database, which called into some code to instantiate widgets, which
called back into the database code, which called the code that calculates
order totals and tax, which called the widget code again to update those
fields, all of which goes 30+ levels deep.

------
jmostert2
If the code is really intricate, I refactor it to understand how it works (and
how it _ought_ to work, which can be an important way of spotting bugs).

But even though refactoring is supposed to be perfectly mechanical and
"harmless" (especially when supported with unit tests) I'll usually throw away
my refactoring, because the risk of breaking the existing code is just too
high. It depends on how invasive the new feature is -- if I'm going to have to
change a lot to get it done anyway, I might as well incorporate the
refactorings. But if the change literally is finding the right position to add
a single line of code, no way I'm going to change things once I find it.

When it's clear that adding even simple features takes an extraordinary amount
of time because the code is that hard to understand and maintain,
getting/making time for proper refactoring is easier, but it's worthwhile even
as a way of creating a mental model.

I've never tried unit tests to figure out the semantics of existing code,
though, even if it seems obvious in retrospect. I think the code base would
have to become very complicated indeed before I go into full scientist mode
and construct hypotheses on the semantics and verify them with unit tests. If
the code is a tangled mess, it could also be tough to extract the proper bits
to test on and/or set up a test environment complete enough for that.

------
wenbert
Maybe it's just me. But sometimes I draw diagrams describing how the code will
work and how the methods interact with each other. A quick drawing with a
pencil and paper or on a whiteboard will do. It helps me understand how the
over-all system works...

~~~
JoeAltmaier
Agreed. If it won't all fit in my head at first, put some of it on paper.
Especially when learning the code. And re-visit the drawing, testing it
against what I'm reading as I go until I'm sure its right. Once its in my
head, throw it away. Wish there was some way to use drawings as comments... My
friend Bob learns differently - he needs words, skips all the illustrations in
manuals, they just don't sink in for him. He's brilliant, just built
differently.

~~~
wenbert
That'd be the day :D Drawings for comments is absolutely a great idea.
Sometimes I wish I could "draw" something in my code (Arrows, bookmarks, quick
diagrams, etc. beside comments). Think github with this kind of feature.

------
known
One productive technique to read other people's code is to step into the code
using a debugger (e.g. gdb)

~~~
tetha
I disagree with this. The debugger is a very, very precise tool and it can be
used to gain very, very precise insights into certain code, however, very
often the debugger is just too precise. It is pretty much like trying to
understand a large chip by looking at how gates flip and flop.

Of course, if the code is horrible enough, then you might need to switch down
to actually tracing line by line and opcode by opcode, potentially even using
a debugger, but for most sane code, I think it is possible and faster to
understand larger blobs of code at once.

~~~
bendotc
Agreed.

In general, if you're having trouble tracing through a particular function
with a narrow band of input, then the debugger can be useful. If you're trying
to figure out a larger system and/or how a system works across a large set of
inputs, then stepping through with a debugger is useless.

Put another way, the debugger is to programming what the microscope is to
medicine: incredibly useful for some things, but not a very good general
diagnostic tool. Metrics data and checking assumptions (via unittest and/or
asserting expectations based on reading the code) are much better for getting
an over-all idea of what's going on. Once you've localized a problem, then the
debugger can help you get a precise idea of the issue, or can help you test a
hypothesis (to use the medical example, you can use the microscope to test a
theory that there's a bacterial infection).

------
wakeless
I don't mind using code folding when I'm doing this. Trying to fold the code
down that I don't care about, this way there isn't as much to read and get
lost in.

~~~
lsb
If you're not folding functions, but parts of functions, that's a sign that
it's poorly divided up, and a refactor would benefit it.

~~~
zeugma
Indeed, but you should understand it before even trying to rewrite it, then
folding can help a lot on poorly divided code. `

------
notthinking
It might be my relative newness to software, but I enjoy looking at other
peoples code to see what works (pick up new language features) and what
doesnt.

------
axod
I'd also heartily recommend learning to read other peoples code from the
assembly - softice/dissassembly listing.

~~~
jcl
Could you elaborate? It doesn't sound like it would improve your understanding
of the code.

------
olliesaunders
Michael Feathers' Working Effectively with Legacy Code is the authority on
this, I think.

~~~
silentbicycle
Not quite - it's primarily focused on safely retrofitting unit testing onto
legacy codebases, since restructuring code to make it testable can easily
break it.

His definition of legacy code is _code that doesn't have tests_ , so you can't
fix it or replace units of it without unknowingly introducing bugs.

------
simplegeek
Also, using a debugger (if it can be used) can be really helpful

------
berntb
I quite liked this, regarding this subject:

<http://perlmonks.org/?node_id=788328>

------
alexkearns
I don't know about the rest of you but I got into software development to
create software (ideally from scratch) and to learn clever new stuff, not to
tinker with someone else's code. Of course, just like every other software
developer, I have on occasions ended up working in someone else's codebase.
But I have done this not out of choice - and except in a handful of cases, I
have not gained any knowledge or mental satisfaction from doing so. I have
done it because I have been told to by my boss.

Let's not con ourselves. Becoming a dab hand at maintaining other people's
software does not make you a good software developer (writing your own
software does that). It makes you a good employee who is willing to do the
boring shit. The two are not the same. I for one hate maintaining other
people's code, and if I ended up doing that the majority of the time, I would
either get a new job, switch careers or kill myself. Probably one of the first
two.

~~~
axod
I think you're missing the point. The idea isn't to have to maintain someone
elses rubbish code.

The idea is that reading _good_ code makes you a better coder. Which it
certainly does. Having said that, bad code can sometimes teach you lessons in
how not to do things.

Good authors probably read other books as well.

