
Learn to read the source, Luke - mgrouchy
http://www.codinghorror.com/blog/2012/04/learn-to-read-the-source-luke.html
======
quanticle
I find it a bit ironic that Jeff Atwood is advocating reading the source when
he's built his company on top of a closed-source stack. Can he read the source
of ASP.Net when he encounters an issue with the web frontend? Can he debug
into MSSQL Server when he has an issue with database performance?

Jeff can talk about the importance of having source available, but his actions
speak louder than his words. He's built a very successful startup on top of a
closed-source stack. Having the source isn't as important as it seems, then.

~~~
bunderbunder
_Can he read the source of ASP.Net when he encounters an issue with the web
frontend?_

Yes. Recently they even went a step beyond the traditional "shared source"
thing by releasing it under the Apache 2.0 license.

<http://aspnetwebstack.codeplex.com/>

Less so with MSSQL. But it's less of an issue there, because MSSQL provides a
very good view of what's going on under the hood to begin with, and Microsoft
has done an extremely good job of documenting the whole thing.

~~~
Locke1689
Having worked on the MSSQL source code -- even if you could see it it would
not help. It's a incredibly complicated monolith with a lot of historical
baggage. Not kind to fresh eyes. Remember that SQL Server was originally
Sybase SQL Server.

~~~
quadrant
That apocryphal last comment is irrelevant though as the code bases entire
split many, many, versions ago. (pre 7.0)

~~~
Tyrannosaurs
And if memory serves it's been through at least one, possibly two ground up
rewrites since then.

~~~
endersshadow
It's been through massive rewrites, but they didn't throw away the code and
start from scratch. They took five years (between SQL 2000 and SQL 2005) and
completely revamped the offering. Looking at SQL 2012, it's really quite
impressive how far they've come along, albeit in 12 years.

------
greenyoda
Atwood wrote: "The idea that you'd settle down in a deep leather chair with
your smoking jacket and a snifter of brandy for a fine evening of reading
through someone else's code is absurd."

Not so absurd, really. Reading well-written source code is a great way to
learn the finer points of the art of programming; it's not just for fixing
bugs. In fact, entire books have been published that consist of annotated
source code, the most famous probably being Lions' Commentary on Unix and
Knuth's "TeX: The Program".

~~~
kevinpet
Entire books have been published that consist of annotated source code, the
least famous probably being Lions' Commentary on Unix and Knuth's "TeX: The
Program", since they are pretty much the only widely distributed books
consisting primarily of annotated source.

~~~
Danieru
How can they be the least famous if they are the only ones?

~~~
TeMPOraL
If they're the only ones, then they're the least famous and the most famous at
the same time.

------
rollypolly

      Nobody reads other people's code for fun.
    

Not true for me. I /love/ reading code from great engineers. I've learned a
lot doing so.

~~~
spacemanaki
Yeah I think he's really off-base there, and the rest of the paragraph is
wrong too: "The idea that you'd settle down in a deep leather chair with your
smoking jacket and a snifter of brandy for a fine evening of reading through
someone else's code is absurd."

Absurd? It's basically what I did recently with ClojureScript One, replacing
brandy with beer and deep leather chair with kitchen table and chair. I found
it very enjoyable and enlightening. And I'm not trying to brag, I really don't
think this is a mark of anything special.

IIRC publishing and highlighting code that was interesting to read was one of
the goals that Peter Seibel wanted to tackle with Code Quarterly (which didn't
pan out, but still, he didn't think it was an absurd suggestion). I also seem
to recall reading code being something a lot of the people interviewed in
Coders At Work described as being valuable. And one of the stock questions
Seibel asked everyone in that book was if they had tried literate programming
("a la Knuth"), which is really just a way to make large pieces of code easier
to be read by someone else. All's to say, absurd it clearly is not.

------
btn
Something he misses is that _good_ documentation isn't just an English
translation of what the source code does, it's also a contract that describes
what the source code _may_ or _may not_ do now and _in the future_.

If "the source code is the ultimate truth", then your source code is indelible
---you can never change your implementation because you've given users freedom
to depend upon the behaviour produced by any line of it. If you don't want
people depending on implementation details, then you _need_ documentation to
hide those away.

~~~
elviejo
Again go to the source... design by contract is a way to embed and execute the
contracts in the source code is a great way to increase code quality. Take a
look at the Eiffel programming language is impressive what having contracts in
the code gives you.

------
nickm12
This post seems to take the attitude that "documentation will always suck, so
just go right to the source". I think this attitude can impede software
projects with changing code (that is, all of them).

The problem is that the source can only tell you what a program does, not what
it is _supposed_ to do. If you don't know what it is supposed to do, it can be
difficult for consumers of the code to know whether some behavior is is
intended or a side-effect of the current implementation. Likewise, code
maintainers can be prevented from changing the implementation when they don't
know if consumers are relying on undocumented behavior. It's more difficult to
file bugs against undocumented code; how do you know it's a bug if you don't
know what the code is supposed to do?

In brief, good documentation and good code are a virtuous cycle. Reading the
source is often necessary but it should be viewed as a failure of
documentation.

------
groby_b
Sigh. This is a rather myopic view of documentation.

Of course the source code is the ultimate arbiter of truth. But having a few
roadmaps to that source code is _incredibly_ valuable. And as long as the
underlying code does "what it says on the box", there's no reason to read the
code.

Reading your stack's source should be a last recourse, not the default mode of
operation. (Yes, I do read source code of my stack. Plenty of it. Which is why
I appreciate any occasion where I don't have to.)

And when I see his "brilliant HN post" mention that suggests that "sometimes,
you recompile your compiler", I'd like to smack some sense into people. You
really don't. I've been working on low-level software for a _loooong_ time,
and I find about one compiler bug a year. I even do have a bit of a background
in compiler writing. And yet, the sane choice is to write a small repro case,
file it with the maintainers, and write your code to work around that bug, at
least in most cases.

~~~
brianpan
I didn't read source as the "default mode of operation", but rather a
necessary fallback that you 1) shouldn't shy away from and 2) you should
demand lest you be unable to access the ultimate truth of what you're building
on top of.

I definitely see a hallmark of experienced/skilled coders as not being afraid
to follow the trail of code farther than I sometimes have patience for.

~~~
groby_b
It's a very thin line to walk. I know I've spent days reading Linux or Mach
sources when I could just have coded around the issue, and I wouldn't list
that as something that makes me a skilled coder. It just means for me
sometimes shiny outweighs expedient ;)

------
fleitz
Reading the source gives you the confidence to tackle bigger projects. Once
you realize what a complete and utter hack job most of the projects you use
are it gives you the confidence to just build your own hackjob, or take their
project and fix it.

For me the biggest one was an FTP library, all it did was figure out when the
server stopped sending data for a particular command and then run a Regex over
it, populate an array of objects and return them.

Unread source is like a David Copperfield trick, it's magic, once you read the
source and know how it's done the magic is lost and you understand what is
really going on behind the hand waving.

------
munin
this is a good skill to have. however, there are a lot of problems / domains
where reading the source isn't enough. debug the operation of compiler logic,
for example. what's really important is to know how the algorithm has been
implemented and how the algorithm, as implemented, is interacting with your
current problem / use case.

if understanding the algorithm involves boning up on two semesters of type
theory or graduate-level courses in algorithms, number theory and abstract
algebra, as debugging problems in modern databases, compilers and high-
performance integer libraries would, then having the source code is probably
not going to help you as much as you think it would...

~~~
ajross
Long term, I think having a solid grounding in (to continue your example) real
world compiler infrastructure and the ability to fix bugs in your tool chain
is going to "help" you an awful lot more than getting whatever instantaneous
problem you have fixed.

I mean, sure: for everyone there are some problems that are so obscure as to
be near-impossible. But if you go through life always deferring those
solutions (by calling tech support, or giving up, or playing voodoo games
until the problem goes away), that set of problems will never shrink. You'll
end your career, broadly, just as incompetently as you started it.

If, on the other hand, you make a practice of always digging for bugs, even
across library boundaries into "other people's" code, you'll find over time
that things like compiler bugs stop looking so scary.

------
otterley
In my view, documentation -- not just any documentation, but correct, complete
documentation -- is at least as important as code. I don't release any of my
own personal projects for general consumption until the documentation is done.

Not having good documentation demonstrates a lack of respect for the user's
time. To be a successful project, people of varying skill levels should be
able to use it.

In order to compete successfully with closed-source Unix variants, GNU had to
have as good documentation as its competitors and the result was excellent
documentation (even if Info files were a bit baroque). The result was
comprehensive and useful manuals for GNU projects such as GCC, Bash, Emacs and
so forth. It's a real shame developers today haven't followed in their
footsteps.

------
Bamafan
Serious question for Jeff - If I gave you a 500,000 line app (any language)
with zero documentation and asked you to start adding features and fixing
bugs, you'd be cool with that because you had "the source"?

Also, how did you get so far in your career as an MS developer with such
limited access to source code?

~~~
adrianhoward
_Serious question for Jeff - If I gave you a 500,000 line app (any language)
with zero documentation and asked you to start adding features and fixing
bugs, you'd be cool with that because you had "the source"?_

I'm not Jeff - but that situation has occurred multiple times in my career.
Along with the more problematic one of there being documentation, and there
being serious discrepancies between the docs and the code.

I like to have both by preference, but if I had to pick one I'd pick the
source. I can figure out what it does from the code. I can't figure out the
bugs from the docs.

Both of these situations outnumber the times I've had large code bases with
good accurate documentation.

 _Also, how did you get so far in your career as an MS developer with such
limited access to source code?_

I'm not an MS developer, but from those I know there seems to have been pretty
wide access to lots of source for some years now - you just can't fix and re-
distribute it :-)

------
pestaa
I absolutely agree with Jeff. Often times I catch myself choosing open source
over better features so that I can fix the damn thing on my own if it goes
wrong.

------
DanielHimmelein
I really like reading source code, that's why I put together some advise how
to do it: [http://himmele.blogspot.de/2012/01/how-do-you-read-source-
co...](http://himmele.blogspot.de/2012/01/how-do-you-read-source-code.html)

From reading source code both for fun and for purpose e.g. like the Android,
Minix, QNX, Linux and NetBSD, network protocol stacks, filesystems, web
frameworks, CouchDB etc. I got a lot of insights into interesting software
technologies and architecture patterns. Good software engineers and architects
should be good and fast at reading code.

------
ishbits
Anyone thought they were pretty good at reading source until they encountered
a Spring application context spread over multiple XML files.

~~~
ajuc
We had 2 tier application - Rich client in Qt3 based framework and PL/SQL for
"servre side".

We moved to J2EE using EJB3, Hibernate, Eclipse RCP. Our application was meant
to be 3-tier, but actually it's more like 10-tier. We have hibernate mappings,
java model, xml files specyfing possible queries and reports, java EJB3 beans
wrapping these xml files, java classes for DTO, xml files specyfing possible
views and editors in RCP, and xml files specyfing how to map from query to
view or editor. And java classes for custom code in views/editors.

When I want to see what database column is shown in view, I need to start with
view class, and descend all those layers down to hibernate mapping.

In our previous qt framework we had one xml file per client view, specyfing
columns/sorts/filters/etc just for this view. Our consultants understood these
files and changed them when they needed to. Now they would need to understand
all those layers.

Now I think more than 2 layers in application is antipattern.

------
snprbob86
Brandon Bloom here. Glad you liked my post, Jeff!

Shameless plug: My startup, <http://www.thinkfuse.com> is hiring developers
who already know how to read the source! Email me at brandon@thinkfuse.com if
you're in Seattle and looking to join a bunch of great developers who know how
to build cool stuff and have a fun time.

------
dmethvin
With great power comes great responsibility. Sure, read the source. But don't
think that because you followed some internal code path and figured out that
it's "safe" to pass `null` to a function that it will _always_ be safe. If the
docs for a project don't say it's safe, ask them to clarify.

------
ceol
_> >"That project is too big, I'll never find it!" or "I'm not smart enough to
understand it"_

That rings so true here. When I started Python web development, I needed to
understand some concept related to middleware and handlers (somewhat foreign
coming from PHP). My first thought was to look for blog posts explaining how
it works in Django, but that wasn't satisfactory. I took a chance and dove
into the Django source code— going against the voice in my head telling me,
"You'll never understand it!"— and found myself learning so much. It was
great!

In software development, we're taught to abstract everything and only think of
the smallest problem, but this sometimes forces us to think of libraries as
magic. This was a problem for me as a beginner, but it's been getting better
as I've progressed.

------
philh
Depending on the problem, there might be other tools that lie between 'source
code' and 'documentation' on the readability-versus-accuracy scale.

I'm specifically thinking of strace, which I've used to diagnose problems that
I was having with apache and chromium, among others. I don't think I would
have got anywhere from reading the source. lsof is another, though I can't
offhand think what I've used it for.

If you do find yourself in the source code, being willing to play around with
it is invaluable for working out what's going on. If nothing else, you can
insert a printf to confirm that you're looking in the right place.

------
RyanMcGreal
> That's why, when it comes to code, all the documentation probably sucks. And
> because writing for people is way harder than writing for machines, the
> documentation will continue to suck for the forseeable future.

I don't think it sucks because it's harder to write. I think it sucks because
it's not strictly necessary to document your code in order to compile/ship it,
and it's easy to justify putting it off. Poorly documented code is a form of
technical debt, a compromise between getting it done _right_ and getting it
done _right now_.

------
drothlis
Some environments make it easier to browse the source than others. See for
example the way you would figure out how to customize indentation in Emacs, by
asking Emacs for the source code to the command that is run when you press
TAB: <http://david.rothlis.net/emacs/customize_c.html#style>

(Note that the above refers to browsing the source of Emacs itself, not using
Emacs to browse any arbitrary source -- unfortunately the tools for that are
still very primitive).

------
ConstantineXVI
There's something to be said for a language that's written in itself. Been
picking up Clojure over the weekend; just for fun I clicked "go to definition"
on defn and there[1] I was looking at the source, all Clojure. Not that I
think I'd be spending a lot of time looking for bugs in defn, but it's a neat
feeling to be able to go under the hood like that.

[1]
[https://github.com/clojure/clojure/blob/master/src/clj/cloju...](https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L272)

~~~
mquander
FYI, it's not actually all Clojure; stuff like the parser and the core data
types are still written in Java. (Of course, it's still open-source.)

[https://github.com/clojure/clojure/tree/master/src/jvm/cloju...](https://github.com/clojure/clojure/tree/master/src/jvm/clojure/lang)

I think translating the rest is a slow ongoing project (as performance, etc.
gets up to par.)

~~~
ConstantineXVI
Aware of that; still feels like you can get pretty deep in the language
without hitting the java. It's still IMHO worlds away from always seeing C
(vs. the language you're actually working in at the moment) whenever you open
up the source

------
MBlume
Note that this applies to hiring. A candidate who can write FizzBuzz given a
spec but can't derive a spec given FizzBuzz is not a programmer. Do not hire
them.

------
singingfish
Basically bollocks. Be kind to your users. Provide complete trivial working
examples for everything whenever possible.

------
Aissen
At first I thought it was going to be about this hilarious response from Jon
Corbet, to a newbie asking about kernel development:
<https://lkml.org/lkml/2012/4/15/114>

------
one-man-bucket
One of the main reasons I hang out on programming help channels on IRC is to
hone my skills in reading other peoples' messy code when helping them solve a
problem. Plus, there's the bonus good feeling from helping people :)

------
alexchamberlain
My only problem with reading source code of larger projects is architecture.
It can be quite difficult sometimes to understand the architecture of some of
the larger projects in order to actually read the source.

------
mattbriggs
He should get into ruby. 99% of the time all you get is an uncommented
generated API doc that is more confusing then helpful, the only choice you
HAVE is to "read the source"

------
beernutz
Someone requested a shirt:

[http://www.zazzle.com/read_the_source_tshirt-235957482677132...](http://www.zazzle.com/read_the_source_tshirt-235957482677132269)

~~~
beernutz
Sorry, link got hosed.

[http://www.zazzle.com/read_the_source_tshirt-235248361224605...](http://www.zazzle.com/read_the_source_tshirt-235248361224605555)

This one should do the trick.

------
sad_panda
"The transformative power of "source always included" in JavaScript is a major
reason..."

Unless it's minified, in which case you might as well be running cat on a
binary.

------
goggles99
Next time my boss razzes me for not writing documentation about something, I
am going to point him to this article :)

Seriously though, the article does seem to belittle documentation.
Documentation gives an API context, it is often invaluable. Even if you have
the source with some comments, this often does not give a clear picture right
away.

------
drivebyacct2
Maybe I'm unique, but this is regularly my MO in my own work or projects. Feel
like a bug in django, or an edge case isn't documented, I do two things: ask
if anyone knows off the top of their head in an IRC channel and then go and
read the source while I wait to see if anyone has any sagely advice.

Sure beats trying to post an unformatable block of text into an MSDN forum and
then well, nothing happens after that.

I also agree with the other comment here's implication, reading source code
from others is a great way to get insight on how to use APIs, how to write
idiomatic code and ways to avoid pitfalls.

