Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What source code is worth studying?
125 points by idlewords on Oct 13, 2009 | hide | past | web | favorite | 114 comments
I realized recently that it's been a long time since I looked at anyone else's code outside of the context of debugging or working on it for hire. In your opinion, what are some examples of particularly well-designed or implemented software projects worth looking at to broaden one's own horizons as a programmer?



I'm going to get downvoted for picking a non-open source option but...

The NT kernel is the most beautiful piece of code I've seen. Dave Cutler and team wrote some very, very elegant code that anyone (even if you're not a kernel hacker) can understand. If I need 'code inspiration', I spend some time looking through their code.

Now, the NT kernel itself doesn't have source out there but for folks who are students, you should be able to get your hands on the Windows Researh Kernel (http://www.microsoft.com/resources/sharedsource/windowsacade...). This has all the good stuff (and is much easier to build too)


I spent several years writing drivers for NT and I don't know which part of the NT kernel you were looking at. The parts I saw were an utter mess lacking any sort of elegance whatsoever. If there were two ways to implement something, one simple and the other one contrived, you can bet your MCSD certificate NT will be using the latter.

The overall impression was that it was hacked together in the worst sense of the word. Like there were several teams developing it that did not communicate with each other. Pieces of the kernel are forced together rather than fit snugly by design. I cannot stress enough how mind-boggingly repulsive it was to work with it. And that's not even touching NDIS, the networking module, which can be used for scaring some serious shit out of young programmers.

If you put BSD or Linux code side by side with NT, look me in the eye and tell me that NT kernel is "the most beautiful piece of code you've seen", you will need to see a head doctor.


I'm not sure you can actually see the kernel outside the base OS team at Microsoft, or under a unique license. it is not included in MSDN or the DDK.

Having worked on that team and having seen it for myself, it is truly elegant, concise, and very well written. I myself never touched it, and that is for the best. I probably wrote some of the crappy drivers for which you hold so much contempt.


You can get the Windows Research Kernel through the MSDNAA (which most major universities should be hooked up to). The WRK has all the good stuff


Agreed, but slamming Dave Cutler's work on the NT kernel, as huhtenberg did, sounds more like trolling than anything else to me. (Regardless of your opinion of Microsoft.)


The NT kernel is the most beautiful piece of code I've seen

Modulo Plan9 kernel. It's also lot smaller a read (ca. 50 KLoC and that's it).

NT kernel itself doesn't have source out but for folks who are students

And folks who are hackers should be able to look in certain places to find the leaked version ;>


All the more heartbreaking that it was forced to don the hideous mantel of the Win32 API (itself further scarred from numerous assaults such as C2 compliance!)


I agree that it's very beautiful code. I wouldn't go as far as "the most beautiful I've seen", personally, but it's really, really good. Also, when you come to it, you're typically shell-shocked from reading lots of crappy Win32 code, and it's just such a nice change.


Your compiler's source code. That should shake your confidence in the world.


for an easier time, your favorite language interpreter's source code ... interpreters are easier to comprehend than compilers, since you can 'follow along' with the flow of execution in the main interpreter loop just like you're executing a program ... e.g., creating stack frames, allocating heap memory, assigning variables, etc.


Squeak Smalltalk. For Java, if you can get hold of it, the old Acme Webserver. That was very clean code.


Squeak is pretty sweet. I was looking at the Kernel category and the Parser classes. Pretty clean stuff.


This really depends on the language. Ruby's code is (or at least was) quite nice. It is pretty much OO written in plain C.

By contrast Perl makes very heavy use of macros for portability, to an extent that may induce brain lock in many people, and makes your debugger very hard to follow. This is not to say that its use of macros is a bad thing in the end, but it is a definite shock.


Yeah, after hacking with perl for a while, i got interested in how it worked under the hood, and had some trouble following.

Check out the regex compilation module: http://cpansearch.perl.org/src/GBARR/perl5.005_03/regcomp.c

And execution: http://cpansearch.perl.org/src/GBARR/perl5.005_03/regexec.c


Well, anything touched by Ilya Zakharevich is going to be hard to read, even by the standards of the Perl source code. That means that the regular expression engine will be particularly hard to read.


One project I found fascinating reading a while back when I was working on graphics, is AGG. It's a vector graphics toolkit, a bit like Cairo, Quartz 2D or Java 2D. It uses a particular style of C++ - a mixture of template programming together with regular polymorphism - to build up a collection of rendering components. The components are almost Unix-like - you plug them together to build your customized rendering pipeline, and the C++ compiler's template logic takes care of making everything hardwired (or dynamic, if you choose to use the polymorphic components):

  http://www.antigrain.com/
The end result is a toolkit which is at a lower level than Quartz 2D et al, but could easily be used to build such an API. It can also be adapted for embedded, or high precision uses.


http://www.haiku-os.org/documents/dev/painter_and_how_agg_wo...

http://bit.ly/131cxL (.doc file)

those help. if you can figure out the source code without the documentation at the agg site + those files, more power to you. but basically you are nuts.


Thanks for the links, I've never seen those before. I read the agg code without the docs, but it would have been difficult without the example programs. The docs definitely help when looking at some of the algorithms though. I remember spending a good day or two staring at the anti-aliasing code, together with the freetype rasterizer (on which it was based), trying to figure out what the hell was going on. Then in just clicked.


"Code reading requires its own set of skills, and the ability to determine which technique to use when is crucial. In this indispensable book, Diomidis Spinellis uses more than 600 real-world examples to show you how to identify good (and bad) code: how to read it, what to look for, and how to use this knowledge to improve your own code."

http://www.spinellis.gr/codereading/


This is a good read and uses lots of examples from one of the BSDs (NetBSD as I rememeber).


Quake source code: www.idsoftware.com/business/techdownloads/

From a graphics and game engine perspective, it was very informative to go through pieces of of the source - I was mainly interested in the client/server and collision detection areas of the code.


Check out this recent analysis of parts of the source: http://www.fabiensanglard.net/quakeSource/


I'm still pretty wet behind the ears, but looking through Quake source (and quake 2), was quite the experience. I though the code was beautiful, as well as impressed with how DRY everything seemed to be....oh, and variable naming, small things like that, really impressed me.


I second this. It's cool that the ideas behind some of the more simple things in the Quake game (such as the console, bindings, commands, etc.), are still being used as the back bone of other products (see: Valve). They are implemented very elegantly in the Q1 source.


This is half answer/half tangent, but I like http://www.google.com/codesearch a lot for browsing and reading code. A fun thing to do is to look at how others implement something you want to implement, or use some library you want to use. You end up finding a lot of different ways of doing the same thing, and if you find something that you want to explore more, it's easy to browse around.


I learned a lot from Qt:

http://qt.nokia.com/

And it's not just their code, everything they do is exemplary. It's one extremely well run software project.

See for example their documentation:

http://doc.trolltech.com/

http://doc.trolltech.com/qq/qq13-apis.html



For systems programming, I've learned a lot from reading the source for open solaris: http://src.opensolaris.org/source/

And the LLVM compiler is far more understandable than gcc sources: http://llvm.org/viewvc/llvm-project/llvm/


I agree with this, and along the same thought, OpenBSD (or any BSDs really, but OpenBSD tends to favor simpler implementations) source is very clear and concise.


Minix, not being as 'real world' as the BSD's or Linux, is even more readable, if you're just interested in a quick glance at how an OS works.


From Minix' site: MINIX 1 and 2 were intended as teaching tools; MINIX 3 adds the new goal of being usable as a serious system on resource-limited and embedded computers and for applications requiring high reliability.

Is Minix 3 still readable, or do you recommend Minix 2?


Good question. My experience was with Minix 2: I needed a floppy driver to weld on to eCos, and Minix's was by far the easiest to deal with. I haven't looked at Minix 3.


Minix3 has comments galore, but seems plagued by one/two letter variable names like 'c' and 'ip'


SQLite's source listing is great. I learned a lot about good C practices and documentation reading it.


You will enjoy "C Interfaces and Implementations", along with the full, literate source-code for LCC.

This is also good, Standard Function Library:

http://legacy.imatix.com/html/sfl/


Where can one find the full literate source code for LCC? I haven't been able to find it anwyhere, and god knows I searched.



There's no literate source linked to on that page. Just plain, normal, barely-commented C.


"There's no literate source linked to on that page. Just plain, normal, barely-commented C."

The book by David Hanson is the literate program.


Ok, so the literate source, in a form that I can actually play with, isn't acutally available. It doesn't even seem to be maintained in literate form in SVN.

That's what I was trying to determine.


If you like that style of C code, Tcl's is similar (and it's not a coincidence, as Dr. Hipp is part of the Tcl core team).


Thanks for the tip, I didn't know that. I haven't ever used Tcl besides some tinkering a few years ago, whereas SQLite is used incredibly often, but I'll read any C that's well written.


Especially its deep test architecture.



Qmail: http://cr.yp.to/qmail.html

It feels well organized almost throughout, and it's a joy to experience the clarity of thought that went into it. In fact, most of djb's code has a similar feeling to it.

Arthur Whitney's code: http://www.nsl.com/papers/origins.htm

Being able to read this (and not merely decode it, but read it, in a manner similar to how you would read a book) will broaden your horizons like nothing else I've seen.


More of crazy Whitney code available at http://aplusdev.org


One way to find code worth reading is to select inspiring developers and look at their code. Most of the developers in "Coders At Work" for example, have Open Source code we can look at learn from.

The best code I've seen: -

Common Lisp - "Paradigms of Artificial Intelligence Programming" by Peter Norvig and "On Lisp" by Paul Graham

C - "C Interfaces and Implementations" and "LCC- a compiler for ANSI C" both by David Hanson. I also found the code for the Player/Stage robotic sim framework surprisingly readable.

I liked the Scala Actors library code as well.

If any HNers know any great codebases in Haskell or Erlang, please post here.


For haskell I would recommend Xmonad. The code is short, lots of interesting functionality including a well written plug-in system.

http://xmonad.org/


Xmonad is nice code indeed. For an introduction to a piece of it, check out Simon Peyton Jones's talk at OSCON 2007 - he uses Xmonad code as an introduction to Haskell, which gives you a taste of how the program is constructed.


I highly recommend the Stanford GraphBase, written by Donald Knuth. It's C code written by Knuth using the literate programming tool CWEB. If you don't know about CWEB, it's not hard to learn how to use, and probably is already installed on your system if you have TeX. The Stanford GraphBase is available for free download on Knuth's web-site, and also comes bound in a nice paperback book, that was reprinted in 2009. Knuth also makes available many other programs written using CWEB on his web-site, but I would start with the GraphBase.


I haven't perused it myself, but every time I see this question posed, someone always mentions Lua: http://www.lua.org/ftp/

Similar previous discussions:

http://news.ycombinator.com/item?id=225577

http://www.reddit.com/r/programming/comments/26dyh/ask_reddi...


Thanks for these links - I figured the topic must have come up before, but couldn't Google my way to the right HN thread. Hopefully there will be a search box on this site one day.


Just use http://searchyc.com/ :)


For Java, all of Apache's stuff is good - particularly Jakarta. http://jakarta.apache.org/ This covers a wide range of topics and is designed to have a public API. http://code.google.com/p/google-collections/ and http://code.google.com/p/guava-libraries/ and http://code.google.com/p/google-guice/ are interesting in that they come out of Google, beyond their own individual merits.


Django's code is well written, and very well documented.


I think you tend to learn more studying badly written badly documented code. Certainly there's a lot of knowledge to be gained being able to decipher spaghetti code with no documentation. I guess it depends on your aims though.


The FreeType 2 font rasterizer is beautiful code. Geometric computation meets performance optimization meets just-well-written-and-commented C.

http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree...


If you're an iPhone developer, Joe Hewitt's three20 is the best open-source codebase I know of: http://github.com/joehewitt/three20


Three20 is really useful. It's also quite powerful. But I wouldn't recommend it to someone trying to learn/study idiomatic Cocoa.

Joe tends to do things his own way, which works out fine, but does tend to depart from most people's Cocoa/Touch code.


Ward's Wiki (The Portland Pattern Repository) has a short list of "Programs to Read" at http://c2.com/cgi/wiki?ProgramsToRead.


DVD John's DeDRMS written in C#. It was beautiful, but I can't seem to find the source code right now. If anyone can please share.

--EDIT---

Found it: http://web.archive.org/web/20050315135351/http://nanocrew.ne...


If you're interested in literate programming: TeX


Diomidis Spinellis has written two books (Code Reading and Code Quality) on this subject that take you through annotated examples of industrial strength systems code (mostly from BSD IIRC): http://www.spinellis.gr/codereading/


Any recommendations for php? I wrote a php/mysql web application for SaaS customers. I am looking at ways to improve performance/cache/error handling or simply write a better code.


For a smallish project that you can figure out fairly easily, I recommend the Kohana framework. It's extremely well documented and it's quite easy to read. Many people actually recommend reading its source to supplement the documentation and after having to do so myself I can see why.


I learnt a lot by using frameworks as symfony. The initial learning curve is steep, but then you can understand architecture better. The jump to other languages such as python and java was really easy after that.


There's a lot of excellent PHP code in the Horde Project's repository: http://cvs.horde.org/


There is no such thing as excellent PHP code.


The symfony framework and Doctrine ORM are both well written projects.


Perhaps MediaWiki?


Personally, I wouldn't recommend it. Unfortunately I can't give a better alternative, but I've done some hacking on MediaWiki since version 1.7 and while it's some of the better examples of PHP out there, it still leaves lots to be desired in different places.


Thanks. My post was just a guess.



Any recommendations for Ruby/Rails projects?


I'm a big fan of ThoughtBot's stuff (http://www.thoughtbot.com/projects). Mostly I use Shoulda and FactoryGirl, but I've been consistently impressed with their ideas and code quality.

For idiomatic Ruby and Rails stuff I have probably learned more from Rick Olson's stuff (http://github.com/technoweenie/) than any other single source. Sometimes I think he tends to be too clever for his own good, but the code is good to read for that reason even if simpler things are better in production.


Ah! Somebody published a timely list of high quality rails apps http://jetpackweb.com/blog/2009/10/14/high-quality-ruby-on-r.... Discussion here http://news.ycombinator.com/item?id=882071


Ryan Tomayko recommends Unicorn as a reference for Unix programming (primarily, signals and sockets):

http://tomayko.com/writings/unicorn-is-unix

I found Sinatra to be an interesting read (and specifically how it parses the app configuration into methods):

http://github.com/sinatra/sinatra


I liked Jamis Buck's (aka the creator of Capistrano) Bucketwise: http://github.com/jamis/bucketwise


Read the source to Marcel Molina's AWS::S3.


I highly recommend both Paul Graham's source code for Arc (http://ycombinator.com/arc/arc3.tar), as well as the source code for Quake and Quake II (http://www.idsoftware.com/business/techdownloads/).


Anyone willing to recommend a Java project worth studying?




One way I've found to start reading code (something that I do not find easy at all) is to read code you use. Lately, I've been reading pyparsing and the Python markdown module. As an added bonus, I also discovered features in those libraries that I never knew existed.


I learned about sockets and networking by looking at the source for wget.

That's also where I saw calloc.c - A portable implementation of calloc. I distinctly remember it being one of the cleverest hacks I've seen.


I've heard this book was good http://www.amazon.com/Beautiful-Code-Leading-Programmers-Pra.... It asks 30 different developers for pieces of beautiful code. I read an excerpt or two back when it was released and put it on my to read list. Unfortunately haven't gotten to it yet, but still plan too.


Any recommendations for good C# code? Much of the web is open source oriented so I don't hear too much about great C# code.


There are some code examples on Eric Lippert's blog that are made of good C# code.


Mono's c# compiler is written in c#. SharpDevelop is another good choice.


http://banshee-project.org/

(disclaimer: I'm one of developers)


I don't know if this is good advice today, but back in the day I learned a lot both from STL and the Boost libraries, specifically Spirit, which I still think is an amazing feat: http://spirit.sourceforge.net/


i will say unless you plan to make small changes, you wont get much out of studying the source code. fixing small bugs in an open source project will expose you to lot of portions of source code and also give you some rep in case you want to become a contributor later


The Linux kernel: http://miller.cs.wm.edu/


Equally or more valuable than trying and sit down to read the kernel source is to follow the patches and discussions thereof on the linux-kernel mailing list: http://lkml.org/

I've found that doing so a) improves my knowledge of what's going on under the hood tremendously, and b) has taught me a fair bit about how to do code reviews and make difficult design decisions. Studying the kernel itself line-by-line would take ages, but reading a few threads a week depending on what piques my interest is totally manageable.


I don't treat is a novel, but a reference. I'll often wonder how certain things are implemented, and I'll poke around for the data structures and algorithms until I gain some understanding of how it works.




Could anyone recommend something for Python? If something is out there for ruby, there's got to be at least one thing for python.


How about Ned Batchelder's implementation of Frank Liang's hyphenation algorithm:

http://nedbatchelder.com/code/modules/hyphenate.html

or AI:

http://aima.cs.berkeley.edu/python/readme.html


I hear good things about the webkit source code



Any recommendations for good objective-C code?


Flex: FlexLib (mostly by Doug McCune)

http://code.google.com/p/flexlib/


Anyone can point to a good javascript code?


I find the sources of ExtJS nice to read. They have an OOP approach.


I'd really like to see some functional-style JS code.

[ background : I've used OOP a lot, and thought it was the ultimate, until I discovered lisp... ]


Hmm.. How about the ADsafe code - http://www.adsafe.org/adsafe.js

by Douglas Crockford, author of 'JavaScript : the good parts' - http://www.crockford.com/


Any good event-driven JavaScript (jQuery/Prototype is also fine) projects out there?


Event driven JS tends to go spaghetti really quickly unless you're using a good events framework. The yui3 and sproutcore frameworks are pretty close to opposite ends of the spectrum in terms of philosophy, but both have very well defined processes for dealing with events. The two frameworks you mention are fine for adding "ajax" to a more or less working server based app but they aren't so hot for building a complete app. FWIW, MochiKit's Signal framework is also nice (though the codebase is now fallow) and Dojo's system has a sort of nifty aop feel, but dojo is kind of hairy in that there are a lot more chefs in the kitchen there.


Check out sammy.js. Its basically a web framework on top of jquery with a plugin architecture, etc. Seems pretty neat.


I've been led to understand that the OpenBSD source code is of amazing quality.



That of a metacircular evaluator such as the one described in SICP.


openssh




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: