Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are some well written/engineered open source software?
98 points by inapis on Feb 11, 2017 | hide | past | web | favorite | 67 comments

I've spent a lot of time reading C sources. Standouts are nginx, mbed TLS, Amazon s2n. Clean coding styles, consistent in checking function return values (very important! significant source of vulnerabilities in C software), comments where due, no hacks.

Among the most convoluted source codes I've read is Tor. It works (apparently), and it isn't even very insecure per se (the code is littered with hard asserts that will abort code execution if an expected condition isn't met), but it is unnecessarily dense. Example: I use software to analyze the call graph (which function calls which function) and when I ask it to find potentially recursive loops (A() calls B() calls A() etc) it spews out tens of thousands of potential recursions.

By comparison, mbed TLS only has a couple of these, and a large project like OpenSSL 50 or so.

Conversely, C software that isn't consistent in error signaling (return -1 on error in function A, return 0 in function B, set parameter int* err in function C, etc), doesn't perform due error checking, whose call graph is spaghetti, mindlessly performs multiplication (leading to overflows with certain inputs), uses signed or unsigned int where size_t is better suited, are usually susceptible to bugs and abuse (vulnerabilities). The projects I mentioned are very clean in this regard.

Noted below, but I'd also highly recommend Redis for clean code.

What software do you use to analyze call graphs?

> it is unnecessarily dense

Underhanded perhaps?

Facebook Presto, a MPP SQL Engine written in Java.


I have learned a lot from reading the source code and watching it develop. It is written in modern Java 8. The authors are obviously experts of the language, JVM and ecosystem. Since it is an MPP SQL engine performance is very important. The authors have been able to strike a good balance between performance and clean abstractions. I have also learned a lot about how to evolve a product. Large features are added iteratively. In my own code I often found myself going from Feature 1.0 -> Feature 2.0. Following Presto PRs, I have seen how for large features they go from Feature 1.0 -> Feature 1.1 -> Feature 1.2 -> ... Feature 2.0 very quickly. This is much more difficult than it sounds. How can I implement 10% of a feature, still have it provide benefits and still be able to ship it? I have seen how this technique allows for code to make it into production quickly where it is validated and hardened. In some ways it reminds me of this: https://storify.com/jrauser/on-the-big-rewrite-and-bezos-as-.... You shouldn't be asking for a rewrite. Know where you want to go and carefully plan small steps from here to there.

Asterisk PBX. Well-chosen small set of module types (channel drivers, applications, functions, resources, codecs & formats), allowing to implement literally any behaviour, and converge with any thinkable external technology. Not working in VoIP anymore for quite long time, but the clarity of design of Asterisk has deeply influenced me.

Gstreamer. Pipeline is very powerful model for software, the potential of it is tremendous. Unfortunately I find level of development & maintenance of Gstreamer project itself quite poor - the code is horribly complicated for questionable reasons (it's said to be non-blocking everywhere; I find it bad excuse for being ridden with subtle bugs and for failures to use custom pipelines as blocks for higher-level pipelines).

I find such projects as ffmpeg and linux kernel quite well engineered, but have nothing special to say about them except that they are reasonably well organized and get better day by day.

For user-interface apps with considerations of high user productivity, I find such software as readline, tmux, mutt and bunch of other following wise pattern of extensible and scriptable software: if you want hotkeys, you need a domain-specific language and bindings must be

  key: action[, action...]

  action: key

I am grateful to work with a few of the asterisk developers and they strive hard for quality. A project that long running and feature-rich is not easy to keep up to date, stable and well architected. If you want to see a project with professional commit messages, it is a solid example (the past several years at least).


Just in case anyone were to be led to believe this:

Asterisk's code base is a pile of crap.

It's been getting a bit better over the years, but it still is terrible, tons of conceptual blunder, protocol implementations are only losely inspired by the specification, system APIs are used incorrectly, lots of code doesn't bother with dynamic string lengths, but instead simply truncates strings arbitrarily if they don't fit into some fixed-size buffer, ...

The only reason it kindof works is because bugs that happen often enough do end up being fixed at some point, but that's about it. If you know your C and POSIX APIs and you don't believe me, just go and have a look at the code, I promise you'll find a bug in less than an hour.

Yes I know Asterisk is ridden with bugs and has very nasty spots at its core (e.g. "channel cloning" or whatever it is called). It was my job to debug the code with gdb and valgrind :)

What is still amazing to me is the set of core design concepts which I've listed - channels, applications... I have a case for comparison here, where the project is of comparable complexity but all features are bolted-on ad-hoc without such complexity compartmentalization which Asterisk has.

This is surprising to read considering the following article:


FreeSWITCH is an alternative to Asterisk.

If the explanation looks like this I am not sure I want to see how the code looks like..


You might find this resource helpful:


Anything written by djb (https://en.wikipedia.org/wiki/Daniel_J._Bernstein): qmail, djbdns, ucspi-tcp, daemontools, etc.

I'll second the recommendation for djb software. http://perl.plover.com/yak/qmail/ has slides from a presentation about qmail internals.

Discourse is a really solid codebase with some nice patterns (their auth/auth checking, for example); probably the best OSS Rails app I know. I routinely answer questions about how the product or API works with 30 seconds of examination of the code.

Thank you! Came here to mention them. Thrilled to see you feel the same way about the readability of that code base.

I've learned quite a bit from reading through the Laravel source - https://github.com/illuminate

edit for details: The authors are quite meticulous (notoriously, every comment in a multi-line comment is 3 characters less than the previous) and stick to the "convention over configuration" mantra no doubt inspired by Ruby on Rails. It's interesting to see how they create abstractions to simplify so many common web dev tasks.

I specially like the collection class: https://github.com/illuminate/support/blob/master/Collection... It's almost as natural language.

Many names put out there, but not much substantiation. If you are going to drop a name, could you explain why it is well written/engineered?

One could argue the question is way too vague. What is a "well written/engineered" software to begin with?

I like the engineering aspects of VS Code:


Seconded. For anyone interested in looking at a real-world Haskell codebase, this is a classic.

The Stockfish chess engine: https://github.com/official-stockfish/Stockfish

I learned a ridiculous amount from reading the source code to TeX (https://www.amazon.com/Computers-Typesetting-B-TeX-Program/d...) but it is written in a very 1970s style.

Take a look at PostreSQL.

Sqlite might be good bet too. Especially with engineering. I have their famous test suite in mind.

I think Lua deserves to be added here: https://www.lua.org/source/5.3/

Isn't Chrome quite buggy and leaky? (for any user of Chrome.) The same goes for Chromium, right?

I'm a fan of the underused dlib C++ library[0]. It has a lot of uses and work transfers cross platform no problem. I know I can do all the work on my Linux machine then when it comes time to export for Windows just open up a VM redownload the repo and compile with cmake and it just works

The thing I like about it the most though are the examples which there are for every feature. The person who wrote it actually understands what I want out of an example, I want code I can look at and immediately understand what is going on and why. I want examples I can refer to when mine does not work so I can compare and see what it is I did wrong. Take the GUI example[1] for instance, anything that happens that is specific to that example has a comment. It makes no assumptions about your prior knowledge other then you understand C++.

[0]http://dlib.net/ [1]http://dlib.net/gui_api_ex.cpp.html

Redis, the most cleanly written and easily extensible code in C you can find.

OpenBSD for correctness and avoiding bloat. One of them told me MuPDF was cleanly coded, too. Rare for PDF readers.

Vyatta's firewall distribution had some documentation which struck me as being remarkably well-written back in the day. Usage appeared to be well thought out. Don't know if their code is nice or whatever but if other aspects are any indication, I'd imagine it too is well done.

Quake 2 and Quake 3.

So far, it's the cleanest code I've ever worked with while still being very self-contained.

I really like reading the go std lib and runtime source.

RxSwift https://github.com/ReactiveX/RxSwift is gorgeous. Cycle.js and RxJS also. Chromium + LLVM also (minus the x-platform parts but those suck everywhere).

Redis. SQLite.

Second Redis. I don't even know C yet I find it surprisingly easy to follow.

I've always found the git source a pleasure to read.

Can you really learn from just reading source code? It seems like you need an annotated guide to understand why this was done along with the how.

Somewhat. I just dug through Laravel's source code and the comments helped.

Having an annotated guide for each software would be difficult but all of us have to start somewhere.

Commit messages help.

If someone care about GTK+-3 and C, I would recommend gnome-recipes[0]. Well written and smaller codebase (It is still in active development, so not yet feature complete).

It shall be helpful in learning Object Oriented Design using C programming and GObject.

[0] https://wiki.gnome.org/Apps/Recipes

* PostgreSQL

* Varnish Cache

* qmail

* Mercury Programming Language

I'm only familiar with varnish 2.1, but as to that version I think it's a bit of a stretch to say varnish is well written. VCL is very complicated - just check the request flow diagram [1]. Some of the documentation is very poor - try to find out the properties available on beresp for example (you have to grep the source code [2]), or try to understand the precise function and implications of grace mode, saint mode, or hit_for_pass. The best redeeming quality is varnishtest and some of the other tools that are provided.

[1] http://book.varnish-software.com/3.0/_images/request.png

[2] https://github.com/varnishcache/varnish-cache/blob/2.1/lib/l...

PostgreSQL, Nginx, fio, SublimeText (3), nmap, libcurl (and curl itself), ffmpeg (parts are also in asm), rsync, XLD, and the list goes on...

Watch out for biases based on how much people like the end product vs how well it's actually implemented though.

Lua's source code is very nice to read, even if you're not a C guy


Especially the kernel in src/os

Answering the question - that would be Blender.

Postfix is a good example of a system written in C with separate components (running in different processes for security).

I always thought Postfix was nice. Maybe I'm wrong as I haven't seen it mentioned.


LLVM is a fantastic example of well written C++ code.

Apache Solr and Tomcat

Solr source code is a mess and sometimes worse than that. Test coverage is pretty good tho.




Google's Guava library, at least the parts I've seen, is incredibly well written and organized.

A assume most "standard library" type stuff is where you will find the cleanest code.

Digging into the code reviews of Guava is impressive: if you've ever felt a code reviewer was being too strict, that's probably nothing compared to Guava reviews. And it shows in the quality of the library.

I implemented a data structure similar to one in guava and I thought my code was pretty good. I looked at guava source out of curiosity and immediately refactored my data structure.

I ended up refactoring it again, and the code is still not as clear as guava.

For instance, I noticed they were using an enum for functions and I was like WTF who does that? Later I decided to make my library serializable so we can save to disk. Well, turns out that's exactly why they used an enum. My solution was to make a utility class to wrap the non-serializable objects but their solution was much clearer and less code

I have used the mono sourcecode as a goto reference for poorly documented .Net framework code, because usually it was clean and quality code. Some quick comparisons with the coreclr and .net reference sources also supported my impression. (a lot of code is being merged from mono now)


For example, NetBSD, which also had a book[0] written about it.

[0] http://www.spinellis.gr/codereading/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact