Ask HN: What are some well written/engineered open source software? - inapis
======
guidovranken
I've spent a lot of time reading C sources. Standouts are nginx, mbed TLS,
Amazon s2n. Clean coding styles, consistent in checking function return values
(very important! significant source of vulnerabilities in C software),
comments where due, no hacks.

Among the most convoluted source codes I've read is Tor. It works
(apparently), and it isn't even very insecure per se (the code is littered
with hard asserts that will abort code execution if an expected condition
isn't met), but it is unnecessarily dense. Example: I use software to analyze
the call graph (which function calls which function) and when I ask it to find
potentially recursive loops (A() calls B() calls A() etc) it spews out tens of
thousands of potential recursions.

By comparison, mbed TLS only has a couple of these, and a large project like
OpenSSL 50 or so.

Conversely, C software that isn't consistent in error signaling (return -1 on
error in function A, return 0 in function B, set parameter int* err in
function C, etc), doesn't perform due error checking, whose call graph is
spaghetti, mindlessly performs multiplication (leading to overflows with
certain inputs), uses signed or unsigned int where size_t is better suited,
are usually susceptible to bugs and abuse (vulnerabilities). The projects I
mentioned are very clean in this regard.

~~~
nemild
Noted below, but I'd also highly recommend Redis for clean code.

------
nfa_backward
Facebook Presto, a MPP SQL Engine written in Java.

[https://github.com/prestodb/presto](https://github.com/prestodb/presto)

I have learned a lot from reading the source code and watching it develop. It
is written in modern Java 8. The authors are obviously experts of the
language, JVM and ecosystem. Since it is an MPP SQL engine performance is very
important. The authors have been able to strike a good balance between
performance and clean abstractions. I have also learned a lot about how to
evolve a product. Large features are added iteratively. In my own code I often
found myself going from Feature 1.0 -> Feature 2.0. Following Presto PRs, I
have seen how for large features they go from Feature 1.0 -> Feature 1.1 ->
Feature 1.2 -> ... Feature 2.0 very quickly. This is much more difficult than
it sounds. How can I implement 10% of a feature, still have it provide
benefits and still be able to ship it? I have seen how this technique allows
for code to make it into production quickly where it is validated and
hardened. In some ways it reminds me of this: [https://storify.com/jrauser/on-
the-big-rewrite-and-bezos-as-...](https://storify.com/jrauser/on-the-big-
rewrite-and-bezos-as-a-technical-leader). You shouldn't be asking for a
rewrite. Know where you want to go and carefully plan small steps from here to
there.

------
andrey_utkin
Asterisk PBX. Well-chosen small set of module types (channel drivers,
applications, functions, resources, codecs & formats), allowing to implement
literally any behaviour, and converge with any thinkable external technology.
Not working in VoIP anymore for quite long time, but the clarity of design of
Asterisk has deeply influenced me.

Gstreamer. Pipeline is very powerful model for software, the potential of it
is tremendous. Unfortunately I find level of development & maintenance of
Gstreamer project itself quite poor - the code is horribly complicated for
questionable reasons (it's said to be non-blocking everywhere; I find it bad
excuse for being ridden with subtle bugs and for failures to use custom
pipelines as blocks for higher-level pipelines).

I find such projects as ffmpeg and linux kernel quite well engineered, but
have nothing special to say about them except that they are reasonably well
organized and get better day by day.

For user-interface apps with considerations of high user productivity, I find
such software as readline, tmux, mutt and bunch of other following wise
pattern of extensible and scriptable software: if you want hotkeys, you need a
domain-specific language and bindings must be

    
    
      key: action[, action...]
    

not

    
    
      action: key

~~~
zAy0LfpBZLC8mAC
Just in case anyone were to be led to believe this:

Asterisk's code base is a pile of crap.

It's been getting a bit better over the years, but it still is terrible, tons
of conceptual blunder, protocol implementations are only losely inspired by
the specification, system APIs are used incorrectly, lots of code doesn't
bother with dynamic string lengths, but instead simply truncates strings
arbitrarily if they don't fit into some fixed-size buffer, ...

The only reason it kindof works is because bugs that happen often enough do
end up being fixed at some point, but that's about it. If you know your C and
POSIX APIs and you don't believe me, just go and have a look at the code, I
promise you'll find a bug in less than an hour.

~~~
andrey_utkin
Yes I know Asterisk is ridden with bugs and has very nasty spots at its core
(e.g. "channel cloning" or whatever it is called). It was my job to debug the
code with gdb and valgrind :)

What is still amazing to me is the set of core design concepts which I've
listed - channels, applications... I have a case for comparison here, where
the project is of comparable complexity but all features are bolted-on ad-hoc
without such complexity compartmentalization which Asterisk has.

------
rch
You might find this resource helpful:

[http://aosabook.org/en/index.html](http://aosabook.org/en/index.html)

------
cure
Anything written by djb
([https://en.wikipedia.org/wiki/Daniel_J._Bernstein](https://en.wikipedia.org/wiki/Daniel_J._Bernstein)):
qmail, djbdns, ucspi-tcp, daemontools, etc.

~~~
tjalfi
I'll second the recommendation for djb software.
[http://perl.plover.com/yak/qmail/](http://perl.plover.com/yak/qmail/) has
slides from a presentation about qmail internals.

------
patio11
Discourse is a really solid codebase with some nice patterns (their auth/auth
checking, for example); probably the best OSS Rails app I know. I routinely
answer questions about how the product or API works with 30 seconds of
examination of the code.

~~~
nstart
Thank you! Came here to mention them. Thrilled to see you feel the same way
about the readability of that code base.

------
mr_anich
I've learned quite a bit from reading through the Laravel source -
[https://github.com/illuminate](https://github.com/illuminate)

edit for details: The authors are quite meticulous (notoriously, every comment
in a multi-line comment is 3 characters less than the previous) and stick to
the "convention over configuration" mantra no doubt inspired by Ruby on Rails.
It's interesting to see how they create abstractions to simplify so many
common web dev tasks.

~~~
t20n
I specially like the collection class:
[https://github.com/illuminate/support/blob/master/Collection...](https://github.com/illuminate/support/blob/master/Collection.php)
It's almost as natural language.

------
Cieplak
Erlang OTP

[https://github.com/erlang/otp](https://github.com/erlang/otp)

------
norswap
Many names put out there, but not much substantiation. If you are going to
drop a name, could you explain _why_ it is well written/engineered?

~~~
camus2
One could argue the question is way too vague. What is a "well
written/engineered" software to begin with?

------
vitoc
I like the engineering aspects of VS Code:

[https://github.com/Microsoft/vscode](https://github.com/Microsoft/vscode)

------
SirensOfTitan
xmonad [https://github.com/xmonad/xmonad](https://github.com/xmonad/xmonad)

~~~
kornish
Seconded. For anyone interested in looking at a real-world Haskell codebase,
this is a classic.

------
dfan
The Stockfish chess engine: [https://github.com/official-
stockfish/Stockfish](https://github.com/official-stockfish/Stockfish)

I learned a ridiculous amount from reading the source code to TeX
([https://www.amazon.com/Computers-Typesetting-B-TeX-
Program/d...](https://www.amazon.com/Computers-Typesetting-B-TeX-
Program/dp/0201134373/)) but it is written in a very 1970s style.

------
frunzales
Take a look at PostreSQL.

~~~
scotty79
Sqlite might be good bet too. Especially with engineering. I have their famous
test suite in mind.

------
bungle
I think Lua deserves to be added here:
[https://www.lua.org/source/5.3/](https://www.lua.org/source/5.3/)

------
terrble
Chromium

edit: [https://www.chromium.org/developers/design-
documents](https://www.chromium.org/developers/design-documents)

~~~
c_shu
Isn't Chrome quite buggy and leaky? (for any user of Chrome.) The same goes
for Chromium, right?

------
wazanator
I'm a fan of the underused dlib C++ library[0]. It has a lot of uses and work
transfers cross platform no problem. I know I can do all the work on my Linux
machine then when it comes time to export for Windows just open up a VM
redownload the repo and compile with cmake and it just works

The thing I like about it the most though are the examples which there are for
every feature. The person who wrote it actually understands what I want out of
an example, I want code I can look at and immediately understand what is going
on and why. I want examples I can refer to when mine does not work so I can
compare and see what it is I did wrong. Take the GUI example[1] for instance,
anything that happens that is specific to that example has a comment. It makes
no assumptions about your prior knowledge other then you understand C++.

[0][http://dlib.net/](http://dlib.net/)
[1][http://dlib.net/gui_api_ex.cpp.html](http://dlib.net/gui_api_ex.cpp.html)

------
rkwasny
Redis, the most cleanly written and easily extensible code in C you can find.

------
nickpsecurity
OpenBSD for correctness and avoiding bloat. One of them told me MuPDF was
cleanly coded, too. Rare for PDF readers.

------
cdvonstinkpot
Vyatta's firewall distribution had some documentation which struck me as being
remarkably well-written back in the day. Usage appeared to be well thought
out. Don't know if their code is nice or whatever but if other aspects are any
indication, I'd imagine it too is well done.

------
CoolGuySteve
Quake 2 and Quake 3.

So far, it's the cleanest code I've ever worked with while still being very
self-contained.

------
shanemhansen
I really like reading the go std lib and runtime source.

------
adamnemecek
RxSwift
[https://github.com/ReactiveX/RxSwift](https://github.com/ReactiveX/RxSwift)
is gorgeous. Cycle.js and RxJS also. Chromium + LLVM also (minus the
x-platform parts but those suck everywhere).

------
danielvf
Redis. SQLite.

~~~
Zikes
Second Redis. I don't even know C yet I find it surprisingly easy to follow.

------
xyzzy_plugh
I've always found the git source a pleasure to read.

------
PretzelFisch
Can you really learn from just reading source code? It seems like you need an
annotated guide to understand why this was done along with the how.

~~~
inapis
Somewhat. I just dug through Laravel's source code and the comments helped.

Having an annotated guide for each software would be difficult but all of us
have to start somewhere.

------
pksadiq
If someone care about GTK+-3 and C, I would recommend gnome-recipes[0]. Well
written and smaller codebase (It is still in active development, so not yet
feature complete).

It shall be helpful in learning Object Oriented Design using C programming and
GObject.

[0] [https://wiki.gnome.org/Apps/Recipes](https://wiki.gnome.org/Apps/Recipes)

------
sdfiogjijd
* PostgreSQL

* Varnish Cache

* qmail

* Mercury Programming Language

~~~
greenleafjacob
I'm only familiar with varnish 2.1, but as to that version I think it's a bit
of a stretch to say varnish is well written. VCL is very complicated - just
check the request flow diagram [1]. Some of the documentation is very poor -
try to find out the properties available on beresp for example (you have to
grep the source code [2]), or try to understand the precise function and
implications of grace mode, saint mode, or hit_for_pass. The best redeeming
quality is varnishtest and some of the other tools that are provided.

[1] [http://book.varnish-
software.com/3.0/_images/request.png](http://book.varnish-
software.com/3.0/_images/request.png)

[2] [https://github.com/varnishcache/varnish-
cache/blob/2.1/lib/l...](https://github.com/varnishcache/varnish-
cache/blob/2.1/lib/libvcl/vcc_obj.c)

------
mrmondo
PostgreSQL, Nginx, fio, SublimeText (3), nmap, libcurl (and curl itself),
ffmpeg (parts are also in asm), rsync, XLD, and the list goes on...

Watch out for biases based on how much people like the end product vs how well
it's actually implemented though.

------
thiht
Lua's source code is very nice to read, even if you're not a C guy

------
chromanoid
[http://netty.io](http://netty.io)

[http://infinispan.org](http://infinispan.org)

------
vandyswa
[https://github.com/vandys/vsta](https://github.com/vandys/vsta)

Especially the kernel in src/os

------
ptrptr
Answering the question - that would be Blender.

------
informatimago
Postfix is a good example of a system written in C with separate components
(running in different processes for security).

------
rmu09
[http://www.jclark.com/sp/](http://www.jclark.com/sp/)

------
robertcope
I always thought Postfix was nice. Maybe I'm wrong as I haven't seen it
mentioned.

------
deepnotderp
TensorFlow.

------
juancn
LLVM is a fantastic example of well written C++ code.

------
bobosha
Apache Solr and Tomcat

~~~
guard-of-terra
Solr source code is a mess and sometimes worse than that. Test coverage is
pretty good tho.

------
c_shu
Boost

------
gaze
Newos

------
rokosbasilisk
django

------
throwawaydbfif
Google's Guava library, at least the parts I've seen, is incredibly well
written and organized.

A assume most "standard library" type stuff is where you will find the
cleanest code.

~~~
euyyn
Digging into the code reviews of Guava is impressive: if you've ever felt a
code reviewer was being too strict, that's probably nothing compared to Guava
reviews. And it shows in the quality of the library.

~~~
throwawaydbfif
I implemented a data structure similar to one in guava and I thought my code
was pretty good. I looked at guava source out of curiosity and immediately
refactored my data structure.

I ended up refactoring it again, and the code is still not as clear as guava.

For instance, I noticed they were using an enum for functions and I was like
WTF who does that? Later I decided to make my library serializable so we can
save to disk. Well, turns out that's exactly why they used an enum. My
solution was to make a utility class to wrap the non-serializable objects but
their solution was much clearer and less code

------
kodfodrasz
I have used the mono sourcecode as a goto reference for poorly documented .Net
framework code, because usually it was clean and quality code. Some quick
comparisons with the coreclr and .net reference sources also supported my
impression. (a lot of code is being merged from mono now)

------
apeacox
A *BSD OS

~~~
bch
For example, NetBSD, which also had a book[0] written about it.

[0]
[http://www.spinellis.gr/codereading/](http://www.spinellis.gr/codereading/)

