Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What open source project, in your opinion, has the highest code quality?
458 points by chefqual on Sept 21, 2018 | hide | past | favorite | 286 comments

I hold the source code of Go standard library & base distribution (i.e. compiler, etc.) in very high regard. Especially the standard library is, in my opinion, stunningly easy to read, explore and understand, while at the same time being well thought through, easy to use (great and astonishingly well documented APIs), of very good performance, and with huge amounts of (also well readable!) tests. The compiler (including the runtime library) is noticeably harder to read and understand (especially because of sparse comments and somewhat idiosyncratic naming conventions; that's partly explained by it being constantly in flux). But still doable for a human being, and I guess probably significantly easier than in most modern compilers. (Though I'd love to be proven wrong on this account!)

At the same time, the apparent simplicity should not be mistaken for lack of effort; on the contrary, I feel every line oozes with purpose, practicality, and to-the-point-ness, like a well sharpened knife, or a great piece of art where it's not about that you cannot add more, but that you cannot remove more.

This is one of the great things about many Go libraries; the language is so simple its difficult to overcomplicate a Go project. This makes reading any Go source code, projects, libraries, the stdlib, a joy. The only times I've found Go libraries to be a PITA to read is when they get autogenerated from some other language (protobuf compilations, parts of the compiler that came from C, AWS/GoogleCloud/Azure libraries, etc), but that's to be expected in every language.

Kubernetes is another great example of a project that is so unbelievably complex in its function, it should be completely impenetrable to anyone who isn't a language expert. But, go check it out; its certainly complex and huge, but actually grokable.

I would argue while Kubernetes is a great piece of software, and its definitely practical to go in with relatively little experience and tweak a single line or function, Kubernetes is not easy to grok or reproduce in its entirety for example it has its own implementation of generics and a custom type system [1].

[1]: https://medium.com/@arschles/go-experience-report-generics-i...

I'd agree, but only as far as aesthetics go. When you have to understand the time complexity and runtime characteristics of the standard library sorting algorithms, I think Go does a very bad job - the standard `sort.Sort(data sort.Interface)` will run poorly if the data is already mostly sorted. I expect these kinds of things to be documented properly.

Golang's `sort.Sort(data sort.Interface)` will sort mostly-sorted data in nearly its fastest possible time, because it basically uses median-of-three quicksort, falling back to insertion sort for small partitions. Median-of-three on sorted or nearly-sorted data picks the optimal or nearly optimal partitioning element for quicksort. The code is simple, readable, and well-commented. Moreover, its average and worst-case complexity is documented in the godoc.

In short, your comment is wrong from beginning to end. What led you to believe that anything in it was true?

It's guaranteed to run in O(n log n) time. Currently it uses quicksort with heapsort as a fallback to prevent quicksort's quadratic worst-case time.


I was pretty certain most libraries shuffle then quicksort. No need for documentation. Does go not do this?

JVM and Python use timsort, which is O(N) on mostly sorted data of all kinds.

timsort is full of win, and such a clever approach.

I'm sorry, but Timsort is a bit of a hack. It's a "this seems to work well" algorithm, and it shows. It took 13 years until its claimed running time was finally proven in 2015. The four (originally three) rules for merging sequences from the stack are rather arbitrary. Multiple issues were found well after it was already widely deployed.

Recently, it was also shown that Timsort doesn't optimally use the information it has about runs. As an alternative, powersort was proposed, which seems to outperform Timsort both on randomly ordered inputs as well as inputs with long runs: https://arxiv.org/pdf/1805.04154.pdf

Totally! And there are much better algorithms for the internet routing protocols, very well documented and tested and all, still just sitting on desks under paperweights...

> a bit of a hack

Isn't that most of computing?

Ever heard of Timsort?

People were still finding bugs in common implementations of timsort as of 3 years ago. It's not unreasonable to stick with a somewhat more conservative choice for a core library function until there's more reason to have confidence in the implementations of timsort.

This comment got me looking into timsort bugs. This was a really interesting read: http://envisage-project.eu/proving-android-java-and-python-s...

You know what, I'm not surprised at all.

Didn't one of the most simple algorithms, binary search, suffered of a bug in a standard library (was it Java?) a few years ago? If IIRC it was a corner case, I should check it because I don't recall the details, but it looked robust code.

Edit: I think it's this one https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

If I recall correctly, the bug only applied for arrays/lists of length greater than 2 to the more-than-astronomical. Not something that anyone ever encountered in the wild, because current hardware doesn't have enough memory.

Edit: The article linked in the other comment says the Java dev team didn't even bother to implement the "proper" fix, but merely adjusted how much space is allocated.

Timsort always reminds me of monkeysort ( http://leonid-shevtsov.github.io/monkeysort/ )

Being pretty certain isn't the same as being certain. I really don't care what most libraries do as long as they document what exactly they have chosen to do. Go's `sort.Sort(data Interface)` definitely does not shuffle.

> standard library is, in my opinion, stunningly easy to read

Reading this brought to mind the JDK. All well structured, neatly formatted and well documented. I’ll often just click thru to the source to get the nitty-gritty on a function, I rarely need to consult the actual docs!

not c++ stl. Very obscure.

Some would consider that a virtue

How is hard to read code a virtue. ?

The fact that almost everyone uses the same style and standards through the go tools has made learning easy. I can dip into the most advanced package and make sense of what's going on quickly.

I think one of the things that makes Go library code so easy to read is the lack of generics. Everything you need to understand the code is right in front of you, without the barrier of having to learn new sets of complicated abstractions or worrying that some obscure code in some other file impacts/is invisibly called by the function. With large code bases written in other programming languages, I have to spend an inordinate amount of time studying the code base and object relations before making changes.

For me, code readability is such a high value that, on these grounds alone, I oppose the introduction of generics and hope the current proposals ultimately fail.

As far as I know, parts of the compiler are still code automatically translated from C, so this may be part of the reason.

It is not, the compiler has been pure Go since, I think, v1.4.

That doesn't invalidate the parent comment :) The code is pure Go, but parts of it originate in C by means of automatic translation during development of Go 1.4 (or whichever version it was).

Exactly, here's the tool that's been used: https://github.com/rsc/c2go


and for this reason alone!


    As of version 3.23.0 (2018-04-02), the SQLite library consists of approximately 
    128.9 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in 
    other words, lines of code excluding blank lines and comments.) 

    By comparison, the project has 711 times as much test code and test scripts - 
    91772.0 KSLOC.

Automated testing is useful and good. But I really feel it's reached a lever of fetishisation that is quite concerning.

Testing code is code which needs to be written, read, maintained, refactored. Very often nowadays I have to wade through tests which test nothing useful, except syntax. Even worse, with developers who adopt the mock-everything approach, I often find tests which only verify that the implementation is exactly the one they wrote, which is even worse: it makes refactoring a pain, because, even if you rewrote a method in a better way which produces exactly the results you wanted, the test will fail.

So, the ratio of testing code vs implementation code is a completely wrong proxy for code quality.

EDIT: I'm not criticising SQLite and their code quality - which I never studie - but the idea that you can judge code quality for a project just by the ratio of test code vs implementation code.

They actually have to test to that degree to follow aviation standards (DO-178b [0]) because they're used in aviation equipment.

Dr. Hipp said he started really following it when Android came out and included SQLite and suddenly there were 200M mobile SQLite users finding edge cases: https://youtu.be/Jib2AmRb_rk?t=3413

Lightly edited transcript here:

> It made a huge difference. That that was when Android was just kicking off. In fact Android might not have been publicly announced, but we had been called in to help with getting Android going with SQLite. [Actually], they had been publicly announced and there were a bunch of Android phones out and we were getting flooded with problems coming in from Android.

> I mean it worked great in the lab it worked great in all the testing and then [...] you give it to 200 million people and let them start clicking on their phone all day and suddenly bugs come up. And this is a big problem for us.

> So I started doing following this DO-178b process and it took a good solid year to get us there. Good solid year of 12 hour days, six days a week, I mean we really really pushed but we got it there. And you know, once we got SQLite to the point where it was at that DO-178b level, standard, we still get bugs but you know they're very manageable. They're infrequent and they don't affect nearly as many people.

> So it's been a huge huge thing. If you're writing an application deal ones, you know a website, a DO-178b/a is way overkill, okay? It's just because it's very expensive and very time-consuming, but if you're running an infrastructure thing like SQL, it's the only way to do it.

[0]: https://youtu.be/Jib2AmRb_rk?t=677 "SQLite: The Database at the Edge of the Network with Dr. Richard Hipp"

SQlite is very high quality software, but they use DO-178b "inspired" testing process. As far as I know they don't have version of software that is or can be used in safety critical parts despite their boasting.

They say in their site that:

> Airbus confirms that SQLite is being used in the flight software for the A350 XWB family of aircraft.

Flight software does not imply safety critical parts of avionics. It can be the entertainment system or some logging that is not critical.

Correct. The key word is "inspired". Multiple companies have run a DO-178B cert on SQLite, I am told, but the core developers did not get to participate, and I think the result was level-C or -D.

While all that was happening 10+ years ago, I learned about DO-178B. I have a copy of the DO-178B spec within arms reach. And I found that, unlike most other "quality" standards I have encountered, DO-178B is actually useful for improving quality.

I originally developed the TH3 test suite for SQLite with the idea that I could sell it to companies interested in using SQLite in safety-critical applications, and thereby help pay for the open-source side of SQLite. That plan didn't work out as nobody ever bought it. But TH3 and the discipline of 100% MC/DC testing was and continues to be enormously helpful in keeping bugs out of SQLite, and so TH3 and all the other DO-178B-inspired testing and refactoring of SQLite has turned out to be well worth the thousands of hours of effort invested.

The SQLite project is not 100% DO-178B compliant. We have gotten slack on some of the more mundane paperwork aspects. Also, we aggressively optimize the SQLite code base for performance, whereas in a real safety-critical application the focus would be on extreme simplicity at the cost of reduced performance.

However, if some company does call us tomorrow and says that they want to purchase a complete set of DO-178B/C Level-A certification artifacts from us, I think we could deliver that with a few months of focused effort.

I just bought a copy of DO-178C after reading these posts here and the Wikipedia article on it. $290, but if it's good, it should be worth it, right?

I haven't seen -C only DO-178B, though I'm told there isn't much difference. It is not a page-turner. It took me about a year to really understand it.

Yeah DO-178B gives several levels for software from DALA (highest) to DALE (lowest). If DALA software fails the results are catastrophic if DALE fails there is no effect on the aircraft. Since DALE is usually just test equipment and such they might be at a DALD level. So still requires a lot of testing but not nearly to the level that DALA requires.


I think it's possible that parts of SQLite, for example file format in read only mode and few constant queries are certified as part of some safety critical software.

Hipp's Hwaci consulting company would probably help to do the work, but it has no relation to the SQLite as a library.

Good point. The video I linked to merely says that he was contacted by someone in the aviation space about the standard, which I took to mean that it was used in avionics.

"They actually have to test to that degree to follow aviation standards..."

I didn't know that, and that's very cool.

Makes me think that the name SQLite is misnomer.

While I agree in general I disagree here. If you read about the Sqlite tests you will find that they do test sensibly.

One suite I'm particularly impressed with will run tests from zero bytes with slowly increasing available memory until the program passes. The tests verify that at no point the DB is corrupted by an OOM event.

Just to clarify, I wasn't criticising sqlite, I was criticising the idea of judging their code quality "for this reason alone!" - ie that they have so much test code vs implementation.

Sure! Can't judge a book by its size.

As a heuristic the code versus test-code ratio serves well as an indicator of quality. Just like consistent indentation does. You don't know whether a well-indented program is good. But if the indentation is inconsistent you'll expect worse.

You may not be able to judge code quality by the presence of test code, but you can by its absence.

Agreed on the testing fetish and usually they're not valuable. Uncle Bob wrote about this last year and its something I wish everyone would read https://blog.cleancoder.com/uncle-bob/2017/10/03/TestContrav...

Yes, bad tests are bad. Yes, mocks are bad. Good tests, however, are good.

To expound on that, designing for testability allows you to sidestep the need for mocks almost entirely, and forces you into easier, more reliable and more consistent code. Then when you choose to test it, the tests are simple, straightforward and valuable.

Oh boy, I would give your comment an infinite number of up-votes. Yes, testing has reached fetish-like levels.

Some of the test code I've encountered recently has been more voluminous, complex and has taken more man hours to develop and maintain than the application or library it's assigned to.

For the love of God, develop the damned software! It's either going to work or it's not.

> has taken more man hours to develop and maintain than the application

But that is perfectly normal when developing quality code.

There is no rule that says that test code development should take LESS time.

Certainly different applications have different quality requirements. Perhaps the software you are developing doesn't have that high requirements?

Yeah, yeah. Just venting. The products I work on aren't the most important, but they certainly are quite important and most of the testing infrastructure that has been built thus far has a lot of goofy sh@t in it.

I just don't have a high tolerance for needless complexity and gee-whiz-look-what-I-can-do while the clock is running.

And this:


"SQLite can be configured so that, subject to certain usage constraints detailed below, it is guaranteed to never fail a memory allocation or fragment the heap."

A better comparison would be some sort of defect rate. Does SQLLite manage less defects per line of code per month (or whatever) than PostgreSQL with that test suite?

Is there a distinction between the best codebase and the best test suite? Probably.

Tangent: I always thought SLOC means "significant lines of code". Since "code" is a shortand for "source code", expanding SLOC as "source lines of [source] code" makes little sense, IMO.

"Source lines" includes lines with comments, empty lines and code. Source lines of code... Specifies only lines with code, no comments

I always thought it meant "single lines of code", like if line breaks added for formatting had been removed so every statement had its own line.

And all that is public domain: https://www.sqlite.org/copyright.html

Not all of the test suite is public domain. That's how the authors maintain a monopoly on improvements to the code.

As a point of information, I wasn't using monopoly in a particularly pejorative sense, but more just to explain the situation. Feel free to replace with a less loaded word in your head.

That doesn't mean their code is necessarily nice.

So you measure good code by the number of test lines?

No but it could be a good proxy.

It's like code coverage, though. Zero might indicate a problem, but lots doesn't necessarily mean "good".

Inversely, this is counteracted by the logic to not use Git:


But to be honest, this is mostly fair criticism. Not sure why they need some of those things, but if they need it, then it's fine.

It's explained in more depth here: https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...

Essentially, git is designed for the "bazaar" development model, while Fossil is designed for the "cathedral" one, which is what SQLite uses, being developed by just three guys working very closely together.

Thos is not the reason. The real reasons are explained quite nicely here [0] and I agree with all of them - though I still use Git because of network effects. My alternative of choice would be hg.

[0] https://sqlite.org/whynotgit.html

The most interesting thing (IMO) left out of this page is the fact that D. Richard Hipp is the author of Fossil. This has a particular smell factor to it, not sure if it is a good or bad smell.

how would using a specific vcs improve the code quality?

This was a response to test coverage used as a benchmark. I see them both as an indirect project quality benchmark. Neither is directly representing the actual code quality but does indicate if it is a project that matches my view of a mature project. Sure, others will disagree - but the whole idea of quality in this context is subjective anyway.

There's a lot to admire in OpenBSD but sometimes the peripheral tools they deliver don't work well. The example I know best is OpenNTPD (last link in your comment). It has a bunch of problems, including relatively poor clock discipline compared to other NTP implementations. And it doesn't even try to handle leap seconds. That causes problems on the machine itself which may or may not matter to you, but it's catastrophic if that OpenNTPD serves time to other servers. Unfortunately there's a bunch of OpenNTPD servers in the NTP Pool actively providing bad time. Some details from the 2016 leap second: https://community.ntppool.org/t/leap-second-2017-status/59/1...

Again I mostly admire OpenBSD. But OpenNTPD is not the best example of their work.

Have you had a chance to look at systemd-timesyncd? I'm curious how it stacks up as a NTP client. Was thinking about switching my systems to it (from chrony) as I don't need the NTP server functionality.

This is seriously bad, but I do get why they thought OpenNTPd was necessary (bad/perfectible code in the other implementations).

Maybe I'll check that at home (where I replaced FreeBSD's ntpd with OpenNTPd).

Yeah the stock old ntpd had a lot of unused code and various security problems over the years. It makes sense OpenBSD would replace it. Just a shame they didn't do it completely. I think that describes a lot of OpenBSD tools; you're trading off some functionality for very good security.

There are better NTP implementations now. Chrony is great, it's the default in Ubuntu now. NTPsec is coming along although I haven't tried to use it myself. Also good ol' ntpd is greatly improved.

https://chrony.tuxfamily.org/ https://www.ntpsec.org/

Just wanted to +1 this.

Once had to make some changes to OpenSSH for an internal project and it was surprisingly easy to find the relevant code and make the necessary changes. One of the few times my code worked on the first compile.

for sure not. OpenBSD makes no attempt to use proper performance which is critical for a kernel. there are so many naive ad-hoc data structures and algos, it's a shame to walkthrough.

Meh. Given the recent Spectre, Meltdown, etc. maybe we shouldn't focus on performance first, but security.




OpenBSD's niche is security -- that's the point.

How is performance related to code quality. That makes no sense. If anything, if you had to inline ASM for example, the code would suffer from readability.

using comma seperated string splitting options over normal bits or'ed together in their public API looses all credibility in their engineering abilities. it would not survive any professional code review.

Shouldn’t good performance be one of the goals of good code?

Depends on the design goals. If you want secure code, you'll make it readable. Here's true(1):

A single "noop" in a 755 file.

A C true would be: https://cvsweb.openbsd.org/src/usr.bin/true/true.c?rev=1.1&c...

Here's a much faster true(1) if you need it: https://github.com/coreutils/coreutils/blob/master/src/true....

Is the gnu-coreutils true much faster? I have a hard time seeing how return 0 can be so slow.

I did say “one of the goals”.

I don’t see how those examples are relevant. Why would that last one be faster?

I agree that the OpenBSD code here is good, no more and no less than needed.

I assumed the grandparent was referring to cases where an O(n) algorithm is used where it might be O(log n) or O(1) with just a little more effort. It’s a tradeoff, sure, and in some cases linear searches can work surprisingly well, but in general I think this kind of thing should always be considered in good code.

Micro-optimizations like inline assembly for inner loops may or may not be a good idea, depending on the application. All else being equal, I’d certainly agree that good clean code would not use assembly.

How is the coreutils true faster?

I would expect the openbsd true to be the fastest, it doesn't need to spawn a subshell and it doesn't do more than the posix specification requires (afaik --help/--version should be ignored).

Experience shows it's faster. It's just weird, but it's like that.

  time { for i in $(seq 1 10000); do /path/to/true; done; }

What are you comparing against what there? Two C executables with the same compiler flags on the same OS?


Why? I was able to do substantial changes to the kernel when I was a teenager (late 90s), mostly on my first try. There was no giant wall of abstraction I had to climb over or some huge swath of mutually interacting code I had to comprehend. There was also nothing that required fancy code navigation and the creation of something like the ctags database in order to find out what on earth was happening.

No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.

Nothing had obtuse documentation that tried my patience or required much more than enthusiasm and basic C knowledge.

I did things like got a wireless card working from code written for one with a similar chipset and got various other things like the IrDA transmitter on my laptop at the time to do a slattach and thus work as a primitive wireless network - all in the late 90s.

I likely had no idea what, say, the difference between network byte order and host byte order was at the time or how the 802.11b protocol worked or what a radiotap header was or any of that. The separation of concerns was so good however, that none of that knowledge was actually needed.

Compare that to say, the Qualcomm compatible WWAN I just dealt with over the past few weeks where I needed to have in-depth knowledge of an exhaustive number of things (very specific chipset and network details) to get a basic ipv4 address working. Then I needed to read up on GNSS technology and NMEA data to debug codes over USBmon to get the GPS from the wwan working. Then after I had the qmi kernel modules doing what I wanted and the qmi userland toolsets, I had to write some python scripts to talk to dbus to get the data from the modemmanager that I needed in order to log the GPS. All the maintainers of these pieces were very nice and helpful and I have nothing negative to say. This is just how it usually is these days.

Back then however, I wasn't a good programmer, I was likely pretty terrible in fact but with the NetBSD codebase I was able to knockout whatever I wanted every time, fast, on a 486.

I miss those days.

> No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.

Ah, I see you've also looked at the Linux kernel code.

What's your relation with it nowadays? I'm very curious about NetBSD but never tried it yet. I sincerely wonder what's your opinion on it now, and why you speak about the situation only as "those days" now? :)

I have no idea, haven't kept up with it. I'd recommend 1.x (<=4) any day though, simply for the education alone.

I don't really use it these days because I need systems that future cheap devs can maintain and once you enter userland it takes commitment and time I simply don't have to stay with netbsd.

Debian permits me to usually not have to care and that's pretty invaluable

Its older than some HN posters, but the GPLed DOOM source code was one I liked.

The performance reached by the game was considered impossible until Carmack did show us otherwise. So I expected lots of ASM and weird hacks, especially as compiler optimization wasnt as good as it is today.

Surprise, surprise, the thing was easy to read, easy to get going, easy to port, reasonablye documented . It has shown me what a goog balance between nice code and usable code is.

If you want tho browse: https://github.com/id-Software/DOOM

FYI, "Surprise, surprise" is generally meant sarcastically. I think you mean "Surprisingly".


  I can speak English!  I learn it from a book!

Though not always very good. Thanks for the bugfix.

Actually, Hypeman is using this correctly in my opinion. We all knew the code would be good, it was only him that doubted it. So its not surprising to us, the reader, that the code is good.

You are reading cleaned up source code that only compiles and runs on Linux. That's why it looks nice.

  Many thanks to Bernd Kreimeier for taking the time to clean up the
  project and make sure that it actually works.  Projects tends to rot if
  you leave it alone for a few years, and it takes effort for someone to
  deal with it again.

I didnt have interrnet at the time so I didn't check github 20 years ago ;-)

On the more serous side, i wanted to say something about the TODOs as example of the balance, but couldnt find any. I thought i was confusing with quake, but the cleanup might explain it better.

These 666 forks can't be a coincidence, right?

it seems to be, judging by the fact that its now 667 forks



The DOOM code is so straightforward. You don't ever experience that feeling of having zero understanding of the code when you look into a file.

This is not necessarily about the code, but I've been really impressed for a while by the lodash project and its maintainer's dedication to constantly keep the number of open issues at 0. Any issues get dealt with at record speed, it's quite a sight to see.


JDD, the maintainer, is also incredibly devoted and overall a nice guy to talk to. He has something like 5 years (and counting) of making a commit every single day, including weekends and holidays and sick days. They may not always be world-changing commits, but it still shows an incredible amount of dedication

Not necessarily relevant, but 15% of the issues are labeled "wontfix".

With such a big project, being quick to hand out wontfix isn't necessarily a bad thing. To be honest, seeing as this project is used by a huge part of the… rather diverse JS crowd, 15% wontfix is astoundingly low.

As an open source maintainer myself, that seems like a pretty low percentage.

It's not always a good thing. In the haste of fixing things introspection of root causes may be neglected...

Never noticed this.

Strictly talking about code quality, I will nominate RCP100, which is a small, virtually unknown, now-abandoned routing software written in C [0]. I started programming with C way back in the 90s, and this is one of only two projects I can recall being immediately struck by the beauty of the code (Redis being the other). I know almost nothing about the author but he seems not to want to be known by name. You can browse the source on Github [1], which I uploaded myself, since you can only get a tarball from sourceforge. Anyway, as someone else mentions, C is usually a mess, but RCP100 struck me as beautiful.

[0] http://rcp100.sourceforge.net/

[1] https://github.com/curtiszimmerman/rcp100

Hi Curtiz,

Thanks for uploading RCP100. Your comment is a timely one. I wanted to learn how a router works and is built and was looking for a simpler implementation.

Can you recommend any resources from which I could learn more about network programming, so that I could understand RCP100 code better?


Maybe you just send the guys an email ;)

I actually did send fan mail to the author, heh, thinly-disguised as a courtesy to let them know that I mirrored their project on Github.

I'm surprised no one has mentioned the Linux Kernel!


It is quite clean when you consider the task that it accomplishes.

Being able to compile across multiple architectures/endian-ness,32/64-bit/scale up/down from server/desktop/router/phone while accepting contributions from thousands of people..

It's not clean at all. Thousands of different styles, no single convention on function-naming, etc.

Want a clean kernel, go look at the BSDs.

One thing the Linux kernel has going for it is that there are a lot of books that describe how the various parts work and how to use the various internal interfaces. I can't think of any other open source project that has multiple books on how to contribute.

(Sadly, most of those good kernel books were written in the 90's and early 2000's. I don't know if there are any recent kernel hacking books.)

Going to throw Elixir Lang into the mix.

- The tooling is excellent.

- The code is well-documented and readable.

- The core team committed to never needing to introduce breaking changes.

The Elixir community tends to produce work that is actually considered "Done". An elixir package is not stale when it hasn't seen a commit in a few months. Instead, the feeling is: "It's feature complete and only needs maintenance from here on out."


> The core team committed to never needing to introduce breaking changes.

Is this why Elixir seems to have many different ways of doing the same thing though?

I think that's one reason. The other is that classic erlang (Elixir is built on top of the erlang beam vm) sometimes does things one way but elixir has a more elegant way of doing the same thing, however, in elixir you can still call into erlang libraries to achieve the same thing if that's more familiar to you.

Scikit Learn comes to mind - not just because I can dig into the source code and immediately know what's happening, but also for the stellar documentation that goes above and beyond telling your what the functions do.

For example their Cross Validation documentation is amazing:


When we take the language into consideration, unwound like to mention Redis.

Often codebases written in C are a a mess to understand, a mess to read. The Redis Source Code is understandable even without deep knowledge of C

Yep. I was going to say Redis and SQLite. Both are really well commented. They almost read like a manual.

Although still in beta, I'd like to add BearSSL to the mix of well written and documented C libraries. In particular compared to the OpenSSL "documentation". It's also nice to see an TLS implementation without any memory allocations at all.

> Redis Source Code is understandable even without deep knowledge of C

Came here to say exactly this - Redis is very cleanly written.

I'd prefer if people said why they consider the code good, instead of throwing out a bunch of random projects.

Python: I really like requests, scikit-learn, the Path module from the stardard library, Keras, Django.

C: Redis, SQLite, LUA.

Java: Joda Time, Guava

+1 for requests, I forgot about that one but it's quite good.

Joda Time is one of my all time favourite libraries.

After struggling with JVM stdlib time nonsense, JodaTime was a breath of fresh air and actually made programming with time fun.

Java 8 time module is now considered the replacement for JodaTime for new projects. It is separate from the older Java time libraries, and fixes many of the problems in Joda. Give it a try!

ARM mbed TLS [1], Amazon S2N [2], nginx [3] have a super consistent code style throughout and are prime examples of how C application programming should be done (in my opinion).

[1] https://github.com/ARMmbed/mbedtls

[2] https://github.com/awslabs/s2n

[3] https://github.com/nginx/nginx

+1 for s2n. It's one of the select few C codebases that is actually a pleasure to read.

There are too many in very different domains and languages.

However, I opt for jQuery here. It is one of the greatest examples of how constant refactoring and thoughful usage of design pattern get you a very long way.

If you are designing JavaScript libraries, pls have a look at jQuery. So many great design decisions aka great code quality.

Pushing all dom manipulation through global evals seems like the exact opposite of thoughtful design to me. I have a long list of places where I want to implement strict CSPs, but can’t purely for minor use of jQuery.

Sequel, a database ORM for Ruby: https://github.com/jeremyevans/sequel

The quality of the code is amazing, it's simple to use and even simpler to look through the docs to reason about.

I also want to praise the author of the library (Jeremy Evans), his support through the IRC is second to none, you can talk directly with him pretty much on a daily basis.

And even after 8+ years, the project is still constantly being updated (last commit 4 days ago). I haven't seen too many project of this calibre especially when it is ran mostly by a single person.

Julia. Julia / Julialang is so pedantically tested and the names are pretty meticulously chosen. The algorithms in Base are almost all generic and handle a very wide variety of inputs without catering to them. If you want to learn Julia, along with good software engineering, looking at the Base library is quite recommended.

Did not look to much into it but at least a packager from Alpine Linux does not think Julia's compiler ecosystem is clean/easy to work with: http://lists.alpinelinux.org/alpine-devel/6248.html

But as said, I did not really checked this claim for validness myself...

Julia requires patched versions of things like LLVM in order for all tests to pass because upstreaming bugfixes take time. This has given some Linux package managers an issue since they try to build using system LLVM/OpenBLAS/etc. with the known bugs. I agree this does cause some distribution problems, but as a scientist and mathematician I do like that the standard distribution of Julia uses the most numerically correct versions (as of current knowledge) of the dependencies as it can, and has a test to identify known potential issues. To me this is good practice.

But anyways, I was talking about the Julia Base library and its numerical routines. I just look at the Julia code and don't touch the build systems.

Does assembly count? Prince of Persia's source code (not really open source...): https://github.com/jmechner/Prince-of-Persia-Apple-II

One look at any of the assembly files and you can get a sense of how properly organized the source code is.

Thanks for that! I love looking at historical game code.

I'm a bit surprised nobody mentioned qmail yet: https://cr.yp.to/qmail.html

qmail cheats a bit because it's so simple, that most people end up using something with messy code on top. Not that I don't think it's a sound engineering decision but when comparing it's code cleanliness with other SMTP stacks it needs to be mentioned.

Or djbdns!

djb is a legend

I don’t know about qmail, but postfix sources are really nice.

They're better than most C software of the era, but not better than qmail --- qmail has a better vulnerability record than Postfix does (perhaps because it does less, but that's beside the point).

Granted I haven't read much open source code but when I was working in Flask, I found the source code to be awesomely clear and well-documented. I actually learned quite a bit about Python by reading Flask code. Also, no-one could explain "g" in a way that made sense, but the source code made it obvious. Would recommend reading it if you're into Python at all.


The global request object if I recall properly.

I'd like to suggest;

1. Don't simply list projects.

2. Give some notion of why you're nominating code.

3. A sense of what you consider to be quality.

Enough to spark discussion, inquiry, or comparison. Doesn't have to be much.

This is rudimentay. But affords purchase; https://news.ycombinator.com/item?id=18037815

This does not: https://news.ycombinator.com/item?id=18038047

(Both reference the same project.)

Both Vue and PostgreSQL. Both have great code base. Amazing documentation. And amazing communities.

Nice to see you include / mention docs and community. I believe a code-based product has a UX. That UX is the code (with comments), documentation and community. That UX is your (i.e., a dev / engineer) end to end experience with "the product." It's not simply the code.

Put another way, there's more to a product that's easy and sensible to work with than code quality.

I'm suprised nobody cited TeX from Knuth. It's an absolute standard in quality of implementation, documentation and computer science background. Perhaps unsurpassed.

I definitely admired PostgreSQL's code when I first looked at it.

Projects written in C require a fair amount of care and discipline to be scaled up to larger codebases and teams. PostgreSQL is such a codebase.

I've also seen various parts of Spring's codebase and found all of it to be consistently solid and careful. They take a lot of care to structure carefully and comment immaculately.

Disclosure: I work for Pivotal, which sponsors Spring. Which is why Spring is highly visible in my working life.


Even though it’s a fairly complex transpiler, the authors did a good job modularizing and leaving lots of contextual comments on what each part does.

Also typescript baseline tests are a simple but very effective way to get lots of coverage on the compiler.

I’ve read source code for Babel, typescript, coffeescript and flow. Typescript architecture stands out.

Typescript not only does fascinating things like magical code completion abilities and great tooling for IDEs but their codebase has been an inspiration for me to build better front end code.

I may be a bit biased since I’ve worked at Microsoft before.

I found the TypeScript type checker pretty hard to read through, though it may be my lack of, well, almost any knowledge about type theory. I didn't dig much into the other parts of the codebase however. What parts of it do you enjoy reading?

While submitting a PR, the parser, Lexor and emitter were fairly easy to understand.

LLVM and associated projects such as clang. Bazel is good too. OkHttp and Retrofit by Square.

I've been working with LLVM for a few years and I still find the code difficult to navigate and badly documented. And every single function's argument list is a random jumble of pointers and references (almost all arguments should be references, but many aren't).

Indeed. And it's not just medium to low-level stuff that's not well documented, it's the high-level stuff too. I personally don't mind that much if I have to spend a few minutes to understand something on a a very local scope, but if the bigger picture is unclear, that's quite bad. For LLVM one largely has to grep for a bunch of other users and try to figure it out from that.

While I think it has some clear deficiencies, I found a lot of e.g. the optimization passes in GCC a lot easier to read. It's probably above par, but e.g. https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-ssa... is really well explained imo.

LLVM is remarkable; the domain is both difficult and critical. Still, the code is consistent enough that I can often guess how things work based on what I think would be reasonable!

Don't look how inline assembly is handled between clang and llvm.

The coding standard for variables in LLVM drives me nuts. Both class names and variables names must be upper camel case so if you're lucky the code looks like this:

Analyzer TheAnalyzer;

but more commonly:

Analyzer A;

with A being utterly unhelpful to read many lines later.

The requests codebase is really well written and it has a beautiful api


Tcl! See https://www.tcl.tk/doc/engManual.pdf to start to understand why. (For code written in Tcl itself there's also some proposed conventions: https://www.tcl.tk/doc/styleGuide.pdf)

I really liked the clojure core, I read it quite a lot when learning the language.

I have heard good things about sqlite, and some day, I plan to read it :-)

Dolphin Emulator


We try to keep up, but the truth is that it's a 15 years old C++ codebase implementing some weird hardware in even weirder ways. We're far from where we'd want to be code quality wise -- close to no automating testing infrastructure, code is full of module-level globals, inconsistent conventions, etc.

How would you even test an emulator except manually? It seems like automated website testing, but even worse. I guess screenshots + scripted input?

That seems like it'd be terrible to try to get running reliably.

I've never installed it, or read the code, but their progress-report writeups are fascinating to me.

e.g. The most recent https://dolphin-emu.org/blog/2018/09/01/dolphin-progress-rep...

The JUCE C++ library is very nice: https://github.com/WeAreROLI/JUCE

I’m no C expert so I’m somewhat guessing, to me, PostgreSQL source looks remarkably clean, well structured and nicely commented.

The Quake 3 source was fairly good...

As a C beginner getting into writing larger projects, especially in that sort of context, the quake source has been my reference on how to structure my code.

Oh, this +1. I ported it to another C dialect (test case for the compiler) and found those parts I touched well structured and easy to understand.

I was going to say the GNU version of /bin/false and /bin/true, but I actually took a look at the source and it is terrible.

The original /bin/true is probably the highest quality code ever written, but unfortunately I don’t think the license is OSI approved: http://trillian.mit.edu/~jc/%3B-)/ATT_Copyright_true.html

Gosh, it seems like copyright lawyers will stop at nothing.

The GNU coding style does not help with readability, in my opinion (he said, donning flame-proof underwear)

Python core libraries have great code. You can open pretty much any module and be able to understand the source without much context.

I don't know how you can say this. The standard lib isn't even very pythonic, let alone "great" along other dimensions.

Agreed. Almost every time I've looked deeply into stdlib code I was surprised by how hard to follow it is and how frequently antipatterns are employed. Doubly so for anything near a C module.

I consider the Python stdlib in a similar vein as the C++ stdlib or Boost: Yes, some useful bits in there, but (1) lots of rot (2) you don't want to have your code look anything like it.

The only core library code I needed to look at was namedtuple, which is pretty incomprehensible even with context.

You obviously did not dig in :D There are absolutely terrible parts, would not recommend!

Though core has some bad API due to maintaining backwards compatibility a lot of the third party libraries like requests, Flask have great focus on API design and code quality.

The authors have good quality repos :



I agree with Flask, much more readable code than Django for example. I would also add Django Rest Framework (and Tom Christie) to the list.

I think that django-rest-framwork is one of the best software implementations out there!

Agreed with the rest, I've ended up reading pypy's implementation of some functions sometimes to see how it works after trying CPython first. From the few I've read I'd say pypy looks nice by the way (I'm talking about standard library).

In PHP land where I spend time for work.

Hands down Symfony.

Golang and Kubernetes have been highly regarded as high quality. I particularly found the Golang code for Kubernetes to be well documented and well architected.

Knuth did a good job on TeX, and it has been closely examined for many years since so there are very few bugs.

The TeX language itself, and the logs and error messages of TeX are so bad, that I would hardly believe it.

Are you sure you aren't thinking of LaTeX?

TeX (plain TeX, not LaTeX) has phenomenally good logging and error messages IMO — everything you need is there, each error message comes in a “formal” and “informal” form and points you to exactly the place the error happened, and TeX lets you fix things on-the-fly without restarting the program. All this of course assumes you use TeX the way it is described in the manual (The TeXbook). The experience is opposite with LaTeX, so I find it worth giving up all the convenience of LaTeX just for the wonderful experience with TeX.

As for “the TeX language”, there is no such thing. As Knuth has said many times, TeX is designed for typesetting, not programming. Sure it has macros to save some typing, but if you're writing elaborate programs in it (as is nearly inevitable if you're using LaTeX) you're doing something wrong. Knuth said:

> When I put in the calculation of prime numbers into the TeX manual I was not thinking of this as the way to use TeX. I was thinking, “Oh, by the way, look at this: dogs can stand on their hind legs and TeX can calculate prime numbers.”

But of course LaTeX does every such thing imaginable :-)

More on TeX not being a programming language: https://cstheory.stackexchange.com/a/40282/115

On the TeX error experience: https://news.ycombinator.com/item?id=15734980

Plain TeX is different to TeX:

..."virgin" TeX...knows just primitive commands, no macros. Plain TeX is the set of macros (developed by Knuth) which makes TeX usable in everyday life of a typist. ... The available commands can be classified into primitive commands and macros. ... The "virgin" TeX knows only the primitive commands. ... Formats (plain TeX, LaTeX, etc.) extend TeX's vocabulary by defining macros. ...For example, plain TeX defines macros \item, \rm, \newdimen, \loop, etc. Plain TeX defines about 600 macros.


Yes of course; see this answer I wrote about typesetting with “virgin” TeX: https://tex.stackexchange.com/a/388360/48 (it's not easy). “Virgin” TeX is never (and was never) used by typical users, and is used only by the system administrator (or these days, the people behind the TeX distributions) to pre-load formats (like plain or LaTeX).

Knuth wrote both the TeX program and the “plain” set of macros; when you start `tex` it is with `plain` that it starts up, and The TeXbook describes both the TeX program and the plain format without being careful to distinguish what comes from where (you have to look at Appendix B to see the proper definition of plain.tex), so when we speak of TeX as Knuth intended/imagined it to be used, it is plain TeX that is meant.


To me that would be Appleseed rendering engine.


Even though I can't code C++, I can read it here and understand most of it (besides the maths).

I particularly like reading code from Upspin (upspin.io). Its probably partially because I think the project design is interesting and write go. Regardless, its a great ground up Go project by some of the original Go authors and contributors.

Very well organized code and it feels like they got the project off the ground, fixed bugs for a few months, and now have largely trailed off from maintaining it largely because it just works (I use it) which lends some credibility to their coding style. Of course, I'd like to see the project evolve conceptually, but, right now it does what it says it does reliably for a project that hasn't even cut a single release.

radare2 - https://github.com/radare/radare2/ More GNU than actual GNU sources, more UNIX than the linux kernel. Huge codebase but extremely easy to get involved with, orthogonal design with no compromise on speed. Best codebase I ever encountered

PostgreSQL and Quake3 are good candidates. Both are C codebases which are surprisingly readable even by relative novices.

I think musl libc [1] has good quality code. If anything their build system is great. It makes the code much easier to navigate.

[1] https://www.musl-libc.org

I still think musl overall is quite readable, but my goodness, that switch statement in your second example. What a monster. I didn't think it was possible to be this confusing without the preprocessor.

What's the problem, out loud?

On the JavaScript side I've enjoyed reading the code for Backbone and Underscore, helped also by the awesome in-line documentation. Very easy to see what is going on.

Also big fan of Sidekiq for similar reasons.

BabylonJs is a wonderful clean code. Made to write 3D on the Web. https://www.babylonjs.com/

most people are talking about clean code, good design constructs, but i feel that many are missing the point, we’re talking about code quality here, design is the grit and grind that all developers go through to develop great software, certainly there are better designed software projects out there that leaves them more maintainable and prone to less bugs, but the fact of the matter is that for complicated code, designs go through many iterations and refactorings over time e.g. linux kernel, all software projects have bugs, even well designed or well tested software. but the significance of good testing and good processes are not being highlighted here, unit testing, code coverage, functional testing, end to end testing,scale testing, performance testing, code review, fault injection, debuggability, test automation, static code analysis, etc, i am shocked not to see lots of discussion on these things (aside from the sqlite mention) and testing techniques. probably a more developer friendly crowd here at hn, but testing is a significant and game changing part of what separates developers from great developers.

I like the Laravel framework. It has a clean style to it.


Definitely agree with this. Both the documentation and code are of excellent quality. Others that come to mind are sqlite and zeromq.

my first experience with high quality code was with tge quake2 engine.

i was both amazed by the simplicity of the architecture (a huge single event loop), and the attention to code presentation and indentation.

Interesting to see so many John Carmack projects in this thread. He's a good candidate for "best programmer of all time", if there were such a thing.

Gambit, Chicken, Racket, Chez and Guile Schemes

Twisted. Not only highly organized and sensibly delineated, but also a lot of fun to read - borderline comical at places.

How do you think the asyncio (formerly Tulip) sources compare?

asyncio is more modern, more stylish, and more concrete.

Twisted is more timeless, more patterned, and more self-aware.

I can imagine Twisted's asyncio reactor becoming its default (and the Twisted flow control slowly declining in importance), but Twisted's protocols, control structures, and execution models becoming more popular.

Twisted has undergone a great resurgence in quality engineering since asyncio became more viable - this was surprising to me, but is actually probably reasonably consistent with the way the historical influence of the standard library.

Overall, I think that Twisted is a great project; I almost always reach for it when my python codebase becomes mature enough to need more thoughtful abstractions around network I/O.

Does 'Physically Based Rendering' count? It's a book... which is also source. It was written as only the 2nd work of true 'Literate Programming' that I know of. I believe Knuth wrote a book about TeX which was the first example. But basically it is prose interleaved with source, readable as a book.

Actually, I think early versions (like from pre-1.0 through maybe 1.5 or so) of Docker had some very high quality code and was also very pleasing to look at. It was very clean and super approachable and readable, and I felt sort of like how the NetBSD commenter felt as described in their comment.

"Highest" I don't know, but "code whose quality I look up to", then:

For C: Process Hacker and some similar code that is designed like and written around Windows kernel APIs: https://github.com/processhacker/processhacker/blob/master/p...

For C++: Some of the Boost code, and stuff like it, such as P-Stade Oven: https://github.com/himura/p-stade/blob/master/pstade/pstade/...

For others: (need to look later, I forget)

Can't say I've seen enough to be confident on the best library but redux (https://github.com/reduxjs/redux) is just so simple, and has great, readable/understandable code.

In Dan Abramov's excellent egghead redux course [0] he implements the `createStore` from scratch which is the core of redux, it's simple enough to post here:

  const createStore = (reducer) => {
    let state;
    let listeners = [];

    const getState = () => state;

    const dispatch = (action) => {
        state = reducer(state, action);
        listeners.map(listener => listener());

    const subscribe = (listener) => {
        // unsubscribe
        return () => {
            listeners = listeners.filter(l => l !== listener)

    // populate initial state

    return { getState, dispatch, subscribe };

[0]: https://egghead.io/lessons/react-redux-implementing-store-fr...

After spending about a month of concerted effort pouring through the zlib sources, looking for vulnerabilities, I can say that zlib is the most astonishingly bug-free code I've ever seen. But in the conventional understanding of "code quality", it's pretty bad.

Julian Storer (JUCE library) did a talk on code quality using zlib as an example. Might be interesting to you if you've not seen it already.


I'd have to go with Monocypher. It makes very tasteful use of comments, functions and macros to maximize readability and clarity.


Toybox by Landley(https://github.com/landley/toybox) is probably the best example of a modern c implementation I have ever seen. Surprised no one has mentioned it yet.

I admit I don’t have the knowledge to make my own assessment, but I’ve read some downright poetic praise on djb’s work [1], and more than once.

[1] https://cr.yp.to/

For Rust, many say that the regex crate sets a high standard for excellence: https://github.com/rust-lang/regex

I study codebases as a hobby. I highly recomend Seastar, Folly, Aeron and Disruptor, SQLite, PostgresSQL, LMDB, Tensorflow, Hashicorp’s vault, and the Linux Kernel projects as prime examples of high quality codebases.

For clean C++, I like leveldb (a key-value DB library) and re2 (a regex engine). Random files from each of them:



I really like the source code of prosemirror:


It's not typical js, but very good none the less

The open source code I know from web development has to be fixed with various hacks - PHP and the frontend javascript that goes with it. Therefore the code I know is not 'highest code quality'. If it was 'highest code quality' then I would not know the code.

Therefore the highest code quality is likely to be in projects where I do not have to go under the hood, e.g. the Chromium project where all contributors are vastly more educated and capable than myself.

Libuv is one of my favourites https://github.com/libuv/libuv

Good code bases that inspired larger projects: MINIX, KHTML

with respect to the C++ Language: there was a book published in 1996. Large-Scale C++ Software Design by John Lakos. He's about to publish the second edition of the book while also expanding it's reach to span two volumes.

Anyhow, while we await the publication of that book, John has been working at bloomberg. some of the code written there has been published to github[1]. He's also done a five hour lecture series [2] available on safari-online (paid service) that cover the topics of his book, and introduce the open source bloomberg repo as an example of code written in that style.

I can't offer you a review as I've just found this all myself, but I'll be eagerly studying it along with some of the other items mentioned here.

[1] https://github.com/bloomberg/bde [2]https://www.safaribooksonline.com/videos/large-scale-c-livel...

linux kernel, purely the reasoning being that it’s probably one of the most used pieces of software out there, along those lines, probably the kernel libraries and user libraries like libstdc that are a part of it. i dont know how the linux kernel is tested, but i know that production testing of the kernel on different platforms, at large scale is probably the most used open source in the market.

I would not judge things on aesthetic quality, but simply on results. In general code faces difficulties that grow exponentially with with time, size, and the number of contributors. Millions of lines code, thousands of contributors, decades of development and it's still at the top of its game? In spite of its complete lack of aesthetic appeal, that's the Linux kernel.

High quality code and one of the best APIs i ever used: https://github.com/requests/requests

Best source code layout, architecture, maintainability: https://github.com/rg3/youtube-dl

As far as C++ code goes, the Lean Prover is really well maintained: https://github.com/leanprover/lean

I'd also say GHC is quite good.

And Pandoc as well.

I don't think I can compute enough variables to consider the "highest" though... so the aforementioned are only examples of what I think are good.

Redis. I have to say antirez not only is an amazing engineer but from the way the code is written, you can see he is a very clear thinker.

I hold Redis codebase as an example of what good C code should be. On the other hand opencv codebase as an example of what C could should not be. Opencv codebase is really inconsistent with quite a bit of unreadable spaghetti sauce.

CVEdetails.com lists the number of (reported) vulnerabilities by year for software projects that have a CVE identifier.

Here's bitcoind: https://www.cvedetails.com/product/22744/Bitcoin-Bitcoind.ht...

XMonad window manager written in Haskell.

What do you think about the GHC sources in comparison?

I am surprised no one mentioned SycallaDB(https://github.com/scylladb/scylla) . Redis and SycallaDB have often been pointed out as examples of good codebases to look at for C/C++ Devs.

Gensim : https://github.com/RaRe-Technologies/gensim

[Can't speak for the 'highest' part of the qn, but Gensim upholds very high code quality standards]

I like Spectrum for both their architecture and code quality. Node.js/JavaScript.

https://github.com/withspectrum/spectrum Https://spectrum.chat

Referred by thousands and available since 2004 without one single bug reported in the last decade: http://users.telenet.be/AphexSoft/

It is not yet on Github.

Pretty sure PostgreSQL will have a place at the top quarter of this page.



I prefer the early versions, before it was softened up for the vulgo.

GTKmm. GTK uses GObject to implement inheritance between C structs and it's easy to go wrong when extending. GTKmm wraps GTK in C++. It's a joy to use and is safer.

I like the design philosophy behind BoringSSL:


If some portion of the library is overly complex, look into the use case and delete it wherever possible. It maintains a long-term bound on code complexity, which I quite like.

Edit: a nice explanation on the design philosophy here https://www.imperialviolet.org/2015/10/17/boringssl.html

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact