I hold the source code of Go standard library & base distribution (i.e. compiler, etc.) in very high regard. Especially the standard library is, in my opinion, stunningly easy to read, explore and understand, while at the same time being well thought through, easy to use (great and astonishingly well documented APIs), of very good performance, and with huge amounts of (also well readable!) tests. The compiler (including the runtime library) is noticeably harder to read and understand (especially because of sparse comments and somewhat idiosyncratic naming conventions; that's partly explained by it being constantly in flux). But still doable for a human being, and I guess probably significantly easier than in most modern compilers. (Though I'd love to be proven wrong on this account!)
At the same time, the apparent simplicity should not be mistaken for lack of effort; on the contrary, I feel every line oozes with purpose, practicality, and to-the-point-ness, like a well sharpened knife, or a great piece of art where it's not about that you cannot add more, but that you cannot remove more.
This is one of the great things about many Go libraries; the language is so simple its difficult to overcomplicate a Go project. This makes reading any Go source code, projects, libraries, the stdlib, a joy. The only times I've found Go libraries to be a PITA to read is when they get autogenerated from some other language (protobuf compilations, parts of the compiler that came from C, AWS/GoogleCloud/Azure libraries, etc), but that's to be expected in every language.
Kubernetes is another great example of a project that is so unbelievably complex in its function, it should be completely impenetrable to anyone who isn't a language expert. But, go check it out; its certainly complex and huge, but actually grokable.
I would argue while Kubernetes is a great piece of software, and its definitely practical to go in with relatively little experience and tweak a single line or function, Kubernetes is not easy to grok or reproduce in its entirety for example it has its own implementation of generics and a custom type system [1].
I'd agree, but only as far as aesthetics go. When you have to understand the time complexity and runtime characteristics of the standard library sorting algorithms, I think Go does a very bad job - the standard `sort.Sort(data sort.Interface)` will run poorly if the data is already mostly sorted. I expect these kinds of things to be documented properly.
Golang's `sort.Sort(data sort.Interface)` will sort mostly-sorted data in nearly its fastest possible time, because it basically uses median-of-three quicksort, falling back to insertion sort for small partitions. Median-of-three on sorted or nearly-sorted data picks the optimal or nearly optimal partitioning element for quicksort. The code is simple, readable, and well-commented. Moreover, its average and worst-case complexity is documented in the godoc.
In short, your comment is wrong from beginning to end. What led you to believe that anything in it was true?
I'm sorry, but Timsort is a bit of a hack. It's a "this seems to work well" algorithm, and it shows. It took 13 years until its claimed running time was finally proven in 2015. The four (originally three) rules for merging sequences from the stack are rather arbitrary. Multiple issues were found well after it was already widely deployed.
Recently, it was also shown that Timsort doesn't optimally use the information it has about runs. As an alternative, powersort was proposed, which seems to outperform Timsort both on randomly ordered inputs as well as inputs with long runs: https://arxiv.org/pdf/1805.04154.pdf
Totally! And there are much better algorithms for the internet routing protocols, very well documented and tested and all, still just sitting on desks under paperweights...
People were still finding bugs in common implementations of timsort as of 3 years ago. It's not unreasonable to stick with a somewhat more conservative choice for a core library function until there's more reason to have confidence in the implementations of timsort.
Didn't one of the most simple algorithms, binary search, suffered of a bug in a standard library (was it Java?) a few years ago? If IIRC it was a corner case, I should check it because I don't recall the details, but it looked robust code.
If I recall correctly, the bug only applied for arrays/lists of length greater than 2 to the more-than-astronomical. Not something that anyone ever encountered in the wild, because current hardware doesn't have enough memory.
Edit: The article linked in the other comment says the Java dev team didn't even bother to implement the "proper" fix, but merely adjusted how much space is allocated.
Being pretty certain isn't the same as being certain. I really don't care what most libraries do as long as they document what exactly they have chosen to do. Go's `sort.Sort(data Interface)` definitely does not shuffle.
> standard library is, in my opinion, stunningly easy to read
Reading this brought to mind the JDK. All well structured, neatly formatted and well documented. I’ll often just click thru to the source to get the nitty-gritty on a function, I rarely need to consult the actual docs!
The fact that almost everyone uses the same style and standards through the go tools has made learning easy. I can dip into the most advanced package and make sense of what's going on quickly.
I think one of the things that makes Go library code so easy to read is the lack of generics. Everything you need to understand the code is right in front of you, without the barrier of having to learn new sets of complicated abstractions or worrying that some obscure code in some other file impacts/is invisibly called by the function. With large code bases written in other programming languages, I have to spend an inordinate amount of time studying the code base and object relations before making changes.
For me, code readability is such a high value that, on these grounds alone, I oppose the introduction of generics and hope the current proposals ultimately fail.
That doesn't invalidate the parent comment :) The code is pure Go, but parts of it originate in C by means of automatic translation during development of Go 1.4 (or whichever version it was).
As of version 3.23.0 (2018-04-02), the SQLite library consists of approximately
128.9 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in
other words, lines of code excluding blank lines and comments.)
By comparison, the project has 711 times as much test code and test scripts -
91772.0 KSLOC.
Automated testing is useful and good. But I really feel it's reached a lever of fetishisation that is quite concerning.
Testing code is code which needs to be written, read, maintained, refactored. Very often nowadays I have to wade through tests which test nothing useful, except syntax. Even worse, with developers who adopt the mock-everything approach, I often find tests which only verify that the implementation is exactly the one they wrote, which is even worse: it makes refactoring a pain, because, even if you rewrote a method in a better way which produces exactly the results you wanted, the test will fail.
So, the ratio of testing code vs implementation code is a completely wrong proxy for code quality.
EDIT: I'm not criticising SQLite and their code quality - which I never studie - but the idea that you can judge code quality for a project just by the ratio of test code vs implementation code.
They actually have to test to that degree to follow aviation standards (DO-178b [0]) because they're used in aviation equipment.
Dr. Hipp said he started really following it when Android came out and included SQLite and suddenly there were 200M mobile SQLite users finding edge cases: https://youtu.be/Jib2AmRb_rk?t=3413
Lightly edited transcript here:
> It made a huge difference. That that was when Android was just kicking off. In fact Android might not have been publicly announced, but we had been called in to help with getting Android going with SQLite. [Actually], they had been publicly announced and there were a bunch of Android phones out and we were getting flooded with problems coming in from Android.
> I mean it worked great in the lab it worked great in all the testing and then [...] you give it to 200 million people and let them start clicking on their phone all day and suddenly bugs come up. And this is a big problem for us.
> So I started doing following this DO-178b process and it took a good solid year to get us there. Good solid year of 12 hour days, six days a week, I mean we really really pushed but we got it there. And you know, once we got SQLite to the point where it was at that DO-178b level, standard, we still get bugs but you know they're very manageable. They're infrequent and they don't affect nearly as many people.
> So it's been a huge huge thing. If you're writing an application deal ones, you know a website, a DO-178b/a is way overkill, okay? It's just because it's very expensive and very time-consuming, but if you're running an infrastructure thing like SQL, it's the only way to do it.
SQlite is very high quality software, but they use DO-178b "inspired" testing process. As far as I know they don't have version of software that is or can be used in safety critical parts despite their boasting.
They say in their site that:
> Airbus confirms that SQLite is being used in the flight software for the A350 XWB family of aircraft.
Flight software does not imply safety critical parts of avionics. It can be the entertainment system or some logging that is not critical.
Correct. The key word is "inspired". Multiple companies have run a DO-178B cert on SQLite, I am told, but the core developers did not get to participate, and I think the result was level-C or -D.
While all that was happening 10+ years ago, I learned about DO-178B. I have a copy of the DO-178B spec within arms reach. And I found that, unlike most other "quality" standards I have encountered, DO-178B is actually useful for improving quality.
I originally developed the TH3 test suite for SQLite with the idea that I could sell it to companies interested in using SQLite in safety-critical applications, and thereby help pay for the open-source side of SQLite. That plan didn't work out as nobody ever bought it. But TH3 and the discipline of 100% MC/DC testing was and continues to be enormously helpful in keeping bugs out of SQLite, and so TH3 and all the other DO-178B-inspired testing and refactoring of SQLite has turned out to be well worth the thousands of hours of effort invested.
The SQLite project is not 100% DO-178B compliant. We have gotten slack on some of the more mundane paperwork aspects. Also, we aggressively optimize the SQLite code base for performance, whereas in a real safety-critical application the focus would be on extreme simplicity at the cost of reduced performance.
However, if some company does call us tomorrow and says that they want to purchase a complete set of DO-178B/C Level-A certification artifacts from us, I think we could deliver that with a few months of focused effort.
Yeah DO-178B gives several levels for software from DALA (highest) to DALE (lowest). If DALA software fails the results are catastrophic if DALE fails there is no effect on the aircraft. Since DALE is usually just test equipment and such they might be at a DALD level. So still requires a lot of testing but not nearly to the level that DALA requires.
I think it's possible that parts of SQLite, for example file format in read only mode and few constant queries are certified as part of some safety critical software.
Hipp's Hwaci consulting company would probably help to do the work, but it has no relation to the SQLite as a library.
Good point. The video I linked to merely says that he was contacted by someone in the aviation space about the standard, which I took to mean that it was used in avionics.
While I agree in general I disagree here. If you read about the Sqlite tests you will find that they do test sensibly.
One suite I'm particularly impressed with will run tests from zero bytes with slowly increasing available memory until the program passes. The tests verify that at no point the DB is corrupted by an OOM event.
Just to clarify, I wasn't criticising sqlite, I was criticising the idea of judging their code quality "for this reason alone!" - ie that they have so much test code vs implementation.
As a heuristic the code versus test-code ratio serves well as an indicator of quality. Just like consistent indentation does. You don't know whether a well-indented program is good. But if the indentation is inconsistent you'll expect worse.
Yes, bad tests are bad. Yes, mocks are bad. Good tests, however, are good.
To expound on that, designing for testability allows you to sidestep the need for mocks almost entirely, and forces you into easier, more reliable and more consistent code. Then when you choose to test it, the tests are simple, straightforward and valuable.
Oh boy, I would give your comment an infinite number of up-votes. Yes, testing has reached fetish-like levels.
Some of the test code I've encountered recently has been more voluminous, complex and has taken more man hours to develop and maintain than the application or library it's assigned to.
For the love of God, develop the damned software! It's either going to work or it's not.
Yeah, yeah. Just venting. The products I work on aren't the most important, but they certainly are quite important and most of the testing infrastructure that has been built thus far has a lot of goofy sh@t in it.
I just don't have a high tolerance for needless complexity and gee-whiz-look-what-I-can-do while the clock is running.
"SQLite can be configured so that, subject to certain usage constraints detailed below, it is guaranteed to never fail a memory allocation or fragment the heap."
A better comparison would be some sort of defect rate. Does SQLLite manage less defects per line of code per month (or whatever) than PostgreSQL with that test suite?
Is there a distinction between the best codebase and the best test suite? Probably.
Tangent: I always thought SLOC means "significant lines of code". Since "code" is a shortand for "source code", expanding SLOC as "source lines of [source] code" makes little sense, IMO.
As a point of information, I wasn't using monopoly in a particularly pejorative sense, but more just to explain the situation. Feel free to replace with a less loaded word in your head.
Essentially, git is designed for the "bazaar" development model, while Fossil is designed for the "cathedral" one, which is what SQLite uses, being developed by just three guys working very closely together.
Thos is not the reason. The real reasons are explained quite nicely here [0] and I agree with all of them - though I still use Git because of network effects. My alternative of choice would be hg.
The most interesting thing (IMO) left out of this page is the fact that D. Richard Hipp is the author of Fossil. This has a particular smell factor to it, not sure if it is a good or bad smell.
This was a response to test coverage used as a benchmark. I see them both as an indirect project quality benchmark. Neither is directly representing the actual code quality but does indicate if it is a project that matches my view of a mature project. Sure, others will disagree - but the whole idea of quality in this context is subjective anyway.
There's a lot to admire in OpenBSD but sometimes the peripheral tools they deliver don't work well. The example I know best is OpenNTPD (last link in your comment). It has a bunch of problems, including relatively poor clock discipline compared to other NTP implementations. And it doesn't even try to handle leap seconds. That causes problems on the machine itself which may or may not matter to you, but it's catastrophic if that OpenNTPD serves time to other servers. Unfortunately there's a bunch of OpenNTPD servers in the NTP Pool actively providing bad time. Some details from the 2016 leap second: https://community.ntppool.org/t/leap-second-2017-status/59/1...
Again I mostly admire OpenBSD. But OpenNTPD is not the best example of their work.
Have you had a chance to look at systemd-timesyncd? I'm curious how it stacks up as a NTP client. Was thinking about switching my systems to it (from chrony) as I don't need the NTP server functionality.
Yeah the stock old ntpd had a lot of unused code and various security problems over the years. It makes sense OpenBSD would replace it. Just a shame they didn't do it completely. I think that describes a lot of OpenBSD tools; you're trading off some functionality for very good security.
There are better NTP implementations now. Chrony is great, it's the default in Ubuntu now. NTPsec is coming along although I haven't tried to use it myself. Also good ol' ntpd is greatly improved.
Once had to make some changes to OpenSSH for an internal project and it was surprisingly easy to find the relevant code and make the necessary changes. One of the few times my code worked on the first compile.
for sure not. OpenBSD makes no attempt to use proper performance which is critical for a kernel. there are so many naive ad-hoc data structures and algos, it's a shame to walkthrough.
How is performance related to code quality. That makes no sense. If anything, if you had to inline ASM for example, the code would suffer from readability.
using comma seperated string splitting options over normal bits or'ed together in their public API looses all credibility in their engineering abilities. it would not survive any professional code review.
I don’t see how those examples are relevant. Why would that last one be faster?
I agree that the OpenBSD code here is good, no more and no less than needed.
I assumed the grandparent was referring to cases where an O(n) algorithm is used where it might be O(log n) or O(1) with just a little more effort. It’s a tradeoff, sure, and in some cases linear searches can work surprisingly well, but in general I think this kind of thing should always be considered in good code.
Micro-optimizations like inline assembly for inner loops may or may not be a good idea, depending on the application. All else being equal, I’d certainly agree that good clean code would not use assembly.
I would expect the openbsd true to be the fastest, it doesn't need to spawn a subshell and it doesn't do more than the posix specification requires (afaik --help/--version should be ignored).
Why? I was able to do substantial changes to the kernel when I was a teenager (late 90s), mostly on my first try. There was no giant wall of abstraction I had to climb over or some huge swath of mutually interacting code I had to comprehend. There was also nothing that required fancy code navigation and the creation of something like the ctags database in order to find out what on earth was happening.
No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.
Nothing had obtuse documentation that tried my patience or required much more than enthusiasm and basic C knowledge.
I did things like got a wireless card working from code written for one with a similar chipset and got various other things like the IrDA transmitter on my laptop at the time to do a slattach and thus work as a primitive wireless network - all in the late 90s.
I likely had no idea what, say, the difference between network byte order and host byte order was at the time or how the 802.11b protocol worked or what a radiotap header was or any of that. The separation of concerns was so good however, that none of that knowledge was actually needed.
Compare that to say, the Qualcomm compatible WWAN I just dealt with over the past few weeks where I needed to have in-depth knowledge of an exhaustive number of things (very specific chipset and network details) to get a basic ipv4 address working. Then I needed to read up on GNSS technology and NMEA data to debug codes over USBmon to get the GPS from the wwan working. Then after I had the qmi kernel modules doing what I wanted and the qmi userland toolsets, I had to write some python scripts to talk to dbus to get the data from the modemmanager that I needed in order to log the GPS. All the maintainers of these pieces were very nice and helpful and I have nothing negative to say. This is just how it usually is these days.
Back then however, I wasn't a good programmer, I was likely pretty terrible in fact but with the NetBSD codebase I was able to knockout whatever I wanted every time, fast, on a 486.
> No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.
Ah, I see you've also looked at the Linux kernel code.
What's your relation with it nowadays? I'm very curious about NetBSD but never tried it yet. I sincerely wonder what's your opinion on it now, and why you speak about the situation only as "those days" now? :)
I have no idea, haven't kept up with it. I'd recommend 1.x (<=4) any day though, simply for the education alone.
I don't really use it these days because I need systems that future cheap devs can maintain and once you enter userland it takes commitment and time I simply don't have to stay with netbsd.
Debian permits me to usually not have to care and that's pretty invaluable
Its older than some HN posters, but the GPLed DOOM source code was one I liked.
The performance reached by the game was considered impossible until Carmack did show us otherwise. So I expected lots of ASM and weird hacks, especially as compiler optimization wasnt as good as it is today.
Surprise, surprise, the thing was easy to read, easy to get going, easy to port, reasonablye documented . It has shown me what a goog balance between nice code and usable code is.
Actually, Hypeman is using this correctly in my opinion. We all knew the code would be good, it was only him that doubted it. So its not surprising to us, the reader, that the code is good.
You are reading cleaned up source code that only compiles and runs on Linux. That's why it looks nice.
Many thanks to Bernd Kreimeier for taking the time to clean up the
project and make sure that it actually works. Projects tends to rot if
you leave it alone for a few years, and it takes effort for someone to
deal with it again.
I didnt have interrnet at the time so I didn't check github 20 years ago ;-)
On the more serous side, i wanted to say something about the TODOs as example of the balance, but couldnt find any. I thought i was confusing with quake, but the cleanup might explain it better.
This is not necessarily about the code, but I've been really impressed for a while by the lodash project and its maintainer's dedication to constantly keep the number of open issues at 0. Any issues get dealt with at record speed, it's quite a sight to see.
JDD, the maintainer, is also incredibly devoted and overall a nice guy to talk to. He has something like 5 years (and counting) of making a commit every single day, including weekends and holidays and sick days. They may not always be world-changing commits, but it still shows an incredible amount of dedication
With such a big project, being quick to hand out wontfix isn't necessarily a bad thing. To be honest, seeing as this project is used by a huge part of the… rather diverse JS crowd, 15% wontfix is astoundingly low.
Strictly talking about code quality, I will nominate RCP100, which is a small, virtually unknown, now-abandoned routing software written in C [0]. I started programming with C way back in the 90s, and this is one of only two projects I can recall being immediately struck by the beauty of the code (Redis being the other). I know almost nothing about the author but he seems not to want to be known by name. You can browse the source on Github [1], which I uploaded myself, since you can only get a tarball from sourceforge. Anyway, as someone else mentions, C is usually a mess, but RCP100 struck me as beautiful.
Thanks for uploading RCP100. Your comment is a timely one. I wanted to learn how a router works and is built and was looking for a simpler implementation.
Can you recommend any resources from which I could learn more about network programming, so that I could understand RCP100 code better?
It is quite clean when you consider the task that it accomplishes.
Being able to compile across multiple architectures/endian-ness,32/64-bit/scale up/down from server/desktop/router/phone while accepting contributions from thousands of people..
One thing the Linux kernel has going for it is that there are a lot of books that describe how the various parts work and how to use the various internal interfaces. I can't think of any other open source project that has multiple books on how to contribute.
(Sadly, most of those good kernel books were written in the 90's and early 2000's. I don't know if there are any recent kernel hacking books.)
- The core team committed to never needing to introduce breaking changes.
The Elixir community tends to produce work that is actually considered "Done". An elixir package is not stale when it hasn't seen a commit in a few months. Instead, the feeling is: "It's feature complete and only needs maintenance from here on out."
I think that's one reason. The other is that classic erlang (Elixir is built on top of the erlang beam vm) sometimes does things one way but elixir has a more elegant way of doing the same thing, however, in elixir you can still call into erlang libraries to achieve the same thing if that's more familiar to you.
Scikit Learn comes to mind - not just because I can dig into the source code and immediately know what's happening, but also for the stellar documentation that goes above and beyond telling your what the functions do.
For example their Cross Validation documentation is amazing:
Although still in beta, I'd like to add BearSSL to the mix of well written and documented C libraries. In particular compared to the OpenSSL "documentation". It's also nice to see an TLS implementation without any memory allocations at all.
Java 8 time module is now considered the replacement for JodaTime for new projects. It is separate from the older Java time libraries, and fixes many of the problems in Joda. Give it a try!
ARM mbed TLS [1], Amazon S2N [2], nginx [3] have a super consistent code style throughout and are prime examples of how C application programming should be done (in my opinion).
There are too many in very different domains and languages.
However, I opt for jQuery here. It is one of the greatest examples of how constant refactoring and thoughful usage of design pattern get you a very long way.
If you are designing JavaScript libraries, pls have a look at jQuery. So many great design decisions aka great code quality.
Pushing all dom manipulation through global evals seems like the exact opposite of thoughtful design to me. I have a long list of places where I want to implement strict CSPs, but can’t purely for minor use of jQuery.
The quality of the code is amazing, it's simple to use and even simpler to look through the docs to reason about.
I also want to praise the author of the library (Jeremy Evans), his support through the IRC is second to none, you can talk directly with him pretty much on a daily basis.
And even after 8+ years, the project is still constantly being updated (last commit 4 days ago). I haven't seen too many project of this calibre especially when it is ran mostly by a single person.
Julia. Julia / Julialang is so pedantically tested and the names are pretty meticulously chosen. The algorithms in Base are almost all generic and handle a very wide variety of inputs without catering to them. If you want to learn Julia, along with good software engineering, looking at the Base library is quite recommended.
Julia requires patched versions of things like LLVM in order for all tests to pass because upstreaming bugfixes take time. This has given some Linux package managers an issue since they try to build using system LLVM/OpenBLAS/etc. with the known bugs. I agree this does cause some distribution problems, but as a scientist and mathematician I do like that the standard distribution of Julia uses the most numerically correct versions (as of current knowledge) of the dependencies as it can, and has a test to identify known potential issues. To me this is good practice.
But anyways, I was talking about the Julia Base library and its numerical routines. I just look at the Julia code and don't touch the build systems.
qmail cheats a bit because it's so simple, that most people end up using something with messy code on top. Not that I don't think it's a sound engineering decision but when comparing it's code cleanliness with other SMTP stacks it needs to be mentioned.
They're better than most C software of the era, but not better than qmail --- qmail has a better vulnerability record than Postfix does (perhaps because it does less, but that's beside the point).
Granted I haven't read much open source code but when I was working in Flask, I found the source code to be awesomely clear and well-documented. I actually learned quite a bit about Python by reading Flask code. Also, no-one could explain "g" in a way that made sense, but the source code made it obvious. Would recommend reading it if you're into Python at all.
Nice to see you include / mention docs and community. I believe a code-based product has a UX. That UX is the code (with comments), documentation and community. That UX is your (i.e., a dev / engineer) end to end experience with "the product." It's not simply the code.
Put another way, there's more to a product that's easy and sensible to work with than code quality.
I'm suprised nobody cited TeX from Knuth. It's an absolute standard in quality of implementation, documentation and computer science background. Perhaps unsurpassed.
I definitely admired PostgreSQL's code when I first looked at it.
Projects written in C require a fair amount of care and discipline to be scaled up to larger codebases and teams. PostgreSQL is such a codebase.
I've also seen various parts of Spring's codebase and found all of it to be consistently solid and careful. They take a lot of care to structure carefully and comment immaculately.
Disclosure: I work for Pivotal, which sponsors Spring. Which is why Spring is highly visible in my working life.
Even though it’s a fairly complex transpiler, the authors did a good job modularizing and leaving lots of contextual comments on what each part does.
Also typescript baseline tests are a simple but very effective way to get lots of coverage on the compiler.
I’ve read source code for Babel, typescript, coffeescript and flow. Typescript architecture stands out.
Typescript not only does fascinating things like magical code completion abilities and great tooling for IDEs but their codebase has been an inspiration for me to build better front end code.
I may be a bit biased since I’ve worked at Microsoft before.
I found the TypeScript type checker pretty hard to read through, though it may be my lack of, well, almost any knowledge about type theory. I didn't dig much into the other parts of the codebase however. What parts of it do you enjoy reading?
I've been working with LLVM for a few years and I still find the code difficult to navigate and badly documented. And every single function's argument list is a random jumble of pointers and references (almost all arguments should be references, but many aren't).
Indeed. And it's not just medium to low-level stuff that's not well documented, it's the high-level stuff too. I personally don't mind that much if I have to spend a few minutes to understand something on a a very local scope, but if the bigger picture is unclear, that's quite bad. For LLVM one largely has to grep for a bunch of other users and try to figure it out from that.
LLVM is remarkable; the domain is both difficult and critical. Still, the code is consistent enough that I can often guess how things work based on what I think would be reasonable!
The coding standard for variables in LLVM drives me nuts. Both class names and variables names must be upper camel case so if you're lucky the code looks like this:
Analyzer TheAnalyzer;
but more commonly:
Analyzer A;
with A being utterly unhelpful to read many lines later.
We try to keep up, but the truth is that it's a 15 years old C++ codebase implementing some weird hardware in even weirder ways. We're far from where we'd want to be code quality wise -- close to no automating testing infrastructure, code is full of module-level globals, inconsistent conventions, etc.
As a C beginner getting into writing larger projects, especially in that sort of context, the quake source has been my reference on how to structure my code.
Agreed. Almost every time I've looked deeply into stdlib code I was surprised by how hard to follow it is and how frequently antipatterns are employed. Doubly so for anything near a C module.
I consider the Python stdlib in a similar vein as the C++ stdlib or Boost: Yes, some useful bits in there, but (1) lots of rot (2) you don't want to have your code look anything like it.
Though core has some bad API due to maintaining backwards compatibility a lot of the third party libraries like requests, Flask have great focus on API design and code quality.
Agreed with the rest, I've ended up reading pypy's implementation of some functions sometimes to see how it works after trying CPython first. From the few I've read I'd say pypy looks nice by the way (I'm talking about standard library).
Golang and Kubernetes have been highly regarded as high quality. I particularly found the Golang code for Kubernetes to be well documented and well architected.
TeX (plain TeX, not LaTeX) has phenomenally good logging and error messages IMO — everything you need is there, each error message comes in a “formal” and “informal” form and points you to exactly the place the error happened, and TeX lets you fix things on-the-fly without restarting the program. All this of course assumes you use TeX the way it is described in the manual (The TeXbook). The experience is opposite with LaTeX, so I find it worth giving up all the convenience of LaTeX just for the wonderful experience with TeX.
As for “the TeX language”, there is no such thing. As Knuth has said many times, TeX is designed for typesetting, not programming. Sure it has macros to save some typing, but if you're writing elaborate programs in it (as is nearly inevitable if you're using LaTeX) you're doing something wrong. Knuth said:
> When I put in the calculation of prime numbers into the TeX manual I was not thinking of this as the way to use TeX. I was thinking, “Oh, by the way, look at this: dogs can stand on their hind legs and TeX can calculate prime numbers.”
But of course LaTeX does every such thing imaginable :-)
..."virgin" TeX...knows just primitive commands, no macros. Plain TeX is the set of macros (developed by Knuth) which makes TeX usable in everyday life of a typist. ... The available commands can be classified into primitive commands and macros. ... The "virgin" TeX knows only the primitive commands. ... Formats (plain TeX, LaTeX, etc.) extend TeX's vocabulary by defining macros. ...For example, plain TeX defines macros \item, \rm, \newdimen, \loop, etc. Plain TeX defines about 600 macros.
Yes of course; see this answer I wrote about typesetting with “virgin” TeX: https://tex.stackexchange.com/a/388360/48 (it's not easy). “Virgin” TeX is never (and was never) used by typical users, and is used only by the system administrator (or these days, the people behind the TeX distributions) to pre-load formats (like plain or LaTeX).
Knuth wrote both the TeX program and the “plain” set of macros; when you start `tex` it is with `plain` that it starts up, and The TeXbook describes both the TeX program and the plain format without being careful to distinguish what comes from where (you have to look at Appendix B to see the proper definition of plain.tex), so when we speak of TeX as Knuth intended/imagined it to be used, it is plain TeX that is meant.
I particularly like reading code from Upspin (upspin.io). Its probably partially because I think the project design is interesting and write go. Regardless, its a great ground up Go project by some of the original Go authors and contributors.
Very well organized code and it feels like they got the project off the ground, fixed bugs for a few months, and now have largely trailed off from maintaining it largely because it just works (I use it) which lends some credibility to their coding style. Of course, I'd like to see the project evolve conceptually, but, right now it does what it says it does reliably for a project that hasn't even cut a single release.
radare2 - https://github.com/radare/radare2/
More GNU than actual GNU sources, more UNIX than the linux kernel. Huge codebase but extremely easy to get involved with, orthogonal design with no compromise on speed. Best codebase I ever encountered
I still think musl overall is quite readable, but my goodness, that switch statement in your second example. What a monster. I didn't think it was possible to be this confusing without the preprocessor.
On the JavaScript side I've enjoyed reading the code for Backbone and Underscore, helped also by the awesome in-line documentation. Very easy to see what is going on.
most people are talking about clean code, good design constructs, but i feel that many are missing the point, we’re talking about code quality here, design is the grit and grind that all developers go through to develop great software, certainly there are better designed software projects out there that leaves them more maintainable and prone to less bugs, but the fact of the matter is that for complicated code, designs go through many iterations and refactorings over time e.g. linux kernel, all software projects have bugs, even well designed or well tested software. but the significance of good testing and good processes are not being highlighted here, unit testing, code coverage, functional testing, end to end testing,scale testing, performance testing, code review, fault injection, debuggability, test automation, static code analysis, etc, i am shocked not to see lots of discussion on these things (aside from the sqlite mention) and testing techniques. probably a more developer friendly crowd here at hn, but testing is a significant and game changing part of what separates developers from great developers.
asyncio is more modern, more stylish, and more concrete.
Twisted is more timeless, more patterned, and more self-aware.
I can imagine Twisted's asyncio reactor becoming its default (and the Twisted flow control slowly declining in importance), but Twisted's protocols, control structures, and execution models becoming more popular.
Twisted has undergone a great resurgence in quality engineering since asyncio became more viable - this was surprising to me, but is actually probably reasonably consistent with the way the historical influence of the standard library.
Overall, I think that Twisted is a great project; I almost always reach for it when my python codebase becomes mature enough to need more thoughtful abstractions around network I/O.
Does 'Physically Based Rendering' count? It's a book... which is also source. It was written as only the 2nd work of true 'Literate Programming' that I know of. I believe Knuth wrote a book about TeX which was the first example. But basically it is prose interleaved with source, readable as a book.
Actually, I think early versions (like from pre-1.0 through maybe 1.5 or so) of Docker had some very high quality code and was also very pleasing to look at. It was very clean and super approachable and readable, and I felt sort of like how the NetBSD commenter felt as described in their comment.
Can't say I've seen enough to be confident on the best library but redux (https://github.com/reduxjs/redux) is just so simple, and has great, readable/understandable code.
In Dan Abramov's excellent egghead redux course [0] he implements the `createStore` from scratch which is the core of redux, it's simple enough to post here:
After spending about a month of concerted effort pouring through the zlib sources, looking for vulnerabilities, I can say that zlib is the most astonishingly bug-free code I've ever seen. But in the conventional understanding of "code quality", it's pretty bad.
Toybox by Landley(https://github.com/landley/toybox) is probably the best example of a modern c implementation I have ever seen. Surprised no one has mentioned it yet.
I study codebases as a hobby. I highly recomend Seastar, Folly, Aeron and Disruptor, SQLite, PostgresSQL, LMDB, Tensorflow, Hashicorp’s vault, and the Linux Kernel projects as prime examples of high quality codebases.
The open source code I know from web development has to be fixed with various hacks - PHP and the frontend javascript that goes with it. Therefore the code I know is not 'highest code quality'. If it was 'highest code quality' then I would not know the code.
Therefore the highest code quality is likely to be in projects where I do not have to go under the hood, e.g. the Chromium project where all contributors are vastly more educated and capable than myself.
with respect to the C++ Language:
there was a book published in 1996. Large-Scale C++ Software Design by John Lakos. He's about to publish the second edition of the book while also expanding it's reach to span two volumes.
Anyhow, while we await the publication of that book, John has been working at bloomberg. some of the code written there has been published to github[1]. He's also done a five hour lecture series [2] available on safari-online (paid service) that cover the topics of his book, and introduce the open source bloomberg repo as an example of code written in that style.
I can't offer you a review as I've just found this all myself, but I'll be eagerly studying it along with some of the other items mentioned here.
linux kernel, purely the reasoning being that it’s probably one of the most used pieces of software out there, along those lines, probably the kernel libraries and user libraries like libstdc that are a part of it. i dont know how the linux kernel is tested, but i know that production testing of the kernel on different platforms, at large scale is probably the most used open source in the market.
I would not judge things on aesthetic quality, but simply on results. In general code faces difficulties that grow exponentially with with time, size, and the number of contributors. Millions of lines code, thousands of contributors, decades of development and it's still at the top of its game? In spite of its complete lack of aesthetic appeal, that's the Linux kernel.
Redis. I have to say antirez not only is an amazing engineer but from the way the code is written, you can see he is a very clear thinker.
I hold Redis codebase as an example of what good C code should be. On the other hand opencv codebase as an example of what C could should not be. Opencv codebase is really inconsistent with quite a bit of unreadable spaghetti sauce.
I am surprised no one mentioned SycallaDB(https://github.com/scylladb/scylla) . Redis and SycallaDB have often been pointed out as examples of good codebases to look at for C/C++ Devs.
GTKmm. GTK uses GObject to implement inheritance between C structs and it's easy to go wrong when extending. GTKmm wraps GTK in C++. It's a joy to use and is safer.
If some portion of the library is overly complex, look into the use case and delete it wherever possible. It maintains a long-term bound on code complexity, which I quite like.
There's a nice introductory lecture series on CPython internals on Youtube that tries to cover how the interpreter works and how the python code maps to bytecode by going trough the cpython source: https://www.youtube.com/watch?v=LhadeL7_EIU
Lua. It's has everything a good C project should have: small size, simple build system, portability by using the simplest constructs and not ifdefs, a clear and well define scope that none dares trespassing.
When I used this library, I was impressed with how their design not only kept their own code clean, but made it incredibly intuitive and fun to write clean code on top of their API. Coworkers also looked at that code years later and went out of their way to give positive reviews of Lua.
- Balanced scorecard
- Bugs per line of code
- Code coverage
- Cohesion
- Comment density[1]
- Connascent software components
- Constructive Cost Model
- Coupling
- Cyclomatic complexity (McCabe's complexity)
- DSQI (design structure quality index)
- Function Points and Automated Function Points, an Object Management Group standard[2]
- Halstead Complexity
- Instruction path length
- Maintainability index
- Number of classes and interfaces[citation needed]
- Number of lines of code
- Number of lines of customer requirements[citation needed]
- Program execution time
- Program load time
- Program size (binary)
- Weighted Micro Function Points
- CISQ automated quality characteristics measures
'''
It's funny to see nobody is even questioning the question.
What does it even hold as a value to be the project of the highest code quality in the world ? How can it exist as a consensus if we can't even agree on best practices ?
If it's for learning purposes, why even look for the ONE project with the HIGHEST quality ? Just go by any GOOD ENOUGH project.
I see this all the time: what's the best editor, the best color scheme, the best font, etc.
How about we just start saying: what's a good enough X for my purpose ?
Just wanted to mention some bias in successful open source projects: they are often structured as a number of similar plug-in pieces, like youtube-dl for different video publishers.
This is great for open source, because you can easily discover and navigate to the part you want, and change it. You might need to understand the plugin interface - or you might not. This flat architecture makes it easy for people to contribute, an important aspect of a successful open source project.
But it's not the ideal architecture for every project. In some cases, a cleverer, harder to understand approach is more elegant, shorter, more efficient, simpler.
Of course... one might argue that ease of understanding is more important than anything else.
At the same time, the apparent simplicity should not be mistaken for lack of effort; on the contrary, I feel every line oozes with purpose, practicality, and to-the-point-ness, like a well sharpened knife, or a great piece of art where it's not about that you cannot add more, but that you cannot remove more.