The language doesn't really matter. I'm trying to see examples of what people consider high quality code (clean, elegant, well-formatted, algorithmically beautiful, etc).
This isn't a github repo - but I think it's an excellent resource and worth mentioning here: The Architecture of Open Source Applications (http://aosabook.org/en/index.html)
edit: I don't see why this is down-voted, the OP asked about examples of high-quality code, structure and design are also a large part of this and it's not hard to find the github-repo for any particular application mentioned in the above book. Furthermore a lot of examples of excellent code are not necessarily easy to approach with the 'open some random files and start reading approach', having a high level overview often helps.
This is an excellent book, but the funny thing is that it only says "what is" and not "why". It took me years of experience and pain to figure out "why".
The Go standard library is throughly documented and implements all sorts of functionality (HTTP, crypto, libc-esque functionality) in a fairly minimal way:
The APIs that you as the programmer are supposed to invoke tend to be well-documented and fairly well-designed.
But there's some parts in there I certainly wouldn't call elegant. For example, if you trace function calls from http.Get() you end up in a lovely function called doFollowingRedirects()[0] which is where the actual request is made inside of a for {} loop.
Be it efficiency, some stylistic tendency I'm unaware of, or some contrived excuse that doesn't make any sense, at the end of the day I'm not likely to look for a function called doFollowingRedirects when I want to see where the initial (and likely only) request is made.
I read `doFollowingRedirects` with an elided comma - that is, 'do, following redirects (as you go)'.
In that context, the function doesn't seem so surprising. If you take a look at the function `Do`[0], you'll notice it calls out to either 'send' or 'doFollowingRedirects' depending on whether it's a request that should 'do, following redirects' or simply 'send a single request'.
Many HTTP libraries I've seen are structured around creating a request object and then handing it to a 'do request' function - the crime here at best seems to be the naming 'doFollowingRedirects', which can be read as 'do the subsequent redirects' which might make it seem odd that the first request was included in the for loop therein.
"High quality" is an strange concept. I would look at code you actually use and rely on - that's the best indication of quality. A lot of critical code deals with inelegant, complex problems correctly and efficiently - I'd consider anything that can be relied on to manage that "high quality", even if it is unclean, inelegant, poorly formatted and algorithmically mundane.
That said, if you want to read elegant code, I'd recommend the stb parser libraries (written in C). They are small self-contained decoders for many common media formats, with excellent documentation:
These libraries are likely insecure, handle many edge-cases incorrectly, implement fewer features, and perform worse than other options. However, they meet your criteria better.
It is not nearly enough for a code to just work and be useful. Code quality is what determines how maintainable it is, how long will it stay relevant, how long will it survive the changing requirements and environment. And it is much harder to get this than just something that (sometimes) work.
Absolutely, by all means look at old code - code that has survived and been useful for a long time. It's either adaptable (Linux) or doesn't need to change or adapt much (TeX).
Do you currently use and rely on software which you expect won't be useful to you in ten years time? I can't think of much personally.
(I do use IDA Pro, which has clearly adapted poorly to changing requirements - it still has scars of the 32-bit to 64-bit transition that get in the way of day-to-day usage. I hope there'll be something better in ten years. Of course, I could buy a cheaper, "higher quality" tool instead, but none of them are as powerful or as useful.)
Question to C programmers: Every time I encounter some C code, it doesn't take long until the obscurely short variable names start popping up from all directions. Is naming variables in this manner considered acceptable by today's "C best practices" or what's the deal with that?
Think of short variable names as pronouns. Like saying "it", except you have maybe 1-4 distinct "it"s. Short variable names work well for things that have common and obvious meanings. "i" for an iterator or loop index (add "j" and "k" if you have nested loops). "fd" for a file descriptor. "n" for a count of something. "p" for a pointer. "ofs" for an offset. Other conventions can apply within a codebase.
Short variable names work less well when you have more than about 4 of them in the same function/block or there is no common convention suggesting what the variable might be from its name alone.
Short variable names are also not good for globals or struct members, because those names are used across many contexts. Combine this convention with a short local variable name and you end up getting things like:
typedef struct {
int retry_count;
} connection_t;
// If "connection_t" is used widely throughout your codebase,
// you get used to seeing a variable "c" that is a connection.
void func(connection_t *c) {
fprintf(stderr, "Retry count: %d\n", c->retry_count);
}
The prevalent opinion* is that it doesn't play well with existing standards to use `_t` in your own code, although personally I find an extension which tells me that it's a type to be very helpful.
Well, as long as the functions are short and the name is descriptive, I always use short names when the meaning is clear. For example strlower, you don't need more than `s` as the parameter name. It will probably be short, so using `len` or even just `l` for the string length should be fine for a ~15 line function.
A tip is to look at the function declaration in the header files. There I, at least, use somewhat longer parameter names for descriptive purposes (along with documentation) but usually use short ones in the function bodies.
So, if your functions are short and to the point (low coupling, high cohesion, in Coding Complete parlance), then I consider short parameter and variable names a good style. But it's just one ingredient in good code.
Unless you're looking at competitive programming, I feel the best programmers make clear code with short names.
Not surprising at all, if you're aware that Kernighan and Pike also wrote a book together (The Practice of Programming) in which they persuasively argued that brevity is an important component of clarity.
Too much of "longer names are better" fetishism adds noise and detracts from overall clarity.
When was the last time you read a math formula that was two million lines long? I don't see why short variable names would be better. With a proper editor you don't even save on typing and I've never seen code with short names that was easier to read than the same code with variable names of proper length that reflect their semantics (not necessarily long).
When are you long variable name fanatics going to understand that it is NOT about typing ;-) It is in fact about READING! People who think longer is better simple do not have a insight into how the human brain process information.
Read up on good user interface design. One of the key points is that texts should be short and the shape of words easily distinguishable. index1, index2, index3 might be more descriptive than i, j, k but the latter have more unique overall shape which makes it easier to identify for a reader. Likewise CoordinateX, CoordinateY, CoordinateZ is harder to read than x, y, z.
How do you like reading the line below:
divideBy(multiply(rocketmass, multiply(rocketvelocity, rocketvelocity), 2)
compared to:
(mv^2)/2
Sure the former describes what the individual variables are, but immediately getting an overview or sense of what is being calculated is harder. But using short variable names doesn't mean you can only use short names. You can mix and match to optimize understanding and clarity.
rocketKineticEnergy = (m*v^2)/2
I follow Rob Pike's advice and use long names for global and seldom used variables and functions while I use short names for locally defined variables and functions. I might also use short names for key concepts frequently used. If your key domain is geomtry then nobody will have problems understanding in context what: x, y, w, h, dx, dy etc means. You don't have to write XCoordinate, YCoordinate, Width, Height, DeltaX, DeltaY.
Maybe if you read my post, you'd see that I specifically mention "variable names of proper length that reflect their semantics (not necessarily long)," so I don't really appreciate being called a "long variable fanatic." I'm asking what the advantage is outside of loop variables which is everyone's favorite example because they can't find any better. Also, outside of geometry or other mathematics fields as most apps are not in that field. I find variable names that are actual, non-abbreviated words easier to read. What are you going to tell me next, that I don't know what's easier for me to read? Oh wait, you just did. Thank you for your condescending manner.
What code are you reading that variables have scope covering millions of lines of code? That's appalling.
As long as variable scopes are kept tightly controlled, length of the overall code-base is irrelevant. Avoid "spooky action at a distance" and you never have to care that there's a variable named s instead of string_for_truncation_html_aware 500,000 lines away from the function in front of you.
For JavaScript, Underscore is very good in terms of clean readability. LoDash is a better library, functionality-wise, but seems to have lost the quality somewhat.
Backbone and the Coffeescript compiler are also good; jashkenas & the contributors did a good job trying to put literary programming techniques to use, IMO.
The Elixir source code is very high quality; same applies to the Ecto library. Elixir code in general, because of the focus on including detailed documentation (including doctests) within the code, tends to be very readable. The way the REPL is set up means the docs for modules or functions can be accessed at any point, so it's good for pragmatic reasons.
I wish there was an edit button for older posts: I meant to qualify that. The actual pragmatic quality may be higher (particularly re terrific speed optimisations, with bitmasks etc), but the cleanness and readability I feel has dropped (some due to extreme modularisation, which is v useful but makes reading source painful). There aren't as many comments, which seems kinda minor except in the context of Jeremy Ashkenas' attempts at literate programming, esp via Docco, which IMO was fairly successful. I still use Underscore source and docs as reference. LoDash I find has pretty awful docs in terms of how they're arranged vs Underscore, but that could just be due to experience.
Lodash source is available as a monolithic build or modules so you can take your pick.
Lodash has plenty of comments (jsdoc, bug fix notes, and implementation explanations). Underscore has stripped many of these leading to devs making the same mistakes or introducing regressions.
Lodash docs are arranged in alphabetic order. Underscore docs aren't and even have thing miscategorized.
Yeah, I've found it the best experience I've ever had with a language in terms of ramping up from zero, and the focus on clean, readable code with documentation built in is a major part of that.
Caveat: came from Ruby, so the syntax was very friendly. But I'd say as an aside that it's the first programming language I've experienced, (coming via a helluva lot of functional JS) that's both functional and completely pragmatic in terms of focussing on what's useful and necessary. Really impressive.
Can't say I agree personally on UI Grid - it's a bit of a mess in some ways, especially in its distribution. It is also a bit of a configuration nightmare.
For high quality JS, I would look to some of the major frameworks out there:
There is a lot one can learn about software design by reading the source code of major libraries, and is far more reliable than any third party library in the ecosystem.
A smaller, more accessible project would be Mithril, very nice, clean code. I haven't looked at the v0.2 code yet, but I can't imagine it deviates much from the quality of v0.1.
I'll note that while the code quality in Rust is pretty good (clean/usually commented/well formatted/tested), it's not exactly idiomatic Rust.
This is because Rust as a language has changed a lot, and the compiler still has old code that was written the "old way" and not updated to use better or more idiomatic alternatives (e.g. elision, if let, etc). Servo is in a similar situation. This is slowly improving as we run clippy on Rust and in occasional manual refactorings (for example when sty was renamed to TypeVariants and ty_foo was renamed to TyFoo to be have the correct capitalization -- the old capitalization was years old), but there still is work to do to make it completely idiomatic.
So if you want to learn how to write idiomatic Rust, I would avoid using the Rust repo as a source. Newer repos and new code in the Rust repo is pretty okay, though.
yegor from yegor256.com
Did an award in 2015[0] for good quality software projects.
I think his evaluation points are sensible:
"Each project must be:
* Open source (in GitHub).
* At least 5,000 lines of code.
* At least one year old.
* Object-oriented (that's the only thing I understand).
The best projects will feature (more about it):
* Strict and visible principles of design.
* Continuous delivery.
* Traceability of changes.
* Self-documented source code.
* Strict rules of code formatting.
What doesn't matter:
* Popularity.
* Programming language.
* Buzz and trends. "
In the end there 158 submissions, 12 finalist and 1 winner.
Check them out.
> Let's get back to class names. When you add the "-er" suffix to your class name, you're immediately turning it into a dumb imperative executor of your will. You do not allow it to think and improvise. You expect it to do exactly what you want — sort, manage, control, print, write, combine, concatenate, etc.
For the curious, the conundrum can be trivially solved via ```apples.min```, or perhaps ```Collections.min(apples)``` in a noun-biased language.
It's not literally flawless (hard to find conceptual entry points, short on comments), but when you get to the massive piles of one-off special functions needed to simulate all of Pokemon's moves and abilities (e.g. https://github.com/Zarel/Pokemon-Showdown/blob/master/data/a... ) it's all much more succinct than I'd have expected. The file I linked expresses more "business logic" in 3300 lines than is contained in the entire 40kiloline codebases of some of the enterprise monstrosities I've worked on.
Code organization could use some work.
A lot of shared mutable state.
Event listeners are subscribed with "on" instead of "once" for one time events... that's all I could find in like 30 seconds... Disqualified
It really makes parts of that codebase dreadfully hard to reason with imo, but at the same time, I appreciate you either do exactly what they did, or you don't support the platform.
Maybe because it's getting a bad wrap these days with people doing dumb, repetitive things with it. But I've always found the code quality to be very good.
I've really enjoyed looking at the GraphQL JavaScript reference implementation source [1]. It is well-commented, consistently formatted and structured, and makes frequent references to the spec. The spec itself [2] is very well written too, and I think the project being spec-driven along with the code quality make it a good example.
It feels like a lot of thought was put in to making the code well-documented and easy to follow for people new to the project.
The Laravel framework [1]. It makes the most of the latest features available in PHP (reflection, traits, namespacing, etc.) and its author, Taylor Otwell, has an almost obsessive attitude with the naming of methods and formatting of the source code. For instance, every multiline comment is exactly three lines long, and every line is three characters shorter than the previous one [2].
Laravel is a pretty bad codebase, making use of reflection or just assuming methods are present instead of interfaces, actively laughing in the face of static typing.
They're also awful at managing issues/bugs, where they'll gladly close issues without explanation, or close an issue when a pull request gets submitted, only to reject the pull request and not re-open the issue.
Fair enough, you seem to have contributed quite a bit to the framework after all [1]. However, don't you think they have gotten better with triaging as of late?
Also, I'm wondering, does your criticism still apply to the latest versions of Laravel or does it stem from frustrations with the pre-5.0 versions? Regarding your comments about the lack of interfaces, it's worth mentioning that at least the core components do have an interface now [2].
But it has lots of statefulness, some of the state is structured as a finite state machines in ways that are easy to break (similar to the "request" module, also a FSM that is incredible easy to break).
A state machine has states and transitions.
Some transitions are valid, others are not.
Therefore when you implement a state machine in code the consequence is sequential coupling.
Imagine a car class: You have StartCar(), Accelerate(), Break(), StopCar(), SwitchGear(). Can you accelerate with your car turned off? no. Can you stop your car twice? No.
So there should be validations in state transitions. Since in this code those are missing, it's possible to arrive to invalid states that can cause undesired behavior.
In the "request" npm module (an ambiguous module name that wastes a lot of my time in a regular basis), you can abort a request that has not started. That causes an exception. It took me a lot of time to find it. It was all because of a broken state machine.
UI Framework built with React and SCSS. The components are nicely modularized, have unit tests, and we put a lot of emphasis on code readability, clear names for things, a scalable folder structure, and intuitive interfaces.
I think the strengths of the C code are: straightforward algorithms without many nonsensical abstraction layers, consistent and clean formatting, sane usage of the preprocessor.
IMO very well written and very well run project. I follow most issues closely just because the discussions re: code reviews, new features, and community questions.
The threshold of what the high code quality is was set long ago and remains far beyond the reach of the mere mortals.
You will hardly find anything even distantly approaching the quality of TeX and Metafont. Not sure if there are mirrors on github, but you can always grab a copy from CTAN.
Have you actually ever read any of the Linux source code? I find it pretty hard to read to be honest. There are barely any comments and the ones that are there are often not very helpful.
> If links to documentation are not in the code, they might as well be nonexistent.
Yeesh. I must either be getting old, or everyone is significantly more busy than I am. I find that I always have time to at least locate any official documentation for the thing that I'm working with.
You'd have to explain to me what you think makes the Linux code of particularly high quality; I don't think it's a good fit for the criteria in the question.
Whilst it may well well engineered as a project of this size, the code I have seen leaves a lot to be desired.
Cases that spring to mind are the mm code and cgroups where, as an outsider, I got the impression of a large codebase where the patch delta is kept to a minimum -- perhaps so as not to introduce new bugs as features are added. But this doesn't necessarily mean the resulting code has a logical structure or layering; it means deciphering it involves as much studying of the Git history as it does of the (sparsely commented) code on the screen.
In contrast, my experiences in the FreeBSD code have been much better, with less spaghetti and even to the point of maintained man pages for key internal functions.
I'm going to propose one of my own[1], since I can do few projects without time limit (for fun) so I did my best to properly program, test and document it. I overcommented on the past but I'm trying to correct it now. Also, all of the new contributors follow the same style so things stay consistent. As an example, let's try to understand the function `.addClass`:
- Just to use it you can see the documentation on the website[2], which is the first contact normally.
- Then when you try to find the relevant files. They are in github's `src/plugins/addclass`, a name that is common and intuitive [3].
- Not only that, but you also got the documentation in that folder in github thanks to the name `readme.md`.
- When opening it [4], the first thing that you note is that the function has a tiny footprint; 8 lines. A brief explanation of what it does and a line explaining what the not-so-intuitive function `this.eacharg()` means.
- Then we use the native `Element.classList` [5] to add classes. Most people familiar with vanilla javascript know it, but otherwise the name is quite descriptive `el.classList.add(name)`
In contrast, while covering quite few other edge cases and older browsers, check jQuery's `.addClass()` code [6]. Or check Zepto.js' `addClass()` code [7]
edit: I don't see why this is down-voted, the OP asked about examples of high-quality code, structure and design are also a large part of this and it's not hard to find the github-repo for any particular application mentioned in the above book. Furthermore a lot of examples of excellent code are not necessarily easy to approach with the 'open some random files and start reading approach', having a high level overview often helps.