I love this! It's the "jazz music" of software development. Something which breaks all the "rules" but does so purposefully and explicitly so that it can become better than the "rules" allow.
A naive look at this and my head is screaming that this file is way too big, has way too many branches and nested if statements, has a lot of "pointless comments" that just describe what the line or few lines around it is doing, and has a lot of "logic" in the comments which could quickly become outdated or wrong compared to the actual code.
Yet at the same time, it's probably a hell of a lot easier to maintain and manage than splitting the logic up among tens or hundreds of files, it contains a lot of the inherently complex work it's doing to this file, and it is so well and heavily commented that it should be pretty easy to ensure that any changes also keep the comments up to date (after all, any change without changing the resulting comments should show up like a sore thumb, and will most likely prompt the reviewer to look into it at the very least).
> I love this! It's the "jazz music" of software development. Something which breaks all the "rules" but does so purposefully and explicitly so that it can become better than the "rules" allow.
I kind of see it as the opposite: “space shuttle style” is code that adheres to heavyweight rules that most software development has abandoned in favor of a more improvisational style.
But in either case it illustrates that code style rules are desirable, or not, based on the context; you need to understand what purpose rules serve and what trade-offs they involve to understand which rules to use; there's no one-size-fits-all solution.
I've had this latent thought for a while that I'm finally putting to words:
The complexity goes somewhere.
It's either into lots tests, or it's into something like shuttle style with lots of comments, or it's into a huge QA department, or it's into the type system / DB schema. It could even be going into the org structure!
But something, somewhere is handling the complexity and it is doing so as a partial function to the economic importance of the software to the stakeholders of the software.
In this case, the questionable choice of a "something" to handle the complexity is jarringly at odds with the high economic importance that the comments convey.
> tests
pv_controller.go has 1715 lines. To be generous, we might say half of it is comments. pv_controller_test.go has 359. Hopefully this code is exercised elsewhere in integration tests?
> a huge QA department
That's what you're signing up for when you choose a language that expresses cases mainly using if/then/else, and when you don't feel like testing every case.
> into the type system
Sum types aren't complex, although they're not familiar to everyone in the way i/t/e is. Even a result type (a straightforward example of a sum type, available e.g. in Rust) would simplify a lot of the "if err != nil" boilerplate, and greatly de-indent this code. It would probably also make invalid cases unrepresentable in a few parts of this file, eliminating a few more branches. In fact, a sum type typically requires an exhaustive pattern match, making practices such as their rule "every 'if' statement has a matching 'else'" the default.
My point is, complexity (and economic utility) are not conserved as you choose between these "somewhere"s. A few basic type system features can drastically reduce complexity for the QA team or eliminate comments and cases from the space shuttle.
Still - good on them for this degree of discipline & for so many well-written comments, with clear contracts about what is modified or not. ATBGE.
Monadic Try, biased Either, Some, pattern matching and destructuting... you don’t even need to know category theory to use and understand them in code, and you let the compiler do all the tough work.
This example - despise the perplexing celebrations - is a product of the limits of Go
That's not an argument, that's a truism. You can make working (complex) software using nothing but machine code keyed into volatile memory via front-panel switches.
> This example - despise the perplexing celebrations - is a product of the limits of Go
All coding style is a product of the limitations of the language (well, except when it is coding style dictated by the limitations of a different language applied in a cargo cult fashion out of it's appropriate context); and celebrating the choice of how to approach this problem given that it was being done in this language isn't the same as celebrating the language choice. Go would be pretty close to my last choice for anything, but that doesn't stop me from recognizing this as a thoughtful way to apply Go to this problem.
> pv_controller.go has 1715 lines. To be generous, we might say half of it is comments. pv_controller_test.go has 359. Hopefully this code is exercised elsewhere in integration tests?
The PV subsystem interfaces with external storage systems, so integration/end-to-end tests are much more useful than unit tests (yes, they do exist).
A language that spends its complexity budget well can save complexity from a lot of programs. E.g. if you look at https://philipnilsson.github.io/Badness10k/escaping-hell-wit... , language A could offer all 5 of those ad-hoc solutions (making it a very complex language) and language B could just offer monads (making it a relatively simple language), but both languages would be just as effective in alleviating the complexity of programs written in that language.
And then language C(++) just skips unnecessary monads which are syntactic salt and uses exceptions, while providing the optional type for where you really, really want this behavior. Tradeoffs of course, but these were considered.
The main problem here with this "solution" is that any monad looks like every other monad. You can easily lose context and have to rely on naming conventions, which is quite terrible. You cannot spot what the code is doing at a glance.
And if you mess up, the type system and compiler will spout some completely unhelpful message because it parses everything the same.
Code that does very different things should look different, not identical. (Unlike what Lisp and FP people think.) Just think on why we do not use textual buttons everywhere in UI.
It's a major reason fewer people accept Lisp than could...
> And then language C(++) just skips unnecessary monads which are syntactic salt and uses exceptions
Exceptions are an ad-hoc solution to a sixth problem. You still have the other five. (Ok, it's possible to use exceptions to replace null. But that still leaves the other four).
C does not have exceptions, or any solution to error-checking hell at all. C++ is a notoriously non-simple language. Neither is at all convincing as a counterargument.
> Code that does very different things should look different, not identical.
In a language with monads, code that does things that are specific to error-handling still looks very different from code that does things that are specific to async I/O. But code that does a thing that is fundamentally the same (parametric), such as composition, looks the same. This is the very essence of programming (and indeed of mathematics): "2 + 2" does something that is, on the surface, very different from "3 + 5", yet there is an important underlying commonality. Code that sorts a list of integers is doing something that is, on the surface, very different from sorting a list of strings, but there's an important underlying commonality. Monads just take the same thing one level higher.
> Code that does very different things should look different, not identical.
I think the major point you're missing here is that the whole point of pushing monads as a core abstraction is the way they shift what "different things" means.
This is a misreading, almost diametric, of what I wrote:
Me: "complexity is not conserved as we choose beween implementations"
You: "So basically complexity is conserved"
Maybe we're getting confused by this word "complexity", so let's break it down into two things: complexity of task handled (COTH) and complexity for programmer (CFP). For a given task of a given complexity, COTH is tautologically the same no matter how you choose to implement. The level of CFP, on the other hand, depends on how you decide to handle complexity. Implementing it in assembly language? High CFP. Implementing it with i/t/e, and commenting loudly your intention to handle all cases? Medium CFP. Enforcing these good, exhaustive standards with the type system? Low CFP. Why does CFP differ? Because the tradeoff matters. In particular, my point was that sum types add a tiny bit of complexity to the language, and remove a TON of complexity from the code. It's not a wash, or even a close contest.
(Notice that I don't make any super-radical suggestions, e.g. that their discipline around mutation/purity should be enforced with types or monads instead of exclamatory comments -- in which case, the CFP added to the language might outweigh the CFP alleviated from the typical file. But my point holds: CFP is not conserved)
I don’t think this changes the inherent complexity of the code right? The basic logic of what is being done is still the same. De-indenting code but still having the same complexity, some of which is now abstracted by the language is still complexity. Maybe it somewhat helps the humans reading the code? But don’t you still have to reason about the basic state changes of the system the same way?
the code itself has the same complexity, but you are pushing the workload to the compiler, not to the human. Given the two options I typically prefer the compiler.
Compilers and libraries are written by humans and are probably the most fallible things ever, optimization wise. I'm not talking 1% here. I'd love to see an opposing compiler with an automated proof that it does what it claims.
Well, unless your computer can read and parse high order proofs over code. Not even Haskell can do that. (It can barely parse F types much less optimize them.) The closest I have come to that feature is Isabelle's simplify and even that is limited.
GCC and Clang can do basic bounds proofs at best and otherwise are tough buggy heuristic beasts. Just look at the trackers.
GHC has a slightly easier job but it's paid for in programming complexity. There are hoops in type system you have to jump to get performant code... and even then the code generated its at best meh quality. There is always some joker telling that it could be compiled for SIMD or multithreaded but that never materialized in usable form.
Compilers can only tell if programs are internally consistent, they can't help ensure that they are correct. Compilers don't know about inputs. Compilers don't know if the branch you wrote goes in the correct direction for a given input.
I would say this is the fundamental problem of making software, and the reason that the whole world could learn JavaScript and yet being able to make great software would still be a rare skill and why even CRUD apps tax the minds of smart people.
Making the right component tackle the right amount of the right complexity is just a hard problem, and I think when it looks easy it's only because your options were constrained.
>Tesler's Law, also known as The Law of Conservation of Complexity, states that for any system there is a certain amount of complexity which cannot be reduced.
Cute, but objectively not true.. using the right tool, or right approach can drastically simplify the solution, sometimes even making intractable problems solvable.
I take this more to mean that the logic you're trying to implement has a fixed, non-zero level of complexity (sometimes called "essential" or "inherent" complexity), which forms the complexity floor of your application. On top of that, your implementation adds additional complexity (sometimes called "accidental" or "incidental" complexity), which is not-zero but not fixed.
So, my reading is that in saying "every application has an inherent amount of complexity that cannot be removed or hidden", the law is referring to the essential complexity. Meaning, the law says "some of the complexity is unavoidable in every application" vs. "the amount of complexity is fixed in every application". I do think the name of the law is a little weird, as it implies the latter meaning.
The trouble with Tesler's Law is that it's often difficult to distinguish between essential complexity and incidental complexity. When I face a UX problem that seems like an insurmountable barrier, if I step back and consider many perspectives, I often find a way to change the context so that the problem becomes easy to solve. What feels like essential complexity is often surprisingly incidental.
What I'm objecting to is the notion that there is a "fixed, non-zero level of complexity." Often real innovations in organization and structure allow for fundamentally simpler implementations. I'm reminded of physics: there's an inherent complexity in solving a rotational problem. But introduce polar coordinates and you fundamentally change the game, removing a dimension from your analysis for some problems, and making the solution trivial.
If "conservation of complexity" were universally true then ANY compression would be impossible.
This isn't a dichotomy. My point is that there are clear examples of situations where you aren't just pushing complexity around, but actually achieving great simplifications.
>If "conservation of complexity" were universally true then ANY compression would be impossible.
No it wouldn't. The complexity of a pattern can usually be conserved while reducing its length, but for each pattern there is a limit. This is the entire concept behind the Kolmogorov complexity of a system and any patterns that cannot be reduced any further without removing complexity are at their limit already.
This is also related to the idea that you cannot have a universal compression algorithm.
In the most ideal way, the complexity becomes the code structure. People can only keep track of 7 or so odd things in working memory at a time, we tackle complexity through chunking (abstraction), so that these new 7 things are in turn each 7 more things which are in turn 7 more things (ideally).
Your code structure is not my memory chunking structure and it will interfere.
At a point you run the risk of just overloading memory with way to many names and entities by merciless chunking, not to mention create a file or function maze.
Inlining works surprisingly well because you do not have to memorize things and can just read them. Theoretically it is a tooling problem, but nobody wrote a good enough "inline" tool so instead everyone relies on incomplete and buggy textual descriptions.
I'm not sure what to say, just do the right thing and be more practical than dogmatic I guess? There are situations where it's better not to install an abstraction. There are situations where an abstraction fits super well and is well understood by your dev team.
I had this thought when working with a legacy app the company was migrating to a modern platform. The legacy app was enormously complex and contained huge amounts of business logic. Migrating it didn't get rid of all that business logic. Now, using modern software patterns inside a framework with it's own complexities, it was a lot more organised but perhaps even more complex thanks to business logic now needing to fit inside the frameworks idioms and structures. It was nicer to read through, but no less complex.
I think you are more or less spot on. Personally I'll mention explicitly (one could argue that you already say this above) that well designed libraries and languages helps you by containing some of the inherent complexity while avoiding to add incidental complexity.
> I kind of see it as the opposite: “space shuttle style” is code that adheres to heavyweight rules that most software development has abandoned in favor of a more improvisational style.
Most software other than that written for space shuttles and other seriously critical applications.
A colleague of mine works for a telco, and changes to software that runs on satellites takes months to approve and goes through verification stages that include running on simulators and duplicate hardware that is on the ground.
> probably a hell of a lot easier to maintain and manage than splitting the logic up among tens or hundreds of files
I'm only halfway through John Ousterhout's book Philosophy of Software Design but I think it agrees with you on this -- that smallness-of-file or smallness-of-function is not a target to shoot for because it prevents the things you build from being deep. That you should strive to build modules which have deep functionality and small interfaces and should contain their complexity within them so the users don't have to know that complexity.
I just finished his book yesterday; he has a lot to say about size and comments. For size, your summary is spot-on. I'd only add that he notes overeager splitting of methods and classes makes code involved in a particular abstraction to be no longer in one place, leading developers to constantly jump around files, which makes it more difficult to understand the code and increases the chances of making bugs.
As for comments, this file is essentially Ousterhout taken to the extreme. Still, I think he would have like it, given how critical this file is. In the book, he encourages writing more comments than the current trends would suggest, pointing out that you can't fully express abstractions in code, so all the things the code doesn't contain - the high-level concepts, the rationale, the caveats - should be documented in comments in appropriate places.
Overall, I'm extremely impressed by the book, and its focus on reducing and mitigating complexity.
> I'd only add that he notes overeager splitting of methods and classes makes code involved in a particular abstraction to be no longer in one place, leading developers to constantly jump around files, which makes it more difficult to understand the code and increases the chances of making bugs.
Another way of describing this, that I ran across recently, is that this increases the cognitive load for developers working on the code, and cognitive load is one metric by which code can be measured as "good" or "bad". Things like excessive scrolling and switching between files increases cognitive load.
So heavily-commented code can decrease cognitive load if the programmer can read the comments and the related code block at the same time and if the comments help to explain why the code exists the way it does. Or, heavily-commented code can increase cognitive load if the comments don't accurately describe the code, or if they're so verbose that the programmer has to scroll up and down to digest both the comments and the code together.
Scrolling is fixable by code folding. You've read and understood the part, now you can fold it. It takes some factoring discipline to not end up with a woven structure.
Unlike a single use function, it does not have a name to remember and is in-place - and definitely is not shared so can be assumed to be safe to modify.
Comments would be much nicer if we could still use column-based commenting which unfortunately is not usable in any modern IDE. Reading code and comments side by side tends to work much better than interleaving.
> Another way of describing this, that I ran across recently, is that this increases the cognitive load for developers working on the code, and cognitive load is one metric by which code can be measured as "good" or "bad". Things like excessive scrolling and switching between files increases cognitive load.
Actually that’s one the reasons why he advocates not splitting code (arbitrarily) in the book.
Martin Fowler of the Agile world, and Garret Smith of the Erlang community, are both excellent programmers whom I respect, and they both take the approach of breaking code into lots of extremely small functions.
Having tried that style, I notice that I don't particularly favor it, and for the very reason you site: the code is no longer all in one place.
I've switched to moderately sized methods/functions with comments every few lines. Some say that comments like this are a smell, and that you should refactor the commented section of code into it's own function, but honestly comments are easier to read than method names (and again, there's the benefit of locality).
> I assume everyone who splits code into smaller pieces use modern IDEs that makes it trivial to navigate to functions by clicking them etc.
That's... not the point. Jumping around is. Imagine reading this comment thread on a bizarro-HN, where you only get to see a short camelCased summary like: debunk(this.previousComment), and have to click to open each comment in a new tab. This is how jumping around small functions feel.
> I say this because I'm always astonished by the number of "modern" programmers who refuse to use IDEs.
There are reasons for it. Many languages don't have an IDE. Many are not suitable for one (especially ones closer to Lisp on expressiveness spectrum). IDEs are heavy and often mouse-oriented, and not efficient for reading and editing text. Sometimes (read: Java) they are a crutch to work around the expressive deficiencies of the language.
Mind you, I have nothing but good things to say about IntelliJ. I've spent a lot of time in it even recently, and I pick it up any time I have to do anything in Java. But for everything else, I launch Emacs, because it can handle all other languages well, and has superior editing capabilities.
> Many languages don't have an IDE. Many are not suitable for one (especially ones closer to Lisp on expressiveness spectrum).
This is a minor point of your comment, but I'd like to refute it: Lispers actually often cite IDE integration as one of the great features of Lisp. It may have lost pace a bit with some of the very best modern ones, but Lisps have had "modern" IDEs since about the 70s or 80s, with stuff like jumping to function definitions, finding usages of a function, looking up the docs etc. Even better, this is usually a part of the language runtime (predating most other language servers by quite a bit), so it can even apply to dynamic uses.
There is a tradeoff: Small functions make high-level logic clearly visible and easy to find, at the price of forcing you to jump around when you want to dive into implementation details. Putting everything into one big function lets you follow all the implementation details without jumping, at the price of making you read everything to actually understand what the code is doing.
The latter is the biggest price you can possibly make me pay. Jumping around is a minor inconvenience.
You're describing an unstructured single function.
There's a middle ground - sectioned code, esp. with code folding or similar commented sections. Good IDEs support such a feature so you can hide the code you think you understand to not look at it every time.
You will have to read this code at least once or trust the documentation and name. I found that in "mature" code the latter is a recipe for disaster and days of debugging.
Wasn’t that the purpose of the long comments: To sketch out the high level logic, while keeping all the code in one place. Jumping around, you now have to do that to be sure of the details and the complexity can be hidden in layers of functions. Complexity that often matters.
By choosing the right names for a function or class a lot of jumping around can be prevented. Most IDEs also have a feature (including key combination) to show the documentation of the method/function. The only reason that remains is when you doubt the correctness of the method and need to look at its definition.
The only and critical reason - skipping the check will net you tons of debugging all the time because hidden assumptions are often not documented. (or otherwise visible)
That applies recursively to the function you're just reading :).
I.e. I wouldn't be inside a particular function of a particular module if I didn't have to know something about its implementation. There's a good chance I need to understand all of it at the level of abstraction of the module (often because I'm supposed to change something about it). Making that less painful leads to better and less bug-inducing experience.
Elsewhere[0], 'usrusr brings attention to nested functions, lack of which I see as a huge factor contributing to overeager splitting of code. With nested functions, you can have "best of both worlds" - a function whose implementation is divvied up into well-named pieces, while keeping those same pieces in the correct conceptual place, close to where they're used, and restricted from polluting unrelated code.
I would make no claims to being an exceptional programmer, but fwiw I don't like nested functions - it always takes me a lot longer to reason about what a function is doing, when it has functions defined inside it.
This can depend on the syntax and semantics of nested functions.
Pascal nested procedures are pretty easy to parse and if they're defined before the `var` block then you don't need to worry about non-local state modification (apart from the parameters, but modifying parameters is unusual).
First-class nested functions with variable capture are harder to understand. Nested functions in e.g. JS are more like object construction than function definition, and it's generally expected that such nested functions will be capturing state from the enclosing context.
Standard Pascal permitted outer scope variable access combined with downward funargs - the nested functions could be passed to other functions, but could not be stored in variables or returned as arguments. This reduces the non-local effects, and handily doesn't require a GC to ensure memory safety, since the captured state can stay on the stack, because the nested function won't outlive the frame.
For nested decomposition of a problem, I'm more of a fan of Pascal-style nested procedures than nested function expressions like in JS. Idiomatically, one expects function expressions to capture state, while nested procedures are just more procedural decomposition.
I second the answer about having them before var-block: In python I always follow the style of having nested functions directly after the outer function header, and only going one layer deep. This way the only thing in their scope is their parameters as well as the parameters of the outer function. Using this restriction I find them very useful and concise - much better than having one-off helper functions in the outer module scope.
I have read an extreme counter-opinion in some J or APL article. It said that it is a bad pactice to name small and common functions:
The example was maybe the average function - and the reasoning was, if I recall correctly:
1. The defintion is shorter then the name average ;)
2. Every praticioning programmer will recognise the definition as a common idiom
3. From the definition it is immediately clear how corner cases are handled (e.g. zero length array)
I just leave this here as an example to show, that programming communities/cultures exists with completely different/alien? ideas about what clean code is ;)
Allegedly. And then you suddenly want to write a SIMD version of an average or optimize it... Mass search and replace time? That'd bloat the code a lot.
Common repeated well defined and mostly immutable code is best left as functions. This is why for example in C strcmp exists instead of everyone writing a two-liner - and the specialized variant gives big performance gains.
For Java programming, I definitely use IDE. Even for Erlang programming I use Emacs with EDTS to get a more or less IDE experience.
I'm saying that, even with that convenience, I don't wrapping a line or two in it's own function is more readable than just adding a comment above those two lines within the context of a (somewhat) larger function.
A technique I use quite a bit is to group functionality within a method using `#{` and `#}` to bound the code doing the thing. It gets you the "grouping" idea of lots of methods, but if the code is only used in one place, there is no reason to pull it out into a method.
I like this idea and will try to apply it to our code. Why do you prefer it over splitting the code out in a different method? Is it so you can read everything in one glance? If so, an ‘inline’ feature could be added to IDEs to show definitions inline.
Also, when using applicative or Monadic style programming there is little reason not to split things of in small separate functions and chain them together in the right order.
Yea, it probably doesn't make sense for every coding style. I just find that if I split things apart too much it turns into spaghetti that I have to try to weave my way through to understand what's going on.
There are definitely places where it works out really well though.
I think it might come with experience. I definately gravitate to very small classses/functions, personally prefering classes to fit within a single screen.
Obviously this isn't always achievable.
It comes down to: "What do you want to focus on?" Each drill down should be to a lower level of abstraction. It is seperating the what from the how.
It results in functions/classes that either detail a flow (set of decisions) or that implement actions. For example I generally don't need to know how to read a file from disk where I am trying to decide if I should read a file.
When looking at code you drill down in a very vertical fashion when you want to know "How does it do operation X?" Meaning the operation largely can be understood in isolation. There shouldn't be a lot of context that is required for it. A side-effect is that all the unit tests are very self contained.
This utlimately steers you towards a much more functional and compositional programming style. Certainly it is moving complexity around, but the end result is heavy compartmentalization and limiting the context required to understand any section of code.
I like your description of separating flow from actions, I try to do that too. I want my functions higher up the abstraction stack to read like they are orchestrating black boxes of functionality.
In general I prefer smaller functions. I think the threshold is when they become hard to name. Then I will step back and think about whether it's worthwhile.
I really enjoyed the Ousterhout book as well. It's my current user manual for reviewing pull requests.
Putting my PR reviewer hat on I would say the code in question would pass muster if it were relatively stable so you would not be constantly redoing the comments. I love nice comments but Golang does not give you much help keeping them in sync.
(Weirdly enough I'm working on a PR for persistent volume documentation at VMware today so the code is very apropos.)
Now i'm squarely in frontend web-app development right now which definitely changes things (mainly the complexity is centered around enabling fast changes/additions to the codebase, and not the actual business logic for the most part), but while "deep functionality and small interfaces" sounds good on paper, most of the time giant files with a few functions exported aren't a good way to manage that.
Sure, it solves the problem when viewed from the outside. "users" of the software (users being other devs in this case) get a nice small interface and docs that explain how to use it, but internally it's much harder to work with. Having everything in one file like this without breaking it into "sub modules" for various parts of the module means that you need to almost have a complete understanding of the module before working on it.
In this case, I have a feeling that is a pro not a con. Because this file is so core to the system, and has so much complexity, that breaking it up into smaller parts could cause a dev to feel like they understand it only to find out they don't after it's released. And it means that any devs that truly do understand it top-to-bottom would just waste a lot of time switching around files if it were split up.
Putting it all in the same file here nudges you to really make sure you understand it top to bottom before making any big changes. It's intimidating and scary for a reason, because at its core it is a complex piece of code, and dressing it up in "simple code's clothing" won't help.
Breaking things down into smaller pieces is good when it’s good... but there are drawbacks that I feel are often ignored.
For example, as you say, a small piece of stand alone code implies that someone can do meaningful work on it without understanding the entire context around it. It also implies that it is suitable for reuse. But if it’s both reused liberally and encourages you to keep working on it, it will almost certainly mean that your changes will have unanticipated consequences. So you still can’t get away with being unaware of the context.
When you break things apart you also fossilize that way of approaching the problem, which often makes it more difficult to see orthogonal approaches and refactor towards them later. Instead you keep working within the structure that’s already there, which often leads to concerns being spread out across different modules. Too often people decide on the structure before they even understand the problem.
I think it has similar problems as religiously following the DRY principle. There are so many situations where your code will be much, much worse if you insist on always sticking to DRY.
> When you break things apart you also fossilize that way of approaching the problem, which often makes it more difficult to see orthogonal approaches and refactor towards them later.
This is particularly true on teams where a significant proportion of the developers is reticent to refactor code as they go. It seems that given a developer with a sub-80th (or so)-percentile propensity to refactor, the more broken up the solution is (into smaller functions, methods, classes, modules, etc.), the less likely that developer will be to refactor the solution when an obviously better approach exists.
> In this case, I have a feeling that is a pro not a con. Because this file is so core to the system, and has so much complexity, that breaking it up into smaller parts could cause a dev to feel like they understand it only to find out they don't after it's released.
I think you're on the money here. This was done intentionally, to ensure that devs understand the whole thing before making changes, because the functionality is so critical and so easy to mess up.
Also, breaking single logical things up into smaller files (rather than into a composition of smaller logical things) just leads to unneccessary file hopping and wrecks locality of reference for the dev.
> Having everything in one file like this without breaking it into "sub modules" for various parts of the module means that you need to almost have a complete understanding of the module before working on it.
There's a balance to be struck here; you want to minimize the size of the code a developer has to understand to work on (or with) a given abstraction, but you don't want to split beyond that point, as it only makes the developer jump around files. In my experience (with Java in particular), current development trends involve splitting the code too much.
Java (like many other languages of that generation) suffers from a lack of idiomatic 1:1 visibility. Whenever you split something up in Java it litters a namespace that is much bigger than necessary. Even private is too big when the class is full of tiny methods most of which most will never be meaningful to any of their peers except for that one call site. Sure, you can create inner function objects and with 8+ it's not even completely weird anymore, but that's still a far cry from the nested functions goodness of Pascal style "structured programming".
Agreed. I don't remember much from my brief exposure to Pascal years ago, but spending a good chunk of my programming years in Common Lisp has spoiled me; the simple ability to nest functions is something I sorely missed when working on Java codebases.
One of my favorite features of Haskell is that I can use where to write the auxiliary function definitions after the code that use them, but keep them scoped inside of a single function.
This runs the risk of strongly coupling them to the main binding and making them less reusable. IMHO strong coupling where two bindings are aware of each others' internals should be avoided.
> That you should strive to build modules which have deep functionality and small interfaces...
this. interfaces are not the most important thing in a software but definitely one of the most. you can live with a crappy implementation but your interfaces must cater to the usecase and should only hcange when core premises change.
Haven't read the book. Goes to my Wish List. I completely agree about encapsulation of complexity inside the module.
Similar to UX design principles. Just like google or other successful sites present a very simple interface (in case of google the home page is quite clean) but behind it lies very complex code.
> that smallness-of-file or smallness-of-function is not a target to shoot for
I disagree. Unless you are methodical and know what you are doing (like NASA or the authors of Kubernetes), it is hard to create large functions and files by keeping levels of detail consistent and not repeating code.
How do you test a function that has 100 unique outcomes? How do you safely maintain it to ensure it won't break? How do you even know it's working?
Write automated proofs? Probably best but tooling is junk - it's rarely supported.
Constraint or property based tests as opposed to value tests? (That depends on how unique.)
Usually the only thing that does unique things is a direct mapping, everything else is either an equation or has other preconditions, postconditions and internal or external properties.
It also makes the logic more directly visible and refactoring easier presuming tests are reasonably written.
I agree. My day job is working on code that isn't this level of critical, but also has the characteristic of being low level, both closer to the metal than typical backend code and also called by so much frontend and backend code that if there was such a thing as "even backend-ier code" this would be a good example. If you miss a nuance, a horde of angry developers will show up at your desk the moment the build deploys, and if you don't fix it fast, that turns into a horde of angry customers.
With code that far down in the inner loops of mission critical code, the dominant time cost isn't really writing or even understanding the code, it's ramping up on the nuances of the scenarios you serve, checking the direct effects of your changes, and then checking the second order and third order effects of your changes.
When your code is so far down that every change you make has third order effects, it's a lot easier to maintain code that is meticulously commented and written for maximum thoroughness explicitly baked into the file in front of you, because you're much more likely to miss the nuances if you just finished piecing together ten different files to figure out how the happy path works.
Well said. I feel that 80% of this thread is people talking past each other, with different assumptions about what the code in question is going to do.
I hope you mean x (multiplication) rather than ^ (exponentiation). If we're talking about fanout (each of 10 items of has 10 subitems, each of which has 10 subitems), multiplication is the relevant operation. And 10^(10^10) or (10^10)^10 is a hopelessly, uselessly, inconceivably large number. (Exponentiation is not associative; 3^(3^3)=3^9= 19683 while (3^3)^3=9^3=729.)
I took it to mean something fairly vague about potential super-exponential blowup in the complexity of the states you have to reason about, when you have to reason directly about third-order effects in a code base.
I completely agree. For code that is unavoidably complex, I love this style too.
I am all for code that is concise and whose syntax/naming is expressive, but sometimes comments are necessary to clearly spell out the logic or business use case. Expressive code can only go so far. Well-crafted comments significantly reduce the amount of time required for other developers to dive in and become productive with an unfamiliar code base.
The key is keeping the comments up to date. There's nothing worse than an inaccurate comment. One's code review process must include a review of the comments accompanying the modified lines.
Outdated comments that explain the business use case or purpose are still better than no comments. It gives you background information how the code evolved or what it was supposed to do.
It's probably because reading comments only is worse than reading code without comments, that some devs developed an aversion towards outdated comments and thus comments in general.
Comments are additional information and no source of truth, always take them as that and read code and comments.
Yeah and over the course of 2 or 3 decades of development a lot of software has moved through several different version control,ticketing systems and developers.
So you wind up with files stating an author who no longer works there with an email address that the company used 3 acquisitions ago, a ticket number you aren't even sure what system it was for but you just know it isn't being used anymore and source control history that goes back 5 years out of a total 25 years of development.
I've found it very helpful to rubber duck comments to get into the right why mode. Consider how you'd explain this piece of code to a less experienced guy on the team - and write down exactly that. Why was this added? Why was the old thing in place changed, what broke and had to be changed? Why didn't you do the other obvious thing? And remember - usually you should explain something until it's clear for you, and then one more step.
About five years ago I worked on a codebase with a similar bit of code. It wasn't nearly this big, but it was branchy, procedural, and verbosely commented. I didn't write the initial version but worked on it quite a bit and learned to appreciate the advantages of the style for the nasty bit of logic it implemented. I ended up having to vigorously defend it against another developer's half-cocked attempt at "refactoring", which got deployed more than once (behind my back) and thankfully broke in pretty obvious ways each time.
I warned him at the beginning that he needed to spend a couple of hours wrapping his head around the code before trying to modify it.
I warned him that the code arrived at its current state after long and painful experience.
I warned him that other developers had worked on the code, had exactly his initial reaction, and eventually admitted they couldn't find a way to improve it.
I warned him, after he confidently declared that it just needed to be "more object-oriented," that I had written a lot of object-oriented code, was open to writing it in an object-oriented style if it would improve things, and could not think of any way to do so that would not make things worse.
But he could not even bring himself to read the existing code. He never did. He worked on it for three weeks, wrote thousands of lines of code, and even tried to deploy his own version without ever spending a single contiguous, focused, hour-long block of time reading the existing code. He was "refactoring" the whole time.
The code he produced was exactly how you imagine it. The code went from several hundred lines in a single file with no classes (just a containing class acting basically as a namespace) to thousands of lines scattered over half a dozen files with at least that many classes. The guy kept adding classes and kept adding test after test after test. He couldn't get his code to pass the test cases I had written, so several times he declared that my test cases were wrong and changed or removed assertions. He also claimed that the existing code couldn't "pass" his tests because he had hundreds of lines of tests for components that only existed in his code. According to him, this proved that there was a ton of "hidden untested functionality" in the existing code, which was therefore "dangerous."
I was pretty busy with other things, and this was his project now, so if he had been a little bit more clever he probably could have made his abomination stable enough to replace the existing version over my objections. Thankfully it was never good enough to survive in production.
Why do all anti oop posts sound like completely unlikely, exaggerated lies? A few hundred lines to thousands with dozens of files and classes (gasp). Lol
I don't have anything against OOP as a style to have in your repertoire and use when it's beneficial. Unfortunately, OOP was promoted in a pernicious way for a couple of decades, resulting in stories like this where a guy who
* didn't understand a difficult piece of code
* knew that the code was written and maintained in its current form by multiple programmers known and respected by him who had more exposure to the problem than he did
* knew that said programmers were well-versed in OO programming and had considered and rejected OO style as a way of improving the code
... this guy, because he had learned software development in OOP's heyday, simply knew as a self-evident truth that the code ought to be OO, because OO was better, and if the code was hard to understand then the proper way to approach it and try to understand it was to incrementally refactor it into classes. And he never stopped believing these things even after his multiple failures.
This is not a knock against OOP. This is a knock against the people in the 1990s and 2000s who thought software engineering was a solved problem, thought class-oriented OOP (exemplified by design-patterns-driven corporate Java style) was the solution, and thought the only hard problem left in the field was to teach all programmers to understand and accept these truths.
> A naive look at this and my head is screaming that this file is way too big, has way too many branches and nested if statements. A naive look at this and my head is screaming that this file is way too big, has way too many branches and nested if statements, has a lot of "pointless comments" that just describe what the line or few lines around it is doing, and has a lot of "logic" in the comments which could quickly become outdated or wrong compared to the actual code.
I actually understood it immediately when I read it. There's something to be said when you have a decent roadmap for debugging critical code segments. It's not just software maintenance here, it's for the "holy crap something went wrong and we can't understand how we got here!" moments.
This is why I prefer older JavaScript to the more recent stuff, with transpiration pipelines and AI-like optimizing compilers that gnaw on giant state trees.
That stuff is lovely and terse when it works. And when it doesn’t you are at a total loss.
The beauty of imperative code is there’s a beginning and an end, and at every step you can easily see what follows and without too much work what came before.
I don't understand the objection to having more, smaller files, at least in Go where they can all be in the same package.
Once two functions are too far apart to be on screen at the same time, jumping back and forth between two functions in the same file doesn't seem any easier than switching between different files. If anything, switching between two different files is easier since they each get an editor tab.
On the other hand, being wary about extracting functions (which moves things logically together further apart) makes more sense.
For me, it's less about number of files than it is "hoeany files do I have to open to figure out how something works? How many levels of indirection do I have to keep in my head?"
I started out writing low-ish level code. Motion control, image processing, digital imaging, and the application level code that coordinated it all. I've steadily moved up the abstraction tree over the last 13 years and there's one thing I hate about it; too many fundtions and classes which do far too little. Abstraction for abstraction's sake.
It's a bit subjective of course and not everyone has their own style, but there is a Starbucks difference in philosophy between the two types of groups.
Anyone who actually worked with code requiring so much jumping.
I did, in an IDE, and I can tell you, jump-to-definition removes the problem of "what file do I have to open now?", but still leaves you with the questions like "where am I?", "how did I get here?" and "what was I trying to understand, again?", which you start asking yourself after ~sixth jump.
Not to language bash, but situations like this are why I avoid "enterprise" patterns and abstractions. Not that they aren't ever useful, but they're often implemented before they are needed, or not needed at all in practical terms. Some C# and Java tend to be some of the worst examples I've encountered. Even if the languages themselves can have much simpler implementations.
Java has these scaffolding developers who spend great deal of time in writing, 'ticket booking', 'toy/video stores' etc with all technology, framework, libs they are looking to learn. None of this code does anything useful but developers add to their resume claiming they developed reservation systems and e-commerce platform with mongoDB or whatever.
I'd add this kind of bullshit started directly from Sun Microsystems when they gave us Java EE blueprints to develop end-to-end enterprise solutions in all seriousness.
It'll take me one level up, which is not sufficient to answer my question, and by the time I find my bearings again, I have to jump back down a couple levels.
I've done a lot of this, both for Common Lisp codebases in Emacs, and for Java codebases in IntelliJ. Having to jump around and remember stuff eats into "7 ± 2" short-term memory limit that you need for the code you're working on.
(With Emacs, it's at least a tad easier to split the window to have 4+ different in view at the same time.)
"far too little" it sounds good but you need to think about what you're saying. Do you want a function to be doing too much or too little. I think any sensible programmer will choose too little. It's easier to understand. This isn't about abstraction and the trendy fashion that is the irrational fear of it nowadays. Is about people being better at tackling problems in small doses.
This is what a lot of Go code looks like. This "space shuttle" code honestly isn't much more verbose than most Go code I interact with. The main difference is they have more comments here.
Likewise, I've seen a lot of Go that is far away from the idomatic Go heaven that is the standard library. Although that happens in every language as soon as there's reasonable complexity.
As for the "space shuttle" term, it's more of a euphemism for "do not even attempt to refactor this mess, it's so complex that you'll surely fuck up if you do, and you'll make it harder for the one guy who can actually understand this"
It's not used very often here, either. Despite their claims at the top of the code of every `if` having a corresponding `else`, there are 140 `if`s, but only 20 `else`s and a mere 5 `else if`s.
It's freaking kubernetes, so obviously it will be a well understood file by a LOT of people. However this is the kind of file that also exists in unknown, non-open source products, and while may have been understood by lots of people (at the respective companies) sometimes a period of 5 years of inactivity and layoffs and hiring goes by... By gosh darn it- this is it. It's the reason we have shitty code but great products.
Straightforward concepts should be coded consisely and complex concepts should be coded verbosely. The commenting in this file slows down the thinking of the reader to an appropriate level.
To be honest, I remember myself back during my first dev seps struggling a lot reading code since a lot of the senior programmers tend to code very effectively, so it was extremely hard to follow an end-to-end solution without terribles headaches.
This is just brilliant, not only because as educational exercises explains perfectly what's doing but the business/thinking process/context.
Never read 1k lines so quickly before, never enjoyed so much.
As a novice programmer, I was absolutely stunned that this was not standard practice. A typical source file provides zero context, background on the subject, pointers to reference material/blog posts/books explaining the concepts, information on how it fits into the program's 'bigger picture', or (most importantly) the thought process that resulted in the file (i.e., why the choice was made to do _this_ rather than _that_, challenges faced, trade-offs, etc.).
It still baffles me. Every programmer has to start from scratch dealing with a new codebase, and it makes improving any non-trivial program impossible unless one is willing to spend hours of archaeological examination. To open-source developers: if you want to get people contributing to a project (and make everyone's effort much more enjoyable!), these sorts of comments are essential. Not to mention they'll save everyone boatloads of time; it's a shame that every programmer has to piece together knowledge from scratch, rather than being 'tutored' by their peers' comments.
There's plenty of good reasons to not write 95% of code with big walls of explanation. The first is a matter of cost: Writing a good explanation around everything is very expensive to do at first. A whole lot of the custom code you find in random companies, from the shiny SV startup to the old enterprise, is unimportant, cobbled together pieces. We have no idea of whether we are writing code that will be thrown away in a week, a month, a year or whether it will last two decades. Context can change fast enough that the comments become worse than useless, as the terminology might have changed, or had been misguided in the first place, turning the long comments into outright unintended deception. This gets even worse as we do not evaluate all the comments in all the files whenever there's a code change: It's crazy how a large comment block in one place can become harmful after it's forgotten, and someone else makes a correct, business critical change in another file. No matter where I am working, it's rare for me to not find multiple examples every year where the code and the comments provide very different impressions of what is going on, and it's the code that is accurate.
This is not to say that there aren't reasons to write large comment blocks, or architecture documents, but that they are often better written not while the system is being first built, but later, in a maintenance cycle, when someone already had wished for the comments, and has regained the knowledge the hard way. By then, it's clear which part of the system are dangerous, suspicious and unclear. Where there's more need for high quality error handling, and where thousands of lines of error handling never get hit, because the failing case that was originally considered doesn't really happen in this dimension anymore.
Writing code so that someone, even a future you, can pick it back up and improve it when it's needed, while still delivering the code at a good pace is a kind of skill that many just don't learn, either because they are always in greenfield teams that never pay for their mistakes, or have an approach to maintenance that involves not becoming intimate familiar with a system, and instead either hack or rewrite.
But nobody looks great in a resume by saying that they are a specialist in software maintenance.
> This is not to say that there aren't reasons to write large comment blocks, or architecture documents, but that they are often better written not while the system is being first built, but later, in a maintenance cycle, when someone already had wished for the comments, and has regained the knowledge the hard way.
I don't think the "Lean Manufacturing" approach works here. By the time someone "pulls" you for a comment, you've already lost the most important knowledge that should go into the comments - why the code is the way it is. Maybe you'll recall it when asked, hopefully not missing anything crucial. Meanwhile, comments are extremely cheap to write as you're writing the code, and even before you're writing the code (you did spend the time thinking about what you'll write, and aren't just "coding from the hip", right?).
Commit logs are also cheap to write, and it's easier for people to realize that whatever you read in "$vcs log" or "$vcs blame" might be severely outdated.
I down voted because although I think what you are saying is common parlance, it is really, really bad advice and mixes up causality. Projects which both do their job and are this well documented attract developers, both to maintain and reuse. Look, for example, at everything done by Armin Ronacher, or at SQLite.
And writing documentation only seems like a waste of time to the developer who just finished writing the code. It wastes their time. But spending time now to write good documentation saves the organization as a whole much more of other developer's time in the long run. It's a smart investment to make, even when the collateral damage is time wasted documenting code that really is thrown out.
(But, if you actually document the code well you'll find out that you throw things out much less often.)
I don't think downvoting is good way for showing disagreement.
I disagree with what you wrote. It does not waste developers time, it wastes money of business owners. Developers are usually paid for their time even if they are reading HN instead of working.
Now question is to people who pay money if they want to pay for "something in the future maybe will be useful". They will say hell no! They want time to market to be as short as possible and as much of end user value delivered.
Now you take example of SQLite or Armin Ronacher those are exceptions. There is a lot more software that is not anything done by Armin and is not SQLite.
This is why the business is usually not asked to make engineering decisions. Things like backups, disaster recovery, and redundancy could all be described as "something in the future maybe will be useful".
> We have no idea of whether we are writing code that will be thrown away in a week, a month, a year or whether it will last two decades.
Sometime also a code is meant to be temporary but is later deems to useful not to be used but no resource is allocated to is maintenance (in this case we could say refactoring). It's the case of many internal tools.
I remember when I did my first internship, I was in the configuration management team and I was responsible for doing the smoke test. I made some tools for myself to speedup the process and once my manager saw how useful they were, he was interested of giving them to the QA department. I refused because it broke every few builds and I knew after my internship I wouldn't be there to maintains them and it would just create more problems. There's so many instance where it wouldn't break so often or that person wasn't considering he would no longer be there to maintains it or won't have the time.
I think that the example given is a great example of code that will be around for a while... as mentioned, the logic was in 3 files and needed heavy refactoring. That refactoring has happened and the logic is well documented. In this case it isn't going anywhere.
That said, I mostly agree... I tend to favor a discoverable code base, where the directory structure is organized over features... for example a given feature may have a few controls, some actions, state logic, api client for foreign system, etc... any of the above. I organize by the feature, not the type of files/components of those features.
I tend to favor using abstractions only when they are simple enough, or make the related code much more simplified and understandable. The exception to this is needed complexity, such as an enterprise product that needs to support being deployed on oracle, sql server, etc.
Simple, replaceable code is often better. That said, when you have a broadly used produce and a piece of functionality that is well used, moderately complex and unlikely to dramatically change, this level of detail isn't a bad thing.
Code without specification is a maintenance nightmare, a pure liability, a ticking time bomb.
And sure, unless you're creating the control system of a nuclear reactor/warhead you don't have to go full Coq and CMMI5 and "high assurance" and whatnot, but spilling a few sentences as a minimal kind of pseudocode before writing what you want (a function, a class, a method, change the build system, refactor a big ball of if-s) is almost ideal. It helps you double check yourself, and it helps reviewers too. (Throwing there links to some issue tracker is also nice, but just the links are not very helpful.)
If you have to write docs later, it's hacking (reverse engineering), not engineering.
Personally I consider that if the code don't speak for itself, then it's most probably bad code, if it's not, add a comments explaining why it's not.
> spilling a few sentences as a minimal kind of pseudocode before writing what you want
Isn't it simply repeating the code you'll write? Which in the end, will be just as bad as the code alone or will lose time of the one that will read it afterward.
Sure it's hard to write good code, sure it's not always obvious what's bad code, but that's a big reason why the review process is there. It's not only to catch mistakes, it's also to see if someone else can understands what the code meant, the context, the decisions, etc... Sadly many just go quickly over it and just check for obvious mistakes.
> As a novice programmer, I was absolutely stunned that this was not standard practice.
I agree with your point, and I will be benefit from this style if it were the standard, too. But don't you think a good community culture can make people maintain a good git history for this purpose?
My daily job is a Linux kernel developer. I found that source codes are only the "What" part, git comments can and should state the "How/Why" part, and if all those still make no sense to me, I look for the original mailing list for the deeper "Why". Most of the time the information is sufficient.
I've never seen a git commit comment describing the "why" of code. "Added foo.\n\nImplemented az Bar because of blorgz." isn't nearly enough of a rationale, and that's the best description I see people making.
Also, the whole point of putting something in a comment is that you have to read it when working through code around that comment. Putting a note in a commit log instead ensures that crucial information important to the code will not be visible, and most likely not read at all, when working on that code.
Look at projects with high discipline and experience, such as the Linux kernel. You will find plenty of "why" examples.
It seems to be easy to find plenty of projects with bad Git commits though.
IMO the big architectural guidelines and structuring and other highest level things and API contracts etc. should be in an external file (not code). The high-level details around a certain implementation in the commit logs for that file/files. Relevant implementation details and important things in the code. The lower you go, the closer to a comment line in the code you should get. This reflects also the rigidity of the code: the high-level architecture should not change often; if it does, then the architecture is not really ready.
Things should be certainly documented, but not in great detail inside the code files. The lack of documentation is in my opinion OK, the code is always the last word for how things actually work anyway. Misleading or wrong documentation are the absolute worst.
The problem with trying to document anything in a commit message is that a commit message can't easily and unambiguously point to the specific section of code that each part of the commit message refers to. You can get general intent in a commit message, specific details in the code, and a yawning void in between.
Sure, the back-and-forth on a mailing list or review system such as Gerrit can provide more of those direct links between code and explanation, but then it's often buried in a bunch of other discussion and dead ends. It's the "oh noes, comments can become out of date" times ten, so hardly a good alternative. And even when that's not the case, it's a horribly context-switchy way to get that information. Separating the explanation and the concrete expression just isn't a great idea. Funny how many people who would rather die than write a design spec also suggest that an even greater separation is a good thing.
> it's often buried in a bunch of other discussion and dead ends
If your "why" doesn't contains this, I have an hard time understanding why you want it in the first place.
The "why" is important to not make the same mistake. Theses others discussions and dead ends are all related to the issue, they are all questions that was asked/answered during it, they are all mistakes that may have been made.
Personally every time I only needed a tiny bits more context, the commit message was enough and if I needed more, the full related ticket was essential (and not a simple why, but the full though process that became that decision).
> Funny how many people who would rather die than write a design spec
I'd rather die than lose time over something I consider won't save time in the long term. Writing any documentation is long and if that time is longer than the few time we'll need it, it's a loss of time.
It's pretty rare that we need to go back to that information, when we do, the ticket information is enough and if it's not, doing the though process a second time isn't so bad the rare time it happens (which will provide more context by the new ticket discussion too).
For sure if your ticket just say:
- Break when we enter text
- FIXED
You are going to have a pretty bad time and you need to change that.
> If your "why" doesn't contains this, I have an hard time understanding why you want it in the first place.
The "why" doesn't need to include every minor style nit (all the way down to variable naming) that came up during the review. It doesn't need to include every "I would have done it this way instead" comment which was in neither the original nor final version. That's noise, not signal.
> Personally every time I only needed a tiny bits more context
Lucky you. For much of the code I have read or written, that was not the case. In general it tends to be less and less the case as code climbs up the complexity/innovation scale.
> I'd rather die than lose time over something I consider won't save time in the long term.
Is that an unavoidable issue with the medium, or more of a reflection of how some people are bad at writing? I've whipped out a spec for a medium-complexity feature or component in under an hour, and been thanked for it five years later. Many times. If you've never had that experience, then I can only say I hope you'll be able to some day.
> when we do, the ticket information is enough and if it's not
Again, lucky you. Others have a different experience.
Not all organizations have such robust communication structures, low levels of dysfunction, or git hygiene. Also, as mentioned, in a corporate environment where the software is proprietary and certain source files may sit dormant long enough for the original authors to move on from the company, it's important that they don't take the "how/why" part with them, so to speak, and leave the next maintainer SOL.
As much as I would implore you to document the why as a comment in the code after your experience, I have also learned the hard way that 'why' can be a moving target given enough time.
The project I'm working on has over 150 commits per day. It would take literally months of work to get up to speed with the codebase if you did it by reading just the git headers, not the diffs. This is considered a small project.
Histories of active projects grow with age. A well maintained artifact should plateau in complexity.
I think both comments and Git messages will depend on culture; in my local / current software development culture, an emphasis is put on the code being readable enough to be obvious (this isn't true by the way, even if it's good code). Nobody reads comments, nobody reads git history - mostly because they will always be outdated / no longer relevant.
The assumption is that the original author knew what he was doing, and if not, there was a code review and anyone can fix it if they see a problem.
Finally, it's code that probably won't be around anymore five years down the line, so detailed commit messages etc feel like a waste.
(Mind you, I don't agree with the above. At the same time, I don't want to do this type of meaningless throwaway work anymore)
> unless one is willing to spend hours of archaeological examination
Currently the only person in-office over Christmas on my first software dev job. Debugging a 50k LOC COBOL beast that digs into three other beast programs and ends up in a final 20k LOC uncommented piece where things are supposed to happen and be returned back.
Nothing is commented, the programs are huge and one can only debug one program at a time, requiring me to submit untested changes in one of them to the shared dev environment since I'm making changes in two, before debugging the other program.
If it just said somewhere what half of the stuff is I would save an insane amount of time getting to know the system.
Heard about COBOL for ages, always been into retro computing and setting up z/OS on the Hercules emulator at home I was offered COBOL-training from a company and two years of guaranteed employment (with pay even if they decide I'm not needed) to join a team that force-retired 40% of their developers this year.
I have no real previous development experience, only some Java and Python, since I chose an industrial engineering program at uni, and I got the offer the semester before my CS-classes (my chosen specialization) started.
Changed my university classes to equivalent distance-classes and took six months of school to get into training and a job.
I think your desire is right, but think about this every time you create a file, and how much slower your work would be. The question then becomes: "how much commenting exactly is needed before this becomes more time than the technical debt it creates?
I think this type of summary should not be per source file but per package/folder/module/project. A high-level developer overview with sufficient depth will also help stop repetition of philosophy in subsequent files.
In this case, I do agree with its existence though because of both the length and complexity of the file. For many files, neither are the case.
I think this commenting only makes sense if the design is atypical. Probably 90% of the code I write is following design patterns already used throughout the code base.
Since comnenter announce does not understand what I meant, it does not matter if design is typical. Functionality still has to be described, so the pattern itself buys you nothing on its own. Then the overarching idea and reasoning also has to be described to make it easier for everybody else to understand.
I've met more than one codebase which tried to use a big hammer pattern to open doors. Heavy indirection to do simple things because of design choices, usually caused by a forced framework - instead of easily factorizable simplicity. Patterns should be emergent not forced. Or more specifically, the choice of appropriate pattern should be based on the problem being solved, rarely one pattern can cleanly solve all problems.
Similarity is very seductive but end result is often complex and hard to reason about when followed religiously.
Sometimes a lot. Because what I'm really doing in my head then is the design work (writing a sentence actually helps with that) - the work that's actually necessary to write the code well.
When I read something saying "writing thoughtful comments is too much work" I really read this as "thinking about the code you write is too much work, it's better to write whatever crap that first comes to mind".
Well people keep picking arbitrary starting points in the coordinate system. I'm saying for any arbitrarily picked starting point in the coordinate system, the time taken is always twice that of getting from the arbitrarily picked starting point to the halfway mark.
The information you are talking about is called documentation and it should exist outside of actual code files.
If you need to read comments to understand code, you don’t know how to read code. Comments can lie, only code is the source of truth, once you gain experience you won’t bother with comments.
> The information you are talking about is called documentation and it should exist outside of actual code files.
The file in the OP is Go. In Go, you are encouraged to write documentation and code in one place[0]. The end result is very nice, auto-generated, human readable documentation[1] which is tightly coupled to the code it documents.
Sure, and then the next day the whole thing is just a long boring text which has no correlation to reality because the business rule changed and the developer next to you refactored the code.
If the developer next to you refactored the code without refactoring the comments, they did a shit job. Period. They need to be told to go back and fix it.
Or maybe the original developer who wrote the code which was so poorly written that it needed 4 paragraphs of explanation did a shitty job. Period. They need to be told to go back and fix it. You see! You argument works both ways, yay!
If the comments are completely redundant with the code, they're bad. But you can't express as code why the some code is the way it is, and why it does what it does. After you tried to express in code every important information that's expressible as code, whatever important information remains must go into comments.
False dichotomy. You are trying to make this black or white while there are many more options beside comments. There are even obvious ones beside comments.
You could. It is called an automated theorem prover, with the cheap limited version called constraint based programming. Sadly very unavailable for most programming languages.
Sometimes writing the proof of why it should be like this is tricky, especially when complexity or performance is involved. (But then the tests are hard too.)
Linters are essentially constraint systems for code, but not for the program.
Theorem provers can still only talk about what the code does and whether or not it does it correctly. Specifying why the code is there is an AI-complete problem. Hence, comments.
One thing that I wish Linux kernel code had. Maybe it does but the few times I have found myself reading Linux code I go to the top and there is zero context in the comments, just a bunch of licensing information.
There are few linux drivers for particularly buggy hardware that are written in this style. Although OTOH what I think of is hme.ko, where the comments are more on the hillariously funny side than descriptive.
The kernel code is nowhere near ideal nor "space-shuttle" code. It has a ton of single use functions which is essentially the opposite.
Magic numbers and timing without explanation why.
Rarely used technical descriptions such as "Lance mode".
Tons of unchecked assertions on lock when kernel provides a lockdep check for this.
Custom logging macros.
Flies in the face of kernel "goto error" error handling convention in a few places.
XXX with obvious bugs mentioned, unfixed.
And more...
Splitting it into more files wouldn't help at all.
This. Documentation (comment blocks in code, as well as all other forms of documentation) tends to become outdated because it is not maintained in sync with every code change.
Solution: write clear, simple, modular code that is self-documenting and does not need extensive commenting.
How soon? I've rarely seen it happen in a meaningful way (i.e. other than someone using automated refactoring on some name, and breaking some comment reference that wasn't written in a way the autorefactor tool could read). When it happens and you see it, you should fix it like any other bug.
Key trick is writing comments as close as possible to the code they affect. Then it's hard to miss affected comments if one's not doing a shoddy job.
> Key trick is writing comments as close as possible to the code they affect. Then it's hard to miss affected comments if one's not doing a shoddy job.
That also helps it show up in a PR so it may be caught in code review. Even so, IME comments get missed enough that over time I can't be as confident as we'd like that they match the code.
I think it's also worth calling out that this key trick is not being applied in the recommendation up-thread, which asks for a block comment at the top of the module.
Overall liked the original post. But also liked what you said.
But here are a couple of Martin Fowler quotes (from his Refactoring book) I tend to follow:
“A heuristic we follow is that whenever we feel the need to comment something, we write a method instead.”
“Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent.”
Having spent 25+ years writing, viewing, commenting on and reviewing code in a multitude of languages, this is good stuff to see - regardless of the 'style' of programming (or the language broadly-speaking).
Stepping back and whilst we can all overlook it, good code comments can make an enormous difference in productivity - both for an individual, a team and indeed a business. It aids repository knowledge (something that is easily lost between current and prior teams/individuals), which shouldn't be mistaken for the intelligence of someone looking at code...
I've spent far too much time personally and otherwise attempting to reverse-engineer code written by someone with no comments or explanation. At times, super experienced programmers/developers will take performance-enhancing shortcuts that the less-experienced don't understand; They'll compress routines and functions that are a result of their explicit knowledge of a language and/or domain but without explanation...
On a basic level, comments should: inform, educate, outline and help others understand the sometimes complex routines and functions that we all create and often under an enormous amount of pressure.
There are those that believe good code shouldn't need explanation and to some degree that's true, but you can't apply that brush to every codebase. Code can become complex, awkward, spaghetti-like and almost unfathomable at times.
I've always strived to teach less experienced developers to comment well, efficiently and with a little humor/humour (where possible); Something that allows us to understand code quickly, appreciate the efforts of those before us and smile/grin at the complexity of a challenge.
Personally, I don't really care about the code/comment ratio - It's a complete red herring. At times, code comments can be worth more than the code itself. At other times, they just help you get your job done; quickly, efficiently, no fuss, just great code.
> There are those that believe good code shouldn't need explanation and to some degree that's true, but you can't apply that brush to every codebase. Code can become complex, awkward, spaghetti-like and almost unfathomable at times.
I have yet to find a codebase that couldn't be made clear as soon as a programmer actually put some effort into doing so. Far too often adding a comment is used as an excuse to give up on making the code readable before you've even started.
3. Is the comment adding more information, making the code more clear?
If Yes: Put that information into the code. Rename variables. Pull out code into subroutines.
If No: Delete the comment.
You’ll be amazed at how often this practice works. Doing it all the time will make your code more readable.
We have a second, enforced practice at work, thanks to code review. The question in your head is simply: “Am I gonna get a comment about this at review time?” If yes, you gotta make the code clearer/simpler/better. Because you’ll have to answer and address the comment, and that’s just gonna slow you down more than if you just fix the problem now.
There are concerns and aspects of software engineering that are important to document but don't sit well in code, but they become important when reading / maintaining that code. Design decisions (implementation details), specification details, other design or engineering constraints. And of course business modeling fundamentals and constraints.
Sure, in theory, you can do everything in code, but I find that usually this trade off is taken, as there's no time/budget to go that 80% extra time. (Or however the Pareto curve looks for the actual problem. And usually it's not less than 80, but more.)
I would never suggest documentation in code alone... Your point is very much noted, echoed and a hat-tip to you here. Yes, documentation is incredibly important, and it should be used as an driver/accompaniment along with comments.
If that works for you then great, but I don't see the value in adding the comment only to remove it one way or another a moment later. What I do is 1. write some code 2. is code clear enough? If yes, stop. 3. clarify the code 4. goto 2.
I agree with you entirely but I can also see that the strategy mentioned could be a good bridge (for those who currently litter code with comments) to a better world where comments are rarely needed.
Well, you seem to be providing data against your own theory.
Usually codebases are a mess, so were people commenting more on what's the goal then refactoring it would be easier.
Software is very susceptible to the "perfect is the enemy of done" mantra. Software works as soon as it works for the first time. And then it gets a shipped. It's 1.0, even if it looks like a mess internally. Because IT development is expensive, there is rarely budget to go that extra mile (which is again usually would cost a lot more than the first few miles).
> On a basic level, comments should: inform, educate, outline and help others understand the sometimes complex routines and functions that we all create and often under an enormous amount of pressure.
Take the time to simplify the complex routines, clean them up and make them readable and don't waste your precious time in writing "good comments".
> Code can become complex, awkward, spaghetti-like and almost unfathomable at times.
The solution for this is not "comment more". It's "clean up the mess".
> At times, code comments can be worth more than the code itself.
That doesn't make any sense. If the code has no value, you can remove it.
> I've always strived to teach less experienced developers to comment well, efficiently and with a little humor/humour (where possible); Something that allows us to understand code quickly, appreciate the efforts of those before us and smile/grin at the complexity of a challenge.
Please don't do that. Of course it's your code base, but usually it's best to find better venues to express your humor than code comments. Also: teach the junior developers to write clean code first.
Comments too often are used as a bad device to fix code. A developer first writes a horrible mess of code and stops. He may then realize that perhaps no-one will understand it. But if we now teach them not to clean up the code but rather just "write some funny comments", do you think it's any better?
I've seen too many code bases written with this attitude. The comments usually don't help at all, they distract the reader and they distract the author. It's too often useless noise. Many developers hide the comments from these code bases so that they can concentrate on what ACTUALLY is relevant: the code.
The worst part about comments is that they just add a cognitive load of reading and parsing extra lines of text.
What if the underlying logic changes? Do I need to update the original witty comment as well?
How do I know if the comment is still relevant? No compiler nor tests will tell me that and now I'm left with a task of parsing a language with much more degrees of freedom (English), without any support from the IDE. It becomes even harder in international teams.
Well said, I was cringing when I was reading the bad advice and you addressed the points very well.
It is so frustrating to hear when someone thinks commenting more and adding humour is somehow cleaning up the code. But to then hear that they are teaching this to juniors just hurts so much.
Simpler advice to juniors would be to read Clean Code by Robert C Martin, apply some of those techniques and then politely ignore “more experienced” developers who think writing humourous essays as code comments is a good thing.
I’ve even heard phrases banded around like “the more comments the better” - I mean seriously WTF.
I like the analogy of...
When in an unfamiliar city, having no map at all is much better than a map that you have no idea whether you can trust or not. Code comments are absolutely that untrustworthy map, doesn’t matter how many code reviews or convoluted PR process takes place, you still need to read the code to know the truth so more effort making the code communicate is the key to maintainable code. Simple things like good variable names and grouping behaviour into functions at the same level of abstraction can completely negate the need for comments.
Comments have their place, but should be the exception not the norm.
I disagree with nickharr's opinion on humor just as starkly as you, but agree with the need for comments very much. And his/her point still stands regardless of how much you name and group functions. There are concerns and aspects (design/engineering decisions become implementation details that might need rationale) that can't be presented efficiently that way. (Workarounds for issues benefit a lot from links to the relevant issue trackers, and so on.)
Point taken, but my point wasn't about adding humor as some form of default - just something that's amusing to come across when you find it as a developer and where others have struggled/suffered. You are reading too much into it, but appreciate I wasn't clear enough in my initial description.
One of my pet peeves is JS projects. I am not sure why but the front-end developers refuse to add any comments at all. This trend, oddly enough, started with ES6. Pre-ES6, JSDoc style comments at least, were quite common.
Just because some popular Javascript project doesn't have comments doesn't mean yours shouldn't. Straight-forward code may not need comments but most of these projects definitely need to explain why or how are things supposed to work or why things are done a certain way.
> Straight-forward code may not need comments but most of these projects definitely need to explain why or how are things supposed to work or why things are done a certain way.
Code comments usually are not the best place to elaborate on "why things are done a certain way". One can also write documentation about the architecture, design etc.
That'd be great, but rarely done. And usually there's a big impedance mismatch between the design document's level of detail and the level that'd be actually useful were someone trying to understand the actual code. (Hence comments seem to me to be the ideal place for documentation. Especially that they can be maintained right at the time the code changes.)
"it became clear that we needed to ensure that every single condition was handled and accounted for in the code"
This is a feature of several (mostly functional) programming languages, e.g. Haskell. Fun to see that often people figure out that these types of concepts are a smart way to write your code. Too bad it usually means many people reinvent the wheel instead of learning about computer science history and other languages.
I know a business coach who regularly asks his audience "Who here makes better burgers than McDonalds?". When half the audience raises their hand, he asks them why they don't outsell this giant company.
Functional programming advocats, especially for the "pure" ones like Haskell, always strike me as odd. It seems that all the beauty of those languages make people obsess over that beauty and purity while keeping them from being productive.
Meanwhile, people with simpler languages like Go just get stuff done that is useful and makes people happy. Now if I _ever_ came across a useful Haskell product, I'd be happy to test drive it, shouldn't be a problem by now with all the container technology. But the closest I ever came to using a functionally developed product was RabbitMQ (written in Erlang). That one was _such_ a pain to use and operate — must have been the developers still dreaming in the purity of its code instead of writing some installation docs. I moved on to Kafka later and didn't regret it a minute.
Selling a lot of burgers encompasses much more than making good burgers. By the same token, good products entails much more than making a programming language choice.
Functional programming, at its heart, is about using self-imposed constraints to avoid certain classes of programming mistakes.
If your application domain doesn't have big consequences for these classes of programming mistakes, then it can seem like functional purity can be a luxury or frivolous. However, if your application domain suffers greatly from those classes of programming mistakes, such as distributed systems, then it may be worth it to consider what functional programming might buy you.
So yes, just because you use a functional programming language won't help you sell your widgets or make a great product. And you can spend lots of time fucking around with it for its own sake and still not sell widgets or make a great product. However, if you understand what its constraints buys you, then you can make your job of building these things in certain domains much easier.
Now that's an excellent point, I agree. However, at its heart, Kubernetes does not even have that many distributed features. It's a rather classic master-slave architecture that shells out a lot of distributed primitives to etcd (which could be a much better candidate for the points you mentioned), but on the other hand, it's a mainly a system orchestration system that juggles an operating system, syscalls, SECCOMP, firewalls and related stuff — exactly the area where Golang shines and Haskell would have a rather hard time.
In this way, I rather like the tradeoff that the OP post shows: It's at least _possible_ to write Go code that's almost as defensive, safe and exhaustive as a safer language could provide, it's just a lot of manual work and discipline. For the small subset of code that needs these guarantees, it's possibly overall more consistent and efficient to take this route than splitting code into different languages with different strengths.
> So yes, just because you use a functional programming language won't help you sell your widgets or make a great product. And you can spend lots of time fucking around with it for its own sake and still not sell widgets or make a great product.
It comes down to trust. You're either fucking with your code in a powerful programming language that lets you do everything, or your fucking with the language restrictions to get your code to compile in the first place.
You can either go eat at McD's which sells pre-cooked burgers from minimum wage employees which kills the flavor and the taste of the meat being served but is extremely safe, or you can go to an upscale burger joint where artisans grind their meat in house and cook it to a perfect medium rare.
> It comes down to trust. You're either fucking with your code in a powerful programming language that lets you do everything, or your fucking with the language restrictions to get your code to compile in the first place.
> ...or your fucking with the language restrictions to get your code to compile in the first place
I think that's a wrong and outdated view on strong type systems.
A good type system is also an ergonomic one. What people start to experience is that the compiler is actually a friend that helps you write code and keep yourself true to your own promises. When new people start Elm, PureScript or Haskell (or another lang with ADTs and Type Inference), they might be a bit overwhelmed with the paradigm shift, if coming elsewhere, but if you are new to programming, there's nothing inherently more difficult in Haskell than in other languages—the cost of wrong code is just apparent earlier.
It doesn't come down to trust, it comes down to the realization that your mind can keep track of less information than a computer, and that you, as a programmer, are forgetful and make mistakes. The compiler is there to help you when you stumble over your own feet.
NOTE: I'm only including strongly, statically typed languages with type inference in the above. "FP langs" by itself is far too broad to be a useful categorization.
Even experiences FP folks complain about how difficult it is to program in Haskell due to its purity. Further, the high barrier to entry you describe is an even bigger deal to organizations than to individuals. It’s a cost that needs to be paid for every employee. And it’s not just the language features, but FP languages tend to have issues like poor documentation, poor editor integrations, multiple standard libraries, multiple build tools, multiple string types, home-grown project file syntax, multiple preludes, many extensions, etc. All of these boost the learning curve in addition to the issues with the language itself.
In my limited experience with functional programming, I've come away with the impression that the problems you're describing are why you don't see a lot of companies that use FP exclusively.
But I think the advantages that OP is lauding are also there, and "space shuttle code" might just be where it shines. Reading that comment in this post made me immediately think of Haskell. The Clojure components at my company fits the "space shuttle" description of importance, and its dependability is striking compared to the rest of our code. Part of it may be that being written in a different language allows its concerns to be separate from the rest of the application too, but I do think the FP paradigm is simply good in this domain.
Sure, but I doubt all of Kubernetes is written in this style, so it’s probably not worth writing everything in Haskell. Note also that there’s nothing about FP that prohibits it from addressing the aforementioned practical problems. Some Haskell-like could swoop in and totally steal Go’s lunch if they would simply prioritize practicality over experimentation.
> Some Haskell-like could swoop in and totally steal Go’s lunch if they would simply prioritize practicality over experimentation.
I don't know. It seems like it would be easy enough to build an AST to make sure there were no unknown conditions that didn't lead to a return statement.
I don’t understand how your post relates to mine. I was saying that if a static functional language focused more on simplicity, readability, good tooling and documentation, etc then it would eat Go’s lunch without trading off any of Haskell’s important characteristics (I.e., robust type safety).
> I think that's a wrong and outdated view on strong type systems.
I wasn't really talking about type systems specifically, I was thinking along the lines of the Rust borrow checker here, and lower level programming like assembly and C.
> A good type system is also an ergonomic one. What people start to experience is that the compiler is actually a friend that helps you write code and keep yourself true to your own promises.
I see a lot of people make the mistake that type safe code is bug free code. "It compiles, therefore ship it."
> It doesn't come down to trust, it comes down to the realization that your mind can keep track of less information than a computer, and that you, as a programmer, are forgetful and make mistakes. The compiler is there to help you when you stumble over your own feet.
You trust the compiler to find bugs. Awesome. Luck be to you. I trust unit tests more than the compiler. Mostly because I wrote them. The compiler, I didn't.
You don't write the compiler, of course, but you do write or rather design the types that your program uses. The typechecker ensures that they're consistent within the system of logic that they set up.
By your logic, it's not enough to have faith in unit tests because you wrote them; you must also write the test framework and runner.
I have also developed a strong preference for the latter. I was always on the fence until I got my hands on generative testing (quickcheck, etc).
Now I want the powerful language and I can just slam it with generated tests to get the same level of confidence as the guardrail languages (in practice definitely, although I understand this is not true in theory, so unspit your coffee Haskell people).
Oh right and this is currently Clojure, so a functional language that definitely does help me make a great product. Less time implementing correctly means more time for product refinement.
As a professional Haskell dev, it definitely makes me more productive even discounting the higher quality of the results. Everything the language does for me is something less I have to think about it. And reusable abstractions, surprise, reduces work later making things go faster still.
Hopefully sometime soon I can share an example of this.
Your language example, and the burger example seems flipped. I assume you prefer artisan burgers, which means you like no language restrictions.
Badly chosen restrictions are not good, we can all agree. But what do good restrictions look like? Back in the day, programmers prided themselves on being able to do their own memory management, and bristled at the idea of a compiler or the runtime doing it for them. Now, that's the exception, as most of the time, we don't think about memory management in most languages we program in.
As time marches on, we'll find more of these restrictions that we all eventually agree are good practices, and the next generation of programmers will take it as a given in programming.
As a look beyond functional programming restrictions, I encourage you to check out Peter Alvaro's talk on Distributed Systems. Here, he talks about how queries over distributed systems is really hard to reason about, because time is now relative--there's no central clock to measure time. However, if we restrict ourselves to a language whose queries cannot express negation, a lot of the hard stuff about distributed systems go away.
Functional programming, at its heart, is about imposing a particular kind of constraints with the hope that it will have an impact on program quality. The evidence does not suggest that FP is successful at that goal. Its chosen constraints are likely the wrong ones. There are a few programming paradigms that are "about constraints," and perhaps one of them will end up making a big difference, but it appears FP is not the one.
> I know a business coach who regularly asks his audience "Who here makes better burgers than McDonalds?". When half the audience raises their hand, he asks them why they don't outsell this giant company.
There is indeed a lesson here, but which one do you think it is? It's certainly not that McDonald's makes better (or "simpler") hamburgers: once you taste good hamburgers you can never go back to McDonald's (and yes, I make better hamburgers too, even though mine won't win any prizes!). To me, the lesson is that there's more to success than product quality. The established brand matters, the scale at which you can sell a (possibly inferior) product matters, how low you can get away with paying your employees matters, how well you can survive PR disasters matters, etc.
If you had to make burgers, would you rather make cheap and mediocre ones, or would you rather enjoy making premium burgers? :)
> It's certainly not that McDonald's makes better (or "simpler") hamburgers
At the risk of derailing the thread, that would be the lesson I wish people would take away from that example.
Criticizing fast food like that is dumb signalling IMO; McDonald's!hamburger != homemade!hamburger. It's an entirely different product sharing the same name and some of the ingredients. It tastes different, and has a different form factor. People like this, even if many don't want to admit it to others (or themselves). When I go for a McD's hamburger instead of a foodtruck one, or when I say I prefer chain restaurant pizza over a home made one, it's because I want a different product. Treating the two as the same category is like treating tea and coffee as the same thing.
Among my circle of friends it is. People either like it or not, and if they like let's say the BigMac they either want that or not, but wouldn't really substitute it with a Burger King Whopper or a KFC Grander. They usually like that too (more or less, depending on the individual), but that's then a conscious decision to okay, let's eat something different. Even if the burger is in the name.
This is what kills me: McDonald's doesn't even make good burgers for its price range and speed of service. Plenty of competitors where I live make equally mediocre burgers cheaper and faster. Of course, if you wanted to enjoy a good burger you wouldn't go to any of them, McD's included.
>People like this, even if many don't want to admit it to others (or themselves)
People only like the cheapness and the convenience (and perhaps the no-surprise factor).
Everything else being the same (price and time to prepare), nobody would eat McDonalds vs a quality burger (except the kind of people who eat Hot Pockets for the taste, but that's a much smaller demographic than McDonalds buyers).
I disagree. I like the taste of McD!burger and I like the taste of foodtruck!burger. They are entirely different tastes; sometimes I want to experience the former, sometimes the latter.
No-surprise factor isn't that big of an issue if you're buying; restaurants tend to have consistent food quality & taste too.
Disagree. I’m not a hardcore foodie, but I enjoy the entire spectrum of food, from fast-food to food trucks to homemade to Michelin 3-star restaurants. And sometimes I want Taco Bell, or yes, even McDonalds.
Tons of the chefs who work at or own those Michelin starred restaurants would agree with you - they almost universally love to eat low tier, mass market fast food
The fact that there is a mass(ive) market for fast food is proof alone that people like it (enough to eat it and spend their money on it). I’m with any of the posters above who like fancy burgers as well as fast-food burgers.
Where McD’s really shines, though, is their breakfast offerings. And with that said, I think I will head there now for a quick breakfast (I live ~5 blocks from
McDs).
I remember watching a TV program where they took some typical fast food (McDonalds, KFC, etc) and presented it on a plate in a way you would expect in a more expensive restaurant. Everybody who tasted it rated it higher than food from the the fast food chain.
It seems many or even most people can't tell the difference and the ambience and service is as important as the food.
Penn & Teller's "Bullshit" show did a similar experiment with "bottled" (actually, tap) water. It turns out most people rate tap water better if it's served in a bottle and told it comes from a natural spring.
It surprises me in the case of frozen patties like McDonald's though. A frozen patty looks nothing at all like an actual burger made from ground meat. It tastes differently as well, of course, but what matters is that it looks completely different!
> Meanwhile, people with simpler languages like Go just get stuff done that is useful and makes people happy.
Kubernetes' reputation is just the opposite: that far from being a simple and useful thing, it's an overengineered, overcomplicated solution to a self-inflicted problem (deploying a distributed monolith).
> But the closest I ever came to using a functionally developed product was RabbitMQ (written in Erlang). That one was _such_ a pain to use and operate — must have been the developers still dreaming in the purity of its code instead of writing some installation docs. I moved on to Kafka later and didn't regret it a minute.
Erm, Kakfa was developed in Scala, whose advocates far more of a reputation for purist pontification than Erlang developers do. Maybe all that beauty and purity is actually good for something?
Calling Scala "purist language" sounds riddiculous to me. It is a hybrid of object-oriented and functional styles, with tons of weird hacks like case classes for pattern matching, implicits, type inference that kind of works, but is not full Hindley-Milner, Java bindings that require weird conversions between Java and Scala collection types, _ as a wildcard in few different places. Scala is the opposite of purism. There's a reason why Odersky's "Programming in Scala" has over 800 pages - even listing all the hacks used in the language is not trivial.
Erlang is not at all a pure functional language, fyi. Functions definitely have three very important core side effects that are optional: message sending, message retrieving, and dying (raising an exception).
I would call the erlang system the "one white lie" category of purity on the FP purity scale. It is, by the way, a huge tradeoff that makes operating in EVM languages far easier than, say Haskell.
> Kubernetes' reputation is just the opposite: that far from being a simple and useful thing, it's an overengineered, overcomplicated solution to a self-inflicted problem (deploying a distributed monolith).
Thats why companies pay the buck to get 24/7 support. Because its that good!
The paucity of useful tools/libraries written in Haskell as compared to languages like Go are more due to the fact that there are far more people in the Go ecosystem than Haskell, rather than because Haskellers are too busy navel gazing.
This disparity in numbers is in turn is primarily because golang/Python/C is inherently much more approachable than Haskell because the average programmer has cut his/her teeth writing imperative code for a significant portion of their early programming career. The jump to functional way of thinking requires a certain leap of the mind, that most people don't want to take the trouble doing because they seem to be happy "getting things done" in golang or Python.
However, that is not to say that we shouldn't be striving for correctness in code that Haskell fosters, or the inherent simplicity that it forces on you because you are forced to separate your IO and effect-full code from the pure bits. These are things worth striving for. These are broad principles worth emulating even when you are coding in an imperative language.
To turn your McDonalds analogy around: sure, a McD will let you just get your food requirements out of the way quickly and cheaply (i.e. just "get stuff done"). But in the long run, it's bad for you.
Haskell is a healthy salad to an [insert favourite imperative language] burger.
The things that keep me from using Haskell aren’t the functional bits but the overall complexity of everything. There are apparently several build tools, preludes, compiler extensions, string types, etc and figuring out when to use which is a major pain. And to top it all off, there’s the tediousness of the syntax, the pervasive use of symbol identifiers, and the general preference for code-golf programming vs readable programming. And THEN there is the tedium of the functional purity.
I really, really want to use Haskell because it’s type system seems neat in general, but the type system doesn’t justify all of the tedium. Go definitely loses on type system, but it wins in many of these other areas and so the trade offs are just better.
I agree with the general sentiment of your post, but the idea that Kafka is somehow less painful to operate than Rabbit is ridiculous. I love Kafka, but I can't think of a more painful software to maintain in production.
In my experience, Kafka has been relatively painless. Sure, it needs zookeeper. But zk has turned out to be start it and forget it kind of infrastructure for us.
Util one day ZK fails in the most spectacular way and nobody knows the internals of it and you lose data. Just because is something fine for a time does not mean that it is reliable. Without telling us how many nodes you are running ZK/Kafka on this information is useless.
I'm not claiming it's perfect. Our cluster of > 15 Kafka nodes and 3 node zk ensemble has had no major issues in over 4 years in production. That's something. And from the bug - "GC logging/tuning is clearly where I dropped the ball, just using the defaults;"
>But the closest I ever came to using a functionally developed product was RabbitMQ (written in Erlang). That one was _such_ a pain to use and operate — must have been the developers still dreaming in the purity of its code instead of writing some installation docs. I moved on to Kafka later and didn't regret it a minute.
Err, RabbitMQ is one of the easiest to setup and stabler queues/messaging systems. And I'm no user/fan of Erlang...
I found Kafka easier to set up. RabbitMQ is a bit finicky when it comes to config, monitoring, administration, but it's pretty okay, and works well enough if you really need AMQP semantics (selective ack of messages).
I agree with the spirit of what you're saying. Too often I've seen folks on HN dismiss languages because they didn't have features X or Y, usually with a condescending "maybe this language's creators should read some PL theory". I remember circa 2013 or 2014, it became impossible to read threads related to Go because the entire thread would always be "a Real Language(TM) needs Generics". Look at the success Go has seen in the last decade while doing the opposite of what HN thought was correct.
Go and Haskell are really at different ends of a spectrum. Haskell is a very high level language, which makes building and reasoning about very complex software much easier (at the expense of a learning curve and fine control of space usage).
Go is a low-level language with no learning curve and a very limited facility to abstract (by design it seems).
Personally I would have written something like Kubenetes in Haskell or OCaml, with perhaps the odd performance critical section in C or Rust.
It's part of the Blub Paradox[1]. If you don't know about better things, you won't miss them. But once you do know about more powerful things, it's hard to go back to living without them.
I wonder if it’s not the language that causes the products, but more that programmers that are drawn to functional languages are less likely to care about product.
That said, lots of things are functional: chunks of Facebook, Twitter, and Microsoft (and I assume Google) are written in OCaml, Haskell, Reason, and F#. Jet is built in F#. Jane Street famously uses OCaml, and Scala is becoming the standard language for hedge funds. Spark apps are usually written in Scala. Etc.
Erm, excuse me? Idiomatic Erlang is definitely not the purely functional, obsessed with correctness and types oasis of Haskell. And a the community that birthed Kafka (Scala) has much more in common with the Haskell community than the Erlang community.
Furthermore, why are you using the user interface as a yard-stick to judge language paradigms with?
> I know a business coach who regularly asks his audience "Who here makes better burgers than McDonalds?". When half the audience raises their hand, he asks them why they don't outsell this giant company.
Because I'm not in the business of making hamburgers nor am I interested in doing so.
Struggling to see this business coach's point. Is he saying that McDonalds makes better burgers than me because I'm not selling a million burgers a day? Because that's a load of crap.
He's saying that when it comes to serving customers, there are critical concerns besides how good the burger is.
E.g. Low cost, made quickly, scalable (easy to train workers from any culture, uses ingredients that can be sourced at scale across the globe, etc), consistency (burger is always up to standard regardless of time or place).
Serving customers is not a contest of who can make the best burger.
Still, his point is easy to turn around on him. Why is he "just" a business coach, selling his personal labor? Why trust what he has to say if he isn't CEO of the world's largest MNC for business coaching services? It's easy to take it ad absurdum as well: if any pundit or political scientist thinks they have better ideas than Trump, why aren't they the sitting leader of the free world?
OCaml is not Haskell, but it does provide almost all of the guarantees Haskell does. Here's a list of popular software you might have used written in OCaml:
- the Flow and Hack compilers
- Facebook's Messenger app
- Jane Street's entire trading infrastructure (and everything else they do)
- XenServer
- some parts of Docker, including the TCP/IP networking stack on OS X and Windows
Seems like you have bad experiences with functional programming, but it's a little strange to rat on the advocates that are trying to figure out how to take potentially useful functional programming concepts and make them mainstream and/or explore alternative ways to quickly build robust systems.
Good examples of this translating to huge gains for the overall community are React + Redux.
I'm quick to admit that functional language ecosystems are not as mature, which might lead to lesser organizational productivity but no need to rat on functional programming in general.
A lot of things seem to get lumped into “functional” programming. The lexicon changes so I may just be behind the times, but ensuring every condition of branch logic is covered is not an aspect of functional programming as I understand it. And I may be wrong.
Without getting into what is and isn’t functional programming, exhaustive pattern matching is a very useful feature present in some functional languages.
That largely depends on supply and demand in the particular locality. Primary value of McD is that they are global enough that such effects are mostly averaged out and the core products are essentially same globally and the pricing is consistent at least on country level. One thing that recently surprised me is that KFC does not work in the same way and the menu is totally different across not only EU countries, but even franchise holders in same country.
While I ashamedly admit to having using pandoc myself already (it's been a while, but it wasn't even a bad experience), to me it still feels a little bit odd that this relatively obscure document converter is already the most famous product in a language I see hyped in every second HN thread.
If you a proffesional chef, I would hope you are fammilar with other methods of producing burgerz than McDonalds, even if they don't sell as well.
Haskell is a highly opinionated reasearch language. This makes it a great languages to talk about. People could make many of the same points with, say Scala or Ocaml, but because those languages arent as opinionated, it is harder to use them as a basis for discussion than a language which is heavily opinionated about the subject in question.
With all due respect, I wouldn't consider Golang any less opinionated than Haskell, just into a completely different direction. Whether or not that's a good thing: I don't know.
FP advocates strike me the odd way that they trying to argue FP can reduce the intrinsic complexity of the problem itself, but they have no proof of it, or their 'proovies' are essentially faith.
Maybe it was. Currently Kafka codebase has twice the Java code than Scala. Over time I have seen Scala code keeps shrinking and Java code keeps rising in repo.
Why do you say Haskell is an example of this? You can return undefined for anything, and have incomplete pattern matching, where if you don't mention a particular case, there is a runtime crash.
An example would be head, which takes the first element of a list and throws an error on an empty list.
I really love Haskell but it seems as if they are going for something different here than what Haskell provides.
In all seriousness though, Rust does get a little closer - it requires exhaustive matching (or explicit opt out) and deliberate error handling (or explicit opt out). There are still ways around it, but the happy path in Rust is handling errors.... well, maybe not happy. It is a little verbose.
For "Haskell-in-practice", incomplete pattern matching isn't an issue - there's a warning for that, you should have it on, and you should have it error. `undefined` is more of an issue, although easy to exclude with code review - relying on habits isn't great but especially when the habits are this simple it's not a problem in practice.
That said, Haskell still isn't actually an example, chiefly because exceptions can be thrown by any code, and are reasonably often used in practice.
That's not really a limitation of the "language" but more of an implementation detail of GHC and Prelude imo. There are very few escape hatches from the type system and you can run the compiler so that these cases are treated as a compile time error.
My point was not really to mention a particular programming language, but rather to make the point that people seem to continuously reinvent the wheel rather than reading up on history and investigating other programming languages.
If you only know C/Java/Python/Go and you learn about <radically-different-language>, it will probably make you a better programmer even if you never write a single line of <radically-different-language>.
I've learned the most from shifting levels of abstraction, rather than between programming languages.
When I started at Oracle, on the first day my head near exploded as everyone communicated in ER diagrams instead of pseudo code or more procedural form. Learning to design databases at scale, and learning how far to go in making things metadata driven was hugely educational.
More recently, breaking up legacy application domains into microservices and defining the APIs between them feels to be like a "higher" skill than database design/object modeling.
Probably doing IoT projects would help bring some other skills to the fore.
I really like the idea of "sound" programming languages. Elm is a great example of a reasonably simple, very sound programming language that force you to handle all cases (short of a compiler bug, hardware failure, or an explicit fail the world statement, it basically cannot throw exceptions).
Unfortunately the odds of this ending up in a mainstream language this decade is pretty low: the extreme focus on terse code and DRY means the average dev balks at the verbosity (thus the coment in the linked piece of code being necessary). It's a shame, as it's objectively superior by many metrics.
Code terseness and case analysis (handling all cases) are totally orthogonal: OCaml is pretty terse and yet the compiler is great at letting you know if you forgot to handle some case.
"In case X, do Y". You can make that pretty terse with clever pattern matching, in term of amount of keystrokes typed, but someone will always come in and feel they can save even more keystrokes by not handling all of the cases.
Not really that easy. The amount of production bugs that end up getting measured are the ones that got reported, there are an unknown (and unknowable) number of unreported bugs that you never measure.
> This is a feature of several (mostly functional) programming languages
This is just one of a common practice of programming, not a feature of functional languages. These practices are not even "reinventing the wheel".
I mean, this is obvious in many areas: from implementing complex logics like Kubernetes to making hardware drivers in C. Programming languages themselves can't automagically ensure every single conditions because these often happen outside of the program (e.g. targeting hardware state). We need to cover and test all cases by hand anyway.
This is one of the painfully obvious facts that tend to be totally ignored by most. Even if you formally verify all of you code it does not mean that you have covered all the possible states of the outside world. And this can mean both unexpected states on the outside and unexpected hardware failures.
On one project I work with guy with significant railway signalling experience and this is one of the issues that I'm somewhat unable to explain to him. Probably because our product's design target is not to fail-fast-and-safe but fail-secure.
> This is a feature of several (mostly functional) programming languages, e.g. Haskell
I didn't see that.
In fact this speaks to me as mission critical software like this needs to be as tediously documented as possible to eliminate surprises. Those branches and conditions are collected through a huge pool of trial-and-errors, implying Haskell can provide those valuable use cases out-of-box is misleading, no it can't.
Model checker like TLA+ or Pluscad might do the trick.
I think what the parent comment referred to is that in Haskell if/then/else is an expression (like everything else) so you must by definition have to have an else “branch”. Basically it frees you from a subtle type of error.
Those are micro level features. For mission critical module like this, it is unlike those low-level bugs will be left without notice.
But the true complexity is coming from states, and combination of states, and many of those are unknown to the developer until certain incidents kicked in. Good programming practice can't relieve programmers from the cognitive burden of reasoning the outcome of those combinations.
In general, I think that if you find yourself "needing" to write in an intentionally complex and verbose style, this is symptomatic of bad design choices elsewhere, maybe even at the language level.
The business wants to know all conditions are handled appropriately, to a certain level of confidence. Tests are one way of achieving this but by no means the only way.
My take away from reading this code is that it is a huge mess that may be impossible to clean up. At some point they failed to introduce abstractions that would remove the need for all this complexity. They are probably right that now that it works that it will be hard to refactor it without leaving out some critical case. However, I pity anyone that works on this code base.
Given where they are, it seems that now the major abstraction they are missing is type safety. Strict types that allowed a monadic structure would simplify this whole mess into a series of maps and folds.
Not only does this add the stricture of a compiler enforcing correct return types for all conditionals, it is also idiomatic of any language that supports this.
Having said all that, changing languages is rarely an option. Maybe this is the best solution given the tool available. If so, it demonstrates that maybe this tool isn't right for this job.
Yeah, I just don't buy this. It says nothing. This is the bottom-barrel suggestion most people bring up, and there's more to simplifying codebases than "strict types."
What in this <2k LOC file suggests strict types help anything whatsoever? It's not massive. It's large, but fold this and it immediately becomes more readable. Any attempts to simplify this would simply be reasoning about it differently.
Would you agree that sometimes, abstractions can just distribute and hide complexity whereby actually all of that context is needed to comprehend the algorithm or process at hand (some things just ARE complex)?
In this case right here, what's your counter-example, or what would you use instead of their specific approach?
> Would you agree that sometimes, abstractions can just distribute and hide complexity whereby actually all of that context is needed to comprehend the algorithm or process at hand (some things just ARE complex)?
No, or at least not often enough to be worth thinking about. It is of course possible to use abstractions badly, but the problems that business software has to solve are always fairly simple because they're human problems; human business processes were never that complex. So if a program looks really complicated, the overwhelmingly likely cause is failing to use appropriate abstractions.
> In this case right here, what's your counter-example, or what would you use instead of their specific approach?
As others have said, a result type would greatly simplify this code without sacrificing any safety. No doubt after such a simplification made the code shorter and easier to comprehend, further simplifications would become apparent.
Your metaphor doesn’t really apply — this isn’t business software, it’s part of a distributed job scheduler. That is inherently complicated. There is of course an open argument of language choice, but I think there’s something to learn from the lack of software built in $BETTER_LANGUAGE.
It’s also an open source project, so if anyone wanted to throw up a branch with an example of simplification without sacrificing logic branch completeness that door is open. Code might even convince the k8s team to change what you think is misguided behavior.
> That is inherently complicated. There is of course an open argument of language choice, but I think there’s something to learn from the lack of software built in $BETTER_LANGUAGE.
Sometimes the best software is no software. Kubernetes exists to solve problems that people using better languages don't generally have.
> It’s also an open source project, so if anyone wanted to throw up a branch with an example of simplification without sacrificing logic branch completeness that door is open.
The "Please do not attempt to simplify this code" comment suggests otherwise. In any case, the team have chosen a language in which good solutions are not possible.
> Kubernetes exists to solve problems that people using better languages don't generally have
This statement doesn't make sense, and the arrogant tone just stinks of ignorance.
Kubernetes isn't perfect, but if there's better provider-agnostic language-agnostic way to dynamically scale an application written in heterogeneous languages across a cluster or on a competitive choice of cloud platforms with dynamic provisioning, I'd like you to tell us, and explain why it's better than Kubernetes.
Language agnosticism isn't something to discard lightly. Different languages have different strengths and weaknesses.
In my experience the costs of language agnosticism outweigh the benefits even without considering deployment. Good languages are general-purpose; a language that is drastically unsuited to some particular thing is probably not worth using at all, and if you're using a language that is decent at most things then the overhead of switching languages is higher than the benefit of using a language that gives a small advantage in some specialised area.
What language you have in mind? All the languages I have known or used are drastically unsuited for one or other thing. So they all seem not worth using.
I mostly work in Scala these days. But really any ML-family language is fine for everything. Garbage collection is not the barrier to writing system tools or soft realtime that it's reputed to be (the fact that people advocate Go for those use cases is proof of that).
(The JVM startup can be a bit slow for scripting one-liners, particularly if you're running them on every file in a directory or some such. There was a time when I would switch to Python or Bash for those, but I realized I was wasting more time making mistakes in those languages than I ever saved at runtime)
> “human business processes were never that complex.”
That’s a falsifiable claim. Human problems / laws are some of the most difficult to code for, IMO. Examples: legal software, economics software, etc. Usually complexity increases as you delve out of the abstract and deal with real-life processes with humans.
You are right that they move the complexity somewhere else, but when abstractions are done right, you don't have to worry how they are implemented when you use them. That is huge benefit.
I believe that premature abstraction is terrible. But in some cases good abstractions may help with handling complexity and may make testing easier. I have a feeling that because authors say that this class should not be changed / refactored, they failed to introduce a good abstraction. This also implies that their tests are inadequate.
I agree with the direction of your opinion, but would say it isn't a matter of helpfulness. In some cases is it is a necessity.
It must be understood that complexity is an economic consideration and not either a financial or technology consideration. It is always already present. Complexity is not something that is created or destroy; but retained, absorbed, or transferred.
Abstractions become necessary when they separate different functional layers to perform different respective responsibilities. In this case complexity is transferred both in and out of a system at a given layer, but you don't care so long as you aren't doing the jobs of the other layers. That is referred to as separation of concerns which results in the hardening of a system (risk reduction) which is the side effect of reducing costs due to restricting requirements available.
Many abstractions exist solely to provide a layer of convenience. In the case where an abstraction does the same job as the code it abstracts risks increase and costs increase. This is because the system continues to simultaneously absorb and transfer complexity like described above, but the requirements between the various layers isn't clearly separated. That results in fulfilling requirements, the same requirements, simultaneously at various layers. This has various names like scope creep, technical debt, and so forth. This is bad because risks and costs increase directly to the correlation of increased code and increased requirements. This is what makes the law of leaky abstractions valid.
It is easy to tell the difference between a necessary abstraction and a wasteful abstraction by the forcefulness of separation. If you can perform the same job in a lower level the abstraction isn't necessary and you are probably better off without it. Most JavaScript frameworks are unnecessary abstractions.
A file is an abstraction that hides an incredibly complex process, possibly even involving quantum effects. I have never found a reason to care about the actual details and managed to do quite a significant amount of work using just the abstraction.
As much as I like the style where the code logic is commented very thoroughly, it also falls in the trap of comments no longer matching the current code, I assume some variable renaming happened.
In the function (line 320 of https://github.com/kubernetes/kubernetes/blob/ec2e767e593953...)
the variable "pvc" seems to have been renamed into "claim" and "pv" into "volume", judging from the code/comment mismatch. Comments in the lines 339, 358, 360, 370, 380, 395, 411, 422, 427 point to the old names. Furthermore in line 370 the comment reads:
} else /* pvc.Spec.VolumeName != nil */ {
while the matching if is:
if claim.Spec.VolumeName == "" {
So not only the variable name mismatches, but also the comment is wrong. The VolumeName seems to be a string, so is never nil, the else comment should specify that the VolumeName is non-empty.
The more verbose and detailled the comments are - the more work needs to be spent in ensuring that they are correct.
> Space shuttle style is meant to ensure that every branch and condition is considered and accounted for…
FTFY: …hopefully!
Only if they’ve used a language with Algebraic Data Types support, the compiler would enforce that “every branch and condition is considered and accounted for.” The only PL with ADT that I’ve used was Haskell, but I’ve heard that Rust has them too “enums”.
People are arguing that “code is what computer executes, comments don’t ensure anything!” and so on, but besides being executed, (this Go) code does not ensure anything either. It’s human eyes that skim through all the cases, and look for a matching branch for every one of them that “ensures.”
In my humble opinion, this so-called “space shuttle style” is just one of the many workarounds to deal with Go’s by-design limitations (the most famous one being lack of generics), a language that’s designed only 9 years ago.
> It’s human eyes that skim through all the cases, and look for a matching branch for every one of them that “ensures.”
This a thousand times. The praise in this thread is disturbing.
The absurdity of this code is the logical conclusion of ignoring decades of PL advances in favor of Go's "simplicity."
When you insist on "space shuttle" era language design, is it any surprise when you're reduced to "space shuttle" era programming? I can't imagine anything more fitting.
I agree with this, but snarky comments like these neglect that ADTs (or generics) are not the only nor remotely the most important factor in choosing a programming language. Go certainly bests Haskell and Rust in many important areas even if it loses in safety.
Two years ago I’ve moved one of my biggest projects from Python 3 to Go because my program was inherently concurrent, and at the time -and I think still- Python has 3 competing approaches for concurrency: threading, multiprocesses (for parallelism), and asyncio.
Although it’s nice to have a variety of options, I think this balkanisation affected the community in negative ways because the options are not compatible with each other (see the emerging “SansIO” libraries for asyncio).
This is actually the reason why Python gets so much praise: thanks to its standard library, there is often a single canonical way to do something (if you need sets use `set`, if you need matrices use NumPy, …), which means that the vast majority of libraries are interoperable* with each other. Same goes for Go when it comes to concurrency, and that’s what guided my choice.
And I wasn’t trying to defend Go :) Just that I hear a lot of comments like these, but they all ignore the possibility that $IMPERATIVE_LANG is the right choice IN SPITE of the inferior type system. In particular I would love an FP lang with Go’s focus on user-friendliness, simplicity, and practicality.
>It’s human eyes that skim through all the cases, and look for a matching branch for every one of them that “ensures.”
Worse, they've made an exception for simple error checking, and the result is that the majority of the if statements at the top of the file have no else condition. Quickly scanning by eye doesn't help me determine if someone screwed up and missed a scenario.
But this gargantuan pile of garbage warns that it can not and should not be refactored and it is like the shuttle code (may their souls rest in peace), it is so solid and perfect as is. It is also written in the sacred language of Go.
So if these conditions are met, then yes above praises will be and must be sang.
The jazz music analogy implies that it breaks the rules, but do so artfully and intelligently -- that wouldn't necessarily be a valid defense for any old block of poorly engineered code.
Also, poorly engineered code is seldom irreducibily complex.
No. Most jazz, including by the jazz greats, just sounds plain terrible. The worst is jam sessions, where everyone steps over each other but it sounds not abysmal at times because they're in the same key.
This code reminds me of why ML-like languages with Maybe-style types and case expressions that generate compiler errors for missed alternatives are good. I bet rewriting this in OCaml or Haskell would lead to tighter code and might even unearth a couple possible states that haven't been accounted for.
Those languages have all of the ML features; they are functional languages in every reasonable sense, unless you consider "functional" to mean the absence of something.
This is basically the style that rust sort of forces you to write in with allowing shorter forms where the compiler can automate checking that all error states are accounted for.
I think it's situationally useful. If I take the author of the OP code at their word, this is one of those situations.
Core, critical plumbing/logic at the kernel of business critical, long-lived applications, will be the source of my stress-dreams long into the twilight years of my life; in the form of a lack of documentation and a presence of organic growth.
To criticize myself quite bluntly: If the core code I worked on at work looked like this, I'd feel a great deal more comfortable in some of the changes/digging that inevitably arises.
I would never use it as an absolute metric; but I'd use the level of comfort e.g. a new dev feels when looking at something that might otherwise be a spiderweb and saying "Oh this makes sense" (As I do when looking at OP) as a north star for the most sensitive bits of logic.
As one of the authors of the original code here, this was the result of several days of intense works by a half dozen people working through every corner case we could dream up, and a bunch we thought of on the spot.
It is in no way a guarantee that we got them all, but after spending so much time reason in through why those 'else' clauses were correctly empty, we thought it would be rude not to write it down.
In truth it was as much for future-me as anyone. My memory is know. To be spotty. :)
Tim, does someone have a fuzzer running against this? Or even some static analysis ensuring that say the enums from various things are actually handled?
As a user of this particular code, and someone who found several bugs in it in the early kubernetes days that were very hard to trace. I applaud the hell out of this.
Why isn't the core code not most of the code? Why isn't it all core code with a tiny dash of ux or data access sprinkled on it? With, maybe, one abstraction layer somewhere (but never two touching!l.
Having written safety critical code (and reviewed it) it is very useful.
On the other hand, as soon as someone not safety minded gets their hands on it, trouble happens. Comments aren't updated (and there's no way to make sure they are checked, other than GREAT code reviews by the original authors, usually with at least two or three people doing critical reviews). Then the comments can become misleading and a liability as people will take them for truth, as they should.
If you have code that has a lot of subtle dependencies or edge cases, really great comments can help enormously.
I generally find that a high comment to code ratio, if the comments are of the form "Do this thing because of this reason", indicates quality. It indicates the programmer both knows what they wrote and why they wrote it, and it helps future maintainers figure out under what circumstances the code can be changed, refactored, or removed and under what circumstances the original behavior must be kept.
A high comment to code ratio, where the comments are of the form "Do this thing in this way," indicates a lack of quality - generally a sign that the programmer is not confident enough in the language that they're writing in, and is trying to solve language-level problems instead of business-level problems.
Uncommented code better come with some reference for why the code exists in the form it does. Sometimes commit logs and the VCS "annotate"/"blame" feature works. Sometimes commit logs link to bug trackers or feature requests. Sometimes there's a README. If you don't have any of those, I tend to find that it's generally low-quality code.
Our purpose is to deliver business value. (Or non-business value, as the case may be; if you're writing a free video game for fun, you want people to successfully have fun.) Our purpose is not to generate lines of code. All code is, to some extent, legacy code; comments can help it be manageable legacy code, or make it even more unmanageable.
The way I think about comments is that you should always be able to articulate what the consequences of deleting any line of code is. If the code itself is insufficient to do that, it needs a comment.
There are three kinds of comments: why, what, and how. How comments are almost always a sign that the design is poor or the complexity is too clever. Why comments are necessary to understand the code and are almost always a good thing. What comments can be useful guideposts for skimming code, but they are also extremely prone to code rot. I suspect what comments generally end up being neutral in a net value proposition.
You want a high ratio of why comments to code, but I suspect most high comment-to-code ratios arise from what comments, which severely attenuates the utility of a pure comment-to-code ratio.
The comment:code ratio is similar to some legacy enterprise C/C++ systems I've worked on.
I've been on Rails/React teams where comments were seen seen as a possible smell. Not talking about useless literal comments, just that their need was seen as pointing to possible bad design and that a well factored codebase was self-documenting -- ie. if you had to comment something, perhaps methods/vars were poorly named, SOLID principles were not adhered to, methods needed to be broken out, or it was just a sloppy approach. Even explaining design decisions was considered more in the domain of git messages and having nicely packaged atomic commits.
While I see that aspect of it, there's no getting around the constraints of the real world and that some problems are just difficult and much easier to grok with a user guide in plain english, so to speak. And mission critical stuff needs as many safeguards as possible.
That said, inaccurate comments can be dangerous and when your code is highly commented there is real danger things can get out of sync. If you're working on a 5000 line file that 100 developers have touched over a 20 year period... and no one has taken it upon themselves to do a recent comment audit, there be dragons.
I have worked on teams with this same attitude, and in my case it was just a systemic way for the group to rule-away having to write comments. The codebase suffered for it.
This. The code base should be the authoritative source of the behavior of the system. The comments should be the authoritative source of what is expected of the system.
I comment to myself before I write code. It’s in English. Then I write code. So every line of code is commented by default.
I think this provides me higher quality, less bug ridden results. So if others use comments in this style I would tend to believe it does increase code quality.
If a line of code doesn’t match the comment, something is clearly wrong. ;)
Maybe a loose correlation? Highly commented code was probably not written under tremendous time pressure; uncommented code can go either way. Wrongly commented code is painful, though.
And then there's something I recall running into, a decade ago:
I'm not sure there are many cases where there should be long amounts of expressive code.
If you're doing something obvious, you should generally be able to program it concisely, in which case you have a high comment-to-code ratio because the amount of code is low. Sometimes this will be because you're importing an external library to do something, or because you're calling out to an internal library. Sometimes this will be because you found a straightforward implementation. If you're finding yourself writing hundreds of lines of code to do a single obvious task then chances are high you're implementing it poorly (and, specifically, in a way where your defect rate is likely proportional to the number of lines of code).
And if you're doing several obvious things, then the point of the code is not to explain what the code is doing, but why it's doing that. What is the business purpose of the code? Which customer cares about this edge case that you're handling, and under what circumstances can you stop handling it? Why did you decide that the common library wouldn't actually work here? If you're converting data from an awful legacy format, why are your ingesters / parsers for the legacy format designed in this way? If you're micro-optimizing for performance, why are the optimizations sound (i.e., why do they accomplish the same thing as the unoptimized version), how do they work, and why did you decide these spots need to be optimized? Each individual thing you do might be obvious on its own, but the arrangement of the whole thing needs comments for each step, which again gives you a high comment-to-code ratio.
I might have misused the word expressive, I don't mean bloated code with more than necessary logic.
I just meant simple to understand variable, function, and class names. That combined with small classes and functions, makes following the logic of your program extremely easy.
Following concepts like DRY (don't repeat yourself) and the single responsibility principle ensure that you're making more easily testable code, and I'm sure less overall LOC.
While your fundamental point is very valid, there are plenty of times where a comment to flag up a fine point of your clean and precise code will save future-you hours of head-scratching.
I absolutely do not comment enough, but knowing this, I try to stick to the principle that if I have had to stop and think through an expression before I write it, then I am likely to eventually thank myself for leaving a short explanatory note.
And moreover, it may not be me scratching my head over that nest of ternaries in a year's time - it may be some other poor soul. And while that poor soul won't thank me for leaving a comment, he or she will certainly curse my name - quite possibly vocally and publicly - for not leaving one.
Well, let's consider an example from this very code:
// The binding is two-step process. PV.Spec.ClaimRef is modified first and
// PVC.Spec.VolumeName second. At any point of this transaction, the PV or PVC
// can be modified by user or other controller or completely deleted. Also,
// two (or more) controllers may try to bind different volumes to different
// claims at the same time. The controller must recover from any conflicts
// that may arise from these conditions.
How would you rewrite the code so that this information was explicit in the code alone, and as obvious as when it is stated in these comments? Note that simply being able to handle any conflicts that may arise from these conditions is not necessarily the same as saying that they can occur and must be handled, as any particular implementation is invariably an over-specification.
As for redundant comments explaining the obvious, that seems to be something of a straw man, at least in my experience - personally, I have very rarely seen such code. The person who is not motivated to write useful comments apparently prefers to write no comments rather than useless ones.
My comment was directed to the OP's question of comment:code ratio in general, not in this exact circumstance.
Additionally, in no way am I advocating for no comments, that's obviously not possible (like your example). Comments are useful, even necessary, for code that might have an otherwise confusing logic to them.
I've seen plenty of code with documentation for a method with nothing more than:
/**
* Bills the user
*
* @param user The user to bill
*/
public void billUser(User user) {
//
}
In my opinion, that comment is completely redundant, and I think it's driven by the idea that we should comment EVERYTHING.
Though there should definitely be documentation here. What happens if `user` is null? What if the user doesn't have enough balance for the transaction to complete? What if the transaction fails?
I see this as someone trying to fool a linter that demands they have documentation. I think it's better to say "comment everything" because it puts documentation as a first-class consideration rather than an afterthought.
If it is a general principle, then would one not expect it to apply in this case? More importantly, this is not a corner case; situations where there are specific conventions and protocols that have to be handled consistently in various cases are extremely common in software. It is also not uncommon to see optimized code that is much easier to understand when it is explained as a modification of a simpler implementation.
With regard to the sample, then if that is your experience, I cannot deny it, but, irritating as it may be, it seems fairly harmless. It appears to date from a time when it was thought that extremely prescriptive coding style standards was the way to fix programming - an even less realistic belief than the idea that code can be entirely self-documenting.
Code shows how something happens (i.e., a string comes into a function and is parsed and only the date from it is returned), but it's so bad at showing WHY something needs to happen. My comments are almost always about why I'm doing it in the way I am, complete with examples of test cases where the users broke things in ways I wasn't originally expecting. Ten years from now, the code part will be rewritten using whatever crazy new stuff the language supports, but the underlying need for doing it at all will probably still be around.
As a counter example, here is a C file of 20,000 lines and no comments. I pushed this to Github long time ago, as it was the most gigantic "real" C file I have encountered.
Comments are very barebone. There is structure, but needing to mess with this kind of code would be scary. Granted, most games are write once and never look back.
You made a good point because this code is neither clean nor expressive. The parent comment talked about clean and expressive code. Can you show some of clean and expressive code and try to make the same argument again? (I am not holding my breath that you will)
I use comments to document assumptions that are likely to be wrong, either now as I write it, or later when someone (probably myself) changes it.
It is absolutely useful to do, and really not too difficult.
Languages that allow for more formal assumption-checking (especially before runtime) are even better, but comments have an additional benefit of being understood by a human directly.
I wish languages with static analysis could somehow allow authors to encode human-friendly/sematic errors that you often see as runtime exceptions into the static analysis itself...
I've seen horrible inheritance/convoluted refactors done in pursuit of DRY.
I'm a bigger fan of WET(Write Everything Twice). Usually the first iteration of a component you don't understand enough of the domain space to get the abstractions right. So use that first attempt to explore the issues/problems/corner cases. Once well understood, rewrite it into something concise and well abstracted.
I've also find that if you try to re-write a third time you'll end up being to clever in trying to predict where a system will evolve and get you right back into the same situation as the first iteration.
> And then you end up with a codebase which indicate A but comments which clearly spell out B, and you as a maintainer have no idea what to believe.
Can you name a few examples where you encountered this? In my career (30 years programming) I've never seen it. I believe it's a common, poor excuse for not writing enough comments.
The benefits of comments are well-understood. For me personally they often helped compensate sloppy code, (non-obvious) assumptions and prevented bad solutions because I reconsidered while writing (embarrassing) comments.
when you have to maintain a large codebase modified thousands of times in 15+ years, every single comment is invaluable.
Then you must have been very lucky, I have seen it happening probably hundreds of times in a mere 15 years on the job.
The inconsistencies that I experienced ranged from doc strings stating to pass a parameter that didn’t exist any more, parameters with different names, parameters with the correct names but in different orders.
As per actual comment I have seen plenty of time comments like
//here we go baby!
//do not touch!
//1 ... //2 ... //3 .. //8
The last one apparently was to indicate the order of some overengineered stuff that could have been written properly.
//This happens only for CompletedOrders (while in reality it was the opposite)
//This calculates the notional in the same currency (while it was converting it in GBP)
//This is extremely important, never delete (and after that there was a bunch of commented code)
Honestly I don’t remember all of them, I tend to use my memory for more important things, but I bet that if you had seen only 1/10 of the bad comments that I have encountered you would have a different opinion.
Btw in your post you are explaining exactly why I hate comments:
“For me personally they often helped compensate sloppy code” - the solution is obviously to fix the sloppy code, not to write a comment because you are too lazy to fix it
“(non-obvious) assumptions” - this is probably the only legitimate reason to write a comment.
“and prevented bad solutions because I reconsidered while writing (embarrassing) comments.” - this has nothing to do with the comments given that at the end you didn’t write it. Thinking more about the code instead of trying to comment it gives you the same result, if not even better.
I was asking specifically about comments that directly contradict the code in a way that confuses the reader. If obsolete comments like "never delete" stand before commented code, what makes you think the commented code is actually used or useful?
> “(non-obvious) assumptions” - this is probably the only legitimate reason to write a comment.
Are you serious? Almost every codebase that has comments has code that doesn’t exactly match the comments. Main obvious reason being that the comments aren’t executed whereas the code is. Sure meticulous attention to detail could keep the two in the sync. But you are suggesting that they have always been in sync for any code you’ve looked at?
As someone who is clearly a fan of comments, are you saying you’ve never modified a comment to make it more accurate to what the code is doing?
If every function is just a few lines long, the comments are easier to keep synchronized, and if a function drops out of service, it should eventually be garbage collected with its now-irrelevant comments.
But what if comments pertain to unexpected states the system as a whole can be in?
Kind of the whole problem is when there are weird corner cases going on that straddle function boundaries.
I'm not saying that's a good thing; mind you - but nor is it always trivially avoidable, especially if code needs to be concurrency and/or exception safe -- or in general whenever the statements your function consists of have surprising and opaque behavior based on system state, particularly if said state is hard to grasp due to being implicit or dynamic, or simply large and complex.
> Kind of the whole problem is when there are weird corner cases going on that straddle function boundaries.
If the problem has "hub and spokes" topology, i.e. it's relevant to multiple places in code that all reference a single location, put a comment describing the issue in that single location, and everywhere else put a comment with a reference. //Warning. See comment in [that location].
If there's no single best place for the detailed comment, put it in some design notes file, and put a comment with a reference to that file in all the affected places.
DRY can, and should be, applied to comments as well.
Yeah, references to a centralized document is such an obvious thing... once you read about it. It's another thing I recently picked up from Ousterhout's book, and looking back, I can now see the places in past codebases where I wish I thought of that myself.
Agreed, but that’s why I prefer writing code that has no global state. In Erlang, for example, it’s unusual (and the language lends itself via pattern matching to having short function clauses).
It gets a little tiresome threading the relevant state to every function that needs it, but it’s worthwhile in the end.
Even a pure function has "state" - namely its (arbitrarily complex) inputs. But sure, it's a little less of a landmine.
The fundamental issue remains that sometimes your knowledge about that state (whether the classical kind or a proper parameter) can be complex and dependent on what happened elsewhere, especially if the codebase your in was grown into that situation, and not designed like that per-se. A comprehensible set of preconditions and postconditions isn't always a luxury you have, certainly not at first.
I think it indicates pride more than anything. If I write something that is just business as usual or commonplace my comments are pretty lacking. If it is something very interesting or that I am proud to have done, I usually write some very detailed comments. This probably correlates to better quality just because it was something I was interested in doing rather than shoveling code.
It's really only useful in areas of codebases that either a) are very complex, b) touched by many people or c) both. When that happens, everyone prefers that there is a lot of documentation, especially about the why. With older codebases the question is always whether this is an actual bug from the developer or is there a reason why it's doing this super-weird thing and if so is it still applicable. What's happened over the last 5 years is that automated testing has become so mainstream that places without tests are the exception AND the tests have replaced the need for comments.
You've missed d) the codebase lives longer than a few months and someone else than the original author has to make changes. Comments describing the intent and caveats are extremely useful in ensuring the future developer gets adequate understanding quickly, and reduces the chance they'll introduce bugs.
Tests can help understand the interface, but they don't help to understand the rationale behind it, the underlying abstraction, or implementation caveats.
Agree 100%, along with accompanying documentation that lays out architecture, rationale, challenges, etc. All of those are invaluable for any code that will outlive the tenure of the developers who built it. And given how often people in tech change jobs that's virtually all code.
Oh yes. And even disregarding tenure, you have cases like illness (see e.g. the Word 1.0 postmortem[0], page 14, talking about losing a key developer), or people changing project.
One time, I inherited a big steaming pile of spaghetti my co-worker wrote to meet a tight deadline, before being shifted to another project. That code implemented one of the key functions of the application, and half a year later, the customer demanded extensive changes. Believe me, I would have paid half my monthly salary the just to have a third of the comments that we see in this Kubernetes file.
How do you successfully express "We need to treat all transactions on February 29 as happening on February 28, see customer ticket #4321 for rationale" in code?
One thing I always push back on is references to ticket numbers or other external systems (except perhaps e.g. ISO standards). Repos should be self contained and perpetual. One might not have access to the ticket system now, or ten years from now.
Since we're referring to Bob Martin, I suspect the answer is through a functional test that captures that requirement. Thus, if a naive editor makes a change that breaks the requirement, it will not pass the test and cannot be committed to mainline.
True enough, but in my limited experience it's more likely that such little requirements are hidden in code (and possibly commented) than that they have been properly captured by specifications and tests.
By naming as much as possible. I'd need to know the rationale in the ticket to be able to try and codify it, but here's how I'd try and do the rest: https://codepen.io/anon/pen/Jwyzdv
I'm not sure that reducing the comments-to-code ratio by increasing the complexity of the code really helps anything. You've made the code more generic for what you currently think future changes are going to look like, which may or may not be accurate. And in the process you've split dateIsOnLeapDay and convertLeapDayToPreviousDay into separate functions, so if someone is tracking down a bug in line 10, they need to jump to lines 20 and 21 to figure out that the associated code is in line 15 (think "wait, did you say leap day? I meant leap second"). In a large program these would get even further separated over time - someone is going to decide that dateIsOnLeapDay should be in a common utils class because they want to use it somewhere else - and I think there's a lot of merit in keeping lines 4 and 5 next to each other.
> I'm not sure that reducing the comments-to-code ratio by increasing the complexity of the code really helps anything.
More lines doesn't mean more complex, it's the same logic just the logic is named now and more reusable. It's possibly not the best example, as the logic is minimal, but when the logic becomes more complex, wrapping it and naming it becomes very powerful. We're creatures of abstraction.
> You've made the code more generic for what you currently think future changes are going to look like, which may or may not be accurate.
I'd argue that I've reduced the number of reasons the code has to change, which should be a goal while programming. If we change how we calculate a leap day, we don't touch how we modify a leap day, which means we're less likely to cause adverse side effects.
> And in the process you've split dateIsOnLeapDay and convertLeapDayToPreviousDay into separate functions, so if someone is tracking down a bug in line 10, they need to jump to lines 20 and 21 to figure out that the associated code is in line 15
They should be separate functions, they are separate things.
> In a large program these would get even further separated over time
Is it really a problem if they are separated? What links them? There could be plenty of reasons for wanting to call one without the other.
> I'd argue that I've reduced the number of reasons the code has to change, which should be a goal while programming. If we change how we calculate a leap day, we don't touch how we modify a leap day, which means we're less likely to cause adverse side effects.
Maybe this is just different instincts/experience and I'm not saying you're wrong, but my feeling here is that you do actually want to change them at the same time. Suppose we decide instead of adding a day to February 29, we keep the months the same and add a festival day at the end of every fourth year, numbered 13/1. Then modifying the festival day to 13/0 is wrong - the day before 13/1 is now 12/31.
If you have one function for "fix leap days for reporting purposes" then you're fine, and you've set the abstraction in a good place (or at least good for my example case, I will totally concede there are other examples!). When you edit the is-it-the-leap-day line of code, you'll see the subtract-one line of code directly below, and if you forget, your reviewers are likely to notice. And you haven't really made things noticeably worse for the case where the customer says "Actually we need February 29 rounded to March 1, instead", it's not distracting to have that line of code above where you are (and if anything it's useful to have that comment, so that if this is a different customer asking you realize that you need to not break expectations for your first customer).
I am something of a skeptic of reusing very short pieces of code - for instance, my team's own codebase has a poorly-designed function for calling a subprocess and swallowing certain types of errors from a very specific command, and in a code review I had to tell someone to just use subprocess.check_output(), which does the same thing but without the modified behavior which they probably didn't want. Abstraction makes sense when there is a meaningful concept to abstract. (Similarly, I am also very much not a fan of getters and setters; I think most people are better off with a structure with public fields, because I have very rarely seen it be useful to convert a trivial getter/setter to a non-trivial one without bothering to look at how callers use it, and it is useful to rename the field and see which code fails to compile / no longer passes tests now.)
Perhaps a function to get the "true" transaction date given a transaction date, and explain the "why" by adding a special function describing the rationale in brief through its name.
Sometimes it's good to have a block comment explaining the motivation behind a chunk of code or particular line, or just to clarify the individual steps in small modules, but you have to remember you aren't writing prose.
I think around 10% of your code as comments is a good measure, but also remember that you may not revisit a module for years, and you will come to appreciate each and every breadcrumb you left which leads back to your original state of mind when you wrote it. If you measure code quality as maintainability, then comments can indeed increase code quality, just non-linearly.
>However, it does give me some comfort. When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?
I think people should spend more time commenting/documenting across the board. I'd much rather have verbose commenting that is unhelpful that I can skip, versus minimal commenting and code that is overly optimized and hard to parse. The less thinking I have to do to pick up where the last person left off, the better.
I would say that yes, in general high comment:code ratio tends to be higher quality.
I think it's usually a sign of a poorly understood domain or a poorly modeled problem. It's not a good sign; it's not necessarily a bad sign; it's (at best) an admission of one's limitations.
Comments become useful when behavior is implicitly tricky. Ideally you'd make the "trickiness" tangible and expressible in-whatever-language you're in, but that's not always easy to do.
> When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?
Just the opposite. It typically indicates a history of rigorously documenting terrible code. Sometimes, comments come from complex business requirements or other external constraint. Documenting the former is largely an anti-pattern while documenting the later is hugely useful.
I think repeating the same concept in different ways just makes things harder to read, not easier. Comments also sometimes reference code outside of where they are placed. This leads to them becoming misleading and incorrect.
I find that comments can be a last ditch effort to make hacky convoluted code look better than it is. It can be an indication of lack of thought and planning and later obsessive documentation to make up for it.
There's a lot of talk about comments becoming stale and code being self documenting in the replies which makes me wonder: do people genuinely not read comments and just made code changes without updating comments? And do reviewers not look at the context of the surrounding code and just let commits in? What's the point of having code reviews then?
It's very easy to miss comments, since they aren't type-checked, nor are they unit-testable.
Comments don't need to be near the code they affect. They don't even need to be in the same file - consider if you've organized your code by features, but the comment relates to a layer than spans multiple features.
I once spent two hours on figuring out why a log file wasn't being modified when there was an error. I knew the location of the file but it just wasn't showing that error.
Eventually I tracked down this line:
// writes to the log file at c:\...\xyz.log
AppendToLog(message);
Yeah, that was the correct path and everything, and yet the line wasn't executing!
Eventually I looked inside the AppendToLog method. It was writing to another file in a completely different path :)
That was when I stopped bothering to read comments. They always lie, and I couldn't even blame the programmer who changed the AppendToLog method -- the comment wasn't inside the method, it was on a call. I can't honestly expect someone who changes a method to look for all the places where that method is called and make sure any existing comments match the change.
Comments straddle this weird line between code and human process. They're in the source code files themselves, and there is an appeal to trying to evaluate a project by looking at the source code alone and trying to gauge abstract technical merit. But I think the real truth here is that comments are a tool in the service of a development process, which includes things like having code reviews, having code reviewers be sufficiently motivated (intrinsically or extrinsically) to care in useful ways and neither nitpicking nor rubber-stamping, having motivated people on the project in the first place, having shared values about what code you're going to write and what code you're not, having tests and doing the operational work to keep tests running, retaining people on the team, etc.
I'd say it's mostly a meme. If a code review doesn't involve reviewing comments around the modified code, it's a bug in the process.
Elsewhere in this thread Ousterhout's book is mentioned; I like his advice about always placing comments in the most obvious places and as close to the code they affect as possible. This way, you can't miss them, and and it's hard to forget to update them.
Everyone can have comment blindness to some extent, but I've worked with two people who auto-collapsed docstrings and didn't read and hence update comments, which is enough (one person writing code without updating comments/docstrings and one person inadequately reviewing). Sure, the problem only appears in a bit of the code, but it means people stop trusting all the comments.
Woah, that sounds like a pretty dumb feature. Auto-collapsing whole functions is useful, but auto-collapsing docstrings sounds like a recipe for disaster. People write docstrings and inline comments for a reason.
No. Autocollapsing doc strings is not dumb at all, it’s an amazing feature. Most of the time in my experience the doc strings are completely useless when you are writing code, they may be useful only for the caller because they give you info in the autocompletion (and most of the time is just “get this value” “set this value”) and they can be used to automatically generate api docs.
If they are extensively used in a private project that no one will call from a different one I will auto collapse them.
They take so much space for nothing and they slow me down terribly. And the projects in which I have seen this behaviour had horrible methods naming, clueless architecture and completely arbitrary method subdivision. The doc strings where just the wrong solution for the wrong problem.
This file isn't that heavily commented. Do you look at many OSS projects for comparison? Though when things get complicated with many branches and function reentries it makes me wonder whether the problem would have been better solved with declarative logic that handles the procedural mess for you. (It might also be much higher quality since you may unlock access to various formal methods and go beyond unit tests. Though perhaps for example there's a vetted TLA+ spec not shown that this controller is based on.)
I don't think doc'ing every function is unusual, usefully doing so is less common though. Comments in the function body also aren't that rare, though it might indicate a place for better factoring e.g. just more function calls on descriptive/suggestive names. (Having more functions will help in not having to stub out (and deeply stub) so much in a behavioral test, too, since you can get away with just mocking the function call instead of the potentially hairy state logic the function does underneath.)
I see an example at a random spot for a couple improvements in naming (in my ignorant opinion, I don't know about kubernetes) -- though the fact I feel able to express even a weak opinion on an improvement suggests the comments were reasonable. I've seen code less hairy but with no comments or useful tests and without a need to really understand it I just want to move along pretending I saw nothing.
Look at the set of ifs at L591. The first if is a null check with part of the explanation on L592, better to remove that part and have a function call, something like "claimWasDeleted(claim)". The matching else if on 615 checks for an empty string name, I'm not sure but I think its explanation is at L634 and the empty string check could be "isClaimPending(claim)", and maybe move the mode mismatch check to its own else if before the isClaimPending block and give it a better name. I appreciate the comment on L635 telling me why the next line of code on 641 is done (it may likely not be clear from the commit history, which can be another place for whys) though with the isClaimPending change the comment and code might be replaced with a fn call with the details in the fn doc. I'm also reminded of an idea in more expressive languages to annotate purely optimization metadata of any kind (inlining being the simplest) and being able to toggle it on/off for extra QA in a test suite. Anyway the next elif on 643 and its comment, could be something like "isVolumeBoundToClaimProperly(claim, volume)". You get the idea.
It is rare to find code that comprehensively explains (without comments) why it exists, or often more importantly, why some superficially-equivalent code doesn’t exist there.
So are politicians when not lying and well-behaved children.
Correctly done comments are a one in a million thing. In my experience, they are utterly surpassed by "i = i + 1; // increment i" style comments. (Seriously, I'm working on code written by someone who teaches programming and he writes this type of comment.)
Some languages like golang, doesn't priorities concise code, it often takes a few lines to do something trivial. Find the object in a list with the lowest lexicographical ranked value of some property.
The code to do this is simple, but not concise, leaving a comment so I can scan the function and skip 5-10 lines doing something trivial is nice.
This sounds like a self-fulfilling prophesy. If only think comments are for telling you what a line of code does literally, then you're only going to see comments where the code is obscure. If you use it as a form of high-level communication to help the user understand the broader context and reasoning behind code (like at the top of the linked page), then it will be useful because the person writing the comments understands why documenting your code is a good thing.
Code can and should be a form of high-level communication too. In a well-structured program, the high-level code will explain the high-level context, and the low-level detailed code will explain the low-level details.
> do other HNers also feel that a high comment:code ratio probably indicates quality?
Nope, imho code with lots of comments is generally crappy code. It's littered with comments to explain the sloppy code they couldn't make clear because they're bad programmers. Good programmers use few comments, write simple clear code that doesn't require explaining, and leave comments about why something was done rather than simply trying to explain what the code does.
Code never lies, comments often do; don't trust comments that explain the code.
i would say like you say it depends on the quality of the comments documenting the code. if they are correct then it shows a thorough understanding of what is written in code, apart from that a complete stranger to the code can easily find what they need. however, like you said, you will need to maintain comments more than code, which is a pain and will lead to inconsistencies in the comments, leading to crappy file with meaningless junk scattered in it, which in turn means you can never trust comments, and it's therefore kind of useless to have. :d but since that's a circular argument, and those tend to be just cynical in nature, i do prefer properly commented code above uncommented code. i'd do it less verbosely myself so i don't need to maintain so much of it though, trying to keep it more consistent over time.
To me, comments are noise, and code is signal; the code is what actually executes.
It's one thing to have a summary of intent at the start of a listing, that should not count towards the code:comments ratio.
Once the code begins however, there should be a minimum of comments necessary - especially in a high-level language not constrained to assembly-level instructions.
In assembly listings it was common to have two columns, the code on the left and comments which often resembled high-level pseudo code on the right. Here's some representative apollo guidance computer source:
MAKEPRIO CAF ZERO
TS COPINDEX
TC LINUSCHR
TCF HIPRIO # LINUS RETURN
CA FLAGWRD4
MASK OCT20100 # IS PRIO IN ENDIDLE OR BUSY
CCS A
TCF PRIOBORT # YES, ABORT
When you're already working in a high-level language like C or Golang, you should be able to clearly communicate what is going on without the need for littering it all with comments.
>To me, comments are noise, and code is signal; the code is what actually executes.
Comments are noise to the compiler, but code is both a communication between humans and from humans to machines. To imply that only what executes is signal and all else noise is to ignore half the purpose of code, which is documentation.
And despite what a lot of people want to believe, code itself is often not sufficiently self-documenting.
More importantly, today's code is an optional basis of tomorrow's code. When NASA wrote the code for Skylab, I bet the comments in this codebase were the only useful thing in it and the code was all noise.
If you don't want to spend your career rewriting the same thing over and over again for slightly different business use cases and platforms, comments are incredibly valuable. (On the other hand, I guess there's a lot of job security in being hired to write the same thing many times....)
Well, the first 500-ish lines of that 1500-line file are comments, and there are block comments throughout, too...
You did, to be fair, say that comments at the top shouldn't count. But I think that depends a lot on personal (and language) style towards multiple files and multiple units within a file - for instance, none of this code is object-oriented, and comments above each class make sense in an object-oriented language. I think as the language gets more concise - Go is a lot more concise than the Apollo assembly language - you're going to need to have the same amount of prose to explain what you're doing but a lot fewer lines of code to get anything done, and it makes sense to have comments above each function or each block, because that's really the comparable unit.
One simplification that might be tempting is to replace the "if !condition" with "if condition".
For example, line 463 shows:
if !found {
// handle missing
} else {
// handle found
}
I would simplify this to:
if found {
// handle found
} else {
// handle not found
}
Or even:
if missing {
// handle missing
} else {
// handle not missing
}
The test-negative style is repeated throughout the file, but inconsistently. Sometimes the negative is tested in the if branch, sometimes it's tested in the else branch. Why?
My guess is that this is in line with the Go tradition of first handling all the odd cases leaving the essential part of what the function (or block of code in this case) attempts to do as the last part. So "if !found" to me suggests that this is more of an exception than the rule and this is why the code is written so as to deal with it first.
Obviously I'm not the intended audience, but I'm not sure it's wise to have CloudVolumeCreatedForClaimNamespaceTag, CloudVolumeCreatedForClaimNameTag, and CloudVolumeCreatedForVolumeNameTag in the same file.
This is almost the worst of both worlds: the trouble of wading through a pile of words, together with the lack of clarity.
I can't say I've written overly branchy code, but I have become a fan of leaning on the side of "verbose" regarding comments in code - specifically in nested if statements. So many in the Ops area, "just" writing scripts for themselves ( that the rest of the team starts depending on) will just pound out a script that works, with no notes on its function, design or requirements.
15 months later, they moved to a less stressful team, the app is not functioning because the servers got upgraded, and you wonder how the hell this thing works. And if it doesn't work, you have 60 hours of redevelopment to do.
Love this thread. I see this a lot, where engineers blindly follow best practices and have urges to re-factor code when its not necessary. Big files are not necessarily bad and I love that a lot of the comments are with me on this. Having to open several tabs and remembering where you are in the stack can be hard once there are more than a couple of frames / function calls in. There is a lot of benefit to keeping logic in 1 file or 1 function, and there is a time and a place for writing really granular DRY code. As with all engineering, there are always trade-offs to every decision and I think its about time we put to rest some of the traditional rules of thumbs and 'code smells' new engineers learn and adhere to like a bible.
> As with all engineering, there are always trade-offs to every decision
This is the cliche that needs to be put to rest.
Yes, often there are tradeoffs. But just as often one thing is better than another thing, and there is no tradeoff.
A worldview in which everything has pros and cons and is ultimately subjective is fertile ground for entrenched habits, because it means never having to admit you're plain wrong, that there is a better way, or that other approaches are simply that much better than yours.
I believe code design is ultimately subjective. Unlike other metrics like performance which can be easily and definitively measured, you can't easily measure 'good' code. The definition of 'good' changes based on the context of the code base and function the code is trying to achieve. In this case, choices like having giant functions and files is definitely subjective.
The comment claim that every branch is accounted for and yet few functions below you can see this is certainly not the case.
They should either have fixed it first and then make such comment or shouldn't make such comment at all.
Otherwise this looks a bit cringey.
It says "(exception: simple error checks for a client API call)" and that seems accurate to me.
In particular Go (like C) has no built-in exception "throwing" / unwinding support, so for any function call where you want to pass an error onto the caller, you need to do something like
(It would be nice in theory if there were better language support for making this lexically obvious but they're in a language where that's not doable.)
Is someone not familiar with the code competent enough to decide what is a "simple error check" and not a bug? This is very weak as they suggest that even the branches that would result in no-op are accounted for. So if someone introduce a code with a branch that is unaccounted for that automatically means the code is either faulty or is a "simple error check". With something supposedly trying to be a space shuttle worthy code the lack of definition of "simple error check" is very worrying. Would I want this code to fly me to the moon? I'd be wary.
I think you can syntactically state that anything where the check is on the second return value (which is, by convention, the error return) is a "simple error check", and their rule for if statements is always for things that come from a first return value.
For instance, this would not be a simple error check:
server, err := find_current_server()
if server != nil {
...
}
because if find_current_server() believes that it's a non-exceptional case that there might be no server at all (i.e., it might return nil, nil instead of nil and an error), then you absolutely want to handle that case.
What if the second returned variable is not err, but the code using it assumes it is? (the code breaks the convention) This will not be accounted for. This means with that in mind a lot more discipline must be used to analyse the code that is being used in that module.
I think that is highly unusual in Go (but someone who actively uses Go would have to correct me).
Besides, this syntactic rule is not implemented by code but by the author and reviewer, who should know what the function returns. (And shouldn't name it "err", then.) They're doing the best they can in a language without syntactic support for what they want, I think.
Go pretty much assumes decent editor support, which should inform you of the type of err as you use it. It's not perfect, but it hasn't been a problem for me in practice, so far.
For background on how actual Space Shuttle flight control software was written I recommend reading the "NASA Manager's Handbook for Software Development". It contains some great guidelines for writing safety-critical software. AFAIK, no Space Shuttle mission ever suffered a serious safety incident due to a software defect.
This looks like an example of why I'm trying to learn more about model driven development these days.
There's probably a lot of wisdom in this file embedded in a lot of noisy Go code. If someone tries to implement this code in a different programming language, a lot of this wisdom would have to be painfully extracted from the Go code. Some sort of extracted decision table format (examples [1]) would be objectively easier to read, could be rendered in different formats, hyperlinked, etc. Maybe a small subsection of this code could be generated from some structured data specification.
Mbeddr [2] is an example of a language that allows embedding decision tables directly in C99. It can also embed state machines and other goodies directly into the code. This is just one way to go... I'm sure some ppl would complain about lock-in in a specific IDE.
It is not clear to me exactly which domain driven techniques could be applied to this particular piece of Go code, but it could be worth it to find that out.
From what I see out there model driven design seems to be applied to areas were really convoluted, and some times nonsensical logic needs to be mapped to code, like in insurance policy management, and stuff like that, or for other complicated code like firmware, drivers, protocols, etc. Why can't we apply these techniques to general programming problems?
Literate programming [0] is frequently re-invented, in various forms, because it is simply the right way to do things- and each iteration of it is almost immediately discarded, because we're all sinful and impatient creatures.
A code style with lots of verbose if statements? Go is the perfect language!
More seriously I’ve done this style unintentionally before in JavaScript for a similar situation with plenty of branching logic. Stuff that easily could have been have the lines but not as easy to verify by hand. I’m happy to see it being formalized for certain scenarios.
What if I actually wanted the ability to get a thing going and quickly working within a few lines, while also being able to spell things out as needed when the use case arises? Crazy I know.
Obviously there are many people on HN that consider that to be a good thing. I personally agree, this is the typical comment got. Only obviously blinded people wouldn't see this.
I don't think that anyone here believes that comment rot is a good thing, or that they are particularly "blinded". I just think that the fact that the comments lie is a compelling argument against the prose comment-heavy style used in this file. In this case it was not a big one because it's trivial to determine that the comment is not true, but in general I think that no comment is better than the wrong comment.
I have not cared to read the file much further so I can't comment on whether comment rot is really prevalent.
Actual "space shuttle code" is AFAIK built to satisfy a specification, not to serve as a specification in itself. When you find that the specification is lacking or somehow faulty, you get the people responsible for it to revise it, wait for the new revision and adjust the code to match. The key is that they are two separate processes. Someone responsible for the specification doesn't need to be concerned with the exact details of the implementation, and those responsible for the implementation only need to be concerned with satisfying the specification.
When the specification and implementation is all mixed up as prose and code in the same file you don't get to enjoy the benefits of separating those concerns. Changing the spec is the same process as changing the code, and the order of change is indistinguishable. You can change some code, but then you have to find all the references to the code you changed in the prose and adjust that too. You can change the prose, but then you have to adjust the code accordingly and any prose that assumed that the code worked as before. You'll be relying on tests and reviews (reviewers suffering the same drawbacks as the submitter) to enforce this, which IMO isn't not necessarily a bad thing, but not exactly rigorous enough to give a "space shuttle" stamp.
> Space shuttle style is meant to ensure that every branch and condition is considered and accounted for
I thought this was what you were supposed to do. Not that I do it every time, because I'm lazy, but Zed Shaw used to say every `if` conditional should have a corresponding `else` (which can also take the form of guard clauses). In Ruby, this is really easy to do because every method has to return something, so you build "returning nil or some String" as a concept into your program. There's a whole discussion around whether _that_ is a good idea, but I digress...going through the motions of enumerating every possible condition for a given piece of logic is not a bad exercise, and can result in very robust code at the expense of the code looking a bit more confusing than necessary.
We have a very similarly-written piece of code to this in our eCommerce platform. This code, written in Ruby, is used to calculate prices of discounts with regards to discount compatibility. There's a big warning atop the method stating something like "Please don't decorate/override this unless you ABSOLUTELY need to, the consequences could be very difficult to debug!". Not the easiest piece of code to look at, but it gets the job done in an efficient way without having to rely on C extensions for performance. We were also focused on correctness here, because calculating the wrong discount group could result in zero or even negative order totals, which our clients would _not_ be happy about. So the code is very verbosely written, isn't optimized for legibility, and strictly specifies both if and else sides of each conditional.
This kind of pattern indicates that the underlying language has inadequate metaprogramming facilities. You should be able to robustly verify certain guarantees via the type system instead of relying on a specific code formatting convention to do it.
DRY is one of the most overly used of the software adages. It more often leads to premature abstraction and overly complex code. My personal preference is explicit code.
This style of code with an emphasis on branch completeness is the biggest virtue Go gets from the lack of exceptions.
I personally dislike that style since that level of detail seems extraneous for normal tasks, but when you need to write code with absolute assurances like this then preventing the stack from being unwound without an explicit return (or panic, because some stuff is just terrible) is quite helpful to inspect and insure you have full coverage.
I've already seen a few places where this is compared to literate style programming. I agree and can see the comparison. However, a main draw of literate style, as used by most tools that support it, is precisely that it lets you bend the code around a narrative. Instead, this is having to fit a narrative to the code.
As an example of what I mean, look at some places where the if chunk is as big as the else chunk. By the time you are at the else, you might have to scroll back a ton to get to what it is you are in the else of.
Now, yes, to an extent, you could split this between functions. Such that you could have:
if foo {
doSomething
} else {
doSomethingElse
}
Literate style, though, would be more akin to
if foo {
<<since we have a foo, we can ...>>
} else {
<<Without a foo, we have to ...>>
}
This is somewhat subtle, but not having to define everything in terms of functions with inputs and single outputs can free certain parts of your code to just dealing with the core logic that you care to explain.
Pattern matching would be nice in this scenario. Would help with isolating responsibility by breaking out the if statements into functions responsible only for the case they know to handle. The additional opportunities to name with more granularity could minimize the comments needed.
It's a code smell whenever a function doesn't fit on a page or has more than 2 levels of indentation. Smaller functions are easier to test, easier to read and easier to debug. Leviathan kitchen-sink functions are a sign of slap-dash engineering.
Interesting to run a "blame" on this code, and see that almost every line has been touched by a different commit, and that 41 contributors collaborated on the file.
It seems that the "do not attempt to simplify this code" is necessary rather because the cost of having so many people relearn what they already know is too large, and not because the code itself cannot be written in a simplified manner.
It's a beautiful example of the difference between real-life constraints of a codebase developed by multiple collaborators, and the elegance that we strive for when coding alone.
There is no "one size fits all" coding style in software engineering.
This style is common in large enterprise C++ code bases and is definitely useful when LOC exceeds a couple of 100k and the business logic complexity is high, and the project is mission critical and expected to be around for many years.
I would argue that if they had solid unit tests of the expected behavior for their `pv_controller`, they wouldn't need this level of documentation - nor the scary space shuttle warning - the code would self-document.
If the code is complicated, sometimes it's better to make it extra explicit than just relying on the reader's comprehension (including empty else blocks, for example)
Because in a nested A/B/C condition it is very easy to not notice what happens in the case where A v !B v C and/or why is it different from A v B v !C especially when some of those can't happen together (but then the code does "something" and "can't happen together" means "it will happen in that very weird case")
Honestly, to me this looks like normal good code. It reads from top to bottom, exits early, does not have conditions so long that they require lines to be wrapped, it has function that are long or wide but not both... Does not seem to superfluously refactor code into functions called only once.
Maybe there are too many comments, but for code that is critical and might be harder to understand at a glance that is better.
The longest functions don’t follow the same guard clause style as many of the shorter functions and I think that’s a mistake. That half the file avoids else clauses and half don’t tells me there’s an argument here that was never adequately resolved. I think we’re missing about six functions and a couple more refractors, then tone down the a couple parts of the lecture.
As it is it’s obvious that many different hands were in it (and if you look at git blame, one of them doesn’t know how to revert changes properly and needs to stay after school for extra lessons. If this file is so important it shouldn’t have the most critical lines with a “revert changes” comment as the commit message. You done goofed, son.)
If you ever programmed embedded software for big companies (automotive, medical, etc) you're used to this style. Hell, at Siemens we were forbidden to even use ternary operator, a rule said it was to be replaced by a complete if/else statement no matter what.
1) One would have to make sure all the branches are accounted for during unit/integration tests.
2) The problem I am struggling with is the interviewers. So much of what I code I keep thinking about interviewers questioning it. "Why so many branches? Couldn't you have done this is in a more DRY manner? Couldn't you have done a better job at naming functions and variables so that so much commenting is not needed?".
This thing we do is truly an art form that can more easily be picked apart than properly understood. Ultimately I think we need to be like MMA fighters... able to properly pick the correct response for the situation with no ideological preference... only results matter.
The key thing about complexity is that, you can move it around, but you can't make it go away.
The complexity has to be there, somehow. It's also highly subjective how one sees and grasps complexity.
Breaking code up for the purpose of reducing complexity is more often a symptom of clearing your mental space.
But IMO most often, complex code is much more well off being put in the same place, and with extensive comments too.
I also believe it's a true flaw to believe that a lot of comments are unnecessary when writing "important" code.
Comments are written in a human language, we interpret them as such. Computer language is different and it takes, more often than not, more mental power to understand and more important to change.
Setting aside the discussion of when, exactly, this approach is appropriate, I wish we had more languages that would make it easier and more organized. The kinds of languages where e.g. code documentation is part of the syntax, where there's a way to explicitly express various contracts (e.g. design-by-contract in the language), and where, in general, the language design errs on the side of readability over terseness.
Some examples (none of which are perfect) would be COBOL, Ada and Eiffel. But they all have their own issues, and then of course there's a question of long term support and maintenance.
On a related note, please note that most people have the opposite issue, myself included. I know we are highly biased and tend to overcomplicate things, so I follow the KISS rule by heart.
But when I do have to write complex parts of software that cannot be simplified, I will take a lot longer because I'll need to prove, first to myself, that it really cannot be simplified. Then I'll still add a big warning, either to others or to my future self (which is effectively a different person).
How about: code should be self-explainable, functions should not take more than a screen, ensure behaviour with tests.
Otherwise, to me it looks more like Space Shuffle style.
Readability is not a valid requirement as it does not specify a valid end condition.
Likewise simplicity is not a valid requirement.
Code coverage by tests, proofs and comments is.
A valid requirement could be considered complexity, number of names and entities and a lot of other linter constraints.
Heck, aggressive deduplication is not even always good.
Self similarity is not necessarily good either (the reuse of patterns) as you could have problems trying to spot the differences and special cases as they may no longer stand out.
The code referenced here is extreme. It does not consider diminishing returns or acknowledge tradeoffs in readability. I would advise a more balanced approach.
A more compact style allows more code to be viewed at once making it easier to understand. Of course, taken to the other extreme, overly compact code becomes hard to read.
It's called, "the art of computer programming", because blindly applying programming principals rarely yields good results.
Of course this can be improved and simplified.
Handling 12123123 err checks is a problem of avoiding short circuit failures. That's fine, but you have multiple return values, so there's no reason not to leverage that properly...or even consistently. The whole file is littered with waiting assignment until a comparison, except in some cases or where if and elseif statements use the return values inline but not other places. It's just random.
//checkVolumeSatisfyClaim checks if the volume requested by the claim satisfies the requirements of the claim
func checkVolumeSatisfyClaim(volume *v1.PersistentVolume, claim *v1.PersistentVolumeClaim) error {
// check if PV's DeletionTimeStamp is set, if so, return error.
if utilfeature.DefaultFeatureGate.Enabled(features.StorageObjectInUseProtection) {
if volume.ObjectMeta.DeletionTimestamp != nil {
return fmt.Errorf("the volume is marked for deletion")
}
}
volumeSize := volume.Spec.Capacity[v1.ResourceStorage].Value()
requestedSize := claim.Spec.Resources.Requests[v1.ResourceName(v1.ResourceStorage)].Value()
if volumeSize < requestedSize {
return fmt.Errorf("requested PV is too small")
}elseif v1helper.GetPersistentVolumeClass(volume) != v1helper.GetPersistentVolumeClaimClass(claim) {
return fmt.Errorf("storageClassName does not match")
}
// function here, obviously
err = volumeAccessModeChecks(volume, claim)
return err
}
// Here's the function
func volumeAccessModeChecks(volume *v1.PersistentVolume, claim *v1.PersistentVolumeClaim) error {
// even the naming sucks. DISTINCT case error name
isMismatch, volumeModeErr := checkVolumeModeMismatches(&claim.Spec, &volume.Spec)
if err != nil {
return fmt.Errorf("error checking volumeMode: %v", volumeModeErr)
}elseif isMismatch {
return fmt.Errorf("incompatible volumeMode")
}elseif !checkAccessModes(claim, volume) {
return fmt.Errorf("incompatible accessMode")
}
return nil
}
This whole function is already a separated function from the middle of the file, no reason to stop there. No need for separate files when the function calls are going to be distinct and inline beneath their usage.
I imagine it reduces errors and I admit it's very tempting.
But doesn't this style add a considerable cognitive load such that thinking several layers up becomes difficult? Perhaps there is simplified documentation of lower or mid level components so that thinking on the higher levels is not so difficult.
As a fairly junior programmer, I'll play devil's advocate on the Bad bit – I find that kind of thing very helpful sometimes. Having plain english demarcations for what a piece of logic does – or where in the loop something is – can shave 80% off the time it takes to get reacclimatized when needing to make a quick change. There are moments when I know the conceptual bit that needs changing, and would prefer to hunt it in english rather than spend time translating on the fly.
1. My comment assumes the code itself is well-written and clear. If it is, the plain English explanation will be superfluous, and you'll be able to understand the code just as fast. If it's not, then the comment was the wrong solution to a real problem. The correct solution is to rewrite the code.
2. What counts as "well-written code you can understand instantly" is of course experience-dependent. And not just "years of experience" dependent, but primarily "familiarity with the paradigm and style of coding experience" dependent.
Steve Yegge talks about over-commenting by junior developers here:
Yeah, 500+ upvotes on HN apparently think otherwise, but I also dare to disagree... it's not clear, despite claims to the contrary.
Comprehending this code relies on reading a long series of verbosely referenced assumptions around an implicit state machine with numerous error conditions and edge cases. The author's argument is that extreme verbosity in control structures assists with ensuring no such conditions escape an explicit code block - ie. they admit that they are (ab)using code to deal with their personal challenge of achieving a rigorous and precise conception of the problem. I would posit that approach is the wrong tool for the job, and further that they have simply failed to clearly articulate the problem, and are writing illegible code and documentation as a result. A clear symptom of the resulting fragility is the title / opening plea to others not to change their carefully constructed house of cards. Fragility does not good code make.
If I were to rewrite this, the implicit state machine and its valid transitions would be documented first and foremost, eg. with a railroad diagram or ABNF.
The benefit of using [a formal specification language] is that it teaches you to think rigorously, to think precisely, and the important point is the precise thinking. So what you need to avoid at all costs is any language that's all syntax and no semantics. - Leslie Lamport ... via http://github.com/globalcitizen/taoup
Space shuttle coding is like preparing code for your future self to quickly make sense of.
A key part of the software lifecycle is being able to easily onboard newcomers to be productive, or coming back to a part of a code base months, or even years later.
The hubris in some of these comments is staggering. No, you wouldn't have more cleanly (and successfully) written this in $BETTER_LANGUAGE, even with your super power of perfect hindsight.
People seem to be missing that "simplify" here is a droll insult to too-clever hackers. The code is not too complex, it is well written and documented.
This is pretty smart given the nature of open source. It's a good safeguard for someone messing with code that runs like half of cloud deployments now.
You have to handle the errors somehow... though it's amusing that Go settled around what is essentially Java Checked Exceptions everywhere with the handicap of not being able to automatically propagate (or equivalently an Either type, albeit you can make expressive functional pipelines with that) so you always complect the attempt at doing work in the body, the ability to fail in the type signature, and the requirement to notice and handle the error at the immediate call site after the failure has wiped out any local state there may have been in the called function before it returned. (Hey, sometimes it's nice to have a way to force callers into something, but I want it as a design choice.) The notion of being able to fully decomplect and separate error signaling from error handling escapes both languages though, being that the notion is only as new as the '60s with PL/I or the '80s with Zetalisp (or modernly with Common Lisp)...
A good alternative is language-level support for returning early from a function with an error. Off the top of my head, both Rust and Haskell have good support for this. (I'm sure there are others, these are just the ones I know.)
Rust has a macro try!(x) that takes a Result<T, E> - a two-variant type, either a successful result of type T or an error of type E - and translates it to, if x is the first variant, evaluate the output of try! to the value of type T inside, otherwise return an error with the value of type E (on the assumption the calling function returns a Result<something, E>, too). So you can do things like `let file = try!(open(...));` and operate on the file. People liked it so much that a couple years back Rust added the question-mark operator, so you can just do `let file = open(...)?`, and chain it to `let data = open(...)?.read(...)?`. It's still visible that this is how you're handling errors, and you can always leave off the question mark and write out `match open(...) { Ok(file) => ..., Err(error) => ... }` instead, if you'd like.
Haskell has its monads, which for the present purpose can be interpreted as just a wrapper type. Given a wrapped object of type T, you can give it a function that takes a T and returns a similarly-wrapped object of type U, and have it apply the function to the data. If the wrapped object is in a failure state, it can choose to "apply" the function by just not calling it at all and instead returning the same failure state. Haskell has special notation with the "do" keyword for calling several monadic functions repeatedly, where it looks like you're writing regular, imperative code and assuming errors don't happen. But again it's obvious when you're using it and you can always handle the exceptional case specially as soon as you want.
Thanks, fixed - I was trying to gloss over it for the sake of readers for whom X<Y> itself isn't familiar syntax but I think it makes more sense this way. (I'm also glossing over the fact that try! will convert error types, I think?)
What is Uncle Bob's claim to fame? What has he written that is production-ready and sturdy and has held up to change?
I'm sure he can have opinions - he is very experienced at having convincing-sounding opinions on software development. But where is the test of whether he's right?
Nice. At some point one rolls up your sleeves and says "right, this part is important and needs to work and I need to be sure I have got it right" At that point space shuttle style works pretty well (usually for me I am writing some security / control dispatch for the umpteenth time)
this is how my code looks until it's done, where i take out all of the nonsense and make it unmaintainable, shooting myself in the foot for any development i need to do on it in the future. love it :D
yesss thought I was the only one that thought this style was a good idea ... Makes my last company look foolish and explains why most of the devs weren't as productive as they could have been.
Dangerous is not delivering on time because you want the code to be fancier than it should be or needs to be when all you had to do was write it out plainly and leave obvious comments.
Go uses tabs instead of spaces and Github displays them as 8 character width. It can get annoying and there are several extensions to change this behavior. Tab Size on Github [1] is the one I personally use.
Code that "can't be refactored" is a huge smell. This code can be written in a simpler and easier to follow fashion even if the authors of the code were unable to do so. The proof of that is the thousands of free software projects of equal level of complexity whose code bases are a joy to read.
The amount of circlejerking in this thread is unbelievable. Go is a horrible language, and if this code were posted anonymously, these same idiots calling it "the jazz music of software development" and "space shuttle style" (copied verbatim from the actual code comments to boot) would be deconstructing and denigrating it to the last fine detail. Hilarious.
A naive look at this and my head is screaming that this file is way too big, has way too many branches and nested if statements, has a lot of "pointless comments" that just describe what the line or few lines around it is doing, and has a lot of "logic" in the comments which could quickly become outdated or wrong compared to the actual code.
Yet at the same time, it's probably a hell of a lot easier to maintain and manage than splitting the logic up among tens or hundreds of files, it contains a lot of the inherently complex work it's doing to this file, and it is so well and heavily commented that it should be pretty easy to ensure that any changes also keep the comments up to date (after all, any change without changing the resulting comments should show up like a sore thumb, and will most likely prompt the reviewer to look into it at the very least).