One thing that stood out in the article was calling out the percentage of functions with fewer than X lines. I don't think that is a good goal to work toward. Having functions that do one thing and do it well is, and function length can be a very rough proxy measure for that. However "doing one thing" almost always means "perform more than one step in an algorithm." I've seen too many code bases where "keep functions short" is used as an excuse to break locality by piecing functions out into smaller functions that are only ever called in one place (the function they split up). It can make code harder to read and maintain and be less efficient besides.
If you have a function with 15 locals, it could be hard to keep straight in your head, even if that function were 50 lines or so. If you can split it up into 3 functions with 5 locals each (and a small number of arguments), that is usually more readable.
As long as the functions are named well and communicate (only) with params and return values, it doesn't hurt readability to split it up, even if it's only used in one place.
On the other hand, I suppose you can have a 500 line function and if has few locals, then you probably don't need to split it up. The canonical example would be a function that only prints stuff to the screen, maybe based on a couple parameters. But I don't really run into those situations very much -- more often I run into the big functions that can be split up.
I am highly skeptical of codebases with lots of functions that are called from only one place; I would even put that forward, along with "code duplication (not just in the small but also structurally: where you should be building higher-order primitives, including your own control structures or scaffold) as its foil (as it usually isn't OK to not have a function if it would be called twice), as core metrics for "bad code". If you have too many locals... use a block! That's what they are for :/.
That is, every new piece of code has to be compared with all the existing code, in order to maintain global coherence. So there is a hard limit on the size of a program you can write this way.
Also, new contributors are forced to understand the whole thing.
This is why "local reasoning" is a desirable property in software architecture -- so the program can grow smoothly. Most programs grow badly, but you want the cost to be closer to O(n) rather than O(n^2).
So these types of program eventually either rot or die -- that's why dpkg "isn't as good as it was before".
As always the answer is somewhere in the middle. Neither extreme is good. You want some modularity but not too much. I guess people are disagreeing on how much there should be.
I concede that modularity has a much greater cost in C than in other languages, e.g. because char* in a function parameter list tells me nothing about ownership. See the sibling comment where I was puzzling through Python interpreter source code like Py_SetPath(char *).
So in C it might make a lot more sense to have huge functions because there is a cost of tracking ownership across function calls. In languages with garbage collection that's much less of a problem.
It's interesting how even in the context of C the virtues that functional programming dials up to eleves come up.
(Git is similar: it's a dirty C program with lots of mutation under the hood, but when you use it you can sort-of pretend it purely functional, since the main things you care about are append-only, and even rewriting history just creates a new slightly altered copy of history, and the old one is available until the GC kicks in. Similar for the recent craze of copy-on-write in filesystems.)
Also, what does OOP share with functional programming? Being disciplined about state. When OOP is used well, you're protecting invariants on state inside classes.
Basically the anti-pattern is global mutable state, which most (old) codebases still have. It works poorly for concurrency too (e.g. Python and the GIL).
Here are a couple related comments which touch on OOP and FP, which a lot of people seemed to like:
https://news.ycombinator.com/item?id=14523728 -- Go without globals; pure dependency injection is really functional programming; state as an explicit parameter
https://news.ycombinator.com/item?id=11841893 -- there's no real advantage to expressing something like "split a string by a delimiter" in a purely functional style. Either way, you have a trivial referentially transparent function you can reuse without causing complexity in your program. You might as well do the obvious imperative thing.
Also, I neglected to mention that this style actually doesn't work all that well in C. It works better in C++.
Not only because C++ has classes and C doesn't, but because of char. I was recently hacking on the Python interpreter, and wondering about functions like Py_SetPath(char ) and setenv(char, char) (from libc). The problem is that I have no idea whether those functions copy their arguments or they store pointers to them. I had to go read three levels deep.
So in that sense, functions in C fail miserably... they are not functions in the mathematical sense, so you can't reason about them the same way without strong conventions.
Yes. Hence my astonishment and use of "even in C".
Indeed, the most important bit for functional programming in the large are well-defined interfaces and a very disciplined use of state. In the languages I use I see no need to resort to imperative features for "split a string by a delimiter"---but if your language is a bit clunky that might be the path of least resistance.
(To give a counterexample to my point, even in Haskell you might want to use eg a mutable hashtable in an imperative fashion. Even if that means going out of your way to do so.)
Avoiding global (or singleton) state is only the start of what's necessary. Even then OO programs tends to have lots of little pockets of state whose network of interactions goes to a large scale again and doesn't compose well.
Breaking out a function that's only ever called once can still be a good idea: functions---especially pure ones---communicate with their environment via narrow interfaces, their arguments and return values.
Compare that to inlining the same block of code: the reader will have to carefully check that the inlined block is self-contained, instead of having the syntax provide the hint.
Granted, this argument mostly makes sense for pure functions, and doesn't apply nearly as often in C as in eg OCaml. Haskell's usual style is perhaps the most extreme in that regard, any individual definition only very seldom goes on for longer than one screen. (Forth people strive for a similar style of ravioli code, but I don't have the experience with that language to say much.)
You introduce comments that are basically just noise. The name of the method already states what it does. The rest is just very simple details, actually memcpy's. Most of the comments are totally unnecessary.
Code comments are not pure good, they are not Schindler's List. They are necessary evil. Every time you need to add a code comment you have failed to write clean code that it easy to understand.
Every comment that you add has a cost. Someone needs to read it _in_addition_ to the code that it tries to describe. The code itself either is correct or isn't. The comment cannot fix that. But it can slow down the reader to understand what the code does. Whether there is a comment or not, the reader still needs to read the code.
Let me explain that in more detail: while `malloc((sizeof(char) + sizeof(i16) + 1) N)` tells me how big the buffer is, it does not tell how it is organized / how it will be used. Of course you could introduce a struct for this but that would get out of hand quickly. Having the comment one line prior illustrating the layout helps one to understand what comes next. Hence offsetting `zExtra` after each `memcpy` part becomes obvious.
`resize azColl` (and similar) serves as a heading for the following block, allowing one to easily find a specific step in a list of multiple steps. Scaning the whole function with your eyes for such headings is much faster than parsing for `pIdx->azColl` inside a call to `memcpy`.
In the end, whether to put these kinds of comments into the code-base
or not should be part of the coding convention chosen for the project.
People have different opinions and after all it is a trade-off.
The way I did it in this example is the way I'd do it in a big (community driven) code base like SQLite. If the convention would state otherwise, I wouldn't add them.
Also, I usually find that just adding one or two line feeds between code blocks helps to add enough structure.
I'll take a codebase with no comments over pages of
// assign 10 to a
a = 10;
As far as possible the code should tell you 'what' it is doing. But pure code alone has a hard time telling you 'why' it does what it does, and an even harder time telling 'why not'?
(Ie 'Why are we using this complicated approach, and not a simpler one?', 'Why can we get away with this simple approach, why don't we have to worry about corner case x?')
For the final delivery of source code we'd make a binder that had a code listing on the pages that were on the right side, with a detailed running commentary on the pages that were on the left. On the left hand pages we could go into great detail (we'd add vertical space on the right hand pages as needed to keep things lined up).
The left hand pages in essence were almost a book on how to do whatever it was the code was doing, making it easy for the client to hand over the project to their own programmers after we were gone. We did not rush writing the left hand pages. Sometimes I spent more time writing the left hand pages than I had spend writing and debugging the code. (Our contracts were all fixed bid contracts, so spending all that time didn't raise the price for the client).
I find this to be a rather curious statement. I don't believe "clean code" and "easy to understand" are necessarily related as strictly as you seem to.
Sometimes solutions to problems are messy and difficult to reason through with just the text of the solution ("code"). Code comments that elucidate meaning when the solution is thorny make code clean. The absence of comments might make code pure (in some sense), but not clean.
Well, if we select a definition for clean code to be what Robert C. Martin defines in his book "Clean Code", it very much is related to code that is maintainable, that is readable (i.e. easy to understand). I'm not saying they are "strictly" related as you seem to suggest I think.
The absence of comments is not a necessary property of clean code. But the absence of unnecessary comments is. Most comments are unnecessary.
Also, most importantly, the presence of comments is not a sufficient criteria for clean code. Code can be of very high quality without any comments.
You seem to say that _good_ comments may improve the quality of code and I agree.
It's not that we are not capable of understanding the original code block, but the 'breathing space' and just some minor sparse comments now allow much of that understanding in less than a few seconds -- a real practical side when working on a large body of code.
And it does that without interfering with the 'good practices' highlighted in the article.
It must not be left out of any discussions of style in the SQlite codebase.
Without this approach, many stylistic choices would not be nearly as acceptable.
(But one thing I never looked into: do the developers use a more readable, better-commented and -spaced style, fewer tricks that produce fewer warnings, to write their test suites?
Or are they written in the same style?)
I think that's the rationale: you are supposed to explain what you do, not how you do it. The function is short enough that a global comment is enough.
I agree with you, that a couple of comments in the function body wouldn't have hurt.
I'm not sure all the casts are necessary, however, and I'd probably reduce usage of some sizeofs just using the constant (sizeof(i16) should obviously be 2, if the name is any indication.)
Why? The sizeof is resolved into a constant at compile time, ie, no runtime cost. I see the 'sizeof(i16)' as conveying more information than a '2'.
-- the fact that the code didn't use the int16_t from stddef.h but rather defined their own, could hint in the sense that a signed-int-over-exactly-16-bits might be unavailable in the ABI. People defining their own exact-width integral types (instead of pulling in stddef.h), rarely think about how that makes the code harder to interpret for a reader.
Now, if the type i16 might indeed be wider than 16 bits, sizeof becomes perfectly justified but '16' is not.
(A note on code style though: some of your lines are a bit too long. Have pity on those with small screens. Seriously, it's not that hard to break up line 94.)
Where can I learn enough about X to read this code?
It's still one file, but it has more features & is less obfuscated
Otherwise I mostly learnt by reading XCB header files
That said things are going the way of Wayland, so I've since switched: https://github.com/serprex/noway
How small are your screen(s)? What resolution do you run them at? What size font? Do you have eyesight issues?
I have decent vision at monitor distance. On a 13-inch Macbook Pro, using a font of reasonable size, I can easily diff two files with a line length of 108 - side-by-side.
I'm curious, then, why others can't do that. (Eyesight issues aside, of course.)
(Whether code lines should be 108 characters long is a separate issue.)
But in this case, all that is not relevant. The link was to Github, and Github enforces a width of less than a thousand pixels, regardless of screen size.
Excellent, thanks. That's something I hadn't considered.
But in this case, all that is not relevant. The link was to Github, and Github enforces...
Ah. I assumed you were speaking generally.
Perhaps should have added a 'mostly' in my original reply though. But I want to keep comments always to a minimum
One major WTF in this code that stood out for me is the place where char zExtra is cast to const char * * . (Pointer-to-pointer casts should be alarming for any C programmer.) Only later, I realised that zExtra was a block of memory containing three arrays of different type. There should at least be a brief comment explaining the memory format of zExtra, and maybe one explaining why the code makes a single allocation instead of three separate allocations (is it performance critical?)
I'm not a purist by any means. I don't think comments are pure evil either. They have their uses but to say that code without comments is
Clean Code principles only state that code comments are not good _per se_. Most comments are just noise. Yet they are sometimes needed and bring value. It's not black and white.
I found Redis and OpenBSD much more organized and documented, it just made more sense since the structure of those projects is really nicely organized. Of course there are reason why sqlite3 is one C file, I am well aware of that, but I just wanted to point out 2 more awesome sources for reading and learning C.
(self bootstrapping lisp)
When the return statement is kept on its own line, this is much easier.
When on VS, I spend most of my time on the .NET side, so I thought it was on the C++ side as well.
What worked for me was diving in with a specific question (I think it was "I wonder how indexes are represented on disk?") until I found the answer, picking up a lot on the way.
Not an easy codebase to read, but full of good architecture, good ideas and best practices.
To be fair, there is only one level of indentation. Kind of difficult to screw that up.
1. Elegant use of white space makes for an easier read.
2. Intermingled declarations and code showed up in C99.
3. C++ style comments rather than block comments. There should be a commentary (and not a recital).
4. camelCase for local variables? Really?
5. Single line if statements make for a hard read.
Question: Whats ru, rs ?
unsigned int u0 = 0;
unsigned int u1 = 1;
int s0 = 0;
int s1 = 1;
int ru = 0 > u0 - u1;
int rs = 0 > s0 - s1;
prog.cc: In function 'int main()':
prog.cc:8:16: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
int ru = 0 > u0 - u1;
An example from the top of my head, a function which finds an integer in an array "should" get that array as a "const int *". But if the caller wants to modify the found element, it needs to do ugly casting (really bad practice).
Similar problems exist in most other programming languages. Most have only "shallow" const. You can mark an Java array member as final, but you can't disallow mutations to that array.
Const-correctness is similar to types - it breaks very quickly, and when it breaks it gets in the way.
That means I resort to indices when possible. When pointers are needed I typically avoid const.
My friend, your example is wrong! The caller can happily work with a mutable array; the function accepting a "const" pointer is just a guarantee that the function itself* won't modify the content.
size_t look4 (const int *ptr, int val);
void caller (int *ptr, int old_val, int new_val)
size_t index = look4(ptr, old_val);
ptr[index] = new_val;
Imagine look4 returned a pointer to the found element. Making look4's signature const-correct in C is not possible.
(Think of strchr).
Try to reduce some of the problems these people have: https://stackoverflow.com/questions/856542/elegant-solution-...