
Elevate your C programming skills by exploring the SQLite codebase - cpprepository
http://cppdepend.com/blog/?p=214
======
sidlls
It is a nice/ok code base. I find Redis' a bit better.

One thing that stood out in the article was calling out the percentage of
functions with fewer than X lines. I don't think that is a good goal to work
toward. Having functions that do one thing and do it well is, and function
length can be a very rough proxy measure for that. However "doing one thing"
almost always means "perform more than one step in an algorithm." I've seen
too many code bases where "keep functions short" is used as an excuse to break
locality by piecing functions out into smaller functions that are only ever
called in one place (the function they split up). It can make code harder to
read and maintain and be less efficient besides.

~~~
chubot
Eh, IMO it depends on how many locals there are more than the number of lines.
Reducing the state space definitely improves readability.

If you have a function with 15 locals, it could be hard to keep straight in
your head, even if that function were 50 lines or so. If you can split it up
into 3 functions with 5 locals each (and a small number of arguments), that is
usually more readable.

As long as the functions are named well and communicate (only) with params and
return values, it doesn't hurt readability to split it up, even if it's only
used in one place.

On the other hand, I suppose you can have a 500 line function and if has few
locals, then you probably don't need to split it up. The canonical example
would be a function that only prints stuff to the screen, maybe based on a
couple parameters. But I don't really run into those situations very much --
more often I run into the big functions that can be split up.

~~~
saurik
I don't know. My absolutely favorite codebase ever was dpkg. It isn't as good
now as it was before, but essentially they arranged so that anything "unusual"
(error handling and the unwind process) happened in a side channel via setjmp
and then the core of the program was in a file called processarc.c which
contained one giant function that you could use to see, in order, all of the
middly steps that dpkg performed. Doing it like that meant that the code read
like a specification.

I am _highly_ skeptical of codebases with lots of functions that are called
from only one place; I would even put that forward, along with "code
duplication (not just in the small but also structurally: where you should be
building higher-order primitives, including your own control structures or
scaffold) as its foil (as it usually isn't OK to not have a function if it
would be called twice), as core metrics for "bad code". If you have too many
locals... use a block! That's what they are for :/.

~~~
chubot
I'm definitely a fan of global coherence, but it only goes so far. I briefly
worked on Rob Landley's toybox project, and he takes it to an extreme. There
are good things about the project, but it will eventually fall down because
adding code is an an O(n^2) algorithm.

That is, every new piece of code has to be compared with all the existing
code, in order to maintain global coherence. So there is a hard limit on the
size of a program you can write this way.

Also, new contributors are forced to understand the whole thing.

This is why "local reasoning" is a desirable property in software architecture
-- so the program can grow smoothly. Most programs grow badly, but you want
the cost to be closer to O(n) rather than O(n^2).

So these types of program eventually either rot or die -- that's why dpkg
"isn't as good as it was before".

As always the answer is somewhere in the middle. Neither extreme is good. You
want some modularity but not too much. I guess people are disagreeing on how
much there should be.

I concede that modularity has a much greater cost in C than in other
languages, e.g. because char* in a function parameter list tells me nothing
about ownership. See the sibling comment where I was puzzling through Python
interpreter source code like Py_SetPath(char *).

So in C it might make a lot more sense to have huge functions because there is
a cost of tracking ownership across function calls. In languages with garbage
collection that's much less of a problem.

------
ivanbakel
The first example made my skin crawl. No spacing between related blocks of
code, no indentation of conditional statements to help pick them out against
the background, and since when is a lack of comments in the body of a method a
good thing?

~~~
w4rh4wk5
I fully agree with you. Here is how I'd write the first example. Note that I
don't know anything about SQLite and that I may have introduced typos. The
comments 'resize ...' are just place holders.

[https://gist.github.com/W4RH4WK/57a8701190b682bcac5b8daadb43...](https://gist.github.com/W4RH4WK/57a8701190b682bcac5b8daadb43cded)

~~~
kyberias
I like your formatting of the code, but...

You introduce comments that are basically just noise. The name of the method
already states what it does. The rest is just very simple details, actually
memcpy's. Most of the comments are totally unnecessary.

Code comments are not pure good, they are not Schindler's List. They are
necessary evil. Every time you need to add a code comment you have failed to
write clean code that it easy to understand.

Every comment that you add has a cost. Someone needs to read it _in_addition_
to the code that it tries to describe. The code itself either is correct or
isn't. The comment cannot fix that. But it can slow down the reader to
understand what the code does. Whether there is a comment or not, the reader
still needs to read the code.

~~~
w4rh4wk5
Of course you can omit the comments in the body of the function. Yet I do not
think they are "just noise". In my point of view they help with two things
when going over the code: First they reduce the cognitive load, second they
help maintaining a red thread throughout the code.

Let me explain that in more detail: while `malloc((sizeof(char _) +
sizeof(i16) + 1)_ N)` tells me how big the buffer is, it does not tell how it
is organized / how it will be used. Of course you could introduce a struct for
this but that would get out of hand quickly. Having the comment one line prior
illustrating the layout helps one to understand what comes next. Hence
offsetting `zExtra` after each `memcpy` part becomes obvious.

`resize azColl` (and similar) serves as a heading for the following block,
allowing one to easily find a specific step in a list of multiple steps.
Scaning the whole function with your eyes for such headings is much faster
than parsing for `pIdx->azColl` inside a call to `memcpy`.

In the end, whether to put these kinds of comments into the code-base or not
should be part of the coding convention chosen for the project. People have
different opinions and after all it is a trade-off. The way I did it in this
example is the way I'd do it in a big (community driven) code base like
SQLite. If the convention would state otherwise, I wouldn't add them.

~~~
kyberias
Have you considered that you can structure your code with something else than
"header comments"? For example, if you find that you need to split a function
into multiple steps and you end up adding "header comments", you could also
split that code into subroutines and name those functions accordingly?

Also, I usually find that just adding one or two line feeds between code
blocks helps to add enough structure.

~~~
w4rh4wk5
I often consider that in languages like Python or Haskell. There you can
easily do this in very few lines of code and with limited scope (inner
function, Haskell's where). In C this is often too much overhead in my
opinion.

------
Philipp__
But reading that amalgamation sqlite3.c file wasn't the best C reading-for-
learning experience I had.

I found Redis and OpenBSD much more organized and documented, it just made
more sense since the structure of those projects is really nicely organized.
Of course there are reason why sqlite3 is one C file, I am well aware of that,
but I just wanted to point out 2 more awesome sources for reading and learning
C.

~~~
agumonkey
One of my favorite C reading experience is maru
[http://piumarta.com/software/maru/](http://piumarta.com/software/maru/)

(self bootstrapping lisp)

------
minipci1321
When the 'return' statement is put on the same line as the parent 'if'
statement, setting a breakpoint on returning requires dealing with the
assembly code and hex values (identifying the instruction that implements the
return (could be a ret or a jump), then using its address to set the
breakpoint.

When the return statement is kept on its own line, this is much easier.

~~~
pjmlp
Or using advanced GUI debuggers that allow middle statement breakpoints.

~~~
to3m
Which debuggers for C or C++ do this, and which compilers do they work with?

~~~
pjmlp
Visual C++, if I am not mistaken it was introduced on VS 2015.

~~~
to3m
C/C++ breakpoints still appear to be line-based in VS2017. (C# breakpoints
have been statement-based for a long time...)

~~~
pjmlp
I see, my failure then.

When on VS, I spend most of my time on the .NET side, so I thought it was on
the C++ side as well.

------
SonOfLilit
I feel like TFA makes a very bad case for it (they use structs? only 8
functions have many parameters? your commercial tool can query this kind of
thing with SQL?), but regardless, SQLite is a very good codebase to delve into
as a seasoned programmer wanting to improve even more.

What worked for me was diving in with a specific question (I think it was "I
wonder how indexes are represented on disk?") until I found the answer,
picking up a lot on the way.

Not an easy codebase to read, but full of good architecture, good ideas and
best practices.

------
vmp
> The function body is well indented

To be fair, there is only one level of indentation. Kind of difficult to screw
that up.

------
omginternets
Is it me or is the scrolling behavior on this site awful?

~~~
photonios
Yeap. I wanted to swipe back on the trackpad of my backbook and it just
scrolled down. Not ok.

------
CalChris
This strikes me as programming like it's 1999 (1998 actually). It's not the
worst by far. But then I wouldn't call it an elevation either.

1\. Elegant use of white space makes for an easier read.

2\. Intermingled declarations and code showed up in C99.

3\. C++ style comments rather than block comments. There should be a
commentary (and not a recital).

4\. camelCase for local variables? Really?

5\. Single line if statements make for a hard read.

------
jankotek
For Java I would recommend to study H2 SQL database. It is small self-
contained project, with well written and consistent code.

------
kerneldeveloper
I think the Linux kernel source code is also very clean and it's worth
reading. I prefer its coding style.

------
wav-part
While we are talking about C skills, recently I got hit by this gotcha. I
never knew a simple comparision could be this complex.

Question: Whats ru, rs ?

    
    
        unsigned int u0 = 0;
        unsigned int u1 = 1;
        int s0 = 0;
        int s1 = 1;
        int ru = 0 > u0 - u1;
        int rs = 0 > s0 - s1;
    

Answer: ru is 0. rs is 1.

~~~
MaulingMonkey
For this specific case, warnings can save you:

[https://wandbox.org/permlink/wwbCYysH039z3YqN](https://wandbox.org/permlink/wwbCYysH039z3YqN)

    
    
      prog.cc: In function 'int main()':
      prog.cc:8:16: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
           int ru = 0 > u0 - u1;
                    ~~^~~~~~~~~

~~~
wav-part
I had -Wall set but I forgot that its misleading. -Wextra is needed for
-Wtype-limits and -Wsign-compare.

------
jstimpfle
Const-correctness is really difficult to get right. One person's const is
another person's mutable.

An example from the top of my head, a function which finds an integer in an
array "should" get that array as a "const int *". But if the caller wants to
modify the found element, it needs to do ugly casting (really bad practice).

Similar problems exist in most other programming languages. Most have only
"shallow" const. You can mark an Java array member as final, but you can't
disallow mutations to that array.

Const-correctness is similar to types - it breaks very quickly, and when it
breaks it gets in the way.

That means I resort to indices when possible. When pointers are needed I
typically avoid const.

~~~
pif
> a function which finds an integer in an array "should" get that array as a
> "const int _". But if the caller wants to modify the found element, it needs
> to do ugly casting (really bad practice).

My friend, your example is wrong! The caller can happily work with a mutable
array; the function accepting a "const" pointer is just a guarantee that _the
function itself* won't modify the content.

    
    
      size_t look4 (const int *ptr, int val);
      void caller (int *ptr, int old_val, int new_val)
      {
          size_t index = look4(ptr, old_val);
          ptr[index] = new_val;
      }

~~~
jjnoakes
Sure, if you return an index. I bet the parent poster was discussing returning
a pointer, though.

Imagine look4 returned a pointer to the found element. Making look4's
signature const-correct in C is not possible.

(Think of strchr).

