
Notes on Programming in C – Rob Pike (1989) - mabynogy
http://www.lysator.liu.se/c/pikestyle.html
======
gens
>Rule 5. Data dominates. If you've chosen the right data structures and
organized things well, the algorithms will almost always be self­evident. Data
structures, not algorithms, are central to programming. (See Brooks p. 102.)

I came to the same conclusion, after a while. In the end code is there to
process data. (there is meta programming and things like state machines but
most programs do hold data in some kind of structures to be processed)

>Simple rule: include files should never include include files.

Yes, please!

~~~
planteen
>>Simple rule: include files should never include include files.

> Yes, please!

Let's say I have a C++ class defined in a header that uses std::string. So I
require that anyone who uses my header already has <string> included in
version 1.0.0. Then, in version 1.1.0, I added to the class in the header so
that now it requires std::map. If the caller doesn't have <map> included,
there will now be a compilation error! How is that better than simply
including string and map in the class header itself?

~~~
gens
Because then people know what they are using. And that is more valuable then
fixing simple compiler errors.

I'v seen programs that have an "uberinclude", usually named "common.h" or
something like that, that includes not only standard libs but also other
headers. It is horrible. There is no way to understand what code written with
that mind-set is doing, other then slaving over a notebook trying to make
sense of it. Granted all encompassing/nested headers don't mean unreadable
code. But from my experience with my own code, clear and separate headers
(including standard ones) make me write clear and separate .c files.

~~~
planteen
Yeah I agree splitting functionality into separate headers is a good idea. But
I feel like the rule "don't include headers in headers" is good advice in lots
of situations.

Redis is often thrown around as an example of an excellent, clean C source
code base. Look at server.h. It has 34 include statements. Would you rather
that every C file that includes server.h have to include these 34 files before
including server.h? Seems insane to me.

[https://github.com/antirez/redis/blob/unstable/src/server.h](https://github.com/antirez/redis/blob/unstable/src/server.h)

~~~
gens
>Redis is often thrown around as an example of an excellent, clean C source
code base. Look at server.h. It has 34 include statements. Would you rather
that every C file that includes server.h have to include these 34 files before
including server.h? Seems insane to me.

Good point. This is exactly why _I would_ want that, yes.

~~~
planteen
Do you know of a large C (or C++) open source project that follows the rule of
no headers included in headers? I'd like to take a peek at its source.

~~~
gens
Having looked around, no. LibPNG seems to keep it at a minimum, though.

Guess i'm the only one doing that. (and i have no problems)

------
unscaled
Most of the advice holds true, but there's one bit you should happily ignore:
don't uglify your code with external include guards - that is put your include
guards in the included file.

Modern compilers perform include guard optimization:
[http://www.bobarcher.org/software/include/results.html](http://www.bobarcher.org/software/include/results.html)

There is one exception to this rule which I think still holds true: MSVC
doesn't perform this optimization correctly, perhaps due to its love affair
with precompiled headers.

I personally prefer solving that issue with pragma once, as I find the risk of
having an include guard name clash higher than the chance of having two the
same file accessible between to different hardlinks or copies, let alone
compiling anything on network share. It's also a lot more readable.

------
combatentropy
Lots of good advice from a long-time programmer
([https://en.wikipedia.org/wiki/Rob_Pike](https://en.wikipedia.org/wiki/Rob_Pike)).
I have bookmarked these notes before, because his simple but effective tips
resonate with me. From the article:

> Algorithms, or details of algorithms, can often be encoded compactly,
> efficiently and expressively as data rather than, say, as lots of if
> statements.

I have read elsewhere about moving your code complexity to your data. But I
can't find that other article. It's hard to find mention at all of this
strategy. But I have found it to be true. Moving details from PHP into the
database results in shorter code overall. The first example that comes to mind
is replacing a bunch of if-statements with one or more columns in the
database, like for some kind of categorization.

~~~
ycmbntrthrwaway
> I have read elsewhere about moving your code complexity to your data. But I
> can't find that other article.

[http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2...](http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2878263)

ESR actually references Pike's "Notes on C Programming".

------
glangdale
These are instructive to look at now. Important to recall the '1989' date of
course, but with hindsight...

I love Rule 5 ("choose the right data structures and the algorithms will
almost always be self-evident"), especially when combined with STL and strong
typing. There is a degree of irony in taking this advice from Pike, given the
design of Go.

Much of the material on complexity assumes you are hand-coding things from
scratch. I am happy to take on complexity (as long as I understand what I'm
getting into) from well-designed libraries rather than building something
"simple and robust" from scratch. The statement about binary trees vs splay
trees is illustrative; unless I need to see the bare data structure again I
would much rather take on something from the STL, complex or otherwise.

~~~
zjarek_s
STL is exactly those simple data structures referenced in this article. For a
long time it didn't even have something as basic as a hash table.

------
DSMan195276
Generally speaking all this stuff is good, but I'd add that his note on
Include files is dated and I wouldn't recommend it anymore. Any compiler worth
half its salt at this point will recognize the include guard pattern and not
parse though include'd files multiple times, so the worry about wasting tons
of time due to that pattern is largely gone.

The big problem with what he's suggesting is that if you're designing a fairly
big system with lots of decently small headers (Which is generally good -
simple headers with easy-to-read APIs are good), you'll end-up with a crazy
number of include's in every file - and if you change something to use a new
dependency, then you'll have to change every location it is include'd in as
well. There is something to be said about avoiding things like circular
dependencies, but this requirement really doesn't make it any harder for that
to happen, and just creates more problems and annoyances - it is not a very
scalable solution.

If you look at the Linux Kernel source (Which is arguably one of the largest
and most successful C programs) each source file has at around 10 to 30
include's at the top (Or more in some cases), and that's _with_ the headers
including other headers. If instead Linux had taken the approach Rob is
recommending, that number would probably be a magnitude larger and extremely
hard to manage even if they combined a bunch of the headers they have together
to reduce the total number (Which, again, I would consider a huge anti-
pattern).

~~~
adrianratnapala
I think I agree that includes-within-includes are probably a necessity.

But I am not that it allows things to be broken into "simple headers with
easy-to-read APIs". That is true for headers that are small and only #include
system headers and other basic dependencies. But if your own code has a nest
of interelated headers, then a single big header file is probably easier to
read.

~~~
DSMan195276
Well I mean, perhaps I should have clarified, "simple headers with easy-to-
read APIs" has to be looked at on a case-by-case basis and what makes sense. I
would argue that as a general rule it is what you should aim for if possible
(And I find it is possible in a lot of cases, and that if it's not things may
be getting a bit too interconnected), but you're right that there are
situations where things are a bit too complicated to split-up easily so a
single header for multiple things makes sense. I think that if you're
constantly in that situation though, you should reevaluate how you're
designing your components and headers.

------
pcwalton
This seems clearly dated. For example:

> For example, binary trees are always faster than splay trees for workaday
> problems.

I assume by this Pike means unbalanced binary trees (if not, then red-black
trees are decidedly _not_ simpler than splay trees, especially if you need to
delete). In that case, I don't really believe it. Nobody uses unbalanced
binary trees anymore for good reason: they have awful performance when you do
something as simple as inserting keys in sorted order.

------
adrianratnapala
I'd like to see Pike vs. the MISRA guidelines
([https://en.wikipedia.org/wiki/MISRA_C](https://en.wikipedia.org/wiki/MISRA_C)).
Rob's notes here are not about the kind of safety critical, hopefully small,
programs that MISRA claims to improve.

It would be interesting to hear his thoughts about just those kinds of
programs, and whether pointers and function pointers are still helpful.

~~~
kbob
It's useful to apply the Steve Yegge political axis metaphor[1] here. C forces
a fairly conservative approach because it stocks a full arsenal of footguns.
Pike and most of the early Unix guys, though, are about as liberal as they can
be within C's constraints. MISRA, on the other hand, is Idaho survivalist camp
conservative.

Both parties have good reason for their ideologies. Early Unix programs were
tiny enough that it was easy to keep an entire applications' rules in your
head and take full advantage of them. So it makes sense to play fast and
loose. MISRA-compliant systems, OTOH, are developed by large teams where no
member understands the whole system, and the consequences of getting something
wrong are measured in attorney man-years.

[1]
[https://plus.google.com/110981030061712822816/posts/KaSKeg4v...](https://plus.google.com/110981030061712822816/posts/KaSKeg4vQtz)

------
filleokus
Hmm, I wonder why this is hosted by Lysator (a student association at
Linköping University in Sweden), does anyone know?

~~~
smarks
Lysator has been around since forever. The English information page [1] says
it was founded in 1973. I remember it from the early days of the Internet and
possibly even from Usenet days.

Among other things it has a large repository of historical documents and
papers on C programming and standardization dating back to the 1980s. See [2].

For what it's worth, I bookmarked the latter link in 2001.

[1] [http://www.lysator.liu.se/english/](http://www.lysator.liu.se/english/)

[2] [http://www.lysator.liu.se/c/](http://www.lysator.liu.se/c/)

------
dwringer
Nobody's ever gonna convince me that "maxphysaddr" is a good example of a
well-named variable.

