
Rob Pike: Notes on Programming in C - ssp
http://www.lysator.liu.se/c/pikestyle.html
======
drblast
Can anyone who's worked on a well-written, large C project comment on the best
way to structure headers and include files?

I don't like including stdio.h and stdint.h in every .c and .h file, but it
seems like the "standard" way to compile with Makefiles is to compile each .h
and .c pair as an object then link them together at the end. So you need the
header files to use includes and such in every file or the compiler barfs.

Every time I try to structure a C project I feel like I'm doing it wrong and
that there must be an elegant solution that I'm missing.

~~~
anonymousDan
I'm not a C/C++ expert but I can highly recommend the book "Large-Scale C++
Software Design" by John Lakos. It gives an extensive discussion on how to
structure programs and addresses the issue above in detail, in addition to
many others. I'd be very interested to hear the opinions of any HN C++ experts
on the principles espoused in this book (assuming a few have read it!).

~~~
jbn
It's a great book with valuable insights, i have successfully used them
multiple times to decrease build times of multi-million LoC projects.

------
rcfox
I disagree with the include files section.

As long as each file has a proper include guard, it won't matter how many
times you've included a file. I'd much rather have the preprocessor figure out
when and where to include a file than do it myself. Maybe this was a
significant performance issue in 1989, but it shouldn't be anymore.

That said, you should only include files that are direct dependencies. There's
no need to include stdio.h in every other file.

~~~
palish
That's a naive viewpoint. It's more of a concern in C++ ... if your codebase
grows to massive proportions, and you #include headers within headers, then
modifying a header can result in ~60% of your codebase being rebuilt. (At
work, I deal with this pretty much every day.)

Including header files within headers is simply unnecessary, especially in C.
You need to forward declare in header files, then include in source files
where you _actually use_ that header file.

~~~
gte910h
C is not C++.

In C you shouldn't be chaining header files because you don't make function
calls in header files in C code (well most good C code at least, people abuse
macros sometimes).

In C you should only include header files in header files if you have types in
the other header files. It's _okay_ to forward declare if you must, but its
often quite a bit clearer to not.

Additionally, you should always include all the h files that have your
undefined functions (only relying on include chaining for header files at
most). So if you have:

DogModule.c calling Sounds.c functions and exporting a public function in
DogModule.c called "PlayBark(struct soundNode);", DogModule.h and things that
use DogModule should both import Sounds.h.

Good C habits and structure are _very different_ than good C++ habits and
structure. They're not the same language, not even close. Mangling your
concepts of Good C and Good C++ is a great way to write both poorly.

~~~
palish
Insightful, thank you.

------
rauljara
I think it's helpful to have criteria like this out there, so that we can
discuss them. That said, what's with the bias against capital letters?
maxphysaddr is more readable than MaxPhysAddr? godisnowhere is more readable
than GodIsNowHere? Or is that GodIsNowhere? I'm personally a fan of coding
styles that cut down on ambiguity.

~~~
jmulho
I prefer god_is_nowhere. The easiest thing to read is what you are used to. To
fairly compare godisnowhere, GodIsNowhere, and god_is_nowhere, you need to
stick with each for several days. I bet you'll discover, as I have, that
god_is_nowhere is the easiest to read.

~~~
TheNewAndy
It makes the distinction between god_is_no_where and god_is_nowhere much
clearer than GodIsNowhere and GodIsNowHere. A slightly more subjective one is
for things with acronyms in them. Compare: check_url_index, CheckURLIndex and
CheckUrlIndex.

------
pgbovine
''' The following data structures are a complete list for almost all practical
programs:

array linked list hash table binary tree '''

this is why it's such a pleasure to program in modern high-level languages
(e.g, python, ruby) --- these basic data structures (essentially sequences and
mappings) are baked into the language with simple, concise syntax

~~~
chc
Except for one. Quick: What's the syntax for a binary tree in Python?

OK, I'll give away the answer: It's not there. I don't think binary trees are
even in the standard library for Python or Ruby. It's a sadly neglected data
structure because, when exposed for what it is, it's a little bit too computer
science-y for most people's taste.

~~~
scott_s
Balanced binary tree versus hash table is an _implementation choice_ for the
associative array abstract data type. Similarly, growable array versus linked
list are implementation choices for the list abstract data type.

I don't buy that balanced binary trees are "too computer sciencey for most
people's taste" as an argument against not have them as implementations of
native types in Python. Designing a good hash table is equally as subtle in
terms of both the theory and the implementation itself. But those details are
not exposed through the _interface_.

~~~
shadowmatter
"Balanced binary tree versus hash table is an implementation choice for the
associative array abstract data type."

You can treat it as an implementation detail, but it's a good idea to let the
client know that the underlying key-value collection is actually sorted by
key. This allows the client to iterate over the key-value pairs in sorted
order without dumping the keys to an array and then sorting it, retrieving the
k smallest keys through forward iteration, retrieving the k largest keys
through reverse iteration, etc. I think Java solves this nicely by creating
the SortedMap subinterface of Map; if you have a SortedMap, you're assured
these properties hold. (The typical implementation of SortedMap is a balanced
tree, while the typical implementation of Map is a hash table.)

------
yason
I agree with about everything but the best part is the function pointer
chapter.

Function pointers are an insanely novel concept. They're not even a C thing
per se but since C has first-class pointers they fit right it.

Nevertheless, most design patterns, closures, higher-order functions, more
complex dispatching, etc. are effectively just a function pointer and possibly
some userdata. You can think in Lisp but do it all in C with function
pointers.

~~~
viraptor
> _They're not even a C thing per se_

Really? from stdlib.h:

    
    
        void
        qsort(void *base, size_t nel, size_t width,
            int (*compar)(const void *, const void *));

~~~
yason
Not even a C thing _per se_ , as in "C has function pointers but it's not
really a C specific feature as some other languages have them too. Like
assembly, very obviously."

Many other languages don't. If they're sufficiently high-level they might have
function objects or closures, which often cover everything you want. If
they're Java they have neither but you can write a Visitor-style class or some
other scheme from the design patterns book that emulates the behaviour of a
function pointer.

But fundamentally (and when the program is compiled down to machine code also
hopefully) they're just function pointers.

~~~
eru
I agree. Though what it gets compiled down to depends on the compilation
strategy. If your compiler is smart enough, and your use of functions as
variables / function pointers is weak [1] enough, your functions may, say,
just get inlined into a big case-statement-like construct.

[1] In the sense of <http://en.wikipedia.org/wiki/Strength_reduction>

------
contol-m
There is a lot more in his book, "The Practice of Programming" -
<http://cm.bell-labs.com/cm/cs/tpop/toc.html>

------
wtracy
"Thus I say maxphysaddr (not MaximumPhysicalAddress) for a global variable,
but np not NodePointer for a pointer locally defined and used."

Yes, because "maxphysaddr" is extremely readable for people who are not native
English speakers.

Wait, was that "maxiphysadd" or "maxmphyaddrss"? Better stop coding (breaking
up my train of thought) and scroll back up to the variable declaration to
double-check how I spelled it.

(Yes, I know that modern IDEs can remember variable names for you, but there's
_always_ those emergency situations where you wind up trying to debug
something with VIM over SSH via a slow-as-molasses hotel wifi service.)

~~~
tedunangst
If you can't remember maxphysaddr vs maxiphysadd, I doubt you'll do much
better remembering MaxPhysAddr vs MaxiPhysAdd.

~~~
chc
MaxPhysicalAddress is more readable than all the above. Pike simply had it
wrong, IMO, probably from having been immersed in a culture of poor
readability for so long that he had become blind to some of it.

~~~
caf
I agree with Pike on this one - and I think it's likely true what he says
about being used to reading prose. For those of us who do a _lot_ of non-
programming reading, Funny Out-Of-Place Capitals interrupt the smooth flow
when reading variable names.

I do_tend_to_use underscores, though - I find that they don't have the
semiotic noise value as WeirdCapitals.

~~~
chc
I find the opposite — underscores are hard to read, but innercaps are almost
effortless. At any rate, they're equivalent — you can mechanically transform
from one to the other without data loss. I would say that either is preferable
to not delimiting your words at all. It sounds to me like you agree with me
more than Pike, modulo an insignificant matter of taste.

------
basman
The rule against nested includes seems dated — he doesn't like include guards
in header files because "[t]he result is often thousands of needless lines of
code passing through the lexical analyzer, which is (in good compilers) the
most expensive phase." Surely not any more?

~~~
cdavid
Indeed. I have read that parsing is the slowest part of compilation several
times, but I guess this is very dated. For example, compiling numpy (a python
package with ~ 100 kLOC of C code):

    
    
      - CFLAGS="-O0" -> ~ 20 sec
      - CFLAGS="-O1" -> ~ 35 sec
      - CFLAGS="-O3" -> ~ 60 sec
    

Since the amount of parsed code for those modes is approximately the same (for
numpy it is exactly the same, but one could imagine differences in glibc
headers, etc...), this shows that parsing is not the bottleneck anymore if you
build non-debug builds.

~~~
mfukar
No, that shows that the optimizer work takes a non-trivial amount of time.
Optimizations (how often?) happen after parsing, type-checking and
constructing the intermediate representation.

~~~
cdavid
sure, that's exactly the point I am trying to make: if it takes 75 % time more
to compile @ O1 than O0, it means that parsing takes at most ~ 60 % of
compilation time for quite conservative optimization. I think it is quite
common to compile above the lowest level of optimization.

To have an accurate estimation, one could look at clang, and only run the
tokenizer.

------
kqueue
I disagree with him on the include part. It is very difficult to remember the
order of includes + dependencies for each file you want to include. It becomes
very messy.

~~~
__david__
I agree. I have better things to do than to memorize the random prerequisites
of header files. Not to mention cluttering my C source with junk that isn't
important to the source in that particular file. My rule is that I should be
able to make an empty C file and put a single #include "header.h" line. If it
doesn't compile then there is a bug in that .h file.

~~~
rbetts
One also sees the common advice to include the paired header (ie, foo.c should
include foo.h) as the first #include to verify that foo.h is dependency-
complete.

The problem with dependency-complete headers is that once a .c file relies on
that header for a symbol, if the header is changed to no longer require the
symbol, all those .c files have to have their #includes fixed. This can take a
_lot_ of time to iterate through in a projects that require hours to compile.

------
gte910h
Notice the date folks: 1989.

Lots of development has gone into C for the better since then.

