
Armin Ronacher: Collections in C - jgalvez
http://lucumr.pocoo.org/2010/11/24/collections-in-c/
======
tptacek
Armin seems to have reinvented the BSD queue macros.

Having once had the pleasure of inheriting a codebase animated by CPP-macro
collections, allow me to be a voice in favor of running, not walking, from
programmers who embrace them. They're tricky, they blow up, they're extremely
noisy in the code, and (worst of all) they don't encapsulate, offering new
devs a myriad of ways to write tangly hard-to-understand subtly broken code
that works directly with the collection structure.

Type safety is entirely overrated. C requires so much deliberation to deploy
even the simplest collection that the likelihood of you picking up a _foo_
where a _bar_ was what was provided is minimal, and easily diagnosed. Just use
voidstar.

One of the first things I did at that job was to port STLport's Red-Black tree
(from <map>) to C code, specialized on voidstar. It worked beautifully. If
there are times when you don't want to use a void* collection (or a gossamer-
thin wrapper around one), those are also times when you don't want a generic
collection library to begin with.

~~~
strlen
There's also a much cleaner way to have interfaces and implementations in C,
without relying on macro hackery.

Why, there's even a whole book on that: [http://www.amazon.com/Interfaces-
Implementations-Techniques-...](http://www.amazon.com/Interfaces-
Implementations-Techniques-Creating-Reusable/dp/0201498413/)

Like Norvig's _Paradigms of Artificial Intelligence: Case Studies in Common
Lisp_ and Joshua Bloch's _Effective Java_ , it's one of those books which
despite having a specific programming language in the title, is really about
programming in general.

~~~
tptacek
This is my favorite C book ever, I recommend it wholeheartedly, and it
recommends voidstars. =)

~~~
nkurz
Respectful of you but not of the book, I have to ask: what do you like about
it? My initial reactions were very negative
(<http://news.ycombinator.com/item?id=1705332>). I love K&R, but recoiled in
horror at every page I managed to read of this book.

I've written entire macro-based systems similar to those recommended by the
author here, and find them very useful. Void pointers are great where they can
be used, but I love the efficiency and clarity of the generated code approach.

~~~
tptacek
I found your objections to this book pedantic.

For instance, in a chapter on string atoms (symbols) --- a concept which
virtually no C program in the world takes advantage of, despite the centrality
of the concept to Lisp, Python, and Ruby --- and this is _the first chapter in
the book_ \--- you were shocked by his use of a 43 byte string to hold a
number string... because someone might be using that code on a 192 bit
machine?

You missed the forest for the trees. If you don't want your code tainted by
the number 43, don't write that code. The point of the book is how you
structure your code, divide it into subsystems, and present coherent
interfaces to the rest of your program.

~~~
nkurz
I'm not bothered by the idea of a fixed length for strings, rather the
explanation that it is best to use the number 43 instead of a named constant
(MAX_STRING_LEN) to avoid namespace pollution. If the first paragraph I quoted
had been standalone, or better as a short comment, I would have been in full
approval.

Anyway, I appreciate your response. It's good try to appreciate what others
see that I do not. Yes, if you ignore the details of the code and the
explanations, there are some good parts. If you look _only_ at the big
picture, it's probably a fine book. And I love his clearly prefixed naming
conventions. But I think you'd do a lot better reading something like the
SQLite source code rather than this book if you want to see examples of good
C.

I guess I have to ask: do you feel that chapter 4 on using setjmp() and
longjmp() plus some brittle macros for error handling is also good for
learners? I thought it was technically very clear but about 40 years out of
date as to good practice. Is this a forest or a tree?

------
btmorex
Why not just pick and choose what you need from C++? For all the hate that C++
gets, one nice thing is it doesn't force anything down your throat. Between
simple not using features, and compile time options (disabling exceptions for
example), you NEVER have to pay for a feature that you don't want or need.

He could have literally written C-style code except used C++ just to create a
few generic data stores instead of using all of this preprocessor magic which
I guarantee is more fragile/harder to debug than basic C++ templates.

~~~
jedbrown
FWIW, C++ does force some things on you, such as automatically typedefing
structs instead of giving you a separate namespace, not warning about multiple
declarations with different arguments (because function overloading is a
feature), not warning about multiple definitions in different compilation
units (linker just picks one, assumes they are equivalent), having to cast
assignment of void* to a typed pointer (ugly and redundant), slower
compilation, and generally later errors/warnings (thus usually more cryptic
and less useful, although it does stricter type checking in some ways, so this
point is not completely clear-cut).

~~~
shin_lao
Hi, just want to bounce on the _"cast your void_ to a pointer"*, actually you
don't need to do that if you don't use an object oriented paradigm à la Java,
but go generic full speed and use return by value. No pointers, little or no
casting.

~~~
jedbrown
> go generic full speed

What does this mean?

As a simple example, suppose I use malloc to allocate memory (because I don't
want to allocate with new which would oblige me to handle exceptions and
prevent me from handing ownership over to a C client or using an allocator
provided by a C client (without extra work)). I have to cast the return value
as in

    
    
      struct Foo *x = (struct Foo*)malloc(n*sizeof(*x));
    

instead of

    
    
      struct Foo *x = malloc(n*sizeof(*x));
    

This looks purely cosmetic, but in C, I can write the macro

    
    
      #define MallocA(n,p) (!(*(p) = malloc((n)*sizeof(**(p))))
    

which then works with a standard error checking convention

    
    
      struct Foo *arr;
      err = MallocA(n,&arr);CHK(err);
    

This doesn't work in C++ without non-portable typeof or evil and less safe
(not conforming for function pointers)

    
    
      *(void**)(p) = malloc((n)*sizeof(**(p)))
    

Similarly, if a client registers a callback with a context, I store their
context in a void* and pass it back to them

    
    
      int UserCallback(void *ctx,...) {
        struct User *user = (struct User*)ctx;
    

instead of

    
    
      int UserCallback(void *ctx,...) {
        struct User *user = ctx;
    

I understand that this is just cosmetic. I don't see how "going generic full
speed" helps with this. Also note that aggregate returns are slower for all
but trivially small structures, and downright bad for big structures.

------
zedshaw
If you really want to learn how to do this, and do it really really well,
check out sglib:

<http://sglib.sourceforge.net/>

It's got nearly every data structure you can imagine, all implemented as CPP
meta hackery. Brilliant.

~~~
chancho
I used this library a while back, and it was nice at first, until I noticed
that his quicksort always picked the first element as the pivot, which makes
it O(n^2) on already-sorted input. Then I started second guessing everything,
and when you do that with a library you may as well write it yourself without
the CPP hackery. At least then the compiler errors will make sense and you can
debug it.

I went to go check out the code to see if this issue had been fixed, but the
download link has gone bad.

I mentioned this library on HN some years back, when I was still excited about
it, and the reply I got was something like "No! Not that macro boneyard!" He
was right.

------
roel_v
I don't agree with one of his premises:

"I normally like to avoid tools that generate new C files for the very simple
reason that these usually generate ugly looking code I then have to look at
which is annoying or at least require yet another tool in my toolchain which
causes headaches when compiling code on more than on operating system. A
simple Python script that generates a C file sounds simple, but it stops being
simple if you also want that thing to be part of your windows installation
where Python is usually not available or works differently."

I have moved most of my macro code generation to using another language, with
a proper template engine, to generate code (in my case C++ but the same holds
for C). If he doesn't trust Python across platforms, he can take a fixed
version with known properties and code around it, or take another language
(which presumably will have the same issues). I use PHP, I'm a bit careful in
cross-platform features and it works great.

Apart from this, he can still write his code generator in C, so that on a new
platforms it can be bootstrapped with a regular C compiler, then process his
templates, then compile his actual code. It's painful to do string processing
in C, but this generator only needs to be written once anyway, and it's not
much work. Plus if he uses a small template engine, that'll take most of the
pain away (most of the work will be in modifying the templates, not the code
generation engine).

------
mrb
A good C library implementing common data structures (lists, trees, hash
tables, etc) is GLib:

<http://library.gnome.org/devel/glib/>

It does many other things, provides abstractions for threads, files, etc. It
is a general-purpose utility library originally written for GTK+. I never
really understood why GLib is not more often used.

------
sukuriant
"This worked really well up to the point where I needed two kinds of lists.
One that accepted floats and another one that works with arbitrary pointers."

Had he never heard of unions?

Even without knowledge of unions, you could treat a pointer as a, depending on
the architecture, sequence of 32 bits. That sequence can be cast to whatever
you want. So long as you have a clear way of describing what you've stored in
that 32 bit sequence, you should never get confused.

------
kvs
Another way is probably to use a Linux Kernel-like list:
<http://isis.poly.edu/kulesh/stuff/src/klist/>

~~~
pmjordan
The kernel list approach also has the advantage that there's only one
implementation of the list operation functions, whereas the one outlined in
the article generates a new set of functions for each type.

------
heresy
There is something to be said for not taking DRY to extremes. This appears to
be one of those extremes.

------
rbranson
Oh man, with that kind of macro "metaprogramming" (if it can be called that),
you're bringing on a world of error message pain. It's almost not worth it.

~~~
burgerbrain
Painful as it may look, it's used in the while _a lot_.

Most of the time it actually works well.

------
jerf
I find myself reminded of the famous Dr. Sagan quote for some reason: “If you
want to make an apple pie from scratch, you must first create the universe.”
(From "cloning Minecraft" to "implementing a list" in only five sentences!)

------
st3fan
This is both awesome and very terrible at the same time.

This proves that you can do magic in C but it also proves that C is really a
low level assembler :-)

For me, stuff like this has been one of the prime reasons to prefer C++ over
C. You can totally get the same performance and compiled code when using
templates. But you gain compile-time checking and much more robust code.

(These preprocessor tricks are also popular in the BSD kernel)

------
endgame
`#define _CAT(A,B) A##B` is going to invoke undefined behaviour, I think:

""" None of these macro names, nor the identifier `defined`, shall be the
subject of a #define or a #undef preprocessing directive. Any other predefined
macro names shall begin with a leading underscore followed by an uppercase
letter or a second underscore. """ (s6.10.8)

~~~
zb
Not really, all that says is that it risks the name colliding with a built-in
macro defined by the compiler. At worst that seems like implementation-defined
behaviour.

~~~
endgame
I was too hasty and confused implementation-defined with undefined.

------
angrycoder
I give thanks every day that there are people like the author and others who
can write and understand code like that so that I don't have to. I'll be
damned if that doesn't look like trying to build a boat using a chainsaw and
popsicle sticks with instructions written in Farsi.

------
lukesandberg
this kind of stuff has always scared me. Its seems like it would be really
easy to get confused just trying to use this stuff. Generally if i need a list
i either use the void* interface and cast as neccesary, or if i want to do
something like store a list of floats, or anything where i want to store the
value of the item in the list (not just a pointer) then i use a 'fat'
datastructure like this

[https://github.com/lukesandberg/Regex/blob/master/src/util/f...](https://github.com/lukesandberg/Regex/blob/master/src/util/fat_stack.h)

when you construct it you just tell it how big each item is then it copies the
value into the stack element for storage.

Sometimes it a few extra copy/cast statements but at least its very clear what
you are doing.

------
srparish
<http://uthash.sourceforge.net/utarray.html>

<http://www.openbsd.org/cgi-bin/man.cgi?query=queue>

------
jchonphoenix
For those who are curious, this solution has a name:

Variable Queue

