Writing a C library, part 1: best practices for writing C libraries

shaggyfrog · on June 28, 2011

In the section on memory management, I was surprised that there was no mention of providing a malloc/free method (or some other memory allocator) to the library to explicitly use. This is especially handy if you're working with limited memory environments prone to fragmentation, where you might want the library to call your custom allocator which is managing small memory pools.

tptacek · on June 28, 2011

What's a well-regarded C library of medium complexity (something that isn't going to be the entire kernel of an application) that does that? I can't think of an example off the top of my head.

sg2342 · on June 28, 2011

I would require this in every C library that provides a crypto framework or provides support functionality for such a framework. And to extend this: i would require this in every c library that touches cleartext (in an cryptographic application).

tptacek · on June 28, 2011

What popular encryption library does this? Also: what would you do specially in your allocator? It seems wrong to me that the users of a library should be expected to know (for instance) the precise incantation required to clear cache footprints on Core2 processors; that seems like an internal library concern.

sg2342 · on June 30, 2011

on top of my head: libgmp (not a crypto library but used by some), libgcrypt, openssl (requires a dynamic engine to do this), libssh2, libmcrypt ...

IMHO the user of an encryption library (the developer of a crypto application) should at least be able to ensure that every bit of heap that was touched by the library is zeroed after it was freed.

endgame · on June 28, 2011

Lua: http://www.lua.org/manual/5.1/manual.html#3.7

shaggyfrog · on June 28, 2011

I would say it's not a common pattern, if only because most libraries aren't written with embedded systems with only a few megabytes (or even kilobytes) of available memory.

I've at least seen it firsthand in C++ systems written for game consoles (PS2) from my time in that industry.

shaggyfrog · on June 30, 2011

Stupid me, I forgot about libxml2. I even asked their mailing list about it 3 years ago: http://mail.gnome.org/archives/xml/2008-June/msg00085.html

defen · on June 28, 2011

libpng lets you specify your own malloc/free functions.

tptacek · on June 28, 2011

That's a great example. Thanks.

jallmann · on June 28, 2011

libstrophe (xmpp) allows you to use your own allocator, but it's pretty niche.

cpeterso · on June 28, 2011

zlib and libpng.

asher · on June 28, 2011

Exactly. Also, if the library uses a hash function, let the caller override that.

malkia · on June 28, 2011

If possible provide non-callback versions of your functions. It's very useful when trying to use the "C" api from a different language, and callbacks being very hard to put in. (C -> otherlang -> (callbacks again) C).

sigil · on June 28, 2011

Or, instead of making C a more pleasant language for developing applications and middleware libraries, you could use a programming language better suited to those tasks. Call out to C only when necessary and at the lowest layers. Don't layer C on C. Even C++ is an order of magnitude more pleasant when it comes to serious application development and building large, layered systems.

Would I write a C library that gives callers a transitive dependency on D-Bus, or contains an epoll event loop? No.

When I write C, it's typically either (1) small djb-style applications that make a bunch of system calls and don't need any data structures beyond C arrays, or (2) libraries of essentially functional code that do performance critical computations. I love C, and it has its place: near the bare metal.

unshift · on June 28, 2011

i'd actually much prefer a lot of handy libraries (e.g., boto comes to mind) be written in C, and exported to higher level languages.

that way, each language wouldn't have to duplicate every other language's effort. the end result would probably be more unified APIs and more complete libraries.

burgerbrain · on June 28, 2011

Exactly. Imagine how much harder life would be if openssl were written in Perl instead of C. It might be a self fulfilling prophecy (C is good for that sort of thing because C is popular, largely in part because C is good for that sort of thing,....) but that's just how real life works sometimes.

burgerbrain · on June 28, 2011

So because you don't use C commonly, you propose that we avoid making it easier to others to do so and collaborate?

This is just language trolling in thin disguise.

sigil · on June 28, 2011

> So because you don't use C commonly

Where did you get that idea?

The author talks about writing C libraries that depend on GTK+, GLib, APR, D-Bus, and the like. There's no way in hell I'd write a C library that depends on that stuff. Above some layer, C is no longer the right tool for the job; you're better off switching to C++ or to a dynamic language. From the author's examples I see no appreciation for this fact.

burgerbrain · on June 28, 2011

The author talks about conventions for C libraries in general. Among other things, it advocates not reinventing the wheel. Sound advice. Pretty much the entire reason we have libraries at all in the first place.

I'm still not seeing a reasoned explanation of why you should be forced to re-implement linked lists, hash tables, red-black trees, etc. It's great and all that your C code is simple enough to get by just using arrays, but some of use need a bit more. These elementary data-structures are hardly anything to be afraid of using in C, particularly when someone's already implemented them for you. What in the world could be the harm in pulling in some handy string buffer functions? It's not as though dynamic linking is a black art.

Furthermore, you completely disregard anyone who either 1) is unable to use something besides C, and 2) people who chose to use C because they enjoy it.

That's why you are language trolling.

sigil · on June 28, 2011

The author talks about conventions for C libraries in general. Among other things, it advocates not reinventing the wheel.

All fine things. The author has some good points on C library design. I have no problem with C (which I use heavily in both my startups), or with C libraries (which I've published, see my github), nor do I need a lecture on the benefits of reuse. Your imagination seems to be running wild here.

My problem is with the author's notion of reuse. Specifically, his lack of appreciation for the flipside: dependencies. We are told to write C libraries that in turn depend on GLib, GTK+, APR, and the like. As a consumer of C libraries, there is no way I'm going to be happy about pulling in GTK+ just to use a library that does Unicode manipulation. (His example, I'm not even making this up.)

That's what I meant by "middleware libraries" in my original comment. I like C libraries that do one thing and do it well, without adding unnecessary transitive dependencies. So, using C at the "lowest layer" of the dependency graph? Absolutely, all for it.

I'm still not seeing a reasoned explanation of why you should be forced to re-implement linked lists, hash tables, red-black trees, etc.

You shouldn't re-implement them. Use C++. STL has had fast, stable implementations of these for years. Destructors + exceptions (the only really useful parts of C++ imho) save you boatloads of nasty C code of the form:

    int foo()
    {
       hashtable *h = hashtable_new();
       ...
       if (err1) {
         hashtable_free(h);
         return err1;
       }
       ...
       if (err2) {
         hashtable_free(h);
         return err2;
       }
       ...
       if (h) hashtable_free(h);
       return 0;
     }

Or the almost equally repulsive

    int foo()
    {
       hashtable *h = hashtable_new();
       ...
       if (err = error1) goto error;
       ...
       if (err = error2) goto error;
       ...
     error:
       if (h) hashtable_free(h);
       return err;
     }

Any large, layered C program using dynamic data structures inevitably fills up with this (error prone!) junk.

Don't like C++? Need a special data structure not in STL? Fine, use C. Hell, I've forked and contributed to a trie implementation in C [1] because I needed it in one project, and there were no quality implementations in C++. It worked out nicely because the program was small and the error prone junk (above) was minimal.

Another option is to write your app in a high level interpreted language to begin with, using whatever data structures they provide, and rewrite the critical path in C. I've been doing this like crazy recently [2] [3] [4].

Furthermore, you completely disregard anyone who either 1) is unable to use something besides C, and 2) people who chose to use C because they enjoy it.

I enjoy C as much as the next person, I hope I've clarified that. I also work in embedded contexts where C is the only option -- in one case, an AVR microcontroller with 8K of program memory. Do I still sometimes wish it was possible to program it in lisp or some other higher level language, like the JPL guys did when they debugged a space probe 100 million miles away [5]? Hell yeah I do.

[1] https://github.com/acg/critbit

[2] https://github.com/acg/lwpb

[3] https://github.com/acg/python-percentcoding

[4] https://github.com/acg/python-flattery

[5] http://www.flownet.com/gat/jpl-lisp.html

smcl · on June 27, 2011

Wow, link overload in that post. Did he really feel the need to link to the US Government wikipedia page?

gue5t · on June 27, 2011

I didn't even notice that there were a large number of links in the post. I think the majority of them would be useful for someone unfamiliar with some of the topics described, though. The great thing about hypertext is that links are optional, so an abundance, as long as the linked content is of quality, hurts no one.

spacemanaki · on June 28, 2011

I disagree, if you link to something there's an implied recommendation, and also a possibility for subtle commentary. (For example, link to something critical of the anchor text.)

When I see a link and I'm curious what's being linked, I often hover to see where it's going. If it turns out it's a link to Wikipedia for a mundane, off-topic article like the US Gov't, I think that kind of wastes brain cycles, and I'm a bit disappointed.

I know this is a small, hair splitting issue, but I think it's valid. Some writers try to cut words where they can to make their writing concise and tight. I think this falls in the same category.

The worst case I've seen is in one of the earlier Clojure (print) books (not Joy of...) that actually had a few footnotes with Wikipedia URLs in them, for CS topics.

forgotusername · on June 28, 2011

Rusty Russell had a slightly tongue-in-cheek rant about such things recently, http://rusty.ozlabs.org/?p=140

There's a bunch more recent articles on his blog covering similar topics