
Collections-C, generic data structures for C - marxo
https://github.com/srdja/Collections-C
======
kev009
I try and plug [http://concurrencykit.org/](http://concurrencykit.org/)
whenever I see someone looking for something like this. It's built around
providing portable atomic intrinsics for parallel code, but the data
structures are also very well done and work just as nicely for single threaded
code.

~~~
rurban
Yes, ck and preshing
[https://github.com/preshing/junction](https://github.com/preshing/junction)
are state of the art. This one really not.

------
eps
Nice, but it's rather naive and through that - needlessly wasteful.

It's C.

Why on Earth you'd want to allocate a separate list node for each piece of
data when you can embed this node directly into the data and then use
container_of or similar offsetof() derivative to get a pointer to the data by
a pointer to a list item? Saves you _at least_ sizeof(void*) per item and
eliminates a chance of list_add ever failing among other things.

The same goes for custom allocators - why would you drag around pointers to
malloc, calloc and free, when all you need is just a pointer to realloc?

This sort of thing. The code is nice, but it's not how one would write a
container library after few years of hands-on C experience.

~~~
seiji
_allocate a separate list node for each piece of data_

So many programmers don't even consider "overhead" of data when writing
things. But, most of the time it doesn't matter. Do you need a list of ten
things? Great. Do whatever. Do you need a list of a billion things? Then you
need to rethink everything from the bottom up.

 _when all you need is just a pointer to realloc_

Wrong!
[https://github.com/Tarsnap/libcperciva/commit/cabe5fca76f6c3...](https://github.com/Tarsnap/libcperciva/commit/cabe5fca76f6c38f872ea4a5967458e6f3bfe054)

~~~
catnaroek
> So many programmers don't even consider "overhead" of data when writing
> things.

Because they're using managed languages where the amount of overhead per
object can't be reduced to zero. And also because, in exchange for that
overhead, they get other benefits, like compacting garbage collection. But you
don't get this in C.

------
vc98mvco
A quick look shows this is storing void* pointers only. My excitement was
quickly over as I was expecting something with _Generic macros or ability to
manage memory for any type.

[https://github.com/srdja/Collections-C/blob/master/src/hasht...](https://github.com/srdja/Collections-C/blob/master/src/hashtable.h)
shows the container is manually configurable. Look at the defines at the
bottom. This would mean the container file source-header pair, would have to
be duplicated for every type, which is somewhat manageable.

Why isn't this written in the readme?

~~~
deadgrey19
<shameless self promotion> Take a look at this instead. Work started on this
before C-generics came about, so there is pretty heavy macro usage, but all
data structures are type safe:
[https://github.com/mgrosvenor/libchaste](https://github.com/mgrosvenor/libchaste)

~~~
vc98mvco
I see you use C macro templates. I have been experimenting with containers
that use those. The advantages are speed and full type safety, and
disadvantages are long compilations times (which have to be done only once
though), the need to define a new template for every type in (only one)
separate file, code bloat, and somewhat long function names (I don't use
member function pointers).

I'm not sure if all that is worth it.

~~~
deadgrey19
I've not noticed the compile time increase to be honest. But I have noticed
the type safety. A few times it has saved me from bugs that would have taken
days to find if I was using void* casts. In my world (mostly high perf
networking) I'm happy to pay a (small ~= unnoticeable) compile time cost if I
get both safety and performance.

To make it more user friendly, I've provided (yet more) macros to make
defining new types super easy. Using the macros does not necessarily involve
adding a separate file, although the code is much cleaner if you do. And I
don't have anything against adding extra files. There's almost no cost to me.
I often group a couple of vectors or linked lists into a single .c/.h file
pair to keep things manageable.

The code bloat you get is on par with using C++ templates. In fact, I'd say
that it is probably better because the type safe macro's all devolve to a
void* underlying implementation rather than generating separate instances. So
it's pretty thin. You can always throw away the standard types list that I
provide so that you don't have to pay for anything you don't use. And if you
really want to, you can interface to the void* underlying implementation
directly, loose the indirection costs, the function pointers and the macro
costs. Basically, you can choose which world you want to live in, or live in
both at the same time.

The long function names are handled by macros again. And since i used function
pointer indirection you almost never see them. The costs with this sort of
thing are minimal. A typical invocation looks something like

    
    
      CH_VECTOR(MYVECCLASS)* myvec =  NEW_CH_VECTOR(MYVECCLASS);
      myvec->append(myvec,object);
    
    

There are certainly tradeoffs, the implementation is not nearly as mature as
I'd like and debugging problems inside the containers is a HUGE PITA. But in
my experience it lets me kee the benefits of working in C (can pull inside the
kernel, can port to different machines, easy control over memory and
performance) and gives me the flexibility of doing higher level things when I
want.

------
wangchow
Would be nice to have parallel structures like "Intel threading building
blocks" provides.

[https://en.m.wikipedia.org/wiki/Threading_Building_Blocks](https://en.m.wikipedia.org/wiki/Threading_Building_Blocks)

I recommend this book to understand such structures:

[https://www.goodreads.com/book/show/14788830-structured-
para...](https://www.goodreads.com/book/show/14788830-structured-parallel-
programming)

~~~
srdja
I'll definitely look into it.

------
srdja
Hello everyone, I'm the author of this library. I was quite surprised to find
it on the front page of HN! I must thank you for all the great feedback, it's
been very helpful. I agree that it has rough edges and that it also lacks some
useful features but I guess this is mostly because I was the only user of it.

I very much welcome your suggestions on how to make better and more useful. :)

------
kazinator
Kazlib: since 1996.

[http://git.savannah.gnu.org/cgit/kazlib.git/tree/](http://git.savannah.gnu.org/cgit/kazlib.git/tree/)

[http://www.kylheku.com/~kaz/kazlib.html](http://www.kylheku.com/~kaz/kazlib.html)

Used in e2fsck since 2002:
[http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/commit/e2fs...](http://git.kernel.org/cgit/fs/ext2/e2fsprogs.git/commit/e2fsck/dict.c?id=838e773e7a6899cec10884ad6c3fdcdaef72b82b)

------
naasking
klib [1] is another good one I recently came across. In fact, I'll just leave
this great curated list of free software libraries right here:

[https://notabug.org/koz.ross/awesome-c](https://notabug.org/koz.ross/awesome-c)

[1]
[https://github.com/attractivechaos/klib](https://github.com/attractivechaos/klib)

------
deadgrey19
<shameless self promotion> If you're interested in this sort of thing, take a
look at
[https://github.com/mgrosvenor/libchaste](https://github.com/mgrosvenor/libchaste).
Work started on this before C-generics came about, so there is pretty heavy
macro usage, but all data structures are type safe rather than the "cast to
void*" approach which is common in this space. It's a bit less mature, but
reasonably well covered by unit tests where it matters.

------
brongondwana
I tried reading the documentation, which is awfully sparse, to see how to use
the hash table and if it would be a good replacement for the current one in
cyrus-imapd.

The destroy function in cyrus-imapd has both a destroy that takes a cleanup
function for values, and an interator:

[https://github.com/brong/cyrus-
imapd/blob/master/lib/hash.c](https://github.com/brong/cyrus-
imapd/blob/master/lib/hash.c)

I couldn't see how to do either with this library until reading the source
code I see you can set a mem_free at creation instead.

The documentation didn't really describe that, and the example is actively
wrong.

I see from the code that there's a way to get the keys, from which you can do
an external iterator. It's less efficient, but it works. You wouldn't know
that from the documentation though, you have to read the source code.

~~~
fit2rule
Any chance you're going to submit a pull request to fix the documentation? I'd
like to check this library out for use in a project, but I'm not sure I'd get
so far as you have with regards to matching the documentation to the sources
..

------
mschuster91
From hashtable
([https://github.com/srdja/Collections-C/blob/master/src/hasht...](https://github.com/srdja/Collections-C/blob/master/src/hashtable.c#L58))

> HashTableConf htc;

Hold short, that structure is allocated from the stack and not via malloc? I'd
expect that one to blow up after the second call to hashtable_new.

edit: seems like htc is immutable and the values copied over in
hashtable_new_conf, but that's downright scary in case one forgets this
semantic and accidentally uses htc for something.

------
jepler
It looks like all collections store void pointers. I don't see any
abstractions to help with pointer lifetimes, which remains the main problem I
encounter when trying to write large programs in C that dynamically allocate
memory.

~~~
akp__
I'm no expert here, just curious. I've seen this brought up a couple times
now: what's wrong with void pointers? How else would you do this in C? If you
wanted abstractions to help with pointer lifetimes wouldn't you just be better
off using C++?

~~~
lucozade
The limitation with void pointers is that it can only inline values <= the
size of a pointer. If you have something larger then you need to allocate it
separately which may have cache efficiency issues.

The main alternative, other than hardcoding an element type, is to use macros
and pass the element type in the definition.

Sometimes you just don't have the option of a C++ compiler. Also, and this was
a situation I was in recently, I needed a collection in a small area of my C
code so it really didn't warrant a toolset change.

------
gibsjose
Has this library been stress tested? I am interested in using this library for
a spaceflight application, but reliability is paramount in our situation.

~~~
asveikau
I stopped looking when I saw it doesn't check for malloc failure. I am going
to say space flight is a no-go.

~~~
david-given
But for this kind of application, wouldn't you just panic and do a soft
restart on malloc failure anyway? Actually _recovering_ from an out-of-memory
situation is incredibly fraught and unreliable; most of the realtime operating
systems I've used don't even bother to try.

~~~
gibsjose
That depends on when a failure were to occur. If we have a failure while the
spacecraft is en route to its destination, perhaps we can recover because we
may have enough time.

If we have a failure during rendezvous with our target, it could be a very bad
day.

~~~
david-given
...necropost, but:

You always need to be able to recover from a soft reboot, even during
maneuvers; you're in a high radiation environment and any passing high energy
particle or cosmic ray can trigger this.

------
Mellowcandle
nice. I actually built something very similar a while ago. although I only
supported lists, queue and stacks...
[https://github.com/mellowcandle/liblist](https://github.com/mellowcandle/liblist)

------
ape4
Something like this should be in standard C.

------
YesThatTom2
Take that, Golang!!!!

J/K

------
rewqfdsa
Honestly, why? Most of the time, you want to write C++, with exceptions, and
the STL. You can make this kind of programming as robust as you want against
memory allocation failure.

If you don't want to use this style of programming, for whatever reason, check
out sys/queue.h. It's already on your system, if you're using some kind of
Unix.

~~~
gibsjose
If you are using an RTOS (which many timing-critical applications run on), you
should be worried about memory allocations happening under the hood.

If you don't have full control over when memory is being
allocated/reallocated, your system is now non-deterministic.

With pure C, you can know exactly when those few extra instructions for
resizing your dynamic array are going to happen.

~~~
catnaroek
Same in C++. There's `std::vector::reserve`, which grows the vector's
underlying physical buffer, without logically adding any new elements. If you
`reserve` enough capacity before inserting anything, it's even a `O(1)`
operation.

~~~
gibsjose
Vector and possibly string have that. The vast majority of STL structures do
not.

~~~
catnaroek
All containers from the C++ standard library (please don't call it STL, that's
Stepanov's original library) can be parameterized by an allocator. You can use
whatever allocation policy you like best. However, most people use the default
allocator because it's good enough.

In any case, while C++ has lots of defects, “loss of control relative to what
C gives you” isn't one of them.

~~~
optforfon
(a bit of topic - but I'm humbly trying to learn more...) I've got zero
experience writing allocators. Is there some common ones you use provided
somewhere? Do you write your own? (in which case can you point me to where to
learn to do that properly)

~~~
catnaroek
I don't write allocators myself, but Boost has a pool allocator library
[[http://www.boost.org/doc/libs/1_60_0/libs/pool/doc/html/inde...](http://www.boost.org/doc/libs/1_60_0/libs/pool/doc/html/index.html)],
which conforms to the Allocator concept defined in the standard library
[[http://en.cppreference.com/w/cpp/concept/Allocator](http://en.cppreference.com/w/cpp/concept/Allocator)].

~~~
optforfon
Thank you

