
Show HN: libcodr7 – fundamental collections in the spirit of C - codr7
https://github.com/codr7/libcodr7
======
haberman
I didn't realize you could implement a red-black tree with this little code.

The code shown here seems to indicate that insert has at least four cases, and
delete has six:
[https://en.wikipedia.org/wiki/Red%E2%80%93black_tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree)

But the code I see here is very small:
[https://github.com/codr7/libcodr7/blob/master/source/codr7/r...](https://github.com/codr7/libcodr7/blob/master/source/codr7/rbnode.c)

I guess it's not handling delete, which probably accounts for a lot of the
difference.

EDIT: I see that it's implementing a left-leaning red/black tree. I see some
caution about that approach here? Would love to get the author's opinion about
this:
[http://read.seas.harvard.edu/~kohler/notes/llrb.html](http://read.seas.harvard.edu/~kohler/notes/llrb.html)

~~~
hairtuq
Red-black tree presentations are often more complicated than necessary, mostly
because of (premature?) optimizations. A simple presentation can be found
here: [https://www.cs.tufts.edu/~nr/cs257/archive/chris-
okasaki/red...](https://www.cs.tufts.edu/~nr/cs257/archive/chris-
okasaki/redblack99.pdf)

I haven't checked how this relates to the code in this library.

------
liuliu
Anyone who does too much C would know that manually implementing associative
array (dictionary) is not fun. I started to use two implementations for my
work, both worked pretty well for the things they are good at (and all of them
are as "generic" as you can get from C):

khash:
[https://github.com/attractivechaos/klib/blob/master/khash.h](https://github.com/attractivechaos/klib/blob/master/khash.h)

The interface I really liked, and you can customize the hash equality
functions.

rb from jemalloc:
[https://github.com/jemalloc/jemalloc/blob/dev/include/jemall...](https://github.com/jemalloc/jemalloc/blob/dev/include/jemalloc/internal/rb.h)

The interface is a bit wacky, but it is flexible enough for my needs. rb
shines when you need range queries (keys less than, more than etc.)

~~~
codr7
I prefer ordered sets with value semantics (which allows using as maps without
overhead) to hash tables since I find the performance easier to reason about
and often want ordering anyway. The easiest one to implement would be a binary
searched vector, but comes with the disadvantage of unstable pointers because
of reallocations. rbtrees are definitely trickier, though this one is left
leaning which reduces the complexity somewhat (see PDF linked from README).

~~~
liuliu
The rbtree implementation from jemalloc allows you to maintain the value
semantics (it is an intrusive design as well). I believe it is also a left-
leaning 2-3 red-black tree to be exact. The API is a bit harder to understand
but once you get use to it, can be pretty easy to use. It also implemented one
trick that embeds the color bit into the right child node pointer, otherwise
you are paying a full 8-byte size penalty.

~~~
codr7
Yeah, that's usually where I draw the line, embedding data in pointers :)
Bitfields and packed structs is another angle that I usually don't bother
with. Too many details to keep in mind for my feeble brain.

------
rwbt
If I were starting a new project in C, I would definitely rely on a solid
foundation library like APR[0] or CoreFoundation[1] without reimplementing a
bunch of fundamental data structure algorithms by myself. Glib is another
alternative, but too bloated in my view.

[0] - [https://apr.apache.org](https://apr.apache.org) [1] -
[https://github.com/opensource-apple/CF](https://github.com/opensource-
apple/CF)

~~~
codetrotter
Note that the GitHub link you have there is an unofficial mirror and the git
version history consists simply of the official source dumps.

To each their own but IMO repositories created in that manner are generally
not very useful (though there are exceptions).

The official location of the CoreFoundation (actually CFLite) source dumps is
[https://opensource.apple.com/tarballs/CF/](https://opensource.apple.com/tarballs/CF/)

The most recent CFLite tarball is
[https://opensource.apple.com/tarballs/CF/CF-1153.18.tar.gz](https://opensource.apple.com/tarballs/CF/CF-1153.18.tar.gz)

You can browse the CFLite source online at the official location as well:
[https://opensource.apple.com/source/CF/CF-1153.18/](https://opensource.apple.com/source/CF/CF-1153.18/)

The docs for CoreFoundation are at
[https://developer.apple.com/documentation/corefoundation](https://developer.apple.com/documentation/corefoundation)

~~~
comex
These days you’re probably better off using the version of CoreFoundation
that’s maintained as part of the Swift project:

[https://github.com/apple/swift-corelibs-
foundation/tree/mast...](https://github.com/apple/swift-corelibs-
foundation/tree/master/CoreFoundation)

...as it’s more up-to-date and developed in the open.

~~~
saagarjha
I wouldn’t go as far as to say it’s “developed in the open”. It’s
significantly more opaque than Swift or WebKit, for example; I’d describe it
as “accepting of patches”.

------
thestoicattack
I can't figure out how the linked list is used. Where is the per-element data?

I thought there would be a usage example in the tests, but there wasn't.

~~~
DSMan195276
These structures are intended to be used like the Linux Kernel's data
structures, via `container_of` (Though in the library the macro is called
`c7_baseof`. But it does the exact same thing as `container_of`).

They have somewhat of an example usage of the list in the `deque.c` file:
[https://github.com/codr7/libcodr7/blob/2b598e3d89d878ef53e05...](https://github.com/codr7/libcodr7/blob/2b598e3d89d878ef53e05161ef0fd83320e1e19f/source/codr7/deque.c#L20)

If you don't know about `container_of`, the idea is that it allows you to take
a pointer to an inner field, and transform it into a pointer to an outer
'container'. For the linked-list, this means that if you want to make a list
of `struct car` objects, you simply embed a `struct c7_list` into `struct
car`, pass the address of that internal `struct c7_list` object to the list
manipulation functions provided here, and then use `container_of` (Or
`c7_baseof`) to transform the pointer to the `struct c7_list` into a pointer
to the containing `struct car`.

In general, I think `container_of` is one of the best things for writing C
code, and it provides a surprising amount of flexibility and nice patterns you
can use. It does have the obvious issue of type checking (You could have a
`struct wheel` that also has a `struct c7_list` on it, and accidentally add ti
to the same `struct c7_list` that has `struct car`), and if your `struct car`
needs to be on more than one list at a time it can get confusing knowing which
`struct c7_list` entries belong to which list, but in general I'd call it a
big net positive.

It's also possibly worth noting, you can use this to implement a non-intrusive
list if you want, just simply make an object that wraps a `struct c7_list` and
a void pointer (Or a typed pointer, if you want), and then write some extra
logic around it for allocating nodes and such.

~~~
thestoicattack
Okay, I'm getting it. Not a knock on the author, but looking at this macro'd,
pointer-to-void'd stuff certainly makes me appreciate the STL.

Edit: aha, I see "intrusive" was the keyword I was missing to learn all about
this different style.

~~~
rafa1981
Intrusive lists are very different beasts than lists by value.

There are less allocations involved (perf and points of failure), values can
be inserted on different lists without new allocations, deletion is O(1),
elements can be heterogeneous (different sizes and types), etc

etc

I seldomly use linked lists, but most of the time i prefer them to be
intrusive. There of course are intrusive list implementations on C++.

~~~
jcheng
> values can be inserted on different lists without new allocations

Not multiple lists at the same time, surely...?

~~~
DSMan195276
They were talking about taking a value off of one list and adding it to a
different one, which doesn't require an allocation, but the entity is never on
more than one list at a time in that scenario.

That said, you can add an entity onto multiple lists if it contains multiple
nodes embedded inside. You have to keep track of which nodes are attached to
which lists though (Which generally just means being consistent on which you
use where). If you give them decent names, then it's not usually a problem,
but it can sometimes get a little confusing if you're not careful.

------
sersnth
Why have comparison callbacks return this enum

    
    
      enum c7_order {C7_LT, C7_EQ, C7_GT};
    

instead of a signed int which seems to be the idiomatic way in C?

~~~
codr7
Because I always end up switching on the return value, and switching on enums
is a lot nicer than ints. Especially when there is no agreement on what values
are returned, LT is mostly -1 but is often specified as simply less than zero.

When you have a finite set of cases, enum is just the right solution period. I
have no idea why people still insist on using ints.

------
slabity
"in the spirit of C" is pretty ambiguous. What does this do better or
differently from other libraries?

There's no working examples shown. There's a "real-world examples" link, but
it seems to just point to another project of yours.

There's no documentation of the API. If I wanted to use this, I would need to
manually read the header files and hope that I'm using them correctly.

And to top it all off, there's not a single comment in the entire project.

I don't mean to be harsh, but if you want other developers to use this, then
you need to provide more resources.

~~~
codr7
The descriptions of the containers contain more of the kind of information
you're looking for, it's a bit much to expect the caption to explain it all.

It's as real world as it gets for now, the test suite should give you an idea.

I will typically add comments where I feel the code doesn't do a good enough
job of explaining itself.

One step at a time.

It's ok, I don't expect anything.

------
tptacek
I've only glanced at these; can someone more familiar explain how these are
different from the BSD "queue" macros (and their offspring, like the tree
headers)?

------
Twirrim
"In the spirit of C". What exactly is meant by that?

~~~
arcticbull
The spirit of chainsaws with no hand guards haha.

But actually, to me that means a thin, light API with a couple of functions
and some macros. Does what it needs to do, no more. Error handling is basic if
even provided, usually via return of numerical error codes.

~~~
codr7
That's correct, it does mostly what you tell it to and stays out of the way.
If you want error checking beyond the test suite you have to do it yourself.
The issues people have with C comes mostly from treating it like any other
language. C is all about control and flexibility.

Further, special care is taken to reduce the number of allocations, often
using pools/slab allocators. And value semantics, or rather byte array
semantics, which means items are treated as blocks of N bytes rather than
pointers/references.

------
jlarocco
So what's the advantage of this over existing libraries?

~~~
codr7
That would depend on which libraries we are discussing, I don't know them all.

This is simply how I would do it based on my experience.

------
95014_refugee
So, like
[http://troydhanson.github.io/uthash/](http://troydhanson.github.io/uthash/)
but without the documentation?

(NetBSD has had an all-macro red/black tree implementation forever, plus the
ancient and well-understood BSD queue.h)

~~~
codr7
From a quick glance; it's hashed (as in not ordered), not pool allocated and
intrusive as opposed to offering value semantics. Couldn't get more different.

There's nothing new under the sun, I never claimed novelty. This is simply how
I would do it, take or leave.

------
juped
Definitely cool, although the spirit of C is to be overly paranoid about
portability and not link to a _library_ , God forbid. (Single pairs of h/c
files like this are usually okay with people)

~~~
codr7
Or you could link to a library written by someone equally paranoid about
portability :)

------
mastrsushi
This is exactly why you shouldn't code in C unless you absolutely have to. So
many ugly boiler plates and macros.

~~~
colonwqbang
There are myriad reasons why you might "absolutely have to" code in C. There
are even a few reasons you might code in C even if you have a choice.

------
totalperspectiv
Is there a reason C libraries generally prefix with 'lib'? Or is it just
historic?

~~~
colonwqbang
If you pass -lcodr7 to the Unix C linker, it would typically go looking for a
file called libcodr7.so. You could perhaps view this as mainly historic
behaviour.

Some libraries like Cairo seem to be generally known as "Cairo", not
"Libcairo". But there is also a trend for libraries to have very straight-to-
the-point names. If the PNG parsing library is called libpng.so and the XML
parsing library is called libxml.so, it would be confusing to call these
libraries just "Png" and "Xml".

As an other example, the two C programs "systemd" and "libsystemd" live in the
same repo, but are really two different beasts: libsystemd might be useful to
you even on a system which doesn't use systemd.

~~~
codr7
Besides this; codr7 is my alias, and I wouldn't want to be confused with a C
library.

