
Show HN: C library of generic, reference-counted data structures - theck01
http://theck01.github.io/offbrand_lib/
======
comex
A small observation: CoreFoundation typedefs CFTypeRef (the generic object
type) to void * , rather than having obj * be a separate type as in this
library. This lets you pass pointers to any CF type to functions that expect
CFTypeRe without the ugly cast: it's unsafe, as you can pass a pointer to any
type without the compiler complaining, but so is explicitly casting. If you're
not going to go the void * route, it would be nice to provide some macro like
obj(x) to safely cast only valid types to object.

Anyway, it's nice to see a portable, modern CF-like library, although of
course it's too slow for some cases where code specialization is called for.

Edit: On the general subject of C data structures, in case anyone's
interested, I wrote a little portable C header that provides minimal hash
table functionality using macros, with a large amount of control - still
generic, but totally opposite to the approach used by this library, much lower
level. It's modeled after the BSD <sys/queue.h> header, provides three
versions (open, closed, and a higher level version), and is completely
undocumented, but I'm somewhat happy with it as a proof of concept. Just
throwing it out there.
[https://github.com/comex/cbit/blob/master/hash.h](https://github.com/comex/cbit/blob/master/hash.h)

------
jzwinck
I was interested to know how this compares with existing C libraries, so I
"ported" the prime number example program to use GMP and GLib instead. It's
not quite working yet (see the description) and I have to run now, but here's
the code--I think it's nearly complete and would be interesting to benchmark
once done.
[https://gist.github.com/jzwinck/5787359](https://gist.github.com/jzwinck/5787359)

~~~
theck01
Shoot, if I had known someone was going to do this I would have made the prime
number program a bit more efficient. Since it looks like you've ported the
algorithm exactly it shouldn't affect the result though. Let me know what the
test reveals, I tried to write the fasted arbitrary precision algorithms I
could.

------
andrewcooke
nice. it's heartwarming to see things like this in c. in my spare time i am
working on struct-sql mapping. it sometimes feels like the popular c libraries
don't try hard enough to provide the convenience we get in other languages.
with a little imagination (as here) i think we can push things a little
further...

~~~
jessedhillon
Is your project hosted somewhere? I would love to see the source for that, no
matter how early/unpolished it is.

~~~
andrewcooke
sure, there's an example at
[https://bitbucket.org/isti/c-orm/src/4ec76f741d756bf70c2b1b8...](https://bitbucket.org/isti/c-orm/src/4ec76f741d756bf70c2b1b8bcf98848638a09d63/example/src/phonebook3.c?at=default)
but i don't have any text explaining it yet, and it's a bit hard to see what
is happening.

but basically there's a python script that parses your struct definitions and
then auto-generates a library (this isn't great i know, but what else can you
do? if you don't want to use that, you can still use it as an SQL library, but
you need to write your own callback functions to do the work of setting values
in the struct). that library is

    
    
        #include "phonebook.corm.h"
    

in the linked code. then you can do things like

    
    
        static int find_name(isti_db *db, const char *text, name** name) {
          corm_name_select *select = NULL;
          *name = NULL;
          STATUS;
          CHECK(cname.select(&select, db));
          CHECK(select->name(select, "like", text)->_go_one(select, name));
          EXIT;
          if (status == ISTI_ERR_NO_RESULT) status = ISTI_OK; // see NULL name
          if (select) status = select->_free(select, status);
          RETURN;

}

(the STATUS, CHECK, EXIT and RETURN are just macros for the usual "return an
int as status and goto exit on error" handling). the snippet above populates
the name (a struct typedef) pointer with values (id and name value in this
case) from the database. the thing called select is a struct with function
pointers that generates SQL select functions (so there's the usual objects-
in-C pattern of passing the struct in to the function in "select->name(select,
..." for example). and the SQL being generated is "select * from [table] where
name like [value]". it's all escaped correctly to avoid SQL injection.

the API for the database backend is isolated out, but the only current
implementation is sqlite.

the complete repo is at
[https://bitbucket.org/isti/c-orm/src](https://bitbucket.org/isti/c-orm/src)
(if you download it and run doxygen you should see better docs - but they're
incomplete and not online yet)

if people are interested, email me at andrew@acooke.org and i'll get back to
you when it's in a usable state (it mainly needs docs, polishing, and perhaps
another database backend - probably postgres).

also, of course, there are many limitations compared to ORM systems. there's
no way to retrieve related objects, for example (so if a struct has a pointer
to another struct, that pointer isn't retrieved - the best you can do is also
store a pointer to the primary key and then make a second call based on that).
and currently table and column names must exactly match struct and field
names.

------
obiterdictum
Along the same lines, there is also CCAN.

[http://ccodearchive.net/list.html](http://ccodearchive.net/list.html)

[https://github.com/rustyrussell/ccan](https://github.com/rustyrussell/ccan)

And GMP for arbitrary precision math (integers and floating point numbers).

[http://gmplib.org/](http://gmplib.org/)

------
huhtenberg
I find that the decision to adopt a C library for a project frequently boils
down to whether it is using a compatible naming notation. Especially for
simpler libraries.

Linked library in nice, but how many C projects do you know that use
camelCase() function naming as opposed to K&R's lower_case_naming()? If it
were a complex library, like OpenSSL, then - sure, to the hell with the
notation, just put a wrapper around it and use it anyway. It's barely an
issue. But if it is a simpler library that is meant to be weaved into the
code, like data containers, the choice of naming notation is always a thing to
consider.

There is obviously an _indent_ tool, but resorting to it means using a
modified version of the original with all the consequences that follow.
Perhaps, it might be the next thing for GitHub to tackle - "Download in Xyz
naming notation"... I know I'd use it.

------
mappu
Was there a rationale to choosing performActionOnStruct() names instead of the
perhaps-more-idiomatic struct_perform_action() style?

I strongly advise against putting identifiers like release() and getCString()
in the global namespace, that's probably not the wisest idea if you plan to
use libraries other than your own.

~~~
theck01
Some bad habits from Java are the only reason. I thought about having
obrelease and obgetCString. Putting ob at the beginning of every function
seems like overkill, and I don't like that only some functions had would have
ob at the beginning if I didn't do a global name change. Any suggestions?

~~~
mappu
> _Putting ob at the beginning of every function seems like overkill_

That's "C-style namespaces" \- really the only sane way to avoid an identifier
collision, sorry.

------
McUsr
Hello. May I suggest looking for some code by tptacek for the red-black trees.
I am looking too, but so far , I haven't been able to find it. I read about it
first in this link:
<[https://news.ycombinator.com/item?id=4455225>](https://news.ycombinator.com/item?id=4455225>).

I think your project looks very interesting. Reference counted datastructures
is a great convenience.

Thanks, and Good Luck further.

------
canadev
I've been meaning to refresh and expand both my algorithms knowledge and my C
programming with a project like this, myself (as an application developer I
find I've barely used either since graduating from university). Was that part
of your goal at all?

I think this is pretty cool. I like the presence of the tests as well.

~~~
theck01
I just graduated, so most of the data structures are still pretty fresh in my
mind. I started after getting tired of implementing data structures in my many
C based classes, hoping that at least I would get some use out of them during
school. Sadly after that I never took another C based class, but the project
became interesting in its own right and I stuck with it.

------
icodestuff
Definitely forking this to provide a saner API. And some of the implementation
is a little iffy (like objIsOfClass). But in general, very cool. Thanks for
contributing to the C community!

Any particular reason why you're avoiding the C99 _Bool?

------
rdtsc
I personally like and use uthash (no reference counting) but:

* only a header file * has hash, array, string, list, implementations

[https://github.com/ned14/uthash](https://github.com/ned14/uthash)

------
simgidacav
This recalls me one of my oldest project:
[http://libdacav.sourceforge.net/](http://libdacav.sourceforge.net/)

However I didn't do reference counting. Good job OP!

------
songgao
star-able url:
[https://github.com/theck01/offbrand_lib/](https://github.com/theck01/offbrand_lib/)

------
pepijndevos
What are "class[es] built from functions and incomplete types"?

I once built classes using structs and function pointers, but this sounds
different?

~~~
theck01
An incomplete type is a struct where the definition of the structure is hidden
from the view of most of your source code. This makes it impossible to
directly access struct elements, and is like data hiding in a object oriented
language. Of course in this library you can always include the header
containing the struct definition and go bananas so its a soft restriction.

------
enqk
I would suggest offering the ability to plug custom allocators.

~~~
ambrop7
If you feel a need to do that, check cmccabe's comment on the top about
intrusive data structures. These let you do your own memory management, in a
way that the data structure implementation doesn't need to know about.

------
talloaktrees
Glad to see this.

------
cmccabe
It's nice that you made these available. You might also want to look at
queue.h from BSD (see
[http://fxr.watson.org/fxr/source/sys/queue.h](http://fxr.watson.org/fxr/source/sys/queue.h))
and tree.h
([http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/sys/tree.h](http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/sys/tree.h)),
single-file "libraries" that can generate a few different kinds of linked list
and binary tree. These don't require typecasting at all, since they generate
functions for your particular type. They are also intrusive data structures,
meaning you control your own allocation.

With regard to OBString: don't. Really. C strings are very flexible and
powerful with just what's in the standard library. You don't need
charAtStringIndex, for example: you have rindex and index. You don't need
splitString; you have strtok_r. You don't need findSubstring; you have strstr.

If you think you need regexes, consider fnmatch, a libc function which matches
shell wildcard patterns. It's always there, it's simple, and it's fast. Or you
could bring in something like PCRE. In general, if you need full regexes in a
small program, it might be worth rethinking whether your program should be in
C anyway.

For some reason, I don't find myself using resizable array utility code much
in C. It's pretty easy to do it yourself with realloc-- that's really one of
the first things they should teach people learning practical C. Perhaps a
small, minimal set of macros could wrap the reallocs and make it a little bit
nicer.

~~~
ExpiredLink
> _You might also want to look at queue.h from BSD_

CPP macros, entirely. No, thanks.

~~~
pmjordan
I'm not a fan of macros either, which is why I've ended up implementing hybrid
C data structures, with the logic in functions, and some optional type helpers
in macros:

[https://github.com/pmj/genccont](https://github.com/pmj/genccont)

Anyway, these are largely intrusive and don't do reference counting, but that
can be seen as an advantage or disadvantage, depending on the situation. (I
use these heavily in kernel code) The hash tables (chaining and open addressed
linear probing) use function pointers to be type-generic, which might not be
to everyone's taste either.

~~~
cmccabe
Well, this is a good effort. I think you are quite right to leave out
reference counting. Reference counting is something that you should pay for
only if you need it.

I don't really understand the fear of macros. void pointers are much scarier
than macros, since they open you up to the threat of bugs going undetected by
the type system.

You're not by any chance maintaining one of those much-maligned "driver
portability" layers, are you?

~~~
pmjordan
For better debugging and diagnostics, I like my macros to do as little as
possible. Plus, if the generated code is the same for all types apart from an
offset calculation, duplicating it is silly.

If you look at the slist, slist_queue and dlist, you'll find there are typed
macros (via a container-of macro) that call down to the generic functions. I
might try to extend this to macros that generate a full set of shim inline
function for a particular type of list element, if it turns out to be less
error prone. You won't see a void pointer being passed around on any of these.

Where function pointers come into play (binary tree, hash tables) I'll admit
it's maybe not quite so clear cut as the indirection messes with branch
predictors, etc. and the callbacks all take void pointer arguments. Maybe for
those containers it would be better to have generator macros that produce
function prototypes and implementations.

And no, I'm not doing anything particularly focused on driver portability. I
mostly write OS X kexts, and the built-in data structures in xnu aren't
terribly helpful. (the OSContainer family uses blocking memory allocations,
can only hold refcounted objects and the map and set use linear search) Some
kernel code I write _is_ designed to be portable, and I use these data
structures for that code, but it's pretty much just a standard factored design
- general core and platform specific setup and API use. Nothing with any
pretence of being a general solution.

