
CDSA – A library of generic data structures and algorithms in ANSI C - Mike_TriHard_Cx
https://github.com/MichaelJWelsh/cdsa
======
tzs
A word has been dropped from the title. The title at the site is: "A library
of generic _intrusive_ data structures and algorithms in ANSI C" (emphasis
added).

For those, like me, who did not know what makes a data structure "intrusive",
it means that the things needed for the data structure are stored together
with the data.

For example, suppose our application needs to keep track of people. As far as
we are concerned, a person looks like this:

    
    
      struct person
      {
        int age;
        const char * name;
      };
    

If we want to make a linked list of people, and do so by adding a link field
to our person struct, like this:

    
    
      struct person
      {
        person * next_person;
        int age;
        const char * name;
      };
    

and then make our list by linking our person structures through the
next_person field, that would be an intrusive linked list.

A non-intrusive linked list would be to do something like this:

    
    
      struct list_element
      {
        list_element * next;
        void * data;
      };
    

and then make our linked list by making a linked list of list_element
structures, with their data pointers pointing to our person structures. If we
were using C++, another way to do a non-intrusive linked list of people would
be to use std::list<person>.

~~~
DSMan195276
Yes, that word is extremely important. With that, intrusive data-structures
really are great in a lot of cases, it's unfortunate that most languages can't
really handle them. That said, for languages with a GC they aren't really as
useful from a programming standpoint (They may still be useful from a speed
standpoint). Still, they allow you avoid tons of small allocations, as well as
avoid the extra 'hop' necessary when you store the node information separate
form the data.

Another great part about intrusive data structures is that you can place an
object on multiple structures without having to allocate any separate data for
them. So to give an example, `std::list<person>` will actually create an
'intrusive' list - it will store the node information along with `person`.
However, that `person` is completely tied to the list. If you want to take a
`person` off of that list and place it into another one, you can't, you would
have to either copy it into a node on the other list, or make the all the
lists `std::list<person *>` and then store a pointer to the same person in
both lists. With intrusive structures, you can instead use the same set of
node information for every list, so you can detach a node from the first list,
and then attach it to the second one without having to do any allocations.

As a note, the Linux Kernel makes fairly extensive use of intrusive data-
structures, which is an area they particularly shine in since you really want
to avoid all those small allocations if you can.

------
_sdegutis
Neat trick using “list_for_each” to shorten a verbose for loop that’s nearly
always the same. Although it did make me do a double take. And since it’s
hiding control flow, it would make ‘break’ and ‘continue’ look out of place.
Is this a common convention and you just kind of get used to it?

~~~
eggy
Hidden control flow is why I decided to learn Zig [1]. I was learning Rust,
and I intend to keep at it, but I have chosen Zig to be my C replacment
because I am personally finding it easier to be more productive. Andrew
Kelley's doing a great job with the language!

    
    
      [1] - ziglang.org

------
steiger
Wouldn't it be preferable to write only a forward declaration of the struct in
the header files, and define the structs themselves at the .c files? In other
words, wouldn't it be better to define them as Abstract Data Types (ADT) ?

That way, users of the library would have no direct access to the fields of
the struct (e.g. hashtable->num_buckets;), they'd have to use a function
declared at the correspondent header file. (e.g. get_num_buckets(hashtable);).

Direct access to member fields could break the data structure because data
structures always needs to follow certain rules in insertion and deletion.

~~~
rkangel
That's not possible - the compiler needs to know the size so that it can
allocate space for the structure.

You can get away with forward declarations of things if you only ever need a
pointer to the struct, because the compiler knows how big a pointer is and
only needs to know the size of the structure when you dereference it.

You also need to know the size when you allocate it - whether you're using
malloc, or allocating statically. As you want to leave control of allocation
up to the caller, then the caller needs to know the size.

Theoretically I suppose you could allocate through a macro that used an
explicit size number in the header, but going to those lengths to take the
struct out of the header seems a bit extreme.

------
atrn
And, as always, check out Phong Vo's CDT:
[https://www.usenix.net/legacy/publications/library/proceedin...](https://www.usenix.net/legacy/publications/library/proceedings/ana97/full_papers/vo/vo.ps)

------
jopsen
This is cool.. I remember last I did something in C I needed a hashtable and
had to write my own.

The sad part is that the more fancy header functions and tricks that gets
added to a library like this, the closer you are to reinventing C++ :)

------
stevefan1999
Also check out TinyGenericC:
[https://github.com/stevefan1999/TinyGenericC](https://github.com/stevefan1999/TinyGenericC)

------
rootw0rm
i applaud the effort...but it looks like someone posted a couple of their
homework assignments to github. Not exactly comprehensive or groundbreaking.

------
abhimt
Huge collection of data structure problems Mostly in C++ -

[http://www.techiedelight.com/list-of-
problems/](http://www.techiedelight.com/list-of-problems/)

