
C99/C11 dynamic array that mimics C++'s std::vector - AlexeyBrin
https://solarianprogrammer.com/2017/01/06/c99-c11-dynamic-array-mimics-cpp-vector/
======
tines
> What if we want to be able to store more than integers in our dynamic array
> ? [...] The array_push_back function also needs to be refactored in order to
> account for the size and type of what we store in the data buffer. A
> possible approach is to use a macro, instead of the original function:

This is a situation where C++ really shines: you can use C++ with templates
and not only have much cleaner code, but also avoid forcing the compiler to
inline every single call to array_push_back. And you don't even need to use
anything more than structs and functions, you don't need to go "full OO".

I wish C programmers would be more open to C++ but it seems like they're for
the most part pretty closed. That's probably the C++ community's fault, but
I'm not sure how to make amends.

~~~
noselasd
In that regards, I also think C++ programmers should be more open to C
programmers that just needs a few new language constructs.

It is not an all or nothing approach. It's fine to write code that looks and
feels mostly like C, but still takes advantage of a few nice C++ features.

~~~
Vindicis
That's pretty much how I use C++. Granted, my knowledge of it is very minimal.
I was forced into C++ because of my Broker's api so...

One of my largest gripes making the switch, is the OO paradigm. I get it, I
understand it, and I really want to love it. But the theory of it vs. the
implementations that I've seen, ugh. And the number of ways you can initialize
variables etc... just makes no sense to me. It's like they just keep adding
new ways to do things, for no other reason than because they can. What's wrong
with only have one way to do simple things like that?

I don't know, maybe it's just me but I strive to keep my code as simple,
concise as I possibly can. I have enough complexity to deal with solving
problems than having to wrestle with my language on top of it. Just to note,
I'm strictly talking about having to use other people's code merged with my
own. If it was solely me writing, I'd just use C++ for some of the nice things
built in and toss classes and all that into the fire. YMMV.

~~~
jjawssd
We need a book with modern best practices like what happened in the Javascript
world with "JavaScript: The Good Parts" from which we can glean a clean
effective dialect, leading to a new C++ uptake and renaissance similar to what
happened in the Javascript world in 2008

~~~
swift
They're not really equivalent, but Scott Meyers' series of books on C++ are
some of my favorite technical writing ever, and his most recent book
"Effective Modern C++" should be on the shelf of every working C++ programmer.

~~~
72deluxe
I agree with this - I finished reading this just before last year and I cannot
recommend this book enough. It really is excellent (including topics like
reference collapsing, type deduction in different scenarios, idiosyncrasies of
async tasks). A great book and essential.

------
krylon
I wrote pretty much this exact implementation of a dynamic array once. For
several data types.

And I had that same idea, let's use macro to fake generic programming. But
while I admire the trickery some people pull off using the C preprocessor, I
admire them from afar. My coworkers would not have let me get away with that,
anyway.

I am not a C++ programmer, but templates are immensely powerful, and after
learning about them (a little, at least), I found statically typed languages
without some form of type-generic programming to be very bothersome.

Looking at C++ as "C with Templates" instead of "C with Classes" gives a very
different picture (plus, Classes and such are still around in case they are
needed, anyway). Every other year or so, I try to get my C++ up to usable
standards, but I do not need it for work (except for that one time about three
years ago), so I eventually lose interest. Maybe approaching C++ as "C with
Templates" is a more promising route.

~~~
tines
> Maybe approaching C++ as "C with Templates" is a more promising route.

Genius. I'm doing this from now on.

~~~
akkartik
For many years now I treat C++ as "C with destructors and the STL". Examples:

1\. RAII:
[https://github.com/akkartik/mu/blob/61fb1da0b6/010vm.cc#L484](https://github.com/akkartik/mu/blob/61fb1da0b6/010vm.cc#L484)

2\. STL:
[https://github.com/akkartik/mu/blob/61fb1da0b6/020run.cc#L50](https://github.com/akkartik/mu/blob/61fb1da0b6/020run.cc#L50)

I really don't use anything else.

~~~
krylon
Oh yes, don't forget RAII as I did! (The name is super-awkward, but the
concept is mind-blowingly awesome when it fully sinks in.)

Other languages have begun to pick up on this, think of Python's with-
Statements, and C#'s using (x = SomeClass())-blocks. but C++ still makes it
easier to take advantage of this feature.

Unless you play around with setjmp/longjmp. But to do that, you have to be ...
special enough to not care about deterministic invocation of destructors in
the first place. ;-)

------
kzrdude
The dynamic memory allocation of the array's fields itself does not mimic
std::vector, it's an extra indirection that C++ does not pay for. You can make
it a non-opaque struct in C and copy it around.

~~~
tines
std::vector does indeed have at least one pointer member (otherwise you
couldn't have a std::vector with automatic storage duration because the size
would be unknowable) so there is some indirection. Maybe you're thinking of
std::array?

~~~
kr7
kzrdude is referring to the allocation of the Array struct on the heap. It
should be something like this instead:

    
    
        Array array_create(size_t size, size_t sizeof_data) {
            Array result;
            result.size = size;
            result.capacity = size;
            if(size) {
                result.data = malloc(size * sizeof_data);
            } else {
                result.data = NULL;
            }
            return result;
        }

~~~
oconnor663
I think kzrdude is actually referring to the void* pointers that the
individual elements of the array live behind. Each of those requires an
allocation to insert them, and an extra pointer traversal to read them. In C++
they would live side by side instead, as in regular C arrays. In C you'd need
the struct to be redefined for each element type (maybe with another macro) if
you wanted the same efficiency.

~~~
kr7
I'm not seeing that?

In ARRAY_PUSH_BACK, it just inserts the element directly into the buffer,
provided there is enough capacity. There is no separate allocation for an
element.

You don't need to redefine the struct for each element type, since the macro
casts 'data' (type void *) to whatever the array's type is.

~~~
oconnor663
You're totally right, I don't know what I was reading >.<

------
kev009
I would recommend David R. Hanson's "C Interfaces and Implementations:
Techniques for Creating Reusable Software (Addison-Wesley Professional
Computing Series, 1997, ISBN 0-201-49841-3).

[https://github.com/kev009/cii/blob/master/src/array.c](https://github.com/kev009/cii/blob/master/src/array.c)
\- this leaves resizing on the caller, but that could be retrofitted in. Most
importantly is how the book explains everything.

------
WalterBright
> A possible approach is to use a macro

When macros start being used for metaprogramming in C, it's time to reconsider
using C++.

~~~
wott
When it's time to reconsider using C++, it's time to consider using something
else :-)

~~~
Ace17
Like the D programming language? :-)

~~~
WalterBright
I do see people extending C++ using the preprocessor for more complex
metaprogramming. Those should consider D.

------
faragon
I wrote a similar thing for C99 [1], but "safe" (bound-checked), having both
stack and heap allocation, and many other array/vector functions [2],
including integer-optimized sort (in-place MSD binary radix sort -wich is
availabe in typical C++ sort implementations, but not in C, as default qsort()
relies on sorting functions-). With some benchmarks, too [3]

[1] [https://github.com/faragon/libsrt](https://github.com/faragon/libsrt)

[2]
[https://faragon.github.io/svector.h.html](https://faragon.github.io/svector.h.html)

[3]
[https://github.com/faragon/libsrt/blob/master/doc/benchmarks...](https://github.com/faragon/libsrt/blob/master/doc/benchmarks.md)

------
im3w1l
Probably more efficient to store the data array inline, with a _flexible array
member_. That way creating takes only one malloc, and destruction only one
free.

~~~
krylon
Those are great. I haven't used C a lot in a long time, but I remember back
when I wrote C code for a living that I ran into the exact situation flexible
array members are made for around the time I also learned of them.

"That will solve my problem elegantly", I thought, but unfortunately, the
compiler we used only understood C89, so my hands were tied.

~~~
userbinator
_but unfortunately, the compiler we used only understood C89, so my hands were
tied._

You can do it in C89 too, just allocate sizeof(header) + n * sizeof(element).

~~~
krylon
Yes, but it's not the same. :( { It _is_ fun, though - I once did it to
write/read a data structure with two levels of indirection (i.e. an array of
arrays) - Slurp the whole thing to memory, then adjust the pointers, and
voila. }

In C89 it's trickery and a little bit of black magic (at least a whiff of it),
while in C99, it is an officially supported convention.

On x86, where the C code I wrote ran, it wouldn't make much of a difference,
but allocating header + array manually also means - if the code needs to run
across a variety of CPU architectures - that one needs to look at alignment
issues.

~~~
makapuf
IIRC You don't have to play with pointers if you use the null array trick (put
an array of size zero at the end of your struct and use it anyway)

------
ensiferum
The author could have just used glib.

------
crossroads1112
I wrote something similar in C99 as well for another project. Initially, I
used the same form used in this blog post however it's easy to see that the
ergonomics for accessing the data are pretty terrible.

I eventually moved to a solution where I prepended the capacity and size to
the block returned to the caller and then wrote helper functions that
accessed/modified these values. This way the caller can access values in the
returned array just as they would one returned from malloc.

The code (note, the `vec` type is just a typedef'd `void*`):
[https://github.com/crossroads1112/marcel/blob/master/src/ds/...](https://github.com/crossroads1112/marcel/blob/master/src/ds/vec.c)

------
krylon
Also, if one is not restricted because of license conditions, the Judy Array
comes to mind:
[https://en.wikipedia.org/wiki/Judy_array](https://en.wikipedia.org/wiki/Judy_array)

The API is very easy, and it's _really_ fast.

------
jagger11
With the use of defer
[http://pastebin.com/EXZuRAdT](http://pastebin.com/EXZuRAdT) you could create
it w/o the need to use array_free.

------
antirez
AFAIK when the array is created, _p- >size_ should be set to 0, not to the
_size_ argument.

~~~
AlexeyBrin
This mimics C++ vector, e.g.:

    
    
        vector<int> vec(5);
        vec.push_back(11);
        vec.push_back(12);
    

Now you have in _vec_ :

    
    
        0, 0, 0, 0, 0, 11, 12
    

But your suggestion is better from an API point of view.

~~~
antirez
PS please note that there is no memset nor any other mean to zero initialize
the elements. So the code appears to have a bug anyway.

~~~
AlexeyBrin
Correct, this is a bug. _calloc_ will better mimic the way C++ std::vector
works.

------
wollstonecraft
[https://github.com/attractivechaos/klib/blob/master/kvec.h](https://github.com/attractivechaos/klib/blob/master/kvec.h)

------
knorker
Ugh. Multi-line macro.

how about #define PUSH_BACK(a,x,t) push_back(a,&x,sizeof(t))

No multi-evaluation problems or other madness.

Edit: Actually that won't work for expressions. So

#define PUSH_BACK(a,x,t) do { t tmp = x; push_back(a,&tmp,sizeof t) } while(0)

slightly better.

------
to3m
This sort of approach is a pain to use, because you keep having to cast when
you're in the debugger, and there's zero type safety. And I'm afraid I don't
have much positive to say about something like "((Vector2i * )arr3->data)[0].x
= 333".

You can do better than this!

What is an array? It's 3 variables: base, length and capacity. So why not
decide that an array is just that. 3 variables of the right size and type.

    
    
        #define ARRAY(T,S) T S;size_t S##_length;size_t S##_capacity
    

Then you can make one like this:

    
    
        ARRAY(int,xs);
    

You'll also need to initialise and these destroy array "objects".

    
    
        #define ARRAY_INIT(S)     \
            do {                  \
                S=NULL;           \
                (S##_length)=0;   \
                (S##_capacity)=0; \
            } while(0)
    
        #define ARRAY_DESTROY(S) \
            do {                 \
                Array_Free(S);   \
                ARRAY_INIT(S);   \
            } while(0)
    
        

Add you'll probably want to add an item to an array too.

    
    
        #define ARRAY_ADD(S,X)                     \
            do {                                   \
                if((S##_length)>=(S##_capacity)) { \
                    S=Array_Grow(S,                \
                                 sizeof *S,        \
                                 &(S##_length),    \
                                 &(S##_capacity)); \
                S[S##_length++]=(X);               \
            } while(0)
    

So you might use them like this:

    
    
        ARRAY(int,xs);
        ARRAY_INIT(xs);
        for(int i=0;i<100;++i)
            ARRAY_ADD(xs,i);
        ARRAY_DESTROY(xs);
    

Array_Free is very simple, and Array_Grow is barely more complicated (however
I wrote it off the cuff, so of course it could still be wrong). Both of these
mainly exist just to keep stdlib.h out of the header.

    
    
        void Array_Free(void *p) {
            free(p);
        }
    
        void *Array_Grow(void *base,size_t stride,size_t *length,size_t *capacity) {
            *capacity+=*capacity/2;
            *capacity=MAX(*capacity,MAX(MIN_CAPACITY,*length));
            return realloc(base,*capacity*stride);
        }
    

Array accesses and iteration and the like are just done in the traditional
way:

    
    
        for(size_t i=0;i<xs_length;++i) {
            printf("%d\n",xs[i]);
        }
    

Even performs nicely with -O0.

For a full implementation you'll probably also need a way of generating a
static array. (I mainly found myself needing this for test code, which uses
globals for convenience; most arrays I create normally are locals, or parts of
structs.)

You'll also need a parameters list for use in a function declaration or
definition, and a macro that expands to all 3 variables.

    
    
        #define ARRAY_PARAMS(T,S) T *S,size_t S##_length,size_t S##_capacity
        #define ARRAY_ARG(S) S,S##_length,S##_capacity
    

Like then you might have a function that takes a pointer to an "array":

    
    
        void FunctionThatTakesAnArray(ARRAY_PARAMS(T,*p));
    

And you call it like this:

    
    
        ARRAY(T,myarray);
        FunctionThatTakesAnArray(ARRAY_ARG(&myarray));
    

(I found this cropped up often enough that I needed the macro, but it was less
common than I thought.)

There's more you can do, but the above is the long and the short of it.

This might all look terrible - or perhaps it sort of looks OK, but you're just
not sure that it would actually work - but I've used this in a prototype
project and thought it worked out well. (I've been using C for 20+ years, so
hopefully even if I've got no taste, I've at least got a rough feel for what
works out OK and what's going to end up a disaster.)

~~~
tines
This gets messy in a couple of ways; for example, what if you want to pass two
of them to a function? Then the names of the parameters generated by the
ARRAY_ARG macro will clash and you'll have to add a counter to it, etc. (Also
I'm not sure that you can concatenate `* p` with `_length` in `S##_length`
where `S` is `* p`, and the same thing for the other, but I understand what
you meant.) You'll also have potentially very confusing errors for the users
of your library when they happen to create a variable whose name collides with
one that the macro generates. And those are just the cursory observations.

~~~
to3m
ARRAY_PARAM (as I assume you mean?) has no problem with two arrays. You just
give them different names. Suppose you do this:

    
    
        void CopyIntArray(ARRAY_PARAMS(int,*dest),ARRAY_PARAMS(int,src))
    

Now you end up with this:

    
    
        void CopyIntArray(int **dest,size_t *dest_length,size_t *dest_capacity,
                          int *src,size_t src_length,size_t src_capacity);
    

And you can call it like this:

    
    
        ARRAY(int,xs);
        ARRAY(int,ys);
        CopyIntArray(ARRAY_ARG(&xs),ARRAY_ARG(ys))
    

Your point about token pasting with * p is a very good one, and I don't think
that had occurred to me... but neither clang, gcc or VC++ seems to mind (and I
used a number of different versions of each). I need to go and look up what
the C standard has to say about this now.

I do note that I didn't use ARRAY_PARAM or ARRAY_ARG all that much in my code,
though - but I don't remember whether this is because I found some problem
with them in practice, or whether it just ended up that way.

(I'm on OS X right now and I just tried my code with clang. Probably-relevant
compile flags were "-std=c1x -Wall -Wuninitialized -Winit-self -pedantic
-Werror=implicit-function-declaration -Wsign-conversion -Wunused-result
-Werror=incompatible-pointer-types -Werror=int-conversion -Werror=return-type
-Wno-overlength-strings -Wunused-parameter".)

~~~
tines
Ah good catch, I misread it.

------
inlined
Why does array_push_back need three parameters? Couldn't the macro just use
size_of on the second param?

~~~
AlexeyBrin
Not without some serious modifications, see line 6 of the macro:

    
    
        data_type *pp = arr->data;\
    

this is expanded to something like (when data_type is double):

    
    
        double *pp = arr->data;
    

You can get rid of the third parameter. Just store the size of the data type
in the container struct and use _memcpy_. Something like this (probably slower
than the original):

    
    
        char *pp = arr->data;\
        memcpy(pp + (size - 1) * arr->size_of_data, &(x), arr->size_of_data);\

------
samfisher83
You could add some function pointers to the struct initialize them and then
you could do:

Array a; a.add(&a, item).

------
piker
Is there a simple way to write the ARRAY_PUSH_BACK without a macro?

~~~
AlexeyBrin
Probably, if you use _memcpy_ and store the size of the data type as an extra
parameter in the containing struct.

~~~
inlined
I was thinking about that but I don't know what to do with the type of the
param you're accepting. Could you do a const void* ref? Might just be easier
to make it variadic.

~~~
AlexeyBrin
See this comment
[https://news.ycombinator.com/item?id=13346432](https://news.ycombinator.com/item?id=13346432)
. At this point creating a function like:

    
    
        array_push_back(int nr_arguments, ...);
    

that will replace the original macro ~~should be possible~~. Actually it won't
work without using non-standard compiler extensions like _typeof_.

------
halayli
for cases where malloc checks complicate the example it doesn't hurt to use
assert instead.

