DYN_ARR_RESET should probably be called DYN_ARR_INIT instead, as calling it more than once will leak memory.
The handling of endptr in DYN_ARR_RESIZE seems to be incorrect. If I have an array with 2 elements and capacity of 3 and I DYN_ARR_RESIZE it to 5, I now have an array with 5 elements, 3 of which are garbage values.
> The handling of endptr in DYN_ARR_RESIZE seems to be incorrect. If I have an array with 2 elements and capacity of 3 and I DYN_ARR_RESIZE it to 5, I now have an array with 5 elements, 3 of which are garbage values.
Why do you say this is wrong? That's exactly what I would expect.
this somewhat mirrors what happens with a std::vector when you call resize() except of course c doesn't have default ctors. the point of resize is to make exactly "size" elements of the dynamic array addressable.
As fpoling points out, capacity is a 32-bit unsigned. That can overflow. There is no safety in append: if capacity is zero no new space will be made, if capacity overflows the realloc will not have enough space -- in either case, you end up writing past the end.
Allocating a [capacity] zero array and appending to it is extremely common. That you'll write past the end in that case shows that the author has barely even used this.
The code is unacceptably bad and unsafe. I don't usually do this, but I'm going to flag this post. I encourage everybody to do the same.
Apologies to the author. Golf is fun. This is uncool.
Is this overflow really a realistic scenario? I imagine most machines won't even permit you to allocate the sorts of sizes that would cause this bug.
In that scenario (where you're allocating a multi-gigabytes array), you typically know the size ahead of time, instead of growing it dynamically, as the latter doesn't really perform too well.
2Gb (1<<31 + 1 = boom) just isn't that much memory these days. If you're parsing large text files, for example, it's really easy to blow that limit. And what's the point of this dynamically-growing array if not to forget about managing capacity?
Where do you check that capacity is nonzero? When capacity is zero, what happens on this line?
a.capacity <<= 1u;
"all c code is unsafe" is not an excuse to permit bloody obvious, undocumented memory overruns.
I write a lot of c. Avoiding the unsafe bits, avoiding UB, is the skill required to write good c. "C code is unsafe" is a Rustacean marketing slogan. Don't believe it, but definitely don't practice it.
Why not just fix it? It's a trivial fix and has the added benefit that a zero-initialized DYN_ARR_OF(x) struct would be in a valid state, which is always nice. A struct with several dynamic arrays can then much simpler to initialize, for example.
For a library, if the user input is wrong, throw an error or an assertion failure. It shouldn't be causing an uninformative segmentation fault which your library does. The fix is a trivial. Why so defensive?
I don't like the use of macros for things like this. Macros in general should rarely or sparingly be used.
I'm also not certain one should use "end pointers." Conventionally, it seems more advisable to use `size_t capacity`, `size_t length`, and `void *data`.
Great use of Cunningham's Law, though! I appreciate C posts on Hacker News.
The lexer makes progress by consuming bytes from the buffer, incremeting the read position. The buffer is reallocated whenever such a read would overrun the write position: I/O functions are called to fill up the buffer and push the write position further ahead.
Had they been pointers within the [buffer, capacity[ range, they would have become invalid whenever buffer is reallocated for expansion since its location in memory would change. So I chose to implement the read and write heads as offsets to the buffer's base address and calculating those pointers only when they're needed.
Yes, but GLib arrays are also lossy. They require you to externally manage capacity or to know with perfect foresight exactly what your capacity will be for the lifetime of the memory allocation.
I don't think those are impossible scenarios, but the cost of one additional pointer in terms of size leaves you with a lot more functionality. Saving one pointer in size and not having capacity makes GLib arrays nearly useless, which I find confusing.
You could simply pass a pointer and a size around instead. Which is what most people actually do when they don't need resizable data layouts.
If you're working with C, I think you have to just accept that void pointers happen. Working around losing compile-time type data requires you to create runtime structures, which I don't find acceptable.
What does it actually gain by using macros instead of proper functions? The only generic macro that can't be written in function is the one use `type`, but saving the type size in the struct is enough for this kind of code.
The latter makes the macro usable without requiring a semicolon, which some people don't like; the former will cause a syntax error if the semicolon is missing, so they feel that it regulates syntax a bit better.
Works if you want to e.g. skip braces if if..else... statements. Some would say putting braces in always is good style, but I like skipping them myself so won't defend it :)
OK, but I guess what's really being relied on there is that it's syntactically still a single statement when you follow it with ';', whereas {...}; is two statements (which is what breaks the if-else).
I just discovered that gcc also supports non-standard "statement expressions" of the form ({...}) which would serve the same purpose at the cost of portability.
This seems pretty sensical to me except that it achieves its small size by sacrificing error checking and handling, even in the (aiui) non-exceptional case of realloc() returning NULL. This is of course classic C program behaviour so perhaps that's fine ;-)
On one hand that's obviously true, on the other hand it's just 60 lines of code, which is easier to check and audit than (for instance) a typical std::vector implementation - I don't know what the equivalent is in D, please forgive my ignorance :)
A typical std::vector implementation supports a lot more things, so it's not an apples-to-apples comparison. It's like saying that a horse-drawn carriage is a lot easier to repair than a motor vehicle. It's true, but doesn't really say much.
I don't have this luxury if I want to learn about ffmpeg libraries, its all written in c, almost all media and hardware accelerator libraries are written in c, you go to another language they just bind to c. its soo frustrating but I've no choice but to stick with c.
Yes, but if you're writing code that you want to be bindable to other languages, you essentially have to follow the C ABI, in which case, writing in C is a natural choice (yes, you can expose a C ABI from other languages, but that's usually not natural and you sort of have to understand C anyway to do that).
Unless your API surface constitutes the majority of your code, writing in C still doesn't make sense, because you're paying the tax every time you have to manually juggle strings and whatnot. C++ is a more sensible choice for this - you can still write C-compatible headers (and it's trivial to verify in builds - just include it in a .c file and see if that compiles!) while using STL etc in the implementation.
Mixing high level C++ stdlib classes with low level C APIs is often not exactly trivial, for instance most C++ containers expect that their items are RAII compatibel, and the resulting memory corruption errors will be harder to debug than plain old C code because the shit hits the fan deep inside some completely unreadable C++ stdlib implementation code.
Usually increased build system complexity, outdated manually maintained bindings, etc etc... - in some languages (like Zig) this is really trivial, in other languages it can be much harder.
Also a far worse language though. Macros have their place, but trying to do anything complex with them just turns into a total nightmare. They're hard to reason about, walk through, or modify.
If heavy macro usage is found, it's definitely time to reconsider the approach.
It really depends on how macros are used. If you're just using them to implement "high level language features", it's not a problem; sure, you might have trouble figuring out what STAILQ_INSERT_TAIL does internally, but you're going to have just as much trouble figuring out what the Lisp or Perl or Python "add this item to the end of that list" operations do internally.
Macros can be a nightmare, but when they're used properly they're not.
Yeah that's a great opinion and all but generally C is used because it HAS to be used these days. No one is going to use this code for building a web application or will be tossing it into legacy code.
Probably many have written a vector header like this. It baffles me why this one reaches the front page of HN given that the implementation is not that great. A few problems (some have been mentioned by others as well).
1. not using "do {} while (0)". This may lead to compile errors.
2. using uint32_t for capacity. On 64-bit machine, this doesn't save memory.
3. DYN_ARR_RESIZE() may have a quadratic time complexity if some is calling DYN_ARR_RESIZE(a, 10); DYN_ARR_RESIZE(a, 11); DYN_ARR_RESIZE(a, 12) etc in a loop. kv_resize() in kvec.h wouldn't have this problem.
It’s better that the code isn’t perfect. Reading the code review comments in this thread has been really valuable for me personally. Like, I’m not sure I would’ve learned about the do-while trick had so many people not pointed it out.
Anyway, I think this HN posting has this "crazy" flavor of using macros to simulate generics, and that's the specific kind of implementation that I meant, which klib also does.
Still waiting for a revolutionary OS written in $virtuous_lang to save us from the scourge of unsafe Linux. They've had since Ada-83 to get something off the ground. Guess they aren't so productive either.
Counterpoint to people suggesting size_t for capacity: we recently changed our in-house std::vector replacement from { T* first, last, end; } to { T* first; u32 size, capacity; }.
Not only did this shave 8 bytes off of each instance but in many cases it saved more because they became 16 bytes which reduced alignment padding in other objects that use SIMD types that require 16-byte alignment.
In game development, you'll want to do things like this often, but general purpose programming shouldn't adopt this practice. It's literally what the type is for.
yes, i know growth by a factor of 2 has issues under certain usage patterns. those are less likely to be problematic when there is more stuff going on than just growing the buffer - you can use the "hole" for other allocations.
Hard to tease out what you're saying here, but maybe a Hashed Array Table (HAT) is what you're referring to ? I have a slightly extended version here [0][1]
The biggest problem with this is that you cannot assign, pass, or return an "instance" of the array struct itself. That's because it's declared as an unnamed struct and in C two unnamed structs are not type compatible even if they have identical fields. C23 did improve struct compatibility [1] but unfortunately not for unnamed structs.
I dislike this - having the “array” be a struct containing the pointer and size fields makes it easy to “copy” the array such that you get dangling pointers. Similarly there’s no way to track lifetime or ownership of the array.
There are long term ABI benefits to these data structures just being opaque pointers outside of the implementing libraries
C++ is annoying because of name mangling and things like static initialization calling constructors.
The result is that you can easily link C code to almost any language, including C++, with almost any linker. But for C++, you usually have to use the linker that comes with the C++ compiler that compiled your library. You can write code in C++ that is as compatible as C, but you have to go out of your way to achieve that, extern "C" is only the beginning.
As a result, when the overhead of using C instead of a more complete language is not too great, I prefer to write my libraries in ANSI-C, for maximum compatibility.
The post demonstrated one thing: don’t share your code.
Not to say I think this particular implementation is my favorite, but why anybody should expect a free online code snippet should be bulletproof is beyond me, it is a damn gist, a demo.
Because the API needs to work for random item types. C doesn't have templates (and for stuff like this C11's _Generic isn't helpful because that just maps specialized function names to generic function names, but doesn't help with the generic implementation code).
This isn't suitable for arrays that large regardless of the resizing multiplier.
realloc() generally allocates a new block of memory of the new size, copies the entire content over, and then frees the old block. Only sometimes you get to be lucky and have enough spare room after the existing data in the virtual address map to not need that copy.
By the point this copying becomes relevant, a continuous dynamic array like this becomes fundamentally the wrong data structure.
Three pointers (or two and a size, or one and two sizes) is pretty standard for dynamic arrays. It lets you allocate more than one element a time when repeatedly appending one element.
No checking malloc or realloc though. The anonymous struct is dubious too, though maybe being unable to pass these things to functions is a feature.
functions can work with just pointer / size / stride. think of it as stl iterators, there's no real point to be passing the container itself around.
however, you can still do that, if you really want.
just `typedef DYN_ARR_OF(widget) widget_array;` and now you have a name-able type, and can even have dynamic-arrays-of-dynamic-arrays (`DYN_ARR_OF(widget_array)`).
The handling of endptr in DYN_ARR_RESIZE seems to be incorrect. If I have an array with 2 elements and capacity of 3 and I DYN_ARR_RESIZE it to 5, I now have an array with 5 elements, 3 of which are garbage values.