> When an application frees an object it always has the original, requested allocation size on hand. After all, it’s a necessary condition to use the object correctly.
This is quite untrue in real-world C codebases unless you have a lot of discipline. I know from reading a previous article (https://nullprogram.com/blog/2023/10/08/) by this author that they hate null-terminated strings. But in your typical C code, you often allocate a bunch of memory to be used as a null-terminated string, return it or pass it to another function transferring ownership, and expect a different function to free() it. The length of the string is not necessarily the allocation size! The code could have allocated some predetermined max size for the buffer, build up the string gradually, and then add a null terminator. Thus, the function doing the free() can't pass the right size.
if i have a function foo which calls functions bar and baz to build up a dynamically allocated string (using a custom allocator), then calls function quux to transfer ownership of the string, foo needs to store the information about how big the allocation currently is, and how to allocate more of it, in a place where functions bar and baz can access it. if it doesn't do that, bar or baz can easily overflow the buffer because they don't know how big it is. this is a particular case of 'having the requested allocation size on hand is a necessary condition to use the object correctly'
but if foo is passing this data structure to bar and baz, it can pass it to quux too, so the problem as you described it just doesn't exist
(unless your code is only working by the merest of chances and will have some cves later this year the first time some competent programmers look at it)
'bar or baz can easily overflow the buffer because they don't know how big it is' is a particular case of 'having the requested allocation size on hand is a necessary condition to use the object correctly'
You are describing an ideal scenario where code is written by experts and those experts did not take any shortcuts. Real world C code is unfortunately anything but that. I haven't had the luxury of only working on expertly-written C code which frankly also makes me hate C code.
What if the function bar only appends a number to a string? Then the maximum length it needs could become embedded knowledge by the foo function ("10 characters is enough to store this uint32_t so no need to pass anything related to the allocation" thought the programmer who hated tedium).
What if bar and baz are not allowed to grow the allocation because foo has already enough buffer to store the maximum possible size (say a path on Windows, 260 bytes)?
What if the function quux is in a different third-party library whose API cannot be modified to take the allocation size?
well, i do agree that there's a lot of code out there that only works by the merest of chances and will have several cves opened the first time a competent programmer looks at it; the kinds of implicit thoughts and prayers you're describing (sweet jesus, pvg, that isn't a uint32_t you're formatting) are precisely what i was talking about
but i'm not talking about code where the authors didn't take any shortcuts, just boringly competent c code with the regular quantity of bugs in it. i'm sorry you've been subjected to such torments as you report. but the author of this article isn't talking about how to write such astoundingly shitty code; he's talking about how to write good code
if the function quux is in a different third-party library, its api probably cannot be modified to call the custom allocator either (my initial comment draft suggested that you might be talking about such a case, but then i realized that didn't make any sense and so i deleted it)
There are many reasons it might be inconvenient for foo to pass that data to quux. Perhaps it’s actually quux that called foo rather than the other way around (in other words, foo is passing ownership through its return value rather than through a function argument), in which case C makes it easier to return one value than two. Or perhaps there’s just a lot of existing logic that passes owned strings around without an accompanying size parameter, which is perfectly safe to do if you’re not planning to ever mutate them.
Of course you could easily design your codebase to not have that issue, if you’re writing it from scratch. But that’s not the question.
i agree that parts of the code that are not changing the length of the string can use it safely without knowing the buffer size
it's been easy to return a struct by value in c for over 30 years, though in very early versions of c, structs in rvalue context implicitly decayed to pointers, like arrays. it's been efficient for about 20 years
so it's not easier to return one value than two if you've already defined the struct type, which is presumably the case here
there's people who avoid it because they're imitating c code from the 01980s (or, charitably, performance-sensitive c code from the 01990s), or because they're using some horrible abomination like common lisp's cffi, but that's really quite marginal
and even if you didn't originally design your codebase that way, you can certainly fix the design. it'll probably even make it easier to understand and maintain, because you're always passing around a stralloc or whatever, instead of sometimes passing around a stralloc and other times passing around a raw buffer pointer
Your personal coding preferences and discipline will switch the jobs together with you. The new hires on the project you left behind are going to have their own perspective (likely very wrong from your POV) on how things should be done.
If it's an open source project, idiosyncratic code will only increase friction for contribution for little practical gain.
broadly i agree except for your last point. but i am unclear on how this relates to the thread, other than explaining why many projects fail to even approach the quality demonstrated in wellons's blog
Indeed, and more — there are code bases (ours, FRRouting) that do the opposite: expect the allocator to track sizes and invoke malloc_usable_size() to do our own (per-type-of-allocation grouped) accounting (to subtract on realloc/free; for adding it's not needed of course since you have the size already.)
Unless all objects allocated by the library have one of the fundamental alignment requirements, I'd add a fourth item to the checklist:
4. All functions take the alignment requirement as a parameter. The alignment requirement must be a power of 2. The reallocation and deallocation functions must be passed the same alignment requirement value as was passed to the allocation function when the block was first allocated.
A common motivation for rolling out your custom allocator is dealing with extended alignment requirements. This should be reflected in the allocator's API.
Alternatively, "the allocator should have an equivalent of posix_memalign()".
Also the default alignment should be well documented, since 99% of calls won't care about a specific alignment, and then you start hunting why your math routines are slow when the CPU is doing unaligned SIMD loads…
C11 has aligned_alloc, which is more standard than posix_memalign, but there is no standard "aligned_realloc".
There is no default alignment if all the functions of the allocator take it as a parameter, but it is reasonable for an allocator implementation to be optimized for a particular alignment requirement such as _Alignof(max_align_t).
> C11 has aligned_alloc, which is more standard than posix_memalign,
I don't have data on this but I feel that posix_memalign might be more widely supported than aligned_alloc. Then again C11 is 12 years old now so hopefully this changed.
> There is no default alignment if all the functions of the allocator take it as a parameter,
I would assume you can pass 0 as the requested alignment, which makes the allocator use some default. If the allocator always requires an alignment value, it becomes a serious pain to go through all allocations and figure out a reasonable alignment value. Especially since in most cases you really don't care.
> I don't have data on this but I feel that posix_memalign might be more widely supported than aligned_alloc.
Possibly. IIRC Microsoft's CRT does not support aligned_alloc() because their implementation of free() cannot deal with extended alignment requirements, so they have separate functions for aligned (de)allocation. To be fair, they probably do not support posix_memalign() either.
> it becomes a serious pain to go through all allocations and figure out a reasonable alignment value
Sure, you can pass _Alignof(max_align_t) in almost all cases, or have generic (= not reimplemented by each allocator) convenience wrapper functions that do that for you. If you want to pass the least restrictive alignment in all cases, you can enlist the help of a macro, but this is a bit more awkward. For instance:
max_align_t is only the max for scalar types, typically 8 or 16 bytes. But often math-intensive code wants 16 or 32 byte alignment for float arrays, to enable SIMD operations.
> max_align_t is only the max for scalar types, typically 8 or 16 bytes. But often math-intensive code wants 16 or 32 byte alignment for float arrays, to enable SIMD operations.
Indeed, but _Alignof(max_align_t) is a good compromise for a default alignment requirement in an allocator interface, which is what eqvinox calls for. Standard C does exactly that: malloc(), realloc(), and calloc() promise that they satisfy _Alignof(max_align_t), but nothing stricter. To reach for extended alignment requirements, your only standard tool is aligned_alloc().
> The context parameter already addresses that need.
How so? The context parameter identifies the allocator instance. What if the library needs to allocate objects with different alignment requirements with the same allocator instance?
Alignment requirement is just like allocation size (*). It varies from call to call, and generic allocators need to deal with heterogeneous alignments and sizes.
Not really I think, the context pointer (I would call it "user data" though since that's more common in the C world) is normally passed around as an opaque pointer which is used by the allocator functions to find their own state, but not for passing additional parameters to the allocator.
The fact that zig's standard library design (and general code guidelines) makes you always carry allocator with you, makes it very easy to manage memory overall. Most of time I set up arena allocator and then I don't even have to care about clean up unless it's something holding a OS resource.
The article presents essentially the same view, for a language which doesn't have that convention, and doesn't have slices (pointers that also contain size information). Zig allocators check all three requirements illustrated here. As do most modern allocator interfaces.
Carrying the allocator everywhere does introduce a new class of bugs — if you call into a library at a later point in time to interact with something that was previously allocated, but you pass a different allocator this time… how exactly does that work?
(As opposed to: set up the allocator once, in an initialization call.)
There are also often Unmanaged variants where you pass allocator to every method. In these scenarios you have to make sure you use the same allocator though. Unmanaged can be useful for short lived allocations or type that combines many standard containers, saving memory.
I wonder how many security-critical memory vulnerabilities would've been avoided over the decades if glibc had been designed with this API instead of the one that became the de-facto standard.
Or introduced. I would think that trusting the (possibly compromised) caller to pass the correct size of a to be freed or resized object opens up new possibilities for security vulnerabilities.
Note that libc-style `free` often has to do work to walk through all possible arenas/buckets/whatever to see which one contains the given pointer. If given a size (even just a hint) it can do a much shorter walk without reducing security.
"Even just a hint" doesn't deserve to be squirreled away in a mere paranthetical. It's what distinguishes some bullshit (the article's suggestion, whereby the allocator carefully checks that you supplied the right value that it already knows and barfs if not) from the actually potentially useful improvement to the API that you suggest.
The code that frees very often does not know how much was allocated.
An allocator that fully trusts the size provided to `free` is hugely unsafe when the size turns out to be incorrect. As an extreme example, the allocator may know that every 4-byte allocation is put into the arena that is aligned to 0x1000-byte block, so it may jump straight into the arena header which won't be there if the size was incorrect. So a real-life allocator is likely to do some checking even when the size is provided, but that checking can be much more simplified in that case.
More accurately speaking though, this information is not directly available from the pointer (unless we have a universal pointer tagging support today...) and should be mapped via the allocator internal data structure. This is what o11c referred by "arenas/buckets/whatever", and having a size information does simplify this process.
The metadata around the allocation is comparably expensive especially for smaller allocations anyway, so modern allocators would have at least two classes of memory depending on the size. So even when the allocator doesn't make use of any other size information, it is possible that the size information can still be helpful.
Having the size is definitely useful, because you can skip the metadata. But since needing this information in some form or the other is very important it is generally easy to derive-no walks or anything, because that would be slow.
Please note that "glibc" is the GNU C library, which is highly distinct from "glib" which is the general-purpose utility library underlying GTK and many other GNOME-adjacent technologies.
> Note that the context pointer came after the “standard” arguments. All things being equal, “extra” arguments should go after standard ones. But don’t sweat it! In the most common calling conventions this allows stub implementations to be merely an unconditional jump.
I didn't know this. Does this optimization have a name? Where can I read more about it?
void free_with_extra_args(int *a, int *b) {
free(a);
}
then *a is in already in the correct slot (the RDI register) for the first argument when free_with_extra_args is being called. Whatever is put into *b is never touched. If you compile this with gcc -O2 you get
free_with_extra_args:
jmp free
If you make the function call free(b) instead, you'll have to move b into the right place before calling free:
Wikipedia also has a nice summary of calling conventions on other platforms like ARM. All modern calling conventions are similar: pass the first args in registers and then use the stack as needed https://en.wikipedia.org/wiki/Calling_convention
That's only one half of it though, the other interesting part is the jmp (instead of call) to hand over control to a subroutine without pushing a new return address to the stack (since there's no code after the function call, and the calling function doesn't require its own stack frame).
This trick stops working as soon as outer_function needs local variables or does anything other than returning the exact return value of inner_function. In that case you need a stack frame
Since the times of K&R, most C calling conventions are defined so that you can call a function with more arguments than it expects. While the ISO C standard does not permit such calls in strictly conforming code, support for them is still ubiquitous.
[One very common use of this is that the C libraries on common desktop systems call main with three (or more!) arguments, and this works whether the programmer has declared an int main(void), an int main(int argc, char *argv), or an int main(int argc, char *argv, char *envp). I once again have to say that the third form, with an envp argument, is not sanctioned by either ISO C or POSIX, and that a C implementation could definitely use some special handling for an external-linkage function called “main” to allow the first two to be used.]
In practical terms, a function call goes along the lines of “pack the first few arguments into caller-saved registers[1] (using more or less complicated rules), then the rest on the stack, then do a call (using either the stack or the link register depending on the architecture), then after returning pop off the stack part of the arguments (the register part being assumed clobbered and not requiring any cleanup).” This kind of convention is called “caller cleanup” (in contrast to “callee cleanup” conventions—like non-vararg __stdcall on Win32—which have the callee pop the arguments off). While you could imagine other ways to permit extra arguments, it’s certainly the most common one.
That, then, implies that a C function can tail call (jump to) any function whose argument tuple is a prefix of its own. Good compilers will recognize this (and unlike you giving a function extra arguments, they are within their standard-given rights to do so!). If you want proper tail calls to functions of other types... that might be possible, but it’s much more gnarly. See for example the description of the musttail attribute in the Clang documentation[2].
[1] For each machine register, a calling convention will define which the callee has to preserve (“callee-saved”) and which the caller will have to if it cares about what’s in them (“caller-saved”).
I'm not sure if there's a particular name for it, but you could consider it to be a special case of a tail call optimisation https://en.wikipedia.org/wiki/Tail_call
> The standard library allocator keeps its state in global variables. This makes for a simple interface, but comes with significant performance and complexity costs
If you assume there’s almost no crosstalk, you can likely get away with a thread-id and an optimistic mutex per pool: the mutex will only contend if one thread tries to (de)alloc while an other thread has a cross-thread free running.
If you assume there will be more contention than that, then scaling up complexity might be a good idea e.g. lock per size class / arena, or push cross-thread frees to a queue and have the owning thread check the queue once in a while. You can then make that queue lock-free or even wait-free if that is better for your workload / issues.
Looks like it's 10.7 kLOC whereas mimalloc is 8.8 kLOC, so it's not much more complex at that...amount of engineering, I guess. Also spotted that the mimalloc README says
> Free-ing from another thread can now be a single CAS without needing sophisticated coordination between threads
which may not be the worst thing in the world; depends on what your app does.
> When an application frees an object it always has the original, requested allocation size on hand.
This is false in general. If it's true in your library that you have this old size on hand, then design the allocator callbacks with old-size arguments in free and realloc. If it's not convenient, then don't design those arguments in.
Most of the time, the allocator implementation will the be the default one via malloc and free, where the size is not required, so you're making the caller calculate or retrieve a value that is thrown away before the allocator is called.
There are situations in which a program would have to calculate the size in order to recover it. A trivial example is when a string is duplicated (as with the POSIX strdup function).
If the free function required the original request size, we would have to free the string using free(str, strlen(str) + 1) rather than just free(str).
Another example is when we have a header structure followed by a flexible array member. We can't just free(obj) but something like free(obj, offsetof(obj_t, array) + sizeof(obj->array[0]) * obj->nelems).
The allocated size could be stored in an extra field, but that's not something required when the allocator is malloc/free.
Good article. The caller can also pass arguments hinting whether the requested allocation will be long or short lived, zero allocated, etc... You can also go beyond the realloc/free idiom and into specialized interfaces, like a push/pop stack allocator.
This is great!! I learned a lot from this post. I'm working on a freestanding C project and had to write my own allocator. I ended up replicating the libc design with a doubly linked list of sized blocks which get merged with free neighbors when deallocated.
Never occurred to me to pass the object sizes to the deallocation and reallocation functions, even though such sizes are passed around everywhere in other parts of the code. Redesigning the whole thing would be a big effort but it might be worth it.
If the library reallocates non-trivial objects, it may need a more sophisticated reallocation function, that does not simply bitwise copy a block to its new location. Some additional work may be needed (e.g. pointer fixup). The reallocation function could take a "move constructor" callback as parameter.
I don't think I have seen it in publicly available allocators either, but I once implemented it in a custom one. I suppose that this design is not very popular because it is quite fragile: you end up running arbitrary code (the move constructor) in the middle of an allocation routine, when some of the heap's invariants have not yet been restored. Consequently, you typically require that the move constructor not recursively call the allocator, which is a reasonable limitation.
Alternatively, the allocator could treat reallocation as a multi-step process ( a) reserve new space, b) move data, c) free old space), with enough state tracking to allow a) and c) to be different calls to the allocator, so that b) could be performed by the caller.
There is no perfect solution to this problem. I suppose that's why C++ does not have a standard reallocation counterpart to new/delete/new[]/delete[].
A good API can use only a single ABI-level function, but it needs more parameters than that. I imagine 4 arguments, at least 2 of which are themselves composite:
auto utopia_alloc(Allocation, AlignAndSkew, Size, Flags) -> Allocation;
That said, while "one function" makes interposing easy, it does mean significant branching cost compared to single functions. There's probably a way to get the best of both worlds (support several functions in the ABI, but forward them all to `utopia_alloc` if interposed).
Oh that is indeed a good list, you should submit that gist independently. That said, I believe alignments should be a suggestion and not a mandatory request, mainly because the library can fix a bad alignment by overallocation. Charles Bloom has suggested a similar thing in the past [1].
Out of possible flags the zeroing flag is probably the most used and possibly only flag needed for library uses; others are also important but only a small number of libraries will ever want them. Again, a failure to do so should ideally be somehow detected and fixed, but that detection can't be actually checking for non-zeros, so the provided function should somehow return whether the memory was actually zeroed or not.
(I also share the same concern about the site, by the way.)
I think you are imagining an aligned allocator out of an unaligned allocator, where the misalignment should be stored somewhere around the actual memory block, but if the main allocator itself is aware of aligned allocations it is not necessary.
I can still imagine the case where allocation bins are structured by the allocation size and alignment size and an incorrect alignment will result in an incorrect bin, but such bins are only beneficial when the alignment itself is too large compared to allocation sizes, and I'm not sure who really wants them in the first place.
I can’t agree. Combining alloc and free into one API is a false economy. You’re not really going to save any significant amount of space, and at best you’ll have an extra predictable branch or two (but probably unpredictable, I’d guess).
More importantly, it’s harder to interpret the calling code’s intent in the edge cases. What does alloc(NULL, 0) mean? Did the caller try to free a null pointer, or allocate a zero-byte object? With separate functions, you can support either or both or neither, but with the combined model the only safe thing is to support neither and panic, lest you interpret an alloc as a free or vice versa.
Depends on your goals. Lua aimed to be an incredibly tiny language with a very low footprint. The wanted you to need to implement very few functions if you needed to integrate it in your codebase. It has very few data types and function calls for that purpose.
The goals of lua may not align with the goals of most projects, but it does make sense to have a single allocator function there, the same way it makes sense to have a single array/set/hash table data type (what lua calls a Table).
Alloc(null, 0) can be implementation defined. It could be a noop in release builds and a panic in debug builds. Just how certain allocators do extra work in debug builds to detect leaks anyway.
Keep in mind this is an allocator interface. Behavior is supposed to change and the allocator maker has control over that. Likewise free() doesn't always actually release memory (arena allocators for example), and that's the whole point.
It's a pet peeve of mine that C people seem to think that 0 isn't a number. Or that 0 (or -1) mean infinity. (Like a timeout of 0 meaning an infinite timeout, instead of failing immediately.)
I guess it's mostly a consequence of C having such poor support for data types.
For a low-level allocator interface this might be an acceptable compromise, but C people also tend to use this kind of mangling when it's not warranted.
> It's a pet peeve of mine that C people seem to think that 0 isn't a number. Or that 0 (or -1) mean infinity.
It's not just C people :D - Zero may or may not be a natural number depending on who you ask, and negative numbers definitely are not.
And since in low-level programming, which is kind of C's kingdom, every bit might count and you'd rather use that useless sign bit for something else. -1 being particularly liked by compilers because it is "all bits set" in the dominant two's complement sign encoding, which sometimes allows optimization tricks.
> I guess it's mostly a consequence of C having such poor support for data types.
Technically nothing prevents anyone from returning a small two fields structure - one for a tag and one for the value - to have "tagged unions" or "optional types" almost like your favorite language. But in the eyes of a low-level C programmer, that's one field too much (omg that doesn't fit in a register, you will be held responsible for global warming if you publish this etc.).
So, it's not that we think that anything non-positive is not a "true number", it's more like we've been using these conventions sometimes since assembly programming days (OR AX,AX to test if register AX is zero in old x86 assembly, because the OR instruction sets the CPU's Z flag) in order to get maximum performance.
You are right about cramming things into bits. But it often makes code more complicated.
Eg if you have two operations you do one after another and they have time-outs of t1 and t2, then normally, you know that after t1+t2, either both will have succeeded, or they will have failed.
But that breaks down with the 'C-conventions' when either t1 or t2 equals 0. Thus you need extra logic to deal with these.
As an aside: OCaml uses tagged pointers to cram more information into machine words. That's an implementation technique, and (mostly) hidden from users of the language. (Apart from integers having 63 bits, instead of 64 bits size by default.) So that's an example where the language is 'clean', but the implementation is still efficient and uses all the low level tricks.
And my pet peeve is more that many people always do this trickery, even when not every bit counts.
> It's not just C people :D - Zero may or may not be a natural number depending on who you ask, and negative numbers definitely are not.
Yes, but I never mentioned any restriction to Natural numbers. Btw, in the context of computers 0 is definitely a number that comes natural, even if it's not a Natural number by some people's definition.
The number 0 is interesting in the sense that it is actually a comparatively late and abstract invention. The natural numbers are also mathematically defined as either including or excluding 0 depending on the author.
That said the issue is really in-band signaling. “I already have this data type which I don’t fully use, let’s use one of the unused values to represent something completely unrelated..”
You could definitely use it for something useful; you can't store anything through it, but if an allocator supports allocations of size 0 with the same semantics of non-zero allocations, you get a "unique" ptr from the allocator dynamically.
Anywhere you need a dynamic list identifier this would suit (yes, a counter could do this too, this would be effectively a globally unique token)
Unless you need zero byte (and other) pointers to be consistently unique.
But that would mean allocating and deallocating some memory for zero byte requests, to provide unique addresses. Or some reserved memory address span, outside of available (real and virtual) RAM, for zero byte pointers.
> Unless you need zero byte (and other) pointers to be consistently unique.
Yes, this was the problem. People had some different ideas for contracts that `malloc` etc. should guarantee---the most lax requirement would be a "usable" pointer on success, which does almost perfectly constrain non-zero cases but leaves lots of freedom for zero cases. At the opposite side there is a "unique non-NULL" pointer requirement, as you've said, which essentially makes `malloc(0)` almost same to `malloc(1)`.
Rust also has this issue because its reference, a pointer after compilation, is expected to be non-null even for zero-sized types. Rust's rule is somewhat in between: any pointer should be non-null, but it doesn't have to be unique for zero-sized types. The canonical non-null pointer for such types (as returned by `NonNull::dangling`) is the smallest non-zero pointer that satisfies the alignment.
alloc and free are the most common operations, and having them be their own thing makes it much easier to reason about the code. Folding them into a common function makes the code much harder to review and maintain, both for humans as well as for tools (e.g. coccinelle).
That isn’t efficient in cases where the caller cannot know the size to allocate up front. For example:
To use sprintf, you’d have to guess at the buffer size to pass in, or suffer a buffer overflow.
snprintf is a bit better in that you can recover from passing a buffer that’s too small, but still either requires you to be generous in sizing the buffer, or accept that you often call snprintf twice, doing double work.
The way around that is to pass a pointer to an allocation routine, but that’s a simple form of the allocator interface that this article favors.
>That isn’t efficient in cases where the caller cannot know the size to allocate up front
This can typically be avoided though. With your sprintf example you could have one function that does the first half that calculates the size and then a second function that does the second half that actually copies into the destination buffer. Alternatively if you want to use realloc you could have a function like snprintf which allows resuming from where it ran out of space.
This strategy, while is worth mentioning as another separate post, can only be applied to simple interfaces. A whole API can be much larger and can't avoid allocations inside. There are some cases where the whole API was carefully designed to avoid such allocations, but it is difficult and not always suitable for users.
For the first idea: Yes, but both of those would have to interpret the format string and determine the character lengths of the arguments which is duplicated work. I'd imagine its a large portion of the work. I'm not convinced its any more efficient than snprintf.
The second idea is neat though... it could return an extra argument which is a pointer to the first part of the format string that wasn't printed
A more realistic version of the former is that the call is opportunistically given a buffer and returns an error with the required buffer size if the buffer is too small. If the buffer is big enough you only need a single call, otherwise you need two calls with a redundant processing, which is better then the original version---this even works when the caller can't preallocate a buffer. This approach is pretty common in the Windows API.
----
ADDED: I thought charcircuit was talking about always calling two functions in a row, missing the original comment that charcircuit was replying to. My bad, so I'd like to update this comment as follows...
I think the former approach would indeed work if the first call is always given a pointer to the space that can be used to record what has been done so far. Like, `snprintf(ptr, n, &recover, "format", ...)`, and `recover` can be relatively small, like 16 bytes. I would record original `ptr` and `format` arguments to be safe, and largest offsets to `ptr` and `format` that are known to be written and synchronized to each other. Of course this is just a workaround for C's inability to construct and return a sum type.
How do you know how much memory you need to allocate when you need to do the work in the first place to determine it? Maybe if snprintf was resumable by supplying a larger buffer while it was doing its work, but this seems like a lot of complexity.
What if the function does more than a single allocation? Imagine a library like libxml2 that can parse a whole XML document into a DOM tree. In many cases you it would be nice to just pass an arena allocator and throw away the whole DOM in one go. Unfortunately libxml2's allocator customization kinda sucks and it's not easy to do.
This is about the case when the caller can’t know how much memory to allocate but is able to provide an estimate for most cases. Either it becomes possible to avoid repeated allocation and make deallocation faster, or another classic dynamic allocator can be asked for additional memory for outlier cases.
If the lifetime of the allocation works out this can end up with a nice pattern of "try to use a cheap stack buffer, and fall back to a more expensive heap allocation when that fails"
This is quite untrue in real-world C codebases unless you have a lot of discipline. I know from reading a previous article (https://nullprogram.com/blog/2023/10/08/) by this author that they hate null-terminated strings. But in your typical C code, you often allocate a bunch of memory to be used as a null-terminated string, return it or pass it to another function transferring ownership, and expect a different function to free() it. The length of the string is not necessarily the allocation size! The code could have allocated some predetermined max size for the buffer, build up the string gradually, and then add a null terminator. Thus, the function doing the free() can't pass the right size.