OP's approach will indeed work for most "minimalist"/single-header libraries, but, I personally feel it pollutes the API you're exposing to your users.
Depending on the specific situation, I may sometimes choose to expose a MODULE_CreateObject() and a MODULE_CreateObjectEx(custom_allocator, custom_deallocator).
Internally, MODULE_CreateObject() calls MODULE_CreateObjectEx(), passing the module's default allocators and deallocators (ie. HeapAlloc and HeapFree). This strikes me as a more balanced approach.
One caveat here, is that you must enforce consistency across usage - you don't want some API calls to use malloc() for allocation, whilst others use HeapFree() for deallocation, that would be a recipe for disaster.
To ensure that, I would often set the allocators and deallocators once, when the object is first created. They may be set through the object's initialization function, and they persist as part of the object itself.
Pollution is in the eye of the beholder. There are many circumstances where a project or subset of a project needs to work without a heap, they just don't necessarily overlap with the "application layer code in a virtual memory process" world your intuition is calibrated against.
And sometimes this stuff needs to read a JSON object or decode base64 or utf8 too, and can't because the library is too thick.
That's an argument in favor of offloading allocation/deallocation to the library's users, which is exactly the core of my, and OP's, proposals. We're saying the same thing here - developers should be able to determine/control how memory is allocated and deallocated.
> "And sometimes this stuff needs to read a JSON object or decode base64 or utf8 too, and can't because the library is too thick."
I'm losing you here. I honestly feel that my proposal is all about keeping the API as simple as humanely possible, without compromising the library's flexibility when it comes to the scenarios your mentioned earlier.
In your case:
BASE64DECODER_DecodeEx(..., allocator, deallocator)
BYTE * BASE64DECODER_GetDecodedBuffer(handle)
I could be an OS bootstrapping layer, a signal handler, an ISR, a process control project operating under strict 'No dynamic allocation!" rules, a thunking layer to get legacy code modes (BIOS says hi!), ...
You're imagining a world where everything is Node or Python or Java, or at the worst C on top of the well-defined standard library. And I'm telling you that the world is bigger than that.
And more specifically, that those weird layers sometimes need library code too.
 (Edited to add) A malware payload, a tracing layer, a compiler-generated stub, a benchmarking hook that can't handle heap latency, ...
Why do you keep putting words in my mouth?
> "But what if I don't have a heap? Not even a wrappable heap."
I'm forced to repeat myself over again. At no point does my proposed API force you to rely on a heap. On the contrary, it lets you rely on whatever solution works best for you, in your specific case.
In your custom kernel project, your custom allocator() can return a buffer from a memory pool you handle yourself. Your custom deallocator() will reclaim that memory back into your custom memory pool.
In a different project, say a desktop app for Windows 10, the allocator() will simply call malloc(), and the deallocator() will call free().
This way, your allocator() can do whatever. Your deallocator() can do whatever. How is this restrictive in any way shape or form?
I don't have either. I have a statically allocated buffer big enough for one frame of data, and I need to guarantee that it never gets used twice. My code does not have a custom allocator. It does not allocate.
void * your_custom_allocator(size_t size)
// Handle your locks.
// Sanity checks, assertions, bounds checks, etc...
void * result = g_your_buffer + g_position;
// "Commit Memory" from your buffer.
g_position += size;
// Some more code...
BASE64DECODE_DecodeEx(..., your_custom_allocator, your_custom_deallocator);
A guarantee that this "allocator function" is only ever called once.
Use whatever mechanism is available to you. Use a global condition variable, check it atomically every time you're entering your custom allocator, increment after a successful allocation. I don't know your system's constraints, nor should I...
If I pass in an allocator that returns the statically allocated buffer, then the second call to it must abort loudly.
You realize you're arguing that a custom probably buggy heap implementation isn't a heap right?
We're done. "It's OK, you can just write your own heap-like API!" is just not remotely responsive to the kind of problems I'm talking about, and that you think it is is sorely tempting me to put more words in your mouth.
If you don't think these libraries are useful, that's fine. Don't use them. Don't presume to understand the application realm before you've worked in it.
// Put this in header to help user calculate allocation needs but hide size from user
size_t LIBNAME_alloc_size(param1, param2, ...);
// Put this in the header to hide the size from user code but allow inlined size calculations
extern const size_t LIBNAME_ALLOC_X;
// Put this in the header to make size known to user (for static const allocation)
#define LIBNAME_ALLOC_Y ((size_t)42)
I’d recommend writing a Fortran 95 wrapper which creates work arrays for you if you really need to use such routines.
Sometimes, you can probably do without any "complex ADT"; in that case, the no-malloc advice forces you to find the clean solution without complex ADT, thus it is really great.
In the rare cases when you intrinsically need a complex ADT, then you do it. The advice is a spirit, not an unbreakable constraint. Just like not using goto.
If I'm making image editor, allocating bitmaps myself is fine, if I'm writing crud app, not so much.
What's wrong with malloc?
Furthermore if the authors of the library think they know better with regard to allocation, they can provide an allocator as a separate addition to the library.
Trying to minimize the library by pushing work to the application is wrong every time you expect the library to be used more than once. Despite these objections, I like libraries that are free of IO and mallocs!
Mine? Just two functions: dns_encode() and dns_decode(). No I/O. No malloc().
 The ones I was looking at are written in C.
...by having your own arena allocator! I do agree that it is quite doable in this particular case, but I always remember that a custom memory allocator of OpenSSL made Heartbleed much more devastating.
void bmp_init(void *, long width, long height);
It has 50 functions. That's too much, but it could be reduced to 10 if the user stick to the highest level facilities. There is no dynamic memory allocation, and no I/O (actually, it doesn't even depend on libc). The structures are defined in the header to allow the user to allocate them on the stack, but looking inside is unneeded and discouraged.
In general unless you have a lot of AVX512 code to run (several ms worth), you're usually better off avoiding those instructions IME :(. (The same is also true for many AVX2 instructions...)
(See https://en.wikichip.org/wiki/intel/frequency_behavior#Base.2... for some more info, as well as the Intel optimization guide).
Regarding having lower bits poor statistically, it was known since forever that that is the case for the huge class of simple PRNGs (effectively all that are faster than the alternatives, unless maybe if there's some specialized instruction in the CPU), the question is if that is critical or not for your purposes. The author of xoroshiro128+ is of course aware of that issue, and he also writes:
"For general usage, one has to consider that its lowest bits have low linear complexity and will fail linearity tests; however, low linear complexity can have hardly any impact in practice, and certainly has no impact at all if you generate floating-point numbers using the upper bits (we computed a precise estimate of the linear complexity of the lowest bits)."
In short, if you don't know how you're going to use the PRNG and you don't have problems related to speed, sure, use the safest one. Note that "safety" is still differently understood in different use cases, e.g. take care to note that most of fast PRNGs still aren't cryptographically secure:
and that sometimes even "standardized" "cryptographically secure" turn out to be something else, e.g. the subtitle on that wikipedia page:
"NSA kleptographic backdoor in the Dual_EC_DRBG PRNG"
Also other considerations come into play when you have some specific needs and you understand the consequences: then it's not only black-and-white "safe" v.s. "not safe." For some purposes (as the mentioned generation of the floating-point numbers in some use cases) speed matters enough to sacrifice some "perfectness."
I have tested xoroshiro128+ vs splitmix64 in several procedural generation & simulation code bases in C and Swift. I could never confirm the numbers on http://xoshiro.di.unimi.it/. In fact, splitmix64 was slightly faster in all my tests with different optimizations enabled. I always assumed that's because its state only occupies a single register which certainly matters in practical applications (especially in C with its restricted calling conventions). I am not absolutely sure whether that was always the reason, though.
And on another side, 2om3r questions the speed measurements of the author, and I think he has a point: the speed measurements should be made in the context of the real use, otherwise the compilers are able to "cheat" (optimize pieces of the code away) if the example is too simple.
AES could likewise benefit from reduced rounds, but since its security margin is lower than Chacha, there's a chance it would perform a bit worse at the same quality level.
Chacha is very fast with vector instructions. Over 2.3GB per second on my core i5 skylake laptop.
Also look at BearSSL: https://www.bearssl.org/constanttime.html
2.4GB per second for AES-INI is comparable to my own measurements with AVS-256 Chacha20.
Chacha is slightly faster than Salsa, mostly because it removed some word shuffling Salsa needed for matrix transposition.
Besides, they started it, talking about AES.
What is the recommended way to generate 64-bit numbers with PCG? Just generate two 32-bit numbers and stick them together? Or does that introduce bias or bad parformance?
If you want to see why, you could play with small RNGs with 4 bits of state each, with 2-bit outputs. Then concatenate the outputs from each and check for uniformity.
For example, say the first RNG is given by the sequence (with the value of low 2 bits following the slash):
3/3, 13/1, 8/0, 11/3, 7/3, 15/3, 0/0, 2/2, 5/1, 10/2, 12/0, 9/1, 14/2, 1/1, 6/2, 4/0
and the second RNG is given by
12/0, 11/3, 15/3, 1/1, 10/2, 13/1, 4/0, 2/2, 14/2, 5/1, 8/0, 0/0, 9/1, 6/2, 7/3, 3/3.
Each have 16 unique states, and each of the four possible 2-bit outputs appear exactly 4 times in the output of a generator. So each generator is uniform.
Now create a 4-bit rng by concatenating 2-bit outputs from each generator: 3|0, 1|3, 0|3, 3|1, 3|2, 3|1, 0|0, 2|2, 1|2, 2|1, 0|0, 1|0, 2|1, 1|2, 2|3, 0|3.
This is all the 16 outputs we can get from the two generators with a period of 16 each, but you can already tell that some outputs appear more than once (0|0, 0|3, 1|2, 2|1, 3|1) and thus, obviously, there are others such as 0|1 or 0|2 that never appear!
For 4 bits of output, you really need a larger period. But even that does not guarantee uniformity when you're concatenating outputs from two independent RNGs. In fact the likelihood of getting uniformity by concatenating two random RNGs is practically nil.
On the other hand, for a single linear congruential generator, it is easy to guarantee uniformity by choosing the parameters according to the well known rules.
IIRC, the PCG C++ distribution has a 64 bit variant (it uses 128 bit integers, which are implemented in software). I don't know if the performance is better or worse than calling the 32 bit variant twice.
It's almost the opposite of good style elsewhere: aggressively procedural code with an elegant boundary.
I'd say it would be a good lesson to many who are down the or pure functional rabbit holes: perhaps there is some other sense of modularity that goes missing when "single purpose" is applied too narrowly.
Rust seems well adapted for it:
> (1) Small number of functions, perhaps even as little as one.
One annoyance with that kind of stuff is that you'll get many small dependencies rather than a few big ones. Having a simple and easy way to acquire & maintain these dependencies is useful, and Cargo provides that.
> (2) No dynamic memory allocations.
> (3) No input or output.
That's pretty much what a #[no_std] (libcore-only) library is.
> (4) Define at most one structure, and perhaps even none.
That's the bit I disagree most about, but if that's what you want the language won't stop you.
 unless it's requires nightly and uses alloc directly but that's not too common right now I think
Beyond NEWP, PL/I, BLISS, Concurrent Pascal, PL/S, PL/8, PL/M, XPL, Mesa that is.
I'd appreciate if the response is not about JS and/or C, but about the minimalist libraries in JS, C, or any other language. Should I use a large number of small libraries? Should I wrap up some of the code which I use into a libraries even if that code is just a couple of functions without any data structure?
Moreover, I'd admit that I'm a great fan of Chris Wellons blog posts which are pretty technical and original, and use some of his emacs libraries on a daily basis.
It's different in that in C you don't pull in 200 dependencies which in-turn bring in another 10+ dependencies each.
You just use 2-3 libs you need (and that they, in turn, don't require anything, or at best the POSIX standard libs), and that's it.
When the OP says a library should ideally have one function, they mean that the one function should do the brunt of the work, but leave any setup and takedown to the caller. When a node lib has a single function, this is because you are expected to import another lib for other purposes (eg left-pad for left padding, right-pad for right padding, rather than a single, generic pad library).
Each of the guidelines listed independently represent good practices, in general, when applicable. It's worth considering the design choices when developing a library. But promoting a definition of minimalism implicitly promotes an all-or-nothing approach to development. The choice to not include memory allocation should be entirely independent of the choice not to include I/O. Pretending otherwise indicates that the choices aren't driven by pragmatism.
Also in the last 5 years people have abandoned the JVM as a end all be all platform which puts you squarely back into native code again.
The interface of the library is intended to help the user of the library, not the developer of the library.
The simplest interface is the best for the user, who does not want to know anything about the implementation details.
In the ideal case, the developer and the api designer will be different persons who are not in good terms to each other. The more the developer hates the api designer, the better.
1. User-facing complexity: Keep the user interface just big enough to get the job done. Put the "90% of people" interface first and foremost and if you need to cover the other 10%, expose a different interface that's CLEARLY marked "Advanced. You probably don't need to use this". Don't get sucked into chrome plating everything.
2. Internal complexity: Keep your structures simple. Make your functions do one thing and exactly that thing, well. Keep your side effects to a minimum. And keep your dependencies low, because you can't trust that other people have done the same in their libraries.
C++ is good enough for me, but it's so slow to compile, and I don't use its most advanced features.
I wish there was a language between C and C++, without the complex semantics you can find in rust and other exotic syntax.
I don't necessarily love C or C++ in term of feature, but the syntax is just what i need. Why can't language designers write a language that is closer to C, with fancy features that don't change the language so much?
D stays true to C in this regard and offers a lot of fancy features. And the BetterC mode sounds suited to your requirements in that the language features doesn't over complicate things.
This would eliminate most of the ugliness in the post-allocation calls, allow for the deletion of most error checking code in each library, and would harden BMP parsing “for free”.
The limitations seem similar to #[no_std] in Rust, and while that's not something to strive for at all costs if you can do without it allows e.g. embedded developers or kernel/OS developers to use the work.
I certainly agree that's one of the strongest reasons to avoid allocating memory etc. It's pretty clear that not many commenters have done any work outside a hosted environment... but I guess that makes the point that we're in fairly specialised territory.
Sometimes those aren't so great. For example, C has errno, a thread-local variable that gets set to the error code of the last function you called. Why can't the function just return the error code? I think it's strange how all the Linux system calls do return error codes but the standard library puts them in errno anyway.
I really like writing freestanding C because I can avoid most of the legacy.
Functions of my own design almost always return status codes only. Actual data is returned through pointer parameters. This allows me to quickly determine the exact set of variables that are affected by any function call.