
Minimalist C Libraries - Cmerlyn
http://nullprogram.com/blog/2018/06/10/
======
xtrapolate
> "The library mustn’t call malloc() internally. It’s up to the caller to
> allocate memory for the library. What’s nice about this is that it’s
> completely up to the application exactly how memory is allocated. Maybe it’s
> using a custom allocator, or it’s not linked against the standard library."

OP's approach will indeed work for most "minimalist"/single-header libraries,
but, I personally feel it pollutes the API you're exposing to your users.

Depending on the specific situation, I may sometimes choose to expose a
MODULE_CreateObject() and a MODULE_CreateObjectEx(custom_allocator,
custom_deallocator).

Internally, MODULE_CreateObject() calls MODULE_CreateObjectEx(), passing the
module's default allocators and deallocators (ie. HeapAlloc and HeapFree).
This strikes me as a more balanced approach.

One caveat here, is that you must enforce consistency across usage - you don't
want some API calls to use malloc() for allocation, whilst others use
HeapFree() for deallocation, that would be a recipe for disaster.

To ensure that, I would often set the allocators and deallocators once, when
the object is first created. They may be set through the object's
initialization function, and they persist as part of the object itself.

~~~
ajross
> I personally feel it pollutes the API you're exposing to your users.

Pollution is in the eye of the beholder. There are _many_ circumstances where
a project or subset of a project needs to work without a heap, they just don't
necessarily overlap with the "application layer code in a virtual memory
process" world your intuition is calibrated against.

And sometimes this stuff needs to read a JSON object or decode base64 or utf8
too, and can't because the library is too thick.

~~~
xtrapolate
> "Pollution is in the eye of the beholder. There are many circumstances where
> a project or subset of a project needs to work without a heap, they just
> don't necessarily overlap with the "application layer code in a virtual
> memory process" world your intuition is calibrated against."

That's an argument in favor of offloading allocation/deallocation to the
library's users, which is exactly the core of my, and OP's, proposals. We're
saying the same thing here - developers should be able to determine/control
how memory is allocated and deallocated.

> "And sometimes this stuff needs to read a JSON object or decode base64 or
> utf8 too, and can't because the library is too thick."

I'm losing you here. I honestly feel that my proposal is all about keeping the
API as simple as humanely possible, without compromising the library's
flexibility when it comes to the scenarios your mentioned earlier.

In your case:

    
    
      BASE64DECODER_Decode(...)
      BASE64DECODER_DecodeEx(..., allocator, deallocator)
      BYTE * BASE64DECODER_GetDecodedBuffer(handle)
      BASE64DECODER_Free(handle)

~~~
ajross
But what if I don't have a heap? Not even a wrappable heap.

I could be an OS bootstrapping layer, a signal handler, an ISR, a process
control project operating under strict 'No dynamic allocation!" rules, a
thunking layer to get legacy code modes (BIOS says hi!), ...[1]

You're imagining a world where everything is Node or Python or Java, or at the
worst C on top of the well-defined standard library. And I'm telling you that
the world is bigger than that.

And more specifically, that those weird layers sometimes need library code
too.

[1] (Edited to add) A malware payload, a tracing layer, a compiler-generated
stub, a benchmarking hook that can't handle heap latency, ...

~~~
xtrapolate
> "You're imagining a world where everything is Node or Python or Java, or at
> the worst C on top of the well-defined standard library. And I'm telling you
> that the world is bigger than that."

Why do you keep putting words in my mouth?

> "But what if I don't have a heap? Not even a wrappable heap."

I'm forced to repeat myself over again. At no point does my proposed API force
you to rely on a heap. On the contrary, it lets you rely on whatever solution
works best for you, in your specific case.

In your custom kernel project, your custom allocator() can return a buffer
from a memory pool you handle yourself. Your custom deallocator() will reclaim
that memory back into your custom memory pool.

In a different project, say a desktop app for Windows 10, the allocator() will
simply call malloc(), and the deallocator() will call free().

This way, your allocator() can do whatever. Your deallocator() can do
whatever. How is this restrictive in any way shape or form?

~~~
ori_b
> In your custom kernel project, your custom allocator() can return a buffer
> from a memory pool you handle yourself. Your custom deallocator() will
> reclaim that memory back into your custom memory pool.

I don't have either. I have a statically allocated buffer big enough for one
frame of data, and I need to guarantee that it never gets used twice. My code
does not have a custom allocator. It does not allocate.

~~~
xtrapolate

      void * your_custom_allocator(size_t size)
      {
          // Handle your locks.
          // Sanity checks, assertions, bounds checks, etc...
    
          void * result = g_your_buffer + g_position;
    
          // "Commit Memory" from your buffer.
          g_position += size;
    
          // Some more code...
    
          return result;
      }
    

Now, you can happily use:

    
    
      BASE64DECODE_DecodeEx(..., your_custom_allocator, your_custom_deallocator);
    

What else do you need? You seem very disturbed by the use of the word
"allocator" here, feel free to rename to whatever works for you.

~~~
ori_b
> What else do you need?

A guarantee that this "allocator function" is only ever called once.

~~~
xtrapolate
Are you seriously suggesting this is an issue? That's entirely up to you to
solve in your custom allocator.

Use whatever mechanism is available to you. Use a global condition variable,
check it atomically every time you're entering your custom allocator,
increment after a successful allocation. I don't know your system's
constraints, nor should I...

~~~
ori_b
I'm not talking about concurrency. I'm talking about needing to know _exactly_
how many bytes are being allocated ahead of time, because I've got 192k of
ram, and 112k of them are spoken for by I/O buffers.

If I pass in an allocator that returns the statically allocated buffer, then
the second call to it _must_ abort loudly.

------
panic
These “single-file libraries” share a lot of the same ideals:
[https://github.com/nothings/single_file_libs](https://github.com/nothings/single_file_libs)

------
nwmcsween
The no malloc within the library cannot be overstated, let the user of the
library decide what is best and do not do some junk like `some_fn()` and
require an implicit `some_free()`

~~~
torstenvl
I disagree. This is an artificial limitation that severely hampers the
library's capabilities. That kind of discipline may be fine for normal linear
string-crunching, but it's cumbersome to an asinine degree if your library
needs to do any sort of complex ADT manipulation.

~~~
jgtrosh
It calls to mind Fortran-style function docs which would give a formula to
compute the size of a work array for the caller to provide. That really ties
the implementation to the header. It was really boring to work with, though it
often pushed you to understand what was going on inside.

~~~
pletnes
This kind of interface was common in fortran 77. I’m guessing you’ve used
BLAS/LAPACK? In more modern fortran, it’s not so usual. Also, Fortran is much
more high level than C, and how memory is allocated etc is compiler dependent
to a much larger degree than for C.

I’d recommend writing a Fortran 95 wrapper which creates work arrays for you
if you really need to use such routines.

------
Mankaninen
With respect to the bmp example: 1\. Removing the bmp_get implies that the
application needs to implement a shadow image if it needs to check the color o
a certain pixel. Or even worse, take a direct peek in its void memory area.
2\. void pointers instead of bmp_pointers makes it easier to create a mess,
the compiler will not tell you that you called the bmp library with a pointer
to a jpg memory area. 3\. Not doing range checking in the library - but
imposing that burden on the caller - is a bad practice. If the caller does the
same - expects his caller to do the checking - we end up with a sequrity risk.

Trying to minimize the library by pushing work to the application is wrong
every time you expect the library to be used more than once. Despite these
objections, I like libraries that are free of IO and mallocs!

~~~
spc476
When I fell down into the rabbit hole of DNS, I wrote code to just encode and
decode DNS packets [1]. All the existing libraries [2] had a complex API that
provided a separate function for querying a few record types (A, AAAA, MX,
TXT, SRV, maybe NS and SOA), leaving the rest unimplemented. They also tend to
have complex network architectures to handle retries, caching, and parallel
queries which could be hard to integrate into a project that had an existing
network framework.

Mine? Just two functions: dns_encode() and dns_decode(). No I/O. No malloc().

[1] [https://github.com/spc476/SPCDNS](https://github.com/spc476/SPCDNS)

[2] The ones I was looking at are written in C.

~~~
lifthrasiir
> No malloc().

...by having your own arena allocator! I do agree that it is quite doable in
this particular case, but I always remember that a custom memory allocator of
OpenSSL made Heartbleed much more devastating.

~~~
spc476
... of memory passed in by the user! So it's up to the caller to make sure
memory contains unclassified information.

------
loup-vaillant
I'm happy to say my crypto library¹ satisfies most of his criteria:

It has 50 functions. That's too much, but it could be reduced to 10 if the
user stick to the highest level facilities. There is no dynamic memory
allocation, and no I/O (actually, it doesn't even depend on libc). The
structures are defined in the header to allow the user to allocate them on the
stack, but looking inside is unneeded and discouraged.

[1]: [https://monocypher.org](https://monocypher.org)

------
stochastic_monk
It's worth pointing out that his two favorite RNGs
(xoroshiro128+/xorshift128+) both fail BigCrush. According to [0] and the
associated github [1], for a statistically strong RNG which is still fast,
AES-CTR or splitmix64/lehmer64 are probably your best bet, unless you have
AVX512, in which case an SIMD-accelerated PCG is the way to go [2]. (The other
methods cap out at 1 cycle per byte, while the AVX512 PCG is 1 cycle per
32-bit integer, 4x as fast as the fastest previously tested.) While I don't
doubt it could be further accelerated, I've added STL compatibility, templated
unrolling, and provided some extra utilities (including random access) in a
package based off code from [1] (provided by Samuel Neves) which I now use in
most of my projects, and which is available at [3].

[0] [https://lemire.me/blog/2017/09/08/the-xorshift128-random-
num...](https://lemire.me/blog/2017/09/08/the-xorshift128-random-number-
generator-fails-bigcrush/)

[1]
[https://github.com/lemire/testingRNG](https://github.com/lemire/testingRNG)

[2] [https://lemire.me/blog/2018/06/07/vectorizing-random-
number-...](https://lemire.me/blog/2018/06/07/vectorizing-random-number-
generators-for-greater-speed-pcg-and-xorshift128-avx-512-edition/)

[3] [https://github.com/dnbaker/aesctr](https://github.com/dnbaker/aesctr)

~~~
acqq
The way the mentioned PRNGs "fail" is when testing just the lower bits (search
for the occurrences of "lsb" in [1] above) and this may be important in your
use cases or not. The same [1] claims in "Visual Summary" that the
"cycles/byte" is 1 for various PRNGs but
[http://xoshiro.di.unimi.it/](http://xoshiro.di.unimi.it/) seems to show that
the reason splitmix64 is not preferred everywhere is that xoroshiro128+ is
roughly two times faster than splitmix64 .

Regarding having lower bits poor statistically, it was known since forever
that that is the case for the huge class of simple PRNGs (effectively all that
are faster than the alternatives, unless maybe if there's some specialized
instruction in the CPU), the question is if that is critical or not for your
purposes. The author of xoroshiro128+ is of course aware of that issue, and he
also writes:

"For general usage, one has to consider that its lowest bits have low linear
complexity and will fail linearity tests; however, low linear complexity can
have hardly any impact in practice, and certainly has no impact at all if you
generate floating-point numbers using the upper bits (we computed a precise
estimate of the linear complexity of the lowest bits)."

In short, if you don't know how you're going to use the PRNG and you don't
have problems related to speed, sure, use the safest one. Note that "safety"
is still differently understood in different use cases, e.g. take care to note
that most of fast PRNGs still aren't cryptographically secure:

[https://en.wikipedia.org/wiki/Cryptographically_secure_pseud...](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator)

and that sometimes even "standardized" "cryptographically secure" turn out to
be something else, e.g. the subtitle on that wikipedia page:

"NSA kleptographic backdoor in the Dual_EC_DRBG PRNG"

Also other considerations come into play when you have some specific needs and
you understand the consequences: then it's not only black-and-white "safe"
v.s. "not safe." For some purposes (as the mentioned generation of the
floating-point numbers in some use cases) speed matters enough to sacrifice
some "perfectness."

~~~
stochastic_monk
I use fast RNGs for kernel projections, FHT-accelerated JL transforms, and
data generation for numerical experiments. I don’t need cryptographic security
for these purposes.

~~~
acqq
And if you generate floating point numbers maybe you don’t have to worry about
lsbs alone either? The “failed” tests drop away the bits that matter the most
when floating point randomness is constructed.

~~~
stochastic_monk
I don’t know the details on which bits matter in the ziggurat algorithm, which
is the one I use. Is this the case for all floating point random number
generators?

~~~
acqq
The PRNGs you mentioned before all generate integers. The conversion from the
integer PRNG to the floating point, and the conversion from one integer range
to another range needed in the ziggurat algorithm both need to be done "right"
to give correct results (I can imagine that even using splitmix64 the wrong
implementation could be programmed by somebody not knowing what has to be
done), so if you aren't sure about these steps you should surely check their
quality yourself. If these are done right, I'd personally expect xoroshiro128+
could be "good enough" even when having that specific weakness (poorer quality
when using only lsb bits) that you worried about. It's important, of course,
not to drop the highest bits away.

And on another side, 2om3r questions the speed measurements of the author, and
I think he has a point: the speed measurements should be made in the context
of the real use, otherwise the compilers are able to "cheat" (optimize pieces
of the code away) if the example is too simple.

------
baybal2
This man deserves a monument. The number of such disciplined C coders who can
articulate what a proper C coding is is dwindling with each year.

~~~
mjburgess
Its a style of programming C was built for, and not many other languages.

It's almost the opposite of good style elsewhere: aggressively procedural code
with an elegant boundary.

I'd say it would be a good lesson to many who are down the or pure functional
rabbit holes: perhaps there is some other sense of modularity that goes
missing when "single purpose" is applied too narrowly.

~~~
masklinn
> Its a style of programming C was built for, and not many other languages.

Rust seems well adapted for it:

> (1) Small number of functions, perhaps even as little as one.

One annoyance with that kind of stuff is that you'll get many small
dependencies rather than a few big ones. Having a simple and easy way to
acquire & maintain these dependencies is useful, and Cargo provides that.

> (2) No dynamic memory allocations.

> (3) No input or output.

That's pretty much what a #[no_std] (libcore-only) library is[0].

> (4) Define at most one structure, and perhaps even none.

That's the bit I disagree most about, but if that's what you want the language
won't stop you.

[0] unless it's requires nightly and uses alloc directly but that's not too
common right now I think

------
tushartyagi
I just want to understand that given the mostly negative reaction that's here
at HN to the npm ecosystem which is based on the same "minimalist libraries"
idea, how is this different?

I'd appreciate if the response is not about JS and/or C, but about the
minimalist libraries in JS, C, or any other language. Should I use a large
number of small libraries? Should I wrap up some of the code which I use into
a libraries even if that code is just a couple of functions without any data
structure?

Moreover, I'd admit that I'm a great fan of Chris Wellons blog posts which are
pretty technical and original, and use some of his emacs libraries on a daily
basis.

~~~
coldtea
> _I just want to understand that given the mostly negative reaction that 's
> here at HN to the npm ecosystem which is based on the same "minimalist
> libraries" idea, how is this different?_

It's different in that in C you don't pull in 200 dependencies which in-turn
bring in another 10+ dependencies each.

You just use 2-3 libs you need (and that they, in turn, don't require
anything, or at best the POSIX standard libs), and that's it.

------
goofballlogic
Love reading this stuff. When I grow up I want to be a C programmer.

------
sevensor
Maybe I'm just noticing it more, but I think there's more discussion of how to
write C well than there used to be. I speculate that the attention Rust has
brought to safe systems programming has caused an uptick in interest in
closer-to-the-metal languages in general, and spurred C programmers to show
that there are reasonable ways to write C as well. It may be an unanticipated
result of Rust's popularity that the quality of C programming improves. (Or
perhaps that was the plan all along?)

~~~
Gibbon1
What I've noticed is the lead time for all sorts of things used in small
embedded systems has gotten terrifyingly long. What says to me that there is a
lot of embedded work going on.

Also in the last 5 years people have abandoned the JVM as a end all be all
platform which puts you squarely back into native code again.

------
vortico
Most things here are reasonable, but I don't see a point about having only one
struct. If your state is better organized in lots of hierarchical structs,
within lists, within other structs, your data will be easier to move around,
copy, and zero in smaller chunks, and you can write functions which processes
isolated segments of data rather than a huge global state.

~~~
matthiasv
The point was about the user-facing side not the internal representation of
state.

~~~
vortico
Even then, it seems weird to prefer a big flat state rather than a state made
of heirarchical sub-structures and arrays.

~~~
yason
As the user of the library I _probably_ don't care about the hierarchies. If I
want X the library can provide me a function to get X by looking up its
internal substructures and arrays so that I don't have to.

------
kstenerud
This basically echoes my library building philosophy. The two biggest things
are:

1\. User-facing complexity: Keep the user interface just big enough to get the
job done. Put the "90% of people" interface first and foremost and if you need
to cover the other 10%, expose a different interface that's CLEARLY marked
"Advanced. You probably don't need to use this". Don't get sucked into chrome
plating everything.

2\. Internal complexity: Keep your structures simple. Make your functions do
one thing and exactly that thing, well. Keep your side effects to a minimum.
And keep your dependencies low, because you can't trust that other people have
done the same in their libraries.

------
jokoon
I don't think there are languages similar to C in term of simplicity, close to
the metal and speed.

C++ is good enough for me, but it's so slow to compile, and I don't use its
most advanced features.

I wish there was a language between C and C++, without the complex semantics
you can find in rust and other exotic syntax.

I don't necessarily love C or C++ in term of feature, but the syntax is just
what i need. Why can't language designers write a language that is closer to
C, with fancy features that don't change the language so much?

~~~
felixangell
Maybe you would like the BetterC mode in the D programming language... if you
mean you like C for it's syntax too. A lot of modern systems prog. languages
seem to adopt a more modern syntax, i.e. types after the names rather than
before, no semi-colons, etc.

D stays true to C in this regard and offers a lot of fancy features. And the
BetterC mode sounds suited to your requirements in that the language features
doesn't over complicate things.

------
joveian
Personally, I'd add "don't write to your own memory" (with various stack use
rules based on expected library use) and relax the "no structures" rule to
encourage code that can be used from multiple threads simultaneously. Make the
first argument always be the internal use data.

------
hedora
I’d argue these APIs could be further minimized by using a struct with a void
* and a size_t in it, along with bounds checking accessors.

This would eliminate most of the ugliness in the post-allocation calls, allow
for the deletion of most error checking code in each library, and would harden
BMP parsing “for free”.

------
sytelus
Please do not do this! There is so much bad advice in this article. Remember
the rule: Your code should be simple, but not simpler. There is absolutely no
need to abandon great facilities afforded by language and libraries to make
things unreadable, undebugable and unmaintainable.

~~~
masklinn
Abandoning _some_ facilities does not "make things unreadable, undebugable and
unmaintainable" and can allow them being used much more widely.

The limitations seem similar to #[no_std] in Rust, and while that's not
something to strive for at all costs if you can do without it allows e.g.
embedded developers or kernel/OS developers to use the work.

~~~
berti
> e.g. embedded developers or kernel/OS developers to use the work.

I certainly agree that's one of the strongest reasons to avoid allocating
memory etc. It's pretty clear that not many commenters have done any work
outside a hosted environment... but I guess that makes the point that we're in
fairly specialised territory.

