
A simple heap memory allocator in about 230 lines of C - ingve
https://github.com/CCareaga/heap_allocator#shmall---simple-heap-memory-allocator
======
katastic
I remember years ago my friend was gong to college for CSCI and had to write a
malloc for a course. He procrastinated till the last... night... and wrote it.
In the free() routine, he simply wrote "return true;". The TA's unit-tested
code because there were over a hundred classmates to test. Well, the unit-
tests must not have been very good because he said he scored a 100.

~~~
specialist
Ha.

Ages ago, I was told to add copy protection to one of the products I managed.

I didn't want to. Hackers posted cracked versions of our apps within a few
days any way. And our legit customers hated it.

Our company's QA/Test in charge of these things signed off on my implemention.
Surprising. I knew it didn't work (as expected). But what the hell. I embraced
her acceptance (buyoff), and burned those gold master CDs, meeting our
deadline, pleasing our dealers, and had a new release to show-off at the trade
show. Woot.

About a year later, this QA/test person figured out that implementation (for
that release) never worked. She wasn't very happy with me.

~~~
kristianp
"I embraced her acceptance (buyoff)".

What does that mean? My understanding is that a buyoff is a bribe, but your
last paragraph implies it wasn't a bribe.

~~~
specialist
I meant the tester acknowledging the acceptance tests passed, so we had her
approval for release.

~~~
kristianp
Ah ok, signoff then.

------
pm215
Coincidentally, the libc malloc/free implementation in 7th Edition Unix is
also about 200 lines of C: [http://www.tuhs.org/cgi-
bin/utree.pl?file=V7/usr/src/libc/ge...](http://www.tuhs.org/cgi-
bin/utree.pl?file=V7/usr/src/libc/gen/malloc.c)

~~~
shubhamjain
Is it my unfamiliarity with C or the code is really terrible? It seems that
coding practices that we have started to think of as common sense are anything
but.

~~~
pm215
It's certainly not the style I would expect C to be written these days, but I
don't think it's terrible code. It was, 40 years ago, genuine shipping real-
world code. There's a related quote about this in the foreword that Dennis
Ritchie wrote for the reprint of the Lions commentary: "You will find linear
searches, primitive data structures, C code that wouldn't compile in 1979 let
alone today, and an orientation towards a machine that's little more than a
memory. But you will also see in the code an underlying structure that has
lasted for a long time and has managed to accommodate vast changes in the
computing environment." We're still calling malloc() and free() and realloc()
today with the semantics that the Unix designers assigned them in the 1970s,
even if the underlying implementations are significantly different. The bones
of the design are strong; the initial implementations may have been naive but
they were good enough to ship (and remember there were only a fairly small
group of people writing the whole OS and userland).

~~~
hawski
It seems that it's not entirely the same semantics:
[https://www.spinellis.gr/blog/20170914/](https://www.spinellis.gr/blog/20170914/)
(
[https://news.ycombinator.com/item?id=15246690](https://news.ycombinator.com/item?id=15246690)
)

~~~
pm215
That blog post is talking about the kernel's internal allocator, not the libc
one.

------
huhtenberg
It's not a "simple" allocator. It's an overly simplistic and largely
unoptimized one.

E.g. try and make it work well with 2 threads. Now try and make it work well
with 2 threads, one - allocating, another - freeing, etc.

Writing a memory allocator is a _fantastic_ exercise in data structures and
optimization. It's also an easy one, so making a reasonably fast allocator
from scratch is not that hard, but in the end it's much more fun to write one
than to look at someone else's results. This is also what makes these toy
allocators to be a dime a dozen.

~~~
senatorobama
Is there an allocator that uses lockless programming?

~~~
masklinn
IIRC jemalloc has per-thread pools & caches which I assume can be accessed
locklessly. I don't think the entire thing is lossless, and neither is
tcmalloc. And as far as I know these are the only significantly thread-aware
allocators (or at least production-ready ones).

~~~
bonzini
glibc has also grown a good thread-aware allocator, though that is more recent
than tcmalloc or jemalloc.

------
ChuckMcM
_" Each chunk of memory has a node struct at the begining and a footer struct
at the end._"

As a public service announcement, if you are building a heap manager for C
code, don't do this (put your heap control structures next to the allocated
memory). Sure it looks elegant, however a _VERY_ common C bug is the 'off by
one' error, and when an off-by-one can corrupt the structures that define the
heap, well it gets out of hand quickly.

Have your heap structures "far" away from the allocated memory. Yes you burn
another N bits (where N is log2(max_address)) in your heap control structures,
and yes you may have heap control structures you "don't need" pre-allocated,
but your future self will thank you for doing it this way.

------
masklinn
> In order to initialize this heap, a section of memory must be provided. In
> this repository, that memory is supplied by malloc (yes, allocating a heap
> via heap). In an OS setting some pages would need to be mapped and supplied
> to the heap (one scenario).

brk/sbrk is a bit of a pain but would it really be more complicated to use
mmap?

~~~
bjourne
mmap isn't available on Windows. On those platforms it is available on, there
could be subtle differences. If you don't care about memory protection, there
is no reason not to use malloc.

~~~
slededit
It's there in spirit. VirtualAlloc is extremely flexible. brk is really
intended for a segmented environment but it makes it almost impossible to
return commit back to the OS.

A lot of unicies allow over commit because of this where windows will fail the
alloc. The windows design is much better if you care about handling OOM.

------
senatorobama
What's the state of the art in memory allocation?

~~~
imtringued
The application acknowledges that a general purpose allocator can't be tuned
to it's specific needs and therefore implements it's own very restrictive,
specialised, simple and fast allocators.

The most simple and therefore fastest possible memory allocator usually is
just a continguous byte array and an offset. On allocation you check if there
is enough memory left and if not you request another very large block of
memory from the general purpose allocator. Otherwise you merely return the
current offset and increment the offset by the size of the allocation. Of
course the restriction here is that you can only delete all allocations at
once or none at all.

Generational garbage collectors go one step further. They are capable of
detecting whether the object is alive or dead and thus make it possible to
keep them and move them into an different memory area. Unfortunately the
garbage collector has to stop all mutator threads otherwise the mark and sweep
phase, copying phase and compation phase would not work. There do exist some
concurrent garbage collectors but they are not widely used.

------
samblr
..sometime ago had come across [ jemalloc, tcmalloc ] memory allocators whilst
solving a performance limitation of platform (in a multithreaded environment)
- I think it was jemalloc which gave real boost in performance. Worth trying
them too.

------
joelthelion
Could you use this as a malloc/free replacement or is it missing something?

~~~
masklinn
Aside from not being a good allocator, it's implemented on top of an existing
malloc so it doesn't actually work.

~~~
apenwarr
Removing the calls to existing malloc on an embedded system would be trivial.
Just point it at a predefined RAM area instead. It uses malloc only so you can
play with it on a full featured OS, not because they left something out.

------
bjourne
I suspect that some of the functions for iterating the linked lists are a big
drag on performance. Such as:
[https://github.com/CCareaga/heap_allocator/blob/master/llist...](https://github.com/CCareaga/heap_allocator/blob/master/llist.c#L68)
An optimization that most memory allocators use are to instead store the nodes
in a balanced tree, such as a red-black one. Or even better, a B-tree. That
turns those functions from O(n) to O(log n).

~~~
eps
> _An optimization that most memory allocators use..._

Got any evidence to support this claim? I call b/s, especially on the "most"
part.

Balanced trees introduce an _extra_ per-node overhead, they tend to thrash the
cache and are generally inferior to a simple array of double-linked lists
indexed by the block size range.

* An AVL tree doesn't have the per-node overhead, but it is very cache-unfriendly.

~~~
bjourne
How about the memory allocators used by Linux, Windows, BSD and Solaris? :)

It doesn't matter if the allocator uses buckets (indexed by the block size
range) or not, it still has to find blocks of suitable size, for some
definition of "suitable." So look at his get_best_fit and add_node functions.
These ensure that his alloc and free functions are O(n) at best. Clearly, we
can do better.

~~~
eps
What about them?

Are you making an educated guess that there should be a blanced tree somewhere
in there? If yes, then it's a wrong guess, because there are better options
that aren't based on a boatload of conditional branching and that _are_
routinely used in a lot of allocators. If no, there shouldn't be hard to
produce relevant code segments. Especially since these trees are virtually
everywhere as per your opening remark.

~~~
bjourne
See f.e:

Address spaces in modern operating systems. Address spaces can consist of a
large number of virtual address regions. As one example, on a typical Linux
desktop, GNOME applications and web browsers such as Firefox and Chrome use
nearly 1,000 distinct memory regions in each process. To manage this large
number of regions, most modern operating systems use structures like the ones
in Figure 1 to represent an address space. Linux uses a red-black tree for the
regions, FreeBSD uses a splay tree, and Solaris and Windows (prior to Windows
7) use AVL trees [18, 24].

[https://people.csail.mit.edu/nickolai/papers/clements-
bonsai...](https://people.csail.mit.edu/nickolai/papers/clements-bonsai.pdf)

If you know of any better search structures, then I'd love to hear about them.
I wrote my own memory allocator for a vm project and found that storing the
free list in a red-black tree was absolutely required for decent performance.

~~~
eps
This is unrelated, you are switching the subject. The context is an
application-level memory allocators, not the underpinnings of the virtual
memory systems.

You said -

> _An optimization that most memory allocators use are to instead store the
> nodes in a balanced tree, such as a red-black one_

Yet, you still haven't shown a single memory allocator that actually does
that. Except for your own.

~~~
bjourne
Now I'm beginning to think that you are trolling me. Memory allocators within
the kernel works exactly the same as those outside of it. But for one stand-
alone library using balanced trees, look at jemalloc.

~~~
eps
Okay. "Exactly the same as outside of it."

jemalloc uses trees (off the fastpath nonetheless), but hoard doesn't,
dlmalloc doesn't, etc.

Common sense derived from CS101 on data structures doesn't directly translate
to what's happening in the real world. Your opening comment was pure armchair
athelitics.

~~~
bjourne
Most memory managers worth their salt tend to cache commonly used sizes,
that's why tree accesses aren't on the fast path.

I have now looked at Hoard. It does use std::map which is indeed a red-black
tree.

------
saurik
I was extremely happy to see this as I have been trying to find a single-
threaded memory allocator (one which doesn't have any reliance on any form of
threading primitives that I simply don't have on the platform I am dealing
with, which I point out to make clear that I am not looking for this due to
any delusion about performance), but since this doesn't have a license on the
code it is essentially useless :(.

~~~
yxhuvud
Did you consider opening an issue about it?

------
Keyframe
Neat! See also TLSF
[https://github.com/mattconte/tlsf](https://github.com/mattconte/tlsf)

------
mbrumlow
A silly allocator i wrote for fun one night is also under 200 lines of code.

[https://github.com/mbrumlow/toyos/blob/master/kernel/heap.c](https://github.com/mbrumlow/toyos/blob/master/kernel/heap.c)

Admittedly mine has no test or does any fancy stuff -- just needed a simple
allocator so I could move forward with a toy kernel.

------
8fingerlouie
Reminds me of an old project of mine for an embedded platform that needed a
memory allocator. Also around 200 lines if you leave out the stdlib functions.
[https://github.com/jinie/memmgr/blob/master/memmgr.c](https://github.com/jinie/memmgr/blob/master/memmgr.c)

------
pjc50
The implementation of malloc in K&R is about 20 lines...

~~~
masklinn
It's closer to 50 lines if you don't exclude the morecore helper, and that's
using the pretty opaque, outdated and golfy style of K&R itself.

