
A quick tutorial on implementing C memory management functions - redraga
http://danluu.com/malloc-tutorial/
======
k4st
I semi-recently implemented a malloc tester (am TAing an OS course). Students
write their own implementation of malloc and free, and then I "intercept"
their calls to brk and sbrk by using macros to re-define those symbols as my
own versions of those functions.

It is quite nice: I have knobs to "fuzz" the alignment of returned addresses
from sbrk, as well as to inject arbitrary EAGAINs, to see if they're following
the man pages. Another thing that I looked for was re-entrancy. This was a bit
hand-wavy, but ultimately took the form of a try-lock in my sbrk
implementation, where if the try-lock fails, their invocations of sbrk are not
being synchronized.

Another thing I did to evaluate their implementations was to poison the entire
unallocated and allocated sbrk space, poison all allocated memory blocks that
my test runner allocates, and poison all freed memory blocks before my test
runner frees memory. This all enabled me to estimate the meta-data overhead of
their malloc implementation at a byte-granularity. That is, I can search the
sbrk space for the various poisons, and then report on their relative
frequency. If the sum of the frequencies is under 100%, then I claim the
remaining percent as meta-data overhead.

The poisoning also also allowed me to detect simple bugs: if they write beyond
the returned sbrk pointer, then I can potentially catch that because the
values in memory don't match the expected poison.

Overall, it was fun to do and I could present some nice stats. For example, I
produced a histogram of malloc call sizes and sbrk call sizes. This allowed
one to quickly see if they're invoking sbrk unnecessarily often.

~~~
aktau
This sounds like a generally useful thing. It would wager that any non-toy
malloc implementation has such a thing. Perhaps it's a good idea to open-
source it so we all benefit from this spec-testing, fuzzing and histogramming?

------
nkurz
Great tutorial, Dan! Since it sounds like you are planning to continue the
series, I have a few thoughts on potential directions to go with it.

First, a link to Doug Lea's classic malloc page might be a good addition to
the resources section. His dlmalloc() is the basis for GCC's current ptmalloc.
His code is wonderfully clear and commented, both for the implementation and
the rationale behind it:
[http://g.oswego.edu/dl/html/malloc.html](http://g.oswego.edu/dl/html/malloc.html)

Second, I wonder if it would make sense to jump straight to using mmap()
instead of the classic brk()/sbrk(). I think it's no more complicated, has
more uses elsewhere, is conceptually more portable, and allows multiple
arena's to be added in a straightforward way. Are there advantages I'm not
seeing to sticking to the ancient ways?

Last, on the debugging side, I think you might want to start with an
introduction to Valgrind rather than gdb. It's a much easier learning curve,
and even for an expert it's often the better tool for the memory allocation
type bugs that are going to be most common here. Alternatively (or
additionally) some examples of the more modern Address Sanitizer that's now in
GCC and CLang would be slick: [https://code.google.com/p/address-
sanitizer/wiki/AddressSani...](https://code.google.com/p/address-
sanitizer/wiki/AddressSanitizer)

~~~
duaneb
> Second, I wonder if it would make sense to jump straight to using mmap()
> instead of the classic brk()/sbrk().

Yes, it will. brk/sbrk are terrible ways to allocate anything but more stack.
Use mmap instead, both for more control over the layout in memory, and for
more portability.

~~~
nkurz
Perhaps you know: on Linux, how do mmap() and brk() differ as far as page
initialization? Does brk() conceptually include the effects of MAP_POPULATE?
Do both of them cause virtual to physical memory mapping at the time of the
call?

~~~
duaneb
I'm not sure as to the specifics, because brk is a strict subset of mmap;
however, I would be surprised if it doesn't use the full VM substructure to
allocate the mapping immediately.

I would amend my statement before to say that, if you're looking into memory
allocation, conceptually, mmap is where everything happens these days. brk is
kept for backwards compatibility.

------
Animats
There is a much better discussion of this in vol. I of Knuth.

(Amusingly, in the original edition Knuth uses a simple algorithm (linking the
allocated spaces) in the text and leaves the better algorithm (linking the
free spaces) to an exercise. In Unix V6 (yes, this really dates me), the
"malloc" in the C library used the simple algorithm from Knuth, variable names
and all, causing O(N^2) performance problems.)

~~~
djdjenej
Fascinating---do you have a pointer to the story or some discriminating
keywords?

~~~
carussell
Here go:
[http://tech.slashdot.org/comments.pl?sid=26793&cid=2895270](http://tech.slashdot.org/comments.pl?sid=26793&cid=2895270)

------
emcrazyone
The HN title doesn't match the article title "A Quick Tutorial on Implementing
and Debugging Malloc, Free, Calloc, and Realloc" but from the article it is
obviously C.

The mentioning of sbrk in the article seems to do so in passing. Nothing
describes what it is. I would spend time on it.

I would also address virtual memory and at least let the reader know that the
underlying operating system is ultimately responsible for allocating memory.
When a modern OS uses virtual memory, you get a virtual memory pointer which
would be different than the physical address.

Linux, in particular, uses a process referred to as Optimistic Memory
Allocation to honor malloc requests.

I mention all this because I think anyone interested in these lower level
details would also be interested in how the OS and hardware are involved.

------
yason
I have "a thing" for basic system services like malloc, linker, etc.

Writing a memory allocator is a nice exercise but it's also fundamental
research: by toying around, because of sheer interest, with what everyone else
takes for granted you might come up with something that changes things for
all, big time.

For example, memory allocators were considered a "well enough solved" problem
for a long time until suddenly we had a rush of new, experimental, and/or
optimized allocators like jemalloc, tcmalloc etc. While they might not be
revolutionary they still blast the old 80's/90's implementations like no
tomorrow.

~~~
Animats
Linkers don't get as much attention as they should. The linker has the last
chance to look at the entire program before it runs. It's a good place for
final checks and optimizations.

In Modula, each module had an "init" section. The Modula linker traced the
dependencies of the "init" sections and arranged for them to be executed in a
valid order. If there was a dependency loop, that was an error. That's way
ahead of the C/C++ rules on order of initialization.

Elimination of duplicate code at link time is possible. For template and
generic instantiations, this can be a big win, especially for C++.

Some linkers support "weak links" \- if A has a weak link to B, and there are
no strong links to B, B is not loaded and the link is null. This is useful for
optional components which need some kind of startup.

~~~
nitrogen
Vegedor, it looks like an early negative comment got your account auto-killed.

Quoting dead comment:

 _vegedor 17 hours ago | link [dead]

GCC with --flto, literally link-time-optimizations, is new to me and seems to
be good. It turns everything it can into constants, even inlines functions
from pointer calls and resolves them if the arguments are also constant. That
didn't work for me without it. Dunno if that's std compliant._

------
ccurtsinger
There are quite a few extra issues that come up when you implement a usable
allocator. I'm surprised the article didn't mention them. Here are just a few:

Alignment: different systems have different minimum alignment requirements.
Most require all allocations are 8 or 16 byte aligned.

Buffer overruns: using object headers is risky, since a buffer overrun can
corrupt your heap metadata. You'll either need to validate heap metadata
before trusting it (e.g. keep a checksum) or store it elsewhere. This also
wastes quite a bit of space for small objects.

Size segregation: this isn't absolutely essential, but most real allocators
serve allocations for each size class from a different block. This is nice for
locality (similarly-sized objects are more likely to be the same type,
accessed together, etc). You can also use per-page or per-block bitmaps to
track which objects are free/allocated. This eliminates the need for per-
object headers.

Internal frees: many programs will free a pointer that is internal to an
allocated object. This is especially likely with C++, because of how classes
using inheritance are represented.

Double/invalid frees: you'll need a way of detecting double/invalid frees, or
these will quickly lead to heap corruption. Aborting on the first invalid free
isn't a great idea, since the way you insert your custom allocator can cause
the dynamic linker to allocate from its own private heap, then free these
objects using your custom allocator.

Thread safety: at the very least, you need to lock the heap when satisfying an
allocation. If you want good performance, you need to allocate objects to
separate threads from separate cache lines, or you'll end up with false
sharing. Thread segregated heaps also reduce contention, but you need a way of
dealing with cross-thread frees (thread A allocates p, passes it to B, which
frees it).

The HeapLayers library is very useful for building portable custom allocators:
[https://github.com/emeryberger/Heap-
Layers](https://github.com/emeryberger/Heap-Layers). The library includes
easily-reusable components (like freelists, size classes, bitmaps, etc.) for
building stable, fast allocators. HeapLayers is used to implement the Hoard
memory allocator, a high performance allocator optimized for parallel
programs.

------
zellyn
A (timeless) shoutout to microcorruption.com - you'll learn far more than you
wanted to about malloc and free, all while having a lot of fun :-)

------
clarry
I think fixing the calloc example and explaining it would be a perfect
opportunity to educate people about the dangers of integer overflows.

