Hacker News new | past | comments | ask | show | jobs | submit login
Memory Allocators 101 – Write a simple memory allocator (2016) (arjunsreedharan.org)
271 points by DonbunEf7 5 months ago | hide | past | web | favorite | 51 comments

FreeRTOS also has a nice collection of different heap implementations with varying degrees of sophistication. They are a good read for those interested in such things.



For the uninitiated, RTOS == Real-Time Operating System. Real-time OSes have constraints that don’t exist in non-real-time OSes.

RTOS on Wikipedia: https://en.m.wikipedia.org/wiki/Real-time_operating_system

K&R includes an example memory allocator as well: https://stackoverflow.com/questions/13159564/explain-this-im...

To me, the K&R one seems much less readable and also doesn't include the global malloc lock, since even the updated edition of K&R predates standardised threading.

I note it also does the "bp = (Header *)ap - 1;" trick, so if that's undefined behaviour then it's a good example of how hard it is to write C without relying on UB.

Good timing. I literally had to do this on a whiteboard in an interview about 11 hours ago.

Interesting things to consider: Fragmentation prevention, real-time performance, minimizing locking(lock-free techniques, or per-thread free lists), and reusing the freed memory to contain the free list structure. I basically started out whiteboarding what the article lays out and by the end of the interview realized everything wrong with it. It's a good starting point though.

If I may ask, what position/designation you were interviewing for?

I was asked this as well few months ago for a New Graduate Software Engineer position to work on embedded avionics.

Embedded/Platform engineering, working on a new OS based around a microkernel.

Please don't write custom memory allocators for production code without seriously considering the security implications of doing so. The glibc allocator has benefited from years of security hardening against very creative attack vectors.

You don't want a simple double-free to lead to an RCE bug, do you...

I gotta say this is a very topical read. Was messing around last week or so trying to learn some x86 Assembly from scratch (Linux subsystem on Windows,) and memory was very much a sticking point. Seeing this is helping me grok just what is sorta going on, which is the best way for me to learn I think.

While I strongly support the idea that no one should write a generic allocator in production, writing it as an exercise is a very good idea.

The article looks at lot like the tutorial I wrote a long time ago ... (Every now and then, I see my old PDF in post about implementing allocators, which is disturbing since I wrote it in a hurry as a quick support for a lecture and I found it very poorly written ... )

I think it's interesting to note that using sbrk(2) is a bit deprecated, it's way easier to implement malloc(3) using mmap(2) ...

There's also better ways to handle pointers arithmetic and list management. Someday I'll put only cleaner version only ... Someday ...

It's worth noting that

    realloc(ptr, 0) 
behavior is undefined. The vast majority of modern C libraries will implement it as

    return free(ptr), NULL;
and it will be documented on man pages as such, but there are systems where this will be equivalent to

    return free(ptr), malloc(0);
Furthermore, in theory, this is also permitted:

    return NULL;
so as tempting as realloc() might be as a single override for implementing custom allocators, there are some worms.

Realloc's behaviour is well-defined: https://port70.net/~nsz/c/c11/n1570.html#

If the size of the space requested is zero, the behavior is implementation-defined: https://port70.net/~nsz/c/c11/n1570.html#7.22.3

Being implementation defined is hardly well defined.

Well, it's the return value in realloc(ptr, 0) case that is implementation-specific, but its behavior _is_ well-defined - it is expected to "deallocate the old object".

My point was that in reality the behavior varies. "Undefined" in a common tongue sense, not in the terms of the C standard.

No... maybe? It's probably arguable, even if it's not an interesting argument.

Either way, realloc's behaviour is well-defined when the pointer is null.

Knuth in one of his volumes ( I think vol I) has described this very nicely. way better than what was in K&R.

  header = (struct header_t*)block - 1;
Isn't this UB?

The cast is defined by ISO/IEC 9899:1990 §, since block is a pointer to void, and struct header_t is an object type:

"A pointer to void may be converted to or from a pointer to any incomplete or object type."

The subtraction is defined by §, provided that block points to an element of a large enough array object:

"When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression."

(There's similar text in other versions of the C standard.)

That part of the standard only covers the cast. It means that you won't mangle a pointer if you cast it to a void pointer and then back to the original pointer type.

Accessing the data that is being pointed at is another matter entirely. You must satisfy alignment constraints. You also must not read any memory as a type other than what it was written as, aside from a very limited exception for type char.

There doesn't seem to be any pointer dereference on the line that paavoova quoted, so I don't see how your comment applies.

The trouble is that it is very easy to interpret things the wrong way. You showed that the casting is OK. People will tend to wrongly assume that they are home free at that point, and everything will be standards-compliant. Most people don't realize that the dereference itself can be a problem. Casting is very frequently followed by non-compliant dereferences. The gcc warnings about strict aliasing do not catch all the problems. Adding more casts, including one to a void pointer, is a common way to make warnings go away without actually stopping the compiler from breaking non-compliant code.

Looking at the full code on the web site, I think it is compliant but dangerous. It is decently likely to trigger gcc bugs.

Not quite, (void *) is special in this regard IIRC, specifically for cases like this. It's a valid pointer but you're required to ensure that any platform requirements like alignment and size will match the requirements of what you're doing.

No, (void⁎) is not special in this way. It just makes things look nicer.

The code also isn't undefined behavior... but you are really asking to hit compiler bugs! This is an easy way to confuse gcc into wrongly determining that the code has undefined behavior, and if gcc gets confused then it may determine that a code path can't be taken. Code paths that can't be taken may be deleted.

The main rule here is that memory has a type which is determined by what was last written into it, and you may only read or examine the memory using that type. (for the type, we ignore attributes like the distinction between signed and unsigned) There is a minor exception that is just enough to implement something like memcpy by using a (char⁎) to read and then write as a char. You still aren't supposed to look at that char. These rules apply to memory accessed via pointers, no matter how you cast them, and to memory accessed via union members.

Real compilers differ from that:

Every compiler I'm aware of will not enforce the rules for unions. The gcc compiler promises not to enforce the rules in this case.

Every compiler I'm aware of will let you look at any data that has been read as a char, so the memcpy trick works and you can do things like determine endianness at runtime.

It is legit to initialize a type X variable, take the address of it, cast it from (X⁎) to (Y⁎), pass it through arbitrary data structures and functions to hide the origin from the compiler, cast the (Y⁎) back to (X⁎), and then access the type X variable. If you do this, gcc may generate bad code.

Is there a way then to write compliant/non-UB/non-buggy memory allocator/GC in C/C++?

The moment you call sbrk or mmap, you're outside of standard C, so no. Treating a pointer as an integer in order to mess with the bits is also a violation.

Aside from that, the style used here is probably OK. It is hard to say what exactly would trigger the gcc bugs, but I'm pretty sure that a recent gcc would be OK for this code.

> Treating a pointer as an integer in order to mess with the bits is also a violation.

Violation of what exactly? Converting a pointer to an integer, and vice versa, is implementation defined. As long as you're not trying to write implementation-independent code, it's perfectly fine.

Sure, you can convert, but the whole point of converting is to do things that violate the C standard. In theory, a standard-conforming C implementation could have the bits of a pointer be encrypted by the CPU. There is nothing meaningful that you could do to those bits. Rounding pointers up or down for alignment is impossible in standards-conforming C code.

So UB, then? All complaint solutions I could find to this use memcpy() instead. Some use __attribute__ ((__packed__)) for structs, but that seems to have its own gotchas (https://stackoverflow.com/a/7956942), but not relevant here.

You can't use memcpy here, you have to modify the original memory to mark it as freed. You also can't allocate a new buffer to do the copy into because you are the allocator.

The C standard specifically states that a cast (T ) to and from (void ) is validly defined behavior and that you must get the original pointer back. It's also valid to go from (T ) to (U ) and back again if and only if T and U have the same alignment requirements. (void ) is required to have no alignment requirements because you can't directly de-reference it since the result would have type (void).

__attribute__((__packed__)) is a completely different topic about how the layout of the struct gets decided and what padding might be used.

What's going on here is that malloc() gives the caller a (void ) that points to one position past a (struct header_t ) and then free is casting it back from (void ) to (struct header_t ) and then going back one element in the set to get the original header with the meta-data about the allocation, this is perfectly fine because the pointer is never anything other than (void ) or (struct header_t ) as far as the language is concerned. The caller might turn the element after the (struct header_t ) into something else but the original (struct header_t *) is always the same.

"You also can't allocate a new buffer to do the copy into because you are the allocator."

You can use the stack in this case.

> The caller might turn the element after the (struct header_t ) into something else

Isn't this exactly the issue with this malloc implementation? The pointer returned that points past the header by sizeof(header) may not be aligned for subsequent types.

This is a problem inherit to all malloc() implementations, since it never gets any type information it can't ever make any adjustments about alignment of the final type that's involved. I'm not actually sure what the fully correct way to ensure that would be.

You too are losing your asterisks. Maybe a bit of Unicode can help. The only thing not stripped out by Hacker News seems to be this:

⁎ e2 81 8e

Damn markdown :). Looks like it's far too late for me to fix it now, but I'll have to keep that in mind in the future.

( * Or an asterisk surrounded by two spaces; * )

The piece of memory was previously allocated with a struct header_t at sizeof(struct header_t) bytes below the passed in block pointer.

So just recovering and using a pointer to that struct header_t from the block pointer in this way is fine.

So I suppose that header = (struct header_t*)(block - 1); would be UB then.

Haven't read the article but if block is void *, this won't compile because sizeof(void) is not defined. And they are probably trying to subtract the sizeof(struct header) hence the expression in your original comment.

I think it would be better to go with char* instead of void*. A much better solution is to use a hashtable for your blocks, but I understand this is just a toy example.

I think he also forgot to implement memalign(3) and posix_memalign(3).

Is there any effort to extract ORCA from Pony (which is better than Erlang's and Azul's C4)?

Good for didactic purposes but in the real world you may want to try an allocator like tcmalloc or jemalloc.

In the past I've sometimes achieved 20% or better speedups by writing custom allocators (speedup on overall performance, not just malloc performance). tcmalloc or jemalloc are great for the general case, but sometimes you know invariants about object sizes, alloc patterns and free patterns that allow much more performant allocation.

The simplest case is if you know that you will free everything at once, or nothing at all. This allows you to eliminate most bookkeeping and allows a completely lock-free architecture. But there are also more complex cases where you can still get big benefits from exploiting known invariants.

> but sometimes you know invariants about object sizes, alloc patterns and free patterns that allow much more performant allocation.

jemalloc allows you to query these invariants if you don't know them, and to use the information to re-configure the allocator to match them :/

I'm pretty sure most modern allocators allow you to do this as well.

> The simplest case is if you know that you will free everything at once, or nothing at all.

That's pretty much a one liner with jemalloc.

Or even the standard memory allocator provided by your system. I'm pretty sure this article was meant as a way to understand how malloc works and not as a high-performance replacement for the one you're currently using.

I’ve heard many gamedevs make specialized allocators for some of their code (the most being arena allocators)...

It's fairly common to avoid naked malloc()/free() in systems with real-time requirements. Memory pools are a great way to go if you want deterministic behavior and better reliability.

This. If you're going to malloc/free equally sized objects often but randomly, a pool allocator can be a great improvement.

With real time requirements you often care about your response time or worst case execution time. In some areas of embedded, safety critical systems you're usually prohibited from using heap at all (instead, stuff is put in global variables or on the stack - so you're only growing in one direction).

All major game engines have custom memory allocator.

Not sure why you are downvoted, but this is a toy allocator.

It's downvoted because it's missing the entire point of this article?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact