
Memory Allocators 101 - bradly
http://jamesgolick.com/2013/5/15/memory-allocators-101.html
======
js2
And if you want to descend down the rabbit hole - "What every programmer
should know about memory":

<http://www.akkadia.org/drepper/cpumemory.pdf>

The serialized version is at <http://lwn.net/Articles/250967/>

~~~
npsimons
Another good read along these lines:
<http://gee.cs.oswego.edu/dl/html/malloc.html>

------
silentbicycle
I had to implement my own malloc/free/etc. on an embedded project recently,
due to a lack of thread-safety in the vendor's C stdlib * . I wrote a blog
post about it ([http://spin.atomicobject.com/2013/04/17/embedded-memory-
unde...](http://spin.atomicobject.com/2013/04/17/embedded-memory-undefined-
behavior/)), with a couple lessons I learned along the way.

* There were compiler flags to generate a version of the stdlib with user-provided mutex functions for an RTOS. I did so, then confirmed (by reading the stdlib's malloc's disassembly) that it never called them!

------
tptacek
This survey paper has been one of my favorite papers (using the term loosely;
it's like a small book) since 1996 or so; it's part of what transitioned me
from network administration with a sideline in development to fulltime
software development:

[http://www.cs.cmu.edu/afs/cs/academic/class/15213-f98/doc/ds...](http://www.cs.cmu.edu/afs/cs/academic/class/15213-f98/doc/dsa.pdf)

------
mrbrowning
Bryant and O'Hallaron's _Computer Systems: A Programmer's Perspective_ has a
pretty solid overview of memory allocation strategies at just the right level
of abstraction for someone who's just getting into this kind of thing: doubly-
vs. singly-linked free lists, first-fit vs. best-fit vs. segregated fits and
so on. That was the book I used in my Intro to Systems Programming class in
college and I really loved it.

------
bcantrill
It continues to amaze me how many workloads can be dominated by malloc()
performance. In particular, last summer we at Joyent found that a production
node.js load was surprisingly dominated by small object malloc()/free()
performance -- and achieving maximum performance ultimately required not just
adding a per-thread cache, but cycle bumming the hell out of it.[1] Point is:
malloc() still very much matters, despite (or maybe because of) modern
interpreted environments.

[1] [http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-
in-...](http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-libumem/)

~~~
seanmcdirmid
I am sort of horrified by a garbage collected VM that uses small malloc/free
calls to manage memory. I also can't believe the linked article doesn't
mention what VM the node.js code was running on.

Edit: reading more of the linked articles, these benchmarks seem to be running
on V8? Which seems to be the only platform that node.js runs on...facepalm.
Still confused why V8 would be using standard malloc/free calls.

~~~
akkartik
_malloc_ doesn't have to suck: <http://www.cs.umass.edu/%7Eemery/pubs/berger-
oopsla2002.pdf>

~~~
JoachimSchipper
If you mean "lea malloc is pretty fast", ok; but going to regions, while an
enormous win in some cases, is moving the goalpoints.

(Still, thanks for posting the paper - the reaps concept is very nice. I did
expect regions to win by more than they did, though. Oh well.)

------
minimax
Memory Allocators 102: The jemalloc paper.

[http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemall...](http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf)

~~~
jamesgolick
More like Memory Allocators 405 haha

~~~
minimax
True enough, but I think the introduction is pretty tractable and contains
some pretty important (but subtle) points about memory allocator performance.
Quoth the paper:

 _It is not sufficient to measure the time consumed by the allocator code in
isolation. Memory layout can have a significant impact on how quickly the rest
of the application runs, due to the effects of CPU cache, RAM, and virtual
memory paging._

 _The only definitive measures of allocator performance are attained by
measuring the execution time and memory usage of real applications. This poses
challenges when qualifying the performance characteristics of allocators.
Consider that an allocator might perform very poorly for certain allocation
patterns, but if none of the benchmarked applications manifest any such
patterns, then the allocator may appear to perform well, despite pathological
performance for some work loads. This makes testing with a wide variety of
applications important. It also motivates an approach to allocator design that
minimizes the number and severity of degenerate edge cases._

------
lgeek
This example implementation should come with a disclaimer: keeping metadata
inside the allocated objects is a bad idea because it pollutes the caches and
increases fragmentation.

I think the problem of memory allocation becomes much more interesting in the
context of multithreaded applications. I really like the streamflow paper[0].
For similar implementations that are more widely used nowadays, see hoard[1],
jemalloc[2] and tcmalloc[3].

Edit: it turns out one of the authors of streamflow was posting in this tread:
<https://news.ycombinator.com/item?id=5722049>

[0] [http://haiocl.googlecode.com/svn-
history/r21/trunk/doc/paper...](http://haiocl.googlecode.com/svn-
history/r21/trunk/doc/papers/ismm06.pdf)

[1] <http://www.hoard.org/>

[2]
[http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemall...](http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf‎)

[3] <http://goog-perftools.sourceforge.net/doc/tcmalloc.html>

~~~
scott_s
Thanks for making my day. I would have emailed that to you, but you don't have
any contact information in your profile.

------
null_ptr
Writing my own malloc with an eye out for multi-core performance as part of an
OS university course was one of the funnest assignments I had the pleasure of
working on. It's so satisfying to shed yet another layer of magic, and get a
thorough understanding on how the fundamentals work.

~~~
richan90
We did exactly the same thing in my C programming course. Its funny how the
best way to learn is to reimplement functions and algorithms that were
revolutionary 30 years ago =)

~~~
jared314
That is why I love, and advocate, the nand2tetris[0] course. Reimplementing a
simple machine from the nand up gives a new perspective.

[0] <http://www.nand2tetris.org/>

------
sb
Managing free memory is indeed an interesting problem. Besides the simple
free-list approach sketched by the author (sidestepping the first-fit vs.
best-fit search strategy, which affects fragmentation, too), I always found
the buddy-list to be particularly interesting:

<https://en.wikipedia.org/wiki/Buddy_memory_allocation>

Unfortunately, the article is not a very good explanation, I remember having
seen a drawing of the data structure, but cannot recall from where...

 _edit_ :

It turns out that the mentioned jemalloc internally uses buddy lists as well.

~~~
scott_s
In the allocator I wrote (see the Stremflow heading: <http://www.scott-
a-s.com/projects/>), I used both a free-list and the buddy algorithm.

The free-list was for small object allocations; it maintained free-lists of
different sizes. (Say, the 8 byte free list, the 16 byte free list, the 32
byte free list, etc.) However, I obtained the memory for these lists from a
page manager, and that page manager used the buddy system. The relationship
here is that the small-object part of the allocation obtained its memory from
the page allocator; the page allocator obtained its memory from the operating
system. This system allowed me to have the benefits of a free-list (good cache
behavior for small objects allocated and used together), but low overall
fragmentation and good reuse of the free lists.

Full details are in our paper: <http://www.scott-a-s.com/files/ismm06.pdf> I
have portions of a draft of a more detailed explanation that I never got
around to finishing and publishing.

------
rlu
By the way, for anyone interested in how the memory manager is implemented in
Windows 8, I watched this video a few months back and remember it being pretty
interesting.

[http://channel9.msdn.com/Shows/Going+Deep/Inside-
Windows-8-G...](http://channel9.msdn.com/Shows/Going+Deep/Inside-
Windows-8-Greg-Colombo-Heap-Manager)

