
Memory by the Slab: The Tale of Bonwick's Slab Allocator [video] - snw
http://paperswelove.org/2015/video/ryan-zezeski-memory-by-the-slab/
======
bcantrill
This paper brings back many great memories -- it's one of those that I can
remember the cafe I was in when I first read it -- and Ryan has done an
excellent job capturing its importance and influence. Speaking personally, the
influence Bonwick's work had is substantial: it was in part as a result of
being inspired by this paper that I ended up working with Jeff for the next
decade on systems software.

In terms of follow-on work, Ryan mentioned the later libumem work[1], but it's
also worth mentioning Robert Mustacchi's work in 2012 on per-thread caching in
libumem[2]. And speaking for myself, I am indebted to the slab allocator for
my work on postmortem object type identification[3] and on postmortem memory
leak detection; both techniques very much relied upon the implementation of
the slab allocator for their efficacy.

Thanks to Ryan for bringing broader attention to a terrific systems paper --
and truly one that I personally love!

[1]
[https://www.usenix.org/legacy/event/usenix01/full_papers/bon...](https://www.usenix.org/legacy/event/usenix01/full_papers/bonwick/bonwick_html/)

[2] [http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-
in-...](http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-libumem/)

[3]
[http://arxiv.org/pdf/cs/0309037v1.pdf](http://arxiv.org/pdf/cs/0309037v1.pdf)

------
tkinom
With C/C++, one can get A LOT OF performance by writing custom memory
allocator that fit certain usage patterns for large scale App.

I designed a custom allocator before new/delete operator overload for C++ OO
app. You can think of the app like MS Word, when you open/create a new doc,
one need a lot of malloc(). In my case it usually between a few millions to a
few tens/hundreds billion records.

There were a lot of overhead for standard new/delete. After profiling, I end
up writing my own allocator with the following property:

* It malloc 1,2,4,8,16,32,64MB at a time. (progressively increase to optimize the app RAM footprint for small and large doc use case.

* All the large block alloc()s are associated with the "Doc/DB".

* When the Doc close, the only freeing a few large block are needed. This change make the doc/db close operations go from 30+ seconds for large Doc/DB to less than 1 seconds.

* I later modified the allocate to get the large block memory directly from a mmap() call. All the memory return are automatically persistent. The save operation also went from 30+seconds for large multiple GB DB to < 1 seconds. (Just close the file and the OS handle all the flushing, etc.)

Without ability to customize memory allocator + pointer manipulation, I can't
figure how to get similar performance for similar type of large scale app with
Golang, Java, etc.

------
mwcampbell
Perhaps this is naive, but it seems to me that a general-purpose allocator, as
Ryan defines it, is a solution to a problem that doesn't have to exist in
principle. The allocator just needs to be able to move memory blocks around,
so it can compact allocated memory. Probably the most natural way to enable
this in C-ish languages is to use relocatable handles, as implemented by the
original Macintosh memory manager and the Windows GlobalAlloc function (when
using the GHND flag). In this scheme, the allocator doesn't return a pointer,
but a handle, which has to be locked to retrieve a pointer whenever that piece
of memory is being used. As long as most handles are unlocked most of the
time, the memory manager is free to move blocks of memory around when
necessary. In Ryan's parking-lot analogy, it's as if all parked cars could be
moved around the lot at any time to avoid fragmentation. So I wonder if the
inventor of libc's malloc was aware of relocatable handles and simply chose
not to use them. Are there good reasons why we don't use relocatable handles
in C/C++ today? Or is it just the weight of arbitrary history?

