Shuffle vectors guarantee randomized sampling of a known set of elements for very cheap, with the ability to return elements to the set. That seems like it could have some very interesting niche usages.
It kind of reminded me of modern Tetris implementations. Instead of picking one tetronimo at random for the next piece, they generate random sequences of each tetronimo. However, this context never requires returning a single element to the set of options - you can just generate a new sequence as a whole once you run out of pieces.
Although... you could "flip" the shuffle-vector by swapping with the elements already taken: make a small buffer of seven elements (one for each piece), circularly iterate over it, and whenever you reach element i you swap the value with a random position in the (inclusive range from zero to i. This would basically be distributing the Fisher-Yates shuffle over the iterations, rather than running it when the pieces run out.
For some reason though, I doubt that tetris piece randomization was ever a source of performance instability for these games.
> This would basically be distributing the Fisher-Yates shuffle over the iterations, rather than running it when the pieces run out.
This obviously makes no difference for a sequence of seven tetronimos, but perhaps there are contexts imaginable where one would do similar operations over very large sequences, in which case paying costs lazily could be nice.
EDIT: I just remembered that Go guarantees that iterating over the built-in maps is guaranteed to be in random order. I wonder how that is implemented - perhaps something like this might be useful there for the situations where one often bails out of such loops early.
OTOH, that actually is kind of an interesting feature; there might be a few situations where that is desired behavior.
I'd argue it's not really important in modern systems due to the use of virtual memory. With virtual memory, you can massive reduce fragmentation since you are no longer managing a single continuous region of memory.
Fragmentation still exists, but it only applies to "small" allocations (think smaller than a page) and in most scenarios it is a result of modern memory allocators preferring performance over reduced memory usage. Memory is cheap nowadays, and performance is king.
>Does anyone have some kind of list of the most important results in the area of memory management?
There is a survey  on the subject of memory allocation that mentions Robson's bounds and other results
With respect to Rust, isn't it usable as-is through LD_PRELOAD or whatnot since Rust now uses the system allocator by default? Though obviously a proper GlobalAlloc implementation could be provided (and eventually an Alloc).
It's beyond amazing we can still innovate on something like a memory allocator using hardware tricks that have been available for nearly 40 years. Leave no stone unturned!
We thought similarly about cache performance, but weren't able to find specific examples (most of the slow benchmarks we dug into were due to unoptimized allocator performance rather than poorer caching).
Informally I think one of the reasons for this is that in long running problems, allocations returned from malloc end up being pretty random rather than contiguous -- as allocations are freed they are put in freelists, and freelists are designed more for temporal locality rather than spatial.
BTW, this is really cool project. Do you envision any difficulties of merging this into jemalloc?
In terms of contrasts it'd be like trying to merge Ruby with Python.. this project's approach stands independently. It may be better to ask what features it can gain from merging with something like jemalloc
Unfortunately I would expect this technique not to scale well with increasing page size, and I sure hope that we will be slowly moving to >4k pages as years go by since the TLB is a very common bottleneck.
We didn't look at huge pages in the paper for the reason that today, if you care most about heap footprint you can always use 4K pages rather than 2M ones. (and lots of database-like applications, like Redis, explicitly suggest disabling huge pages)
> Mesh makes meshing both efficient and provably effective (with high probability) by combining it with two novel randomized algorithms. First, Mesh uses a space-efficient randomized allocation strategy that effectively scatters objects within each virtual page, making the above scenario provably exceedingly unlikely. Second, Mesh incorporates an efficient randomized algorithm that is guaranteed with high probability to quickly find candidate pages that are likely to mesh.
> Meshing is only possible when no objects on the pages occupy the same offsets. A key observation is that as fragmentation increases (that is, as there are more free objects),the likelihood of successfully finding pairs of pages that mesh also increases.
> Compacting garbage collection in managed languages:
Also, what happens when such a page with multiple virtual mappings get paged out and paged in again? Does it stay a merged page?
So long as mprotect() is used to ensure the source page is stable for the duration of the copy and until the mremap() is complete, it should not be possible for the program to observe any difference in behaviour -- except perhaps if it tries to modify an allocation on a page that is currently being merged by malloc() running in another thread (presumably they have some kind of SIGSEGV handler to catch this)
From the pictures, I thought merging only merged the first, but rereading, I guess the former is stored in the same blocks as the latter. They still need to make sure that multi-threaded access through different virtual pages works, but that seems doable, too.
You could try to get the mesh methodology integrated into existing allocators I guess, but there is no such thing as a "mainline allocator".
I think they just mean mainlined on a particular platform. If FreeBSD were to switch its system allocator from jemalloc to mesh, for example, I’d count that as mainlining.