
The Hoard Memory Allocator - vgnet
http://www.hoard.org/
======
gregburd
Hoard is fine, but if we're going to start talking about various scalable
memory allocators designed for concurrent, multi-core/cpu systems then...

libumem is the user space slab memory allocator first available in Solaris 9
(SunOS 5.4) now the default allocator on Solaris (and Illumos, SmartOS,
OpenIndiana, etc.). There is a fork of libumem that has been ported to other
popular operating systems, such as Linux, Windows and *BSD systems (including
Darwin/OSX) by OmniTI (<https://labs.omniti.com/labs/portableumem>). I
maintain a fork of portable libumem (<https://github.com/gburd/libumem>) that
includes changes made by Joyent as part of their ongoing work to improve
SmartOS.

I have deployed this allocator to dozens of production systems to improve the
performance of highly concurrent memory-intensive applications (such as Riak)
and found it to be an excellent, stable and fast allocator.

In addition to fast allocations it includes excellent statistics and memory
leak detection
([https://blogs.oracle.com/pnayak/entry/finding_memory_leaks_w...](https://blogs.oracle.com/pnayak/entry/finding_memory_leaks_within_solaris))
as well as a few different allocator heap-fit algorithm choices.

It is licensed under the CDDL.

~~~
vgnet
There's a very interesting comparison between libumem (mtmalloc) and Hoard at
[http://www.oracle.com/technetwork/articles/servers-
storage-d...](http://www.oracle.com/technetwork/articles/servers-storage-
dev/mem-alloc-1557798.html) (HN link:
<https://news.ycombinator.com/item?id=3755621>). How do you view this class of
allocators to tcmalloc, jemalloc, ptmalloc3, etc., specially in the concurrent
multi-cpu scenario?

Here's what I'll be reading on allocators tonight, any further suggestions
(following all the links in comments is part of the deal)?

<http://en.wikipedia.org/wiki/C_dynamic_memory_allocation>

[http://www.facebook.com/notes/facebook-
engineering/scalable-...](http://www.facebook.com/notes/facebook-
engineering/scalable-memory-allocation-using-jemalloc/480222803919)

<http://locklessinc.com/benchmarks_allocator.shtml>

[http://blog.reverberate.org/2009/02/20/one-malloc-to-rule-
th...](http://blog.reverberate.org/2009/02/20/one-malloc-to-rule-them-all/)

------
TwoBit
We've looked into Hoard. It's fast only if you have a large number of threads
that are -constantly- allocating memory, and even then it wastes more memory
than a regular heap. The large majority of real-world applications don't do
this, but some server applications might. I think Hoard could be useful for
that. On the other hand, if performance was critical to you then you'd change
your code to not allocate memory so much.

A practical problem we have with Hoard is that it lacks diagnostics. By
diagnostics I mean extensive heap reports, fragmentation information, free
space, what's the largest block I can safely allocate, give me a callback for
every malloc so I can do stats tracking, etc.

~~~
scott_s
_On the other hand, if performance was critical to you then you'd change your
code to not allocate memory so much._

This was a problem we ran into with the Streamflow project
(<http://people.cs.vt.edu/~scschnei/streamflow/>). There was no benchmark
suite for multithreaded allocators, so we had to pull from HPC benchmark
suites, but what's the first optimization many people do when improving an
application's performance? Remove dynamic memory allocation.

We were motivated to do our work because of runtimes for multithreaded
programming that need to dynamically allocate memory to represent parallel
tasks. And I think that motivation remains, but that pain wasn't felt until
recently.

~~~
onemoreact
I don't think the parent post was suggesting you completely avoid dynamic
allocation just minimize it. Because it's not a question of dynamic vs static
memory but # of allocations / second / thread. And often it's a question of
delayed optimization where you suspect what your doing needs to change, but
you avoid optimizing until you have some performance data.

~~~
scott_s
I was more talking about the difficulty of testing the performance of a
multithreaded dynamic memory allocator. Because moving dynamic memory
allocation off the critical path of an application is a common first
optimization, it was difficult to find interesting applications to use as
benchmarks.

------
scott_s
I published a paper about a lock-free allocator back in 2006 that performed
very well against Hoard and TCMalloc (<http://goog-
perftools.sourceforge.net/doc/tcmalloc.html>), you can find the paper and
source code here: <http://people.cs.vt.edu/~scschnei/streamflow/>

I think that our techniques worked very well, but TCMalloc is a better
_engineered_ allocator. That is, it's production quality code. People have
used our allocator in their research projects when they wanted a low-latency,
highly scalable allocator, but I'm not aware of anyone using it for production
- and I no longer maintain it, so that's probably for the best.

There was recently a paper in PACT last year that extended on our design.
Myself and my advisor would like to release a tech report with a more full
description of our algorithms and designs, but both of us are busy with our
current work.

------
dnda
Unfortunately their performance comparison does not include modern memory
allocators like tcmalloc or jemalloc.

~~~
_delirium
It appears to have made a splash around 2000-2003 or so, which is when most of
the publications and testimonials/comparisons are from. Looks like it's been
mostly in maintenance mode since then, with relatively minor performance and
compatibility releases around once per year, which may be why they haven't
done a re-assessment against more recent competitors.

------
willvarfar
Hoard is very nice and this kind of programming really interests me.

An alternative is tcmalloc by Google:

<http://goog-perftools.sourceforge.net/doc/tcmalloc.html>

------
maqr
I see the license is both GPLv2 _and_ proprietary for commercial use. I was
unaware that licensing could be done that way. Is that a valid use of GPL?

~~~
pgeorgi
If all copyright holders agree, then yes (see mysql).

The main issue with that is that you have some problems with accepting
external contributions - you need a copyright assignment process, which is
difficult to do internationally and a hurdle for contributors to overcome.

------
simonster
Jason Evans's benchmarks ([http://www.facebook.com/notes/facebook-
engineering/scalable-...](http://www.facebook.com/notes/facebook-
engineering/scalable-memory-allocation-using-jemalloc/480222803919), 3/4 of
the way down the page) don't look good for Hoard, but they may be a little
biased.

~~~
scott_s
That's similar to what I found. (See my paper referenced elsewhere in the
thread.)

