
TCMalloc and MySQL - craigkerstiens
https://github.com/blog/1422-tcmalloc-and-mysql
======
jeffdavis
Warning: tcmalloc does not release memory back to the OS, _ever_ :

"TCMalloc currently does not return any memory to the system."[1]

That means if you have many long-running processes, then each of them will
consume the maximum amount of memory that it ever has. Not good for a multi-
tenant setup.

If it's a dedicated server running one multi-threaded application, _maybe_
that's OK, although I'd be a little bit wary anyway.

I should note that, even if the application doesn't let the memory go, the OS
could page out the inactive regions. Not really something that I would like to
rely on, though. There are some other caveats also, like it would make memory
accounting a little trickier ("Wow, that process is huge! Oh, never mind, it's
mostly paged out.").

For what it's worth, I just spent considerable effort to get rid of tcmalloc
due (in part) to problems like this. [2]

[1] <http://goog-perftools.sourceforge.net/doc/tcmalloc.html>

[2] You wouldn't think it would be a lot of effort, but we were using dynamic
libraries that were linking against tcmalloc, which is outright dangerous if
the main executable isn't linked against tcmalloc (you don't want to replace
the allocator in a running executable). And some of those libraries were
actually using the tcmalloc-specific features/symbols, so I had to get away
from that first.

~~~
sghemawat
About [1] Sorry about that: the document you linked to is amazingly stale.
tcmalloc has been releasing memory to the system for many years. See for
example the IncrementalScavenge routine in a version of page_heap.cc from Dec
2008:

[https://code.google.com/p/gperftools/source/browse/trunk/src...](https://code.google.com/p/gperftools/source/browse/trunk/src/page_heap.cc?r=60)

One caveat: physical memory and swap space is released, but the process's
virtual size will not decrease since tcmalloc uses madvise(MNONE) to release
memory.

About [2], code using tcmalloc-specific features/symbols is definitely a
problem. I would strongly advise against doing that and sticking to the libc
interfaces instead for the reason you pointed out.

~~~
jeffdavis
Strange, the page showed up first when I googled "tcmalloc" and the problem
was also present in the version that I was using (at least I think it was). My
apologies.

Yeah, regarding [2], that was definitely not my idea.

~~~
sghemawat
Not your fault. We just plain forgot to update the documentation, so the
freshest available document is a few years out of date.

------
antirez
Good allocators are good for different things, but what the glibc allocator is
good for is yet to be discovered: fragments like a glass fallen in the floor
and has contention issues.

~~~
scott_s
Stability. Give it credit for being - probably - the most widely used
implementation of malloc in the world.

I say this as someone who has implemented a lock-free memory allocator for
mutlithreaded applications. I cared about performance, and I was willing to
sacrifice nice things like detecting double-frees. I moved away from the
project largely because I didn't want to be in a performance race with
TCMalloc. (At the end, TCMalloc outperformed my allocator in some benchmarks,
but not in others. But, surprisingly, there were also some places were glibc
outperformed _both_.)

~~~
malkia
It could be that the MSVCRT implementation is the most used one (actually
maybe the one in HeapAlloc and so). How would one know for sure :)

It's probably also used in all Xbox-es too...

------
HarrisonFisk
jemalloc is the new hotness for MySQL. We are using it at Facebook (and I know
percona/oracle use it for benchmarks and testing as well).

Good benchmark showing the impact of the different options:

[http://www.mysqlperformanceblog.com/2012/07/05/impact-of-
mem...](http://www.mysqlperformanceblog.com/2012/07/05/impact-of-memory-
allocators-on-mysql-performance/)

~~~
cpeterso
Firefox uses jemalloc, too.

~~~
ihsw
Redis as well.

------
ck2
Apparently once you start having more threads than cores, tcmalloc really
shines:

<http://i.imgur.com/4RzmQD6.png>

Looks like those on centos can install it easily via

    
    
       yum install gperftools-libs --enablerepo=epel 
    

which installs

    
    
      /usr/lib64/libtcmalloc.so.4
      /usr/lib64/libtcmalloc_minimal.so.4
    

then you just need to edit your mysql init script?

    
    
      test -e /usr/lib64/libtcmalloc_minimal.so.4 && export LD_PRELOAD="/usr/lib64/libtcmalloc_minimal.so.4"
    
    

You can also try jemalloc which supposedly is close to as good as tcmalloc but
uses less memory

    
    
       yum install jemalloc  --enablerepo=epel 
    

which installs

    
    
      /usr/lib64/libjemalloc.so.1
    

and for your init.d

    
    
      test -e /usr/lib64/libjemalloc.so.1 && export LD_PRELOAD="/usr/lib64/libjemalloc.so.1"

~~~
ck2
oh and apparently mysql 5.5 users (not 5.1) can just directly use in my.cnf

    
    
      [mysqld_safe]
      malloc-lib=/usr/lib64/libtcmalloc_minimal.so.4
    

or

    
    
      malloc-lib=/usr/lib64/libjemalloc.so.1
    

[http://dev.mysql.com/doc/refman/5.5/en//mysqld-
safe.html#opt...](http://dev.mysql.com/doc/refman/5.5/en//mysqld-
safe.html#option_mysqld_safe_malloc-lib)

no export or script editing required

~~~
stock_toaster
This is what I do at $dayjob.

We have been using tcmalloc for a while on our databases, as well as disabling
the transparent huge pages and transparent huge page defrag (centos6). It made
a big difference for us.

~~~
ck2
Can I ask you a dumb question: I think I just turned it on properly but I have
no idea how to proactively confirm that mysql is actually using jemalloc,
rather than just wait for better performance numbers?

Because it's an external environment variable, it doesn't actually show inside
any of mysql's settings. No startup errors or runtime problems is always nice
but I really am curious to know for a fact it worked.

Will probably have to ask this on stackexchange if you don't know.

~~~
stock_toaster
With tcmalloc I get a few messages in the log about a large allocation on
startup, but you can probably find it with this.

    
    
        # as root or sudo
        pmap -x $(pidof mysqld)|grep malloc

------
josephscott
jemalloc is another strong option -

[http://www.mysqlperformanceblog.com/2012/07/05/impact-of-
mem...](http://www.mysqlperformanceblog.com/2012/07/05/impact-of-memory-
allocators-on-mysql-performance/)

[http://www.quora.com/Is-tcmalloc-stable-enough-for-
productio...](http://www.quora.com/Is-tcmalloc-stable-enough-for-production-
use)

------
ComputerGuru
nedmalloc [0] is my absolute favorite and pretty much owns everything else in
terms of performance (esp. multi-threaded memory allocations), though I would
not use it at the scale Facebook and GitHub are running on. It has subtle bugs
that creep in and get fixed down the road. jemalloc and tcmalloc are very
heavily tested and vetted, though and are great options. Basically, anything
other than the default allocator on Windows/Mac/Linux is fine :)

The author of nedmalloc is working on a very exciting C++ API (actually, I
think it's API-complete now) to make it a drop-in STL allocator. I personally
use the C API in my C++ applications without a problem, mainly as a pool
allocator. For me, the Windows allocators (both the old default and the new
"low-fragmentation" default) are absolutely abysmal at deallocation. Pool
allocators in general make that go away.

0: <http://www.nedprod.com/programs/portable/nedmalloc/>

------
xal
Shopify runs TCMalloc for mysql as well.

------
telemachos
I don't work with MySQL or Rails, but I read this all the way through, mostly
because the story was well told.

Strikes me as a perfect example of a culture that works hard and enjoys the
hell out of it too.

------
SEJeff
They found a proverbial "silver bullet" in performance land. This almost never
happens, but props to them for finding it. Now time to try this out!

~~~
kev009
It's fairly common to see double digit percent changes when swapping out lower
level component implementations or version (compiler, JVM, OS, kernel, etc).
That can be a good thing, in the case here where they found a win, or an awful
thing.

------
minimax
This is interesting and from a black box view of MySQL, this is a good
solution. For the MySQL developers, it seems like an opportunity for
improvement. When you get bottlenecked on malloc() it usually means you are
frequently allocating many small objects. To me this sounds like a good
opportunity to use a memory pool allocator (or find a way in the code to do
fewer allocations).

------
malkia
I've had mixed feelings about tcmalloc on Windows - that was 4-5 years ago, so
things might be better. It was doing some hooking, looking at places to
replace standard malloc/free/etc. throughout the whole address space, and on
new dll's coming. Other than that, except when it was crashing for no reason
(on some Windows 2003 servers for example), it was pretty good.

------
thrownaway2424
Try, as well, setting the value of tcmalloc.max_total_thread_cache_bytes to
something larger than 16MB (the default). Reasonable values might range all
the way to 1GB or more. Best to experiment and get data.

