
How tcmalloc Works - luu
http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html
======
bcantrill
Interesting stuff. These guys made some of the same conclusions around
management that we made when implementing per-thread caching for libumem[1]
(in particular, around summing all per-thread caches and managing that
number). It would be interesting to benchmark these two allocators; we do some
dynamic code generation that allows for cache sizes tuned without sacrificing
performance.

[1] [http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-
in-...](http://dtrace.org/blogs/rm/2012/07/16/per-thread-caching-in-libumem/)

------
EliRivers
_This blog post is incredibly long._

It's not. It's really not. I don't know whether to feel slighted that the
author assumes my attention span is so pitiful, or bemused that the author
feels the need to mention the length of the post at all. I suppose someone who
feels the need to open a post with the new hip way of saying "Summary" or
"Abstract" has already given up on his audience anyway.

~~~
personZ
Given how it is mentioned at the outset and conclusion, I almost have to think
it's stated ironically or something.

------
thrownaway2424
IMHO one of the nicest things about tcmalloc isn't the performance it's the
profiling. It samples your allocations and records the stack trace where an
object was allocated, and records that information over the life of your
process. This can be invaluable when tracking leaks or performance problems
suspected to be due to excessive new and delete.

[http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.h...](http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.html)

~~~
jpfr
Have you tried valgrind for this?

~~~
thrownaway2424
Yes and my experience is that valgrind is a tremendously slow way to do one-
off debugging of a suspected memory leak. The instrumentation in tcmalloc is
really different as it has almost no cost (depending on the size of the system
you might want to adjust the sample parameter for highly multithreaded
programs) and is running at all times so you can use it to troubleshoot in
production. When I've used valgrind the program was so slow it wasn't the kind
of thing you could put under a live workload.

~~~
jpfr
True. Valgrind slows things considerably.

In the end, what to use is a matter of workflow and convenience. Good to know
how tcmalloc makes debugging memleaks easier.

Just for for completeness, a third option would be a tracing tool that hooks
into the kernel syscalls (e.g. the lttng project). They have nearly zero
performance penalty as well.

------
suprjami
James is one of the two hosts of the Real Talk podcast, which is by far the
best podcast I've ever listened to. Such a shame they only made half a dozen
episodes. Check it out: [http://realtalk.io/](http://realtalk.io/)

~~~
orange_sharpie
I actually found their podcasts to be extremely rudimentary. I listened to
week 3, where they discuss an article on "high scalability". I feel these guys
try to hard to sound like hipsters, and are lacking any fundamental training
in computer science. Every three minutes, Joe would state "I don't know what a
unix kernel is"...or "I don't know what having a large application on a 4 core
kernal locking up is".

To be completely honest, i was so excited when I saw your link for a
"technical podcast". I thought, "hey, i finally have something with a lot of
content to listen to on my way to work!". There were expectations that weren't
met...

------
general_failure
[http://goog-perftools.sourceforge.net/doc/tcmalloc.html](http://goog-
perftools.sourceforge.net/doc/tcmalloc.html) had less of architecture insights

