
The Horror in the Standard Library - aw1621107
https://www.zerotier.com/blog/2017-05-05-theleak.shtml
======
bluejekyll
OMG, as I was reading this I thought, "man, this reminds me of a bug I ran
into with std::string back in 2000", A few sentences later, and this is also
about std::string and the STL.

Mine was different though, after tracking down a memory leak that was
happening with the creation of just new empty string, I discovered in the
stdlib that there was a shared pointer to the empty string with a reference
count of how many locations were using it (ironic that this was intended to
save allocations). It turned out this was on Intel and we had what was rare at
the time, a multi-processor system. It turned out that the std::string empty
string reference count was just doing a vanilla ++, no locking, nothing,
variable not marked volatile, nothing.

A few emails with a guy in Australia, a little inline assembly to call a new
atomic increment on the counter, and the bug was fixed. That took two weeks to
track down, mostly because it didn't even cross my mind that it wasn't in my
code.

From that point on, I realized you can't trust libraries blindly, even one of
the most used and broadly adopted ones out there.

~~~
bostik
> _you can 't trust libraries blindly, even one of the most used and broadly
> adopted ones_

There is a corollary to development and debugging. When things break in
mysterious ways, we tend to go through a familiar song and dance. As
experience, skills and even personal networks grow, we can find ourselves
diving ever further in the following chain.

1\. "It must be in my code." \-- _hours of debugging_

2\. "Okay, it must be somewhere in our codebase." \-- _days of intense
debugging and code spelunking_

3\. "It HAS TO be in the third party libraries" \-- _days of issue tracker
excavations and never-before-enabled profiling runs_

4\. "It can't possibly be in stdlib..." \-- _more of the same, but now
profiling the core runtime libraries_

5\. "Please let this not be a compiler bug" \-- _you become intensely familiar
with mailing list archives_

6\. "I will not debug drivers. I will not debug drivers. I will not debug
drivers."

7\. "I don't even know anyone who could help me figure out the kernel
innards."

8\. "NOBODY understands filesystems!"

9\. "What do you mean 'firmware edge case'?"

And the final stage, the one I have witnessed only one person ever achieve:

10\. "Where is my chip lab grade oscilloscope?"

Apart from bullheadedness, this chain also highlights another trait of a good
developer. Humility.

~~~
kosma
> 10\. "Where is my chip lab grade oscilloscope?"

11\. "Shit, where do I borrow a spectrum analyzer and a set of near field
probes? These things cost an arm and a leg!"

Yes, STM32F1 MCUs generate inference that jams GPS receivers. No, it's not
documented anywhere.

~~~
makomk
You'd be surprised how much you can find out just with a $10 TV tuner dongle
and a piece of coax with a short section of the outer braid trimmed off at
once end.

~~~
ethbro
You're describing my college tv antenna. Coax is cheap. An actual digital
antenna is like... 50 ramen equivalents.

~~~
eltoozero
50 ramen = $10 USD

------
faragon
The problem is forgetting that dynamic memory usage is not "free" (as in
"gratis" or "cheap"). In fact, using std::string for long-lived server
processes doing intensive string processing (e.g. parsing, text processing,
etc.) is already known to be suicidal since forever, because of memory
fragmentation.

For high load backend processing data, you need at least a soft real-time
approach: avoid dynamic memory usage at runtime (use dynamic memory just at
process start-up or reconfig, and rely on stack allocation for small stuff,
when possible).

I wrote a C library with exactly that purpose [1], in order to work with
complex data (strings -UTF8, with many string functions for string
processing-, vectors, maps, sets, bit sets) on heap or stack memory, with
minimum memory fragmentation and suitable for soft/hard real-time
requirements.

[1] [https://github.com/faragon/libsrt](https://github.com/faragon/libsrt)

~~~
gravypod
Are you still developing this project?

~~~
faragon
Yes

------
arunc
I encountered exactly the same issue few years ago in UIDAI in one of our
large scale biometric matchers and the resolution was exactly the same. After
a week of debugging I found that the libstdc++ allocator was the culprit. I
found [1] and confirmed the same, which helped in fixing this issue.

The thing that was more interesting (or sad) was to know that the GCC
developers didn't expect the multithreaded applications to be long running.

"Operating systems will reclaim allocated memory at program termination
anyway. "

[1]
[https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator...](https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator_impl.html)

~~~
CamperBob2
_Notes about deallocation. This allocator does not explicitly release memory.
Because of this, memory debugging programs like valgrind or purify may notice
leaks: sorry about this inconvenience. Operating systems will reclaim
allocated memory at program termination anyway._

Wow. This is worth a Linus Torvalds-level rant. Whoever accepted this code
into the source tree needs to be put on GNU's version of a performance
improvement plan.

~~~
badsectoracula
This is actually a fine thing to do, i don't remember where i read it, but a
good analogy for trying to free memory at program exit is like trying to clean
the floors and walls of a building right before it is demolished.

This is also why Valgrind separates reachable and unreachable memory and only
considers unreachable memory as leaks.

~~~
dom0
A good example is something like doing a "cp -a". To preserve hardlinks you'll
need a mapping, trying to free that at exit can and will take time. This was
an actual bug in the coreutils.

~~~
CamperBob2
I have to wonder why the C standard library didn't include a Pascal-style
mark/release allocator. A naive implementation wouldn't be much faster than
free()'ing a bunch of allocations manually, but the general idea offers
possibilities for optimization that otherwise aren't available to a
conventional heap allocator.

~~~
dom0
Because (m)alloc predates mmap by a decade or so [1], and you expressly don't
want stack-like allocation semantics when using malloc (otherwise you'd just
put it on a/the stack).

[1] And you cannot have more than one heap without mmap. Without mmap, you
only have sbrk = the one heap. On UNIX and those that pretend to be, anyway.

------
alyandon
I myself ran across this same scenario many years ago with a similar amount of
hair pulling and eventually concluding that the GNU libstdc++ allocator wasn't
reusing memory properly. Unfortunately, I was never able to pare down the
application to the point that I had a reproducible test case to report
upstream.

GLIBCPP_FORCE_NEW was the solution for the near term and since I was deploying
on Solaris boxes I eventually switched to the Sun Forte C++ compiler.

It really bugs me that this problem still exists. :-/

~~~
flamedoge
It really bugs me that nobody links bug on bugtracker or filed it. I know I'm
asking a lot and being an ass.

~~~
tjalfi
I searched GCC Bugzilla and found one result[0] for GLIBCPP_FORCE_NEW.

[0]
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13823](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13823)

Edited to add the following text.

There is one result[1] for GLIBCXX_FORCE_NEW.

[1]
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65018](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65018)

~~~
hcs
Default bugzilla search doesn't include resolved, verified, and closed bugs.
That said, I only found one slightly relevant, which was resolved invalid,
saying the env var should be used:
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10183](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10183)

And that's just due to not freeing the pools, not whatever is the complex
problem here.

------
bgd11
All the technicalities aside the writing style of the author is amazing. I
would have never thought that someone can create such an intense narrative
with 'malloc' as the main character

~~~
dclowd9901
I thought the metaphor was pretty tortured, but I generally appreciate
anything that colors up dry technical writing so much that I give it a pass.

------
firethief
> Nothing made any sense until we noticed the controller microservice's memory
> consumption. A service that should be using perhaps a few hundred megabytes
> at most was using gigabytes and growing... and growing... and growing... and
> growing...

Not identifying this until many hours after symptoms were impacting users
sounds like a pretty big monitoring blind spot.

~~~
ZoFreX
Yes, although to be fair blindspots are always obvious in hindsight!

------
cyphar
Did you report the issue upstream with a patch? The solution to "the standard
library is broken" is to fix the standard library, no? It's all free software
after all.

~~~
api
It's a known issue according to what I've read. It's never been fixed. This
may be due to the fact that it's hard to trigger and reproduce.

~~~
neilc
Can you link to some of the things you've read? What exactly is the "known
issue"?

~~~
MaysonL
From the documentation (linked by arunc above)[0] _" Notes about deallocation.
This allocator does not explicitly release memory. Because of this, memory
debugging programs like valgrind or purify may notice leaks: sorry about this
inconvenience. Operating systems will reclaim allocated memory at program
termination anyway. If sidestepping this kind of noise is desired, there are
three options: use an allocator, like new_allocator that releases memory while
debugging, use GLIBCXX_FORCE_NEW to bypass the allocator's internal pools, or
use a custom pool datum that releases resources on destruction."_

[0][https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator...](https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator_impl.html)

~~~
mjw1007
That doesn't explain the «Some allocations were much larger than anything
ZeroTier should need.» part.

~~~
api
Yes it does. The secondary C++ allocator is doing its own pooling on top of
the primary allocator.

------
stephen_g
Things like this is why I was happy to see the LLVM project write their own
C++ standard library. libstdc++ has always seemed a bit hacky and fragile to
me. It's great to have an option which is a more modern, clean codebase.

Have you tested to see if this works better with LLVM libc++?

~~~
saurik
Alternatively, things like this is why it is great when everyone works
together to improve one lib rather instead of forming their efforts on two
essentially-identical ones. You act like libc++ is somehow simply _better_ and
probably doesn't have bugs :/. I have been doing C++ work now for over two
decades and let me set that record straight in your head: incredibly basic
stuff has been pretty extremely broken in libc++. One horror story that wasted
way way too much of my life is that the copy that Mac OS X 10.7 seriously
shipped with a build of libc++ where std::streambuf failed to check EOF
conditions correctly. Despite most of my projects being compiled
simultaneously by numerous versions of both gcc and clang (to target various
weird configurations), I seriously don't remember the last time I ran into a
bug in gcc and libstdc++... it was pre-2006 for sure... but I continue to run
into annoying issues with clang and libc++. The correct way to read "modern"
when applied to "codebase" is "untested". And hell: while I am totally willing
to believe there is a bug here, this post doesn't have a fix and doesn't even
seem to have led to a bug report. This is like saying "I compiled my code
using -O0 and it started working, so clearly this is a bug in the optimizer",
which we should all know is a dubious statement at best.

~~~
tomohawk
Alternatively, having a choice of more than one thing tends to cause
competition to kick in, which often results in better quality than a single
solution that everyone pretty much has to use.

------
rurban
The "make malloc faster" part was done over a decade ago with the followup
from ptmalloc2 (the official glibc malloc) to ptmalloc3. But it added one word
overhead per region, so the libc people never updated it to v3. perf.
regression. They rather broke the workarounds they added. And now they are
breaking emacs with their new malloc.

------
consultSKI
>> Most operators in C++, including its memory allocation and deletion
operators, can be overloaded.

Have I mentioned lately how much I hate C++?

Great read.

------
spyder81
"Then I remembered reading something long ago" is when experienced programmers
are worth their weight in gold.

------
brynet
Interestingly for recent versions of GCC (>=4.0) GLIBCXX_FORCE_NEW is defined
for libstdc++, not GLIBCPP_FORCE_NEW.

[https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator...](https://gcc.gnu.org/onlinedocs/libstdc++/manual/mt_allocator_impl.html)

------
charles-salvia
I'm a bit confused here.

>> Most operators in C++, including its memory allocation and deletion
operators, can be overloaded. Indeed this one was.

Okay, well, firstly - the issue here seems to be a problem with the
implementation of std::allocator, rather than anything to do with overloading
global operator new or delete. Specifically, it sounds like the blog author is
talking about one of the GNU libstdc++ extension allocators, like
"mt_allocator", which uses thread-local power-of-2 memory pools.[1] These
extension allocators are basically drop-in extension implementations of plain
std::allocator, and should only really effect the allocation behavior for the
STL containers that take Allocator template parameters.

Essentially, libstdc++ tries to provide some flexibility in terms of setting
up an allocation strategy for use with STL containers.[2] Basically, in the
actual implementation, std::allocator inherits from allocator_base, (a non-
standard GNU base class), which can be configured during compilation of
libstdc++ to alias one of the extension allocators (like the "mt_allocator"
pool allocator, which does _not_ explicitly release memory to the OS, but
rather keeps it in a user-space pool until program exit).

However, according to the GNU docs, the _default_ implementation of
std::allocator used by libstdc++ is _new_allocator_ [3] - a simple class that
the GNU libstdc++ implementation uses to wrap raw calls to global operator new
and delete (presumably with no memory pooling.) This allocator is of course
often slower than a memory pool, but obviously more predictable in terms of
releasing memory back to the OS.

Note also that "mt_allocator" will check if the environment variable
GLIBCXX_FORCE_NEW (not GLIBCPP_FORCE_NEW as the author mentions) is set, and
if it is, bypass the memory pool and directly use raw ::operator new.

So, it looks like the blog author somehow was getting mt_allocator (or some
other multi-threaded pool allocator) as the implementation used by
std::allocator, rather than plain old new_allocator. This could have happened
if libstdc++ was compiled with the --enable-libstdcxx-allocator=mt flag.

However, apart from explicitly using the mt_allocator as the Allocator
parameter with an STL container, or compiling libstdc++ to use it by default,
I'm not sure how the blog author is getting a multi-threaded pool allocator
implementation of std::allocator by default.

[1]
[https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/ma...](https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/manual/mt_allocator.html)

[2]
[https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/ma...](https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/manual/memory.html#idm140050801737120)

[3]
[https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/ma...](https://gcc.gnu.org/onlinedocs/gcc-4.9.4/libstdc++/manual/manual/memory.html#idm140050801737120)

~~~
yokaze
Call me confused, too. When I was once following up the allocator in the STL
because of some performance issues, it was (more or less) going directly to
malloc.

I've searched now the code for GLIBCXX_FORCE_NEW, and it seems it is used in
the pool_allocator and the mt_allocator [1].

String uses std::allocator [2].

I agree, the blog entry seems to missing some information to reproduce the
issue. It looks to me, that the author was jumping to a conclusion, which
confirmed his initial "insight". Not surprising, if you are under pressure and
working over the whole weekend and nights. Who hasn't been there.

What bug me, that the standard answer seems quite often, that the whole thing
is broken, and we have to switch to a complete different implementation,
and/or re-write it from scratch.

[1] [https://github.com/gcc-
mirror/gcc/search?p=1&q=GLIBCXX_FORCE...](https://github.com/gcc-
mirror/gcc/search?p=1&q=GLIBCXX_FORCE_NEW&type=&utf8=%E2%9C%93)

[2] [https://github.com/gcc-
mirror/gcc/blob/master/libstdc%2B%2B-...](https://github.com/gcc-
mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stringfwd.h#L70)

------
bboreham
Post doesn't actually say what was broken, or indeed prove the location of
broken-ness. Just that it went away with a different compile option.

Exciting writing, but lacking a point.

~~~
x4m
Yeah, the post is just accusation with very indirect and unverifiable
evidence.

I'd be nice if author created repro or actually spotted the bug.

Right now it's just a moaning. Quite probably, he's shooting into his leg
himself in one of the numerous odd ways.

------
halayli
this conclusion might be wrong. the code in question while it might not be
allocating/freeing memory it might be stumbling on memory blocks and
corrupting mem management structures. Turning the flag on might be fixing the
issue by mere luck because memory allocations, locations and structures would
be different

~~~
MichaelBurge
Valgrind would've caught those, wouldn't it? Or, maybe the C++ layer prevents
it from catching that since it's application-level, which is at least another
reason to remove it.

~~~
fnord123
The tools are good but not magic. It's always possible that they miss
something (e.g. when multithreading, the bug might not manifest since the
timings are all different when running in valgrind). But this is a red flag:

"Nothing worked. It's leaking but it's not. It's leaking but the debugger says
no memory was lost. It's leaking in ways that are dependent on irrelevant
changes to the ordering of mundane operations. This can't be happening."

This is a red flag for heap corruption - or multithreading bugs. (Stack
corruption is usually a crash and a wrong stack trace). As it's not trivially
reproducible, it's probably a multithreading bug. It's also easy to imagine
that OP wrote a scripty-input generator in C++ to run through valgrind which
gives single threaded inputs. So running under valgrind it won't be detected.
So it's never fun to solve and ever since I changed languages, my
multithreaded debugging skills have become a bit rusty. Hey-o!

But it would be good if he amended his post to reduce the vitriol aimed at
GCC. It's demonstrably false that GCC's default allocator holds a cache. It
makes OP look stupid to people in the know; and makes GCC look bad to people
not in the know. No one wins.

------
SFJulie
Memory fragmentation due to dynamic non fixed size data structure and
multithreading is an old foe. That may not be fixable in c/c++

Worker A allocates dynamic stuff. Algo take a segment (0+sof(str)(ofA) + n)
Work B Allocates to create same kind of data structure (fragment of a JSON)
[ofA, OfB] Wk A resume allocating, boundary of [0, ofA] exceeded, no free
contiguous space up or down [Ofb, OfC] allocated Wk C enters wants to alloc,
but sizeof(string) make it bigger than [0, OfA] so [ofD, ofE] asked .... and
the more concurrent workers the more interleaving of memory allocation go on
with fragmented memory.

Since malloc are costly the problem known, a complex allocator was created
with pools of slab and else, probably having one edge case, very hard to
trigger having phD proven really complex heuristic.

CPU power increase, more loads more workers, interleaving comes in, edge case
gets triggered.

And C/C++ makes fun of fortran with its fixed size data structures embracing
any new arbitrary size arbitrary depth data structure for the convenience of
skipping a costly waterfall model before delivering a feature or a change in
the data structure and avoiding bike shedding in committees.

Human want to work in a way that is more agile than what computers are under
the hood.

Alternative:

Always allocated fixed size memory range for data handling, and make sure it
will be enough. When doing REST make sure you have an upper bound, use
paging/cursors, which require FSM, have all FP programmers say mutable are
bads, sysadmin say that FSM are a pain to handle when HA is required, and CFO
saying SLA will not be reached and business model is trashed, and REST fans
saying that REST is dead when stateful.

Well REST is a bad idea.

~~~
Const-me
Multi-threading ain’t particularly popular on Linux, hence issues like this
one in STL.

But it’s fixable. In windows, they’ve implemented the solution in Windows XP
(opt-in), and in Vista they use it by default:

[https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa366750\(v=vs.85\).aspx)

------
squidlogic
Amazing write-up. Informative and gripping in its prose.

~~~
saurik
This article was an extremely long rant about "something has a if but
templates are hard so I have no clue what it is". The author figured out a
workaround to the issue, but we still don't know what the bug was, and the
strongly worded conclusion that it is a bug in libstdc++ isn't even defended
well (as this is similar to concluding "there is a bug in the compiler's code
optimizer" from "I compiled my code with -O0 and it started working")... I
can't really see calling this "informative" :(.

~~~
rhaps0dy
I'm curious, what is usually the cause of code starting to work when it is
compiled with -O0?

~~~
bostik
I don't have an answer to that, but I have seen - and worked with - several
codebases which work fine when compiled with -g, but will crash (in good
cases) or behave irrationally (in bad cases) without.

The crashing ones at least are easy. Somewhere a list or variable-argument
array is missing the NULL terminator...

~~~
tonyarkles
This was on Windows with VC++, but same deal. The code that some cow-orkers
had written was copying strings like "LAX" and "ORD" into `char airport[3]`
using strcpy(). In debug builds, VC was allocating a whole 32-bit word, but in
release builds it was packing everything on the stack. Write to it, and the
terminating null ends up overwriting a byte of the next variable on the stack.
Urgh. Of course, these were several hundred line functions, so the strcpy and
the subsequent use of the trashed variable were a long ways away.

------
TimJYoung
I'm not sure if the other debug tools mentioned offer this, but AQTime Pro:

[https://smartbear.com/product/aqtime-
pro/overview/](https://smartbear.com/product/aqtime-pro/overview/)

has an allocation profiler that can be used to track down this sort of
problem. You can take allocation snapshots while the application is running to
see where the allocations are coming from (provided that you can run AQTime
Pro against a binary with debug symbols/info).

I'm not affiliated with the company - just a happy customer that has used them
for years with Delphi development.

~~~
odbol_
Delphi... Now that's a name I haven't heard in a long time

~~~
TimJYoung
It's still kicking. ;-)

------
tripzilch
Upvoted for the Lovecraft and pulp horror lit references, and starting with
"It was a dark and stormy night ..." :-)

Great writing, great read.

------
jcalvinowens
I don't understand the point of this article... if you think there's a bug in
the library, fix it. Don't write a melodramatic blog post lamenting how
horrible it is in the hope somebody else will do it for you.

This isn't particle physics, it's code: we don't have to guess, we can look at
it and see how it works.

------
Pica_soO
One thing, not even mentioned in the article is the component-scouting &
profiling-phase, which completely failed. You do not go all in with a project
on a crucial library, that you did not profile with the real workload.

One small prototype, never run under full load, with mock up classes- not even
the size of future classes, mock up operations(not even close to the real
workload) and sometimes not even the final db-type attached. Yeah, hard to see
the future, but why not drive the test-setup to the limits and go from their?

Instead the whole frankenservice is run once for ten minutes and declared by
all involved worthy to bare the load of the project.

Here is to lousy component scouting and then blaming it on the architect.

------
bogomipz
This was a nice write up, however I didn't follow how memory fragmentation was
related to a memory leak. Can someone explain? I understand that alternate
memory allocators would help with the fragmentation issue but how does the
choice of allocators affect memory leakage?

~~~
Joeri
When you have a rapid sequence of large and small allocations, your memory
starts looking like swiss cheese. To fit a large allocation you need a
contiguous block of free memory, but all the available blocks are too small,
so it gets placed at the end where the free ram starts. Then another
allocation is put right after it, the big chunk gets freed, and a small
allocation is put in its place. Now instead of one big block of free memory
you have a slightly smaller block, which is slightly too small to fit a large
allocation, which has to be put at the end, and...

I ran into this issue once debugging a performance issue with a CAD file
parser. It made a lot of copies of large and small chunks, and some CAD files
would cause catastrophic fragmentation. Switching out the allocator for a
smarter one fixed the problem. A few versions of delphi later they put that
allocator in by default, so that now it is not prone to fragmentation anymore.

------
grandinj
I'm guessing the cpp thing is a holdover from the days when the glibc
maintainer was less than entirely helpful. There has been actual improvements
in glibc in this area lately so hopefully these kinds of hacks will slowly go
away.

~~~
jwakely
The pooling behaviour of the libstdc++ std::allocator was only the default
from 2004 until late 2005, so it went away more than a decade ago.

------
odbol_
Is C++ now going the way of PHP, where to have an actual working program you
have to disable all the defaults in some mysterious but crucial ritual?

------
dummy323
[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80658](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80658)

------
selimthegrim
James Mickens, move over. There's a new sheriff in town.

------
jonnycomputer
the idea that we should always blame ourselves first has merit. but frankly
some bugs, just p e r s i s ttt.

like this one, with fputcsv in PHP.
[https://bugs.php.net/bug.php?id=43225](https://bugs.php.net/bug.php?id=43225)

------
Safety1stClyde
It was only yesterday that I was reading another discussion from hacker news
about problems with Gnu C library.

[https://news.ycombinator.com/item?id=14271305](https://news.ycombinator.com/item?id=14271305)

~~~
jwakely
That's a different library.

------
logicallee
People forget that C++ is just a _tool_ , like a screwdriver or a hammer. A
good carpenter knows when it's time to take a metallurgy class and resmelt his
hammer, because its composition is not correct for being a hammer.

~~~
cortesoft
I am pretty sure there has never been a carpenter who resmelted his hammer.

~~~
theoh
I'm not sure "resmelting" is even a thing. I am sure that the GP is joking.

~~~
logicallee
Guys, yes! I was joking. I was directly riffing on the common claim that C++
is "just a tool". This exact phrase, in quotes, has over a hundred thousand
hits on Google[1], many of them immediately comparing it to a hammer or a
screwdriver. The usual context is that C++ isn't really "unsafe" \-- it's just
a tool, how you use it is up to you.

In the case of the standard library being broken, it is like a tool that is
broken. Resmelting might not be a word but the idea of the hammer not being
cast correctly and needing to be re-cast is ridiculous. In essence I was
making fun of the "there's nothing wrong with C++, just like there's nothing
wrong with a hammer" common claim. I was taking it to an extreme. (I thought
it was funny.)

[1]
[https://www.google.com/search?q=%22C%2B%2B+is+just+a+tool%22](https://www.google.com/search?q=%22C%2B%2B+is+just+a+tool%22)

~~~
cortesoft
I have no idea why I thought you were serious. Poe's law?

------
Ono-Sendai
So where's the bug report with repro test code?

------
tbodt
Maybe it'll get fixed now that a post saying "libc++ is broken" got
hackernewsed

~~~
StephanTLavavej
libstdc++ and libc++ are different libraries; this post is talking about
libstdc++. They're like elephants and elephant seals.

~~~
mcguire
" _They 're like elephants and elephant seals._"

Yes, yes they are.

------
ris
The correct response is to file a bug report, not write a clickbait-y article.

------
mtanski
Yeah malloc() is pretty terrible in glibc by modern standards. For some
workloads it just can't keep up and ends up fragmenting space in such a way
that memory can't be returned to the OS (and thus be used for the page cache)
and you end up in this performance spiral.

I always deploy C++ server on jemalloc. Been doing it for years and while
there's been occasional hicks up when updating it has provided much more
predictable performance.

~~~
kartD
Actually from my understanding, it's libstdc++'s allocator that is causing the
issue, not malloc.

~~~
mtanski
A big reason the small object optimization exists in libstdc++ containers is
because system malloc() is not fast enough.

We're not talking about another optimization (small object / locality) as his
issue was caused by libstdc++ alloc pools which would not need to exist in the
first place if system malloc was better. So libstdc++ reinvents end up
reinventing the wheel poorly.

As the author mentioned, when he disabled the optimization behavior
GLIBCPP_FORCE_NEW he ended up burning more CPU via system (glibc) malloc().
Once he added jemalloc on top of GLIBCPP_FORCE_NEW, this pretty much evened
out with previous behavior runtime performance.

The conclusion towards the end of article: > The right answer to "malloc is
slow" is to make it faster.

~~~
jwakely
By default libstdc++ stopped using the pooling allocator in 2005:
[https://gcc.gnu.org/r106665](https://gcc.gnu.org/r106665)

That's one year after the ancient, bitrotted, unofficial copy of the libstdc++
documentation that the blog post links to, but still ancient history.

------
nly
Actually it is C's malloc and free that is "broken". malloc() takes a size
parameter, but free() doesn't. This imbalance means it can never be maximally
efficient. Whatever GNU stdlibc++ is doing is probably, on balance, a net win
for most programs.

It's not exactly roses in C++ either of course. You can do better than the
standard library facilities. Andrei Alexandrescu gave a great, entertaining,
and technically elegant talk on memory allocation in C and C++ at Cppcon 2015
that is well worth watching

[https://www.youtube.com/watch?v=LIb3L4vKZ7U](https://www.youtube.com/watch?v=LIb3L4vKZ7U)

~~~
charles-salvia
The C++ delete[] operator doesn't take a size parameter either. This is
neither here nor there, and unrelated to the problem the blog post is talking
about.

~~~
nly
> The C++ delete[] operator doesn't take a size parameter either

Yes it does. See overload 6 here

[http://en.cppreference.com/w/cpp/memory/new/operator_delete](http://en.cppreference.com/w/cpp/memory/new/operator_delete)

~~~
theclaw
Note that the size parameter is ignored by the standard library
implementation. It is intended for use by user-defined implementations.

~~~
nly
> Note that the size parameter is ignored by the standard library
> implementation

Only because they all call malloc() and free() under the covers. It was
however an ABI break, and one the C++ community thought worth doing.

If you replace your malloc with jemalloc you can easily wire this up to
sdallocx(). The C++ standard library obviously can't do this out of the box
because it can't assume you are using jemalloc

