I find that one of the most important features of a malloc library to debug memory usage.
glibc has these functions (like malloc_info()) -- they are very bugged in that they return wrong results, but after patching them to be correct, they are super useful.
https://sourceware.org/bugzilla/show_bug.cgi?id=24026 - "malloc_info() returns wrong numbers"
https://sourceware.org/bugzilla/show_bug.cgi?id=21556 - "malloc_stats printing size_t fields as unsigned int"
If you're interested in this topic:
After finding one bug after the other due to 32-bit integer overflow in malloc.c, I just searched for "unsigned int" in that file for fun, and 30 seconds later found what I consider a security vulnerability in realloc():
In certain situations, if you realloc() e.g. 32G + 5 bytes (reallocs this large can happen in large programs e.g. for data analysis), it'll copy only 5 bytes, and leave the rest as memory garbage.
That experience taught me that open source code being old doesn't mean anybody ever read it.
It describes only the kernel-userspace parts of memory usage. For example, malloc fragmentation is not inspectable in it.
(If you enable the mallopt() setting that turns each malloc() into a separate mmap(), then it tells you a lot, but that is not a default setting.)
At least the same architecture of allocated chunks management.
As always the thing to do is build and run your own workload and see the results.
Even something as simple as a bunch of threads doing malloc-free in a loop will drop performance of a lot of allocators to the floor, due to some sort of central locking or excessive cache thrashing. This is typically solved by adding per-thread block pools, free lists or some such.
If you go further down the rabbit hole, there's a case when blocks are allocated in one thread and freed in another, your very typical producer-consumer setup. This too further complicates things with the pool/freelist setup and requires periodic rebalancing of freelists and pools.
So once all this is accommodated, a well-tuned allocator inevitably converges to a model with central slabs/pools/freelists and per-thread caches of the same, which are periodically flushed into the former. Then it all comes down to routine code optimization to make fastpaths fast, through lock-free data structures, some clever tricks and what not.
In other words, it's always nice to read through someone's allocator code, but in the end this is a very well-explored area and there's basically a single stable point once all common scenarios are considered.
- it looks like it avoids atomics on common cases of malloc and free. That’s a big deal and not all malloc a accomplish that.
- it looks like it has cleverness specifically for the case that one thread frees an object into another thread’s heap. It seems like this case was given some special consideration - in particular avoiding the need for the thread that owns the heap to stop or synchronize at the moment that happens even though that thread is non-atomically allocating and freeing in that same heap.
So, it’s always easy to sound smart by saying that a field is well explored - but it looks like this thing actually has some cool new ideas in it.
Many mallocs implementations try to avoid one cache per thread because it can waste a huge amount of memory in applications with thousands of threads opting for per cpu pools or similar solutions. These solutions help with contention but still require atomics (unless using something like restartable sequences).
It is a trade-off, but for applications that use one thread per cpu, completely private free lists can branch win of course.
Obviously this would introduce additional creative ways in which to mess up memory management, but at least theoretically it seems like it ought to be doable. When I tried to search for information on the topic just now, I found only a few items about using multiple allocators in C++ that involved the STL and templates, but nothing about low level code or best practices.
Also, you want to make sure that the mallocs can share free pages with each other directly. Otherwise you will likely het higher peak memory overhead or worse perf or both.
"The larson server workload allocates and frees objects between many threads. Larson and Krishnan  observe this behavior (which they call bleeding) in actual server applications, and the benchmark simulates this. Here, mimalloc is more than 2.5× faster than tcmalloc and jemalloc due to the object migration between different threads. This is a difficult benchmark for other allocators too where mimalloc is still 48% faster than the next fastest (snmalloc)."
 I suppose there's a digression to be had here about allocation being trivial and RAII-style deallocation not really being deterministic, but...
I remember DrDobbs and The C/C++ Users Journal having ads for companies whose sole product was a memory allocator for different kinds of deployment scenarios.
It wasn't until well into the 2000s until the standard allocators started to become good enough to make the 3rd party libraries less essential. Many of them you can still buy today and they probably still help with certain types of software.
>We present mimalloc, a memory allocator
that effectively balances these demands, shows significant performance
advantages over existing allocators, and is tailored to support languages
that rely on the memory allocator as a backend for reference counting.
And the paper, which I’ll be presenting tomorrow at ISMM: https://github.com/microsoft/snmalloc/blob/master/snmalloc.p...
This has some quite clever and very carefully considered details around concurrency.
> in the end this is a very well-explored area and there's basically a single stable point once all common scenarios are considered.
I think it's absurd to call allocation a solved problem where there's one known optimal design.
With a lot of interesting new generic allocators that beat the competition hands down popping up in the last 10 years, I'd dare to claim the area is not very well explored. It has indeed been explored a lot but it seems that there's still low-hanging fruit. Every few years someone writes an allocator that makes significant improvements over the previous generation implementations. I love to see such action on "solved problems".
Is there a production-ready allocator that's optimized for single-threaded use?
Edit: They do mention they're all from AMD's EPYC chip, which is a little idiosyncratic. Speculation: perhaps page locality is more important on this architecture.
Looks roughly similar: https://github.com/daanx/mimalloc-bench
The problems I encounter with allocator and heap manager are almost never solved by these types of frameworks. These problems include:
1. Improper usage of the memory returned that contradict implementation.
2. Pool allocators that don't have separation between individual blocks (performance reasons).
3. Specifying the lifetime of the memory to a thread or until specific events happen.
4. Difficult to diagnose corruption, with any tool available.
Here's a specific scenario I deal with very often:
There are N persistent worker threads. These worker threads have their own pool of memory, and prior to getting work we know this pool is clean. After the work is finished and before more work is recieved the memory is cleaned. Any excess requested memory is returned to the global-pool, and any memory that is "unmanaged" is dealt with properly.
This means that people can do whatever heap management call you use (void * obtainMemory(size_t);) in the scope of business logic without having to worry about infrastructure concerns.
Having a faster malloc/calloc doesn't benefit me as much as making the usage of memory easier, and the understanding of what happens easier.
Concurrent algorithms (to which multi-thread capable allocators belong) typically necessitate design and performance compromises which aren't nullified by creating only a single thread.
> I'd suggest you actually prove that the memory barriers are actually a problem for you via profiling, since it's somewhat unlikely it is.
civility's reply to your post is somewhat rude, but they're right. It's rather arrogant to assume that the poster doesn't know what they're doing, without knowing anything about their specific problem. Modern allocators are complex pieces of software, typically with a lot of knobs and dials, and it is entirely plausible that an allocator tuned for single-threaded programs is more performant for a specific use case than a generic multi-thread allocator.
> It's rather arrogant to assume that the poster doesn't know what they're doing
Given the question they asked I don't believe it was arrogant at all to assume they don't really know what they were doing. Their response seems to justify the push to start from the basics as well.
Aggressive comments aren't worse, they're just easier for you to recognize.
We care as much as you do about snarky and condescending comments. I personally share your feeling of being even more averse to those than to the cruder abuses. But for a bunch of reasons, they're harder to moderate. Here's one: people's interpretations of what counts as snark or condescension vary widely. There is little community consensus around this, and moderation can never get too far ahead of the community view. We have to choose our battles wisely if we don't want to spark backlashes, protests, and off-topic distractions that render the cure worse than the disease.
I can't moderate based on my personal views. That's the reason why your advice wouldn't work. We couldn't moderate based on your personal views either. It isn't laziness—it's that you can't impose an individual interpretation set on the community as a whole. People assume we do, but that's only because they haven't learned about moderation the hard way. Moderation is extremely different from extrapolating one's likes and dislikes into site rules and then using power to enforce them.
Does that mean doing nothing about snarky and condescending comments? Hardly, and if you read my comment history (not that I recommend it), you'll find plenty of examples of asking people not to do that. But they're ad hoc and I try to be careful not to demand too big a leap from the reader.
The plan is, over time, to raise the bar for comments so that gradually the snark and shallow dismissals stand out more prominently as abusive, the way that "you're an asshole" comments do now. Then it will be possible for moderators to do more about them. But this needs to happen slowly— frog-boilingly slowly. If we push too hard, the community balks and moderation loses power. We can only do what the community will support. We can lead—but only a bit at a time. Even if https://news.ycombinator.com/item?id=20252539 was trite and unhelpful, I guarantee you that the hivemind is not yet ready to support moderator intervention at that level. It needs to get more refined before that is possible. That's the long-term hope, but it will take years if not decades to get there.
So where are we now? I'll go back to reading the headlines instead of the comments because I'm not clever enough to keep people like that from getting under my skin, and he'll go on to piss off someone else the next chance that comes up.
Anyways, I regret snapping at you. Take care.
It's really a comment-by-comment thing rather than a people thing. If you're seeing cases where we're failing to moderate a post that cries out for it, we'd appreciate links, because we don't come close to seeing everything here. Or of course you can flag the comment (described at https://news.ycombinator.com/newsfaq.html). In egregious cases, emailing firstname.lastname@example.org is best because then we're guaranteed to see it sooner.
> His goal was to antagonize me while flying under the radar
Really, are you sure? Can you point to the comment that demonstrates this? Because either I missed something obvious or you're reading perhaps a bit too much into what was posted. Intent is notoriously difficult to read accurately in these posts, as I'm sure you know.
Pretty sure, and user adwn commented on it too. It's a pretty common pattern to not answer the question and then indicate it was a dumb question in the first place. Then he doubled down on the acceptable insults in the follow-on response.
I'm sure I'm oversensitive to crap like this, but I don't think I'm wrong to notice it.
I wonder if HN would ever add a feature to block/hide users others don't want to see, something per reader. I don't see a browser addon for it, and while I'm sure it could be done purely client-side, you might find stats about who is being blocked and how often interesting if it was done on the server. Heh, I'm sure I would end up on a few people's list, but it doesn't seem likely that we're all just going to get along anytime soon.
Anyways, thank you again for the polite reply.
I feel like we probably shouldn't add a block/hide/killfile feature because it would be a step back into the siloed style of forum (https://hn.algolia.com/?query=by:dang%20siloed&sort=byDate&d...). That would feel easier in the short term but would perhaps be a retreat from the hard problem of building a community that actually works. The thing about that task is that it's always deeply unsatisfying and frustrating. One can only fall short. Yet it seems like the right task to be working on. Perhaps in the end, as Bob Dylan put it, we win the war after losing every battle. Or perhaps it just falls back into the swamp. The ambition of HN has always been to maybe stave off doom a while longer (https://hn.algolia.com/?query=by:dang%20stave&sort=byDate&da...), or as pg put it years ago, "make a conscious effort to resist decline" (https://news.ycombinator.com/newswelcome.html).
Not as rude as I wanted to be. I expect answers like that from Stack Overflow or Reddit - just another condescending ego play without actually addressing the question.
Regardless, thank you for your reply.
I don't suppose you know a good single-threaded (or share-nothing across threads) allocator worth looking at?