In [mimalloc](https://github.com/microsoft/mimalloc) we use a similar strategy to protect the free list in secure mode. However, there are some weaknesses associated with using a plain _xor_ -- if the attacker can guess `P` that reveals the key immediately (`P^key^P == key`) or even if there is a read overflow, and the attacker can read multiple encodings, they can xor with each other, say `(P1^key)^(P2^key)` and then we have`(P1^P2)` which may reveal information about the pointers (like alignment).
What we use in _mimalloc_ instead is two keys `key1` and `key2` and encode as `((P^key2)<<<key1)+key1`. Since these operations are not associative, the above approaches do not
work so well any more even if the `P` can be guesstimated. For example, for the read case we can subtract two entries to discard the `+key1` term, but that leads to `((P1^key2)<<<key1) - ((P2^key2)<<<key1)` at best. (We include the left-rotation since xor and addition are otherwise linear
in the lowest bit).
Just some thoughts. Of course, this may be too much for the use-case. However, we found just little extra overhead for the extra operations (as most programs are dominated by memory access) so it may be of benefit.
(And got better malloc(3) performance at the same time.)
No malloc has absolute detection of all corruption and even tools like Valgrind don't catch everything.
Let's not pretend that a few pointer hacks and checks have solved a problem.
In addition to all the security problems inherent in mingling metadata directly next to user-data, the linked list gives O(Nalloc) performance on free(3) and realloc(3) whereas my malloc had O(1) performance.
That may not seem like a lot, but when the 'O' is a page-in from disk, and Nalloc is from C++ code, it is nothing to sneeze at.
Not only did that make my malloc faster, but it would, not "could", but "would" detect several classes of malloc-usage errors, double-free etc, unconditionally.
Over the next 10-ish years, that practically eliminated entire classes of malloc-mistakes, making a lot of FOSS software safer, no matter which malloc you used it with.
Given the definition of the malloc(3) family API, there is no way you can detect all corruption without hardware support, but there are people working on that too, notably Robert Watsons CHERI project at Cambridge.
So yeah, nice pointer arithmetic, but how about people solved the real problem instead ?
(On big memory and multi-core systems use jemalloc, on small memory and single-core systems use phkmalloc.)
The only problem OP has solved is detecting overwriting of metadata inadvisably comingled with the user data.
That problem does simply not exist in phkmalloc, or jemalloc for that matter, writing past the end or before the beginning of an allocation does not automatically give you the ability to violate the internal integrity of the malloc implementation.
If you actually read the first link you pasted carefully, you will see what an amazing tour-de-force it was to exploit CVE-2003-0015, which was a run-of-the-mill double-free mistake in CVS.
As I remember it, there were proof-of-concept exploits against list-based mallocs in a matter of days, it took months before bbp's article appeared.
When I wrote phkmalloc, most of the exploitable malloc-related problems were usage problems, double frees, freeing modified pointers, writing past start or end of allocation etc.
If you had seen my presentation in New Orleans 1998, you would undoubtedly remember my handwritten slide with around 20 setuid programs phkmalloc had found issues in within the first year.
That is the primary reason why there are so few malloc(3) related exploits these days: phkmalloc and later jemalloc complained about them.
Those are mistakes which list-based mallocs do not find, because it is too expensive to walk the list all the time, and on top of that, they run slower, because the list is O(n)!
There is neither a reason, nor an excuse for using a list-based malloc in this day and age.
Neither OP nor I claim to have solved the "big" malloc problem, because, as I already explained: That takes hardware support.
Did you talk to musl team as well?
Thank you for your work!
A desktop environment and consumer-grade hardware add a lot of noise.
Ideally you'd always benchmark on completely idle dedicated servers, but those are rarely available.
If you _really_ want to avoid any contention, then build the benchmark as a Linux kernel module and call stop_machine(), which will prevent anything else including interrupt handlers from running.