> The standard library allocator keeps its state in global variables. This makes for a simple interface, but comes with significant performance and complexity costs
If you assume there’s almost no crosstalk, you can likely get away with a thread-id and an optimistic mutex per pool: the mutex will only contend if one thread tries to (de)alloc while an other thread has a cross-thread free running.
If you assume there will be more contention than that, then scaling up complexity might be a good idea e.g. lock per size class / arena, or push cross-thread frees to a queue and have the owning thread check the queue once in a while. You can then make that queue lock-free or even wait-free if that is better for your workload / issues.
Looks like it's 10.7 kLOC whereas mimalloc is 8.8 kLOC, so it's not much more complex at that...amount of engineering, I guess. Also spotted that the mimalloc README says
> Free-ing from another thread can now be a single CAS without needing sophisticated coordination between threads
which may not be the worst thing in the world; depends on what your app does.
Thread local allocation buffers?