> The standard library allocator keeps its state in global variables. This makes...

MereInterest · on Dec 18, 2023

Unfortunately, because a buffer may be allocated in one thread but freed in another, you can't have purely thread-local state for the allocator.

hayley-patton · on Dec 18, 2023

snmalloc batches up cross-thread frees, though generally any thread-local stuff is better than none.

kragen · on Dec 18, 2023

how simple can that be and still get reasonable scaling?

masklinn · on Dec 18, 2023

Probably depends on your work load.

If you assume there’s almost no crosstalk, you can likely get away with a thread-id and an optimistic mutex per pool: the mutex will only contend if one thread tries to (de)alloc while an other thread has a cross-thread free running.

If you assume there will be more contention than that, then scaling up complexity might be a good idea e.g. lock per size class / arena, or push cross-thread frees to a queue and have the owning thread check the queue once in a while. You can then make that queue lock-free or even wait-free if that is better for your workload / issues.

hayley-patton · on Dec 18, 2023

Looks like it's 10.7 kLOC whereas mimalloc is 8.8 kLOC, so it's not much more complex at that...amount of engineering, I guess. Also spotted that the mimalloc README says

> Free-ing from another thread can now be a single CAS without needing sophisticated coordination between threads

which may not be the worst thing in the world; depends on what your app does.

kragen · on Dec 18, 2023

musl's malloc is 0.6 kloc and right now it has terrible multicore scaling, and i'm wondering how simply that can be fixed

afaik neither c nor posix provides portable cas, so while that's probably the best performing solution, it's probably not suitable for musl

discussion in https://news.ycombinator.com/item?id=38619599

hayley-patton · on Dec 18, 2023

C11 has atomics and CAS, is C11 okay?

kragen · on Dec 19, 2023

hm! interesting!