Hacker News new | past | comments | ask | show | jobs | submit login

> The standard library allocator keeps its state in global variables. This makes for a simple interface, but comes with significant performance and complexity costs

Thread local allocation buffers?




Unfortunately, because a buffer may be allocated in one thread but freed in another, you can't have purely thread-local state for the allocator.


snmalloc batches up cross-thread frees, though generally any thread-local stuff is better than none.


how simple can that be and still get reasonable scaling?


Probably depends on your work load.

If you assume there’s almost no crosstalk, you can likely get away with a thread-id and an optimistic mutex per pool: the mutex will only contend if one thread tries to (de)alloc while an other thread has a cross-thread free running.

If you assume there will be more contention than that, then scaling up complexity might be a good idea e.g. lock per size class / arena, or push cross-thread frees to a queue and have the owning thread check the queue once in a while. You can then make that queue lock-free or even wait-free if that is better for your workload / issues.


Looks like it's 10.7 kLOC whereas mimalloc is 8.8 kLOC, so it's not much more complex at that...amount of engineering, I guess. Also spotted that the mimalloc README says

> Free-ing from another thread can now be a single CAS without needing sophisticated coordination between threads

which may not be the worst thing in the world; depends on what your app does.


musl's malloc is 0.6 kloc and right now it has terrible multicore scaling, and i'm wondering how simply that can be fixed

afaik neither c nor posix provides portable cas, so while that's probably the best performing solution, it's probably not suitable for musl

discussion in https://news.ycombinator.com/item?id=38619599


C11 has atomics and CAS, is C11 okay?


hm! interesting!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: