I may be making the mistake of jumping in with a comment before reading the complete article but, on the description of the experiment setup from the start:
They want to generate a workload that’s as realistic as possible, but without spending a lot of CPU time on the actual application, in order to isolate the performance of the allocator. I’m not convinced that their simulated application is actually realistic, though. Why not just record the allocations performed by a real application, and replay them?
It’s also not clear if the application code can be entirely separated from the allocator, as they’ll battle over cache space. An allocator with lower peak performance but smaller code might work better in a real world situation, purely because it pushes less of the application code out of cache.
I built a commercial memory allocator library (for MacOS pre X/Codewarrior called HeapManager) in the 90's. I built like 20 different test apps with varying workloads that I ran after every change to ensure it worked and measure the performance. I probably spent more time on the test apps than the allocator. But it was a simpler time (single threaded) and the CPUs were a whole lot less complicated. I did learn a lot about measuring allocators though it was hard to emulate real world usage fairly. My customers seemed to find it was really fast so I guess I did a decent job. Today it would be a hell of a lot more work.
I cut my C/C++ teeth on CodeWarrior (and Think C), and vividly remember the incredibly detailed implementation comments in the CodeWarrior malloc headers. I learned a lot from that! It’s still one of my of go-to examples of great tech docs.
This may come across as pedantic: I feel like you should never enable curve smoothing in these kinds of plots. Smoothing the curves makes it look pretty but then the plots are no longer entirely truthful.
For that matter though, a linear interpolation isn't a faithful representation of the data either. Would it not be better to just display the raw data in a scatter plot?
I wouldn't say I get "more" from a scatter plot, it's just a matter of accuracy and trusting the data. Smoothed plots add slopes and transitions that just don't exist.
They want to generate a workload that’s as realistic as possible, but without spending a lot of CPU time on the actual application, in order to isolate the performance of the allocator. I’m not convinced that their simulated application is actually realistic, though. Why not just record the allocations performed by a real application, and replay them?
It’s also not clear if the application code can be entirely separated from the allocator, as they’ll battle over cache space. An allocator with lower peak performance but smaller code might work better in a real world situation, purely because it pushes less of the application code out of cache.