I’ve wondered why there aren’t more tools for predicting how a program fits into...

CoolGuySteve · on June 24, 2019

The best tool for this in my experience is callgrind with assembly notation. You can configure it to more or less mimic the cache layout of whatever particular chip you're running and then execute your code on it.

You can use the start and stop macros in valgrind.h to show cache behaviour of a specific chain of function calls, like when a network event happens, then in the view menu of kcachegrind select IL Fetch Misses, and show the hierarchical function view.

It doesn't mimic the exact branch prediction or whatever of your architecture but when you compare it to actual timings it's damn close.

elcritch · on June 25, 2019

Wow, that's cool!