Unfortunately, it is really difficult to diagnose cache effects.
Do you have a good idea of what levels of cache misses are typical? Do you regression-check cache miss rates between versions?
What about the whole "debugging your program makes it slower" observer-effect? Do you know what effect your profiler has on cache?
And again, you do something that blows away cache it'll show up as cache misses at scattered random other parts of your problem. Even if you catch it can (will) be hard to track back to the actual source.
I thought it was regression which was at issue here.
I'm not convinced cache is so different from all the other aspects of performance we have to deal with in HPC systems (insofar as they're isolated), but no matter. At least there's plenty of tool support and literature.
Do you have a good idea of what levels of cache misses are typical? Do you regression-check cache miss rates between versions?
What about the whole "debugging your program makes it slower" observer-effect? Do you know what effect your profiler has on cache?
And again, you do something that blows away cache it'll show up as cache misses at scattered random other parts of your problem. Even if you catch it can (will) be hard to track back to the actual source.