My point is that this isn't how performance work is done. You have to first diagnose that the issue is CPU-bound before it being memory bound can enter the picture. Time spent waiting for memory is accounted the same as any other CPU work, so it goes under that metric.
To make an analogy, this would be like adding a metric for function calls into Activity Monitor and using it to diagnose quadratic performance. You can't just take that number and immediately figure out the problem; you need to go look at the code and see what it's doing first and then go "oh ok this number is too high". The same applies to waiting for memory. What are you going to do with a number that says the program is spending 30% of its time stalled on loads? Is that too high? A good number? You need to analyze it in more detail elsewhere first.
You’re really just making a case for firing up a profiler more often. That’s fine, I do that a lot. But what you’re looking for has no meaning outside of that context.
I want to be able to do the same for memory bound performance problems.
But the top level tools are stuck in the land of decades ago when CPUs were the bottleneck.