Hacker News new | past | comments | ask | show | jobs | submit login

Is there any way to analyze memory bandwidth usage?

I've had a couple of programs over the years that I wondered if we were just hitting memory bandwidth limitations but I couldn't find any way to prove that, or even particularly gather evidence.




It will show up like CPU usage - the hyperthread that is waiting for memory will appear busy executing.

perf can record hardware counter events - use "perf list" to see the list - including those that count frontend and backend stalls (stalls in the instruction decoding and execution stages respectively). A high stall ratio / low instructions per cycle throughput is an indicator that you're running into memory bandwidth limitations.


You've also reminded me that while they do show up as CPU time, they wouldn't be 'user' time (for example in the output of the command 'time')


If it's user-mode loads and stores, which for applications is pretty common, it is user time. Easy to test.


Thanks for that, I'll take another look at it and see what I've been missing then.


Yes, PMCs (CPU performance monitoring counters; also known by many other terms, such as PMU counters, PICs, CPCs, etc). In the past I've written tools that print usage of memory busses (really, CPU interconnect ports) via the PMCs.

They are also a PITA to work with.

As another person said, it shows up as CPU utilization. So I check that along with IPC (instructions per cycle), and if IPC is low (what "low" is depends, but say, < 1.0), then that's a good clue you're blocked on memory.

... but of course, I want actual throughput (usage), bandwidth (maximum), and utilization (ratio), which is more digging with the PMCs.


I've identified a memory bandwidth issue in the past by keeping an eye on truss/strace output and "counting" mem operations.

After that, I compiled and ran a tiny executable called Stream[1] and got the numbers I needed in order to explain why one machine was twice as slow as another.

[1] http://www.cs.virginia.edu/stream/


Using truss/strace to figure out memory bandwidth issues sounds pretty unreliable. I would not have guessed there was much correlation between memory syscalls that truss/strace can observe (mmap/munmap, brk), and the CPU load/stores that consume memory bandwidth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: