Hey, thanks for the response. Is it just a matter of measuring the LLC miss rate and then figuring out the max DRAM bandwidth somehow? What about in a multicore setting? NUMA? It would be nice to have a library or tool that works this out - always surprises me there isn't something off the shelf.
You might be interested to use Intel VTune then if you have an Intel CPU. I believe it has a profiling option that shows memory bandwidth over time [1].