
Measuring the memory-level parallelism of a system using a small C++ program? - ingve
https://lemire.me/blog/2018/11/05/measuring-the-memory-level-parallelism-of-a-system-using-a-small-c-program/
======
fulafel
Very interesting benchmark. He mentions that multicore systems would get
higher aggregate throughput. Is this true? It would be interesting to see
parallel results from a multithreaded run.

Anyone have results for a Ryzen system?

~~~
BeeOnRope
I would expect this to scale well with at least the first few additional
cores: the total memory bandwidth used in this test is probably 10 GB/s or
less, whereas system memory bandwidth depends on the number of memory channels
and memory speed but ranges between 30 and more than 100 GB/s for most
configurations, so there is a fair amount of unused capacity at the memory
subsystem.

With enough additional cores though you'll hit some kind of shared limit,
however. How many depends strongly on the details on your system, and this
varies by nearly a factor of 10 between some server and client systems.

