Estimating your memory bandwidth

tikkun · on Jan 16, 2024

I thought this would be about estimating the memory bandwidth of a human, either calculating the typical memory bandwidth, or giving some tests that would allow someone to approximate the memory bandwidth of their brain. That'd be neat.

imetatroll · on Jan 16, 2024

That was my first thought when reading the headline. Probably because I'm rather self-conscious about my obvious limitations.

jiggawatts · on Jan 16, 2024

Generally for this type of measurement, you need to measure the bandwidth as a function of buffer size. This reveals the various L1/L2/L3 bandwidths, as well as the main memory bandwidth.

There are some really complicated effects as well, such as: TLB cache misses, huge pages, and NUMA.

For example, on typical two-socket servers, half the memory is "faster" and half the memory is "slower" because it is attached to the remote CPU.

Here's an example: https://images.anandtech.com/doci/17024/BandwidthCPU_575px.p...

hobs · on Jan 16, 2024

The NUMA stuff is a big deal if you want to have predictable performance, in my database days we'd definitely try to run a database server in one NUMA node.

snops · on Jan 16, 2024

Reducing the RAM requirement to 8GB to fit this on my Macbook Air M2 (with room left for other running apps), I get the same figure (66 GB/s) regardless of the number of threads.

If I actually set it to the size of physical memory performance is _much_ lower, as OSX tries to compress the RAM it is using.

pengaru · on Jan 16, 2024

On your typical Linux host you can just run `hdparm -T $BLKDEV` (e.g. BLKDEV=/dev/sda here):

  # hdparm -T /dev/sda

  /dev/sda:
   Timing cached reads:   19110 MB in  1.99 seconds = 9582.36 MB/sec

Which quickly tells me single-threaded memory bandwidth of ~10GB/s (i7 X230 ThinkPad w/2x8GB SODIMMs(16GB))

It's often interesting to run multiple of these simultaneously and observe how much the per-core and aggregate bandwidth increases/decreases with increased parallelism.

dietr1ch · on Jan 16, 2024

I was curious and get vastly different numbers on my machine.

I grabbed this and compiled it with gcc https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...

g++ -std=gnu++17 -O3 -o bandwidth bandwidth.cpp && ./bandwidth > 50.0 > ...

50 GB/s, it drops slowly to 45GB/s with 32 threads)

sudo hdparm -T /dev/nvme0n1

> Timing cached reads: 58222 MB in 1.99 seconds = 29230.67 MB/sec

It's a good lower bound though

pengaru · on Jan 16, 2024

That makes me wonder how much sys-wide kernel<->userspace and/or glibc copying performance is sitting on the table there, to be potentially reclaimed with an optimized build and/or vulnerability mitigations disabled?

anonymousDan · on Jan 16, 2024

Any easy way to monitor current memory bandwidth consumption using Linux perf or some other command line tool? As opposed to measuring max bandwidth.

gpderetta · on Jan 16, 2024

you could infer the currently used bandwidth by inspecting the retired load/stores rate, which should be readily accessible to perf. That includes L1/L2/L3 bandwidth of course, so you might need to inspect more performance counters to distinguish them.

MrYellowP · on Jan 16, 2024

I'm impressed that this made it to the front page, but don't understand why.

lifthrasiir · on Jan 16, 2024

Daniel Lemire has a good reputation in the HN due to his work on simdjson and the Eisel-Lemire floating point parsing algorithm among others.

unwind · on Jan 16, 2024

In other words, it's a known source that posts semi-regularly about pleasingly low-level and technical things, with authority and often including assembly code in the posts. Sweet.

For me it's an almost certain upvote, i.e. I'm glad that this got posted and already looking forward to the next one.