Hacker News new | past | comments | ask | show | jobs | submit login
Estimating your memory bandwidth (lemire.me)
62 points by ingve on Jan 16, 2024 | hide | past | favorite | 13 comments



I thought this would be about estimating the memory bandwidth of a human, either calculating the typical memory bandwidth, or giving some tests that would allow someone to approximate the memory bandwidth of their brain. That'd be neat.


That was my first thought when reading the headline. Probably because I'm rather self-conscious about my obvious limitations.


Generally for this type of measurement, you need to measure the bandwidth as a function of buffer size. This reveals the various L1/L2/L3 bandwidths, as well as the main memory bandwidth.

There are some really complicated effects as well, such as: TLB cache misses, huge pages, and NUMA.

For example, on typical two-socket servers, half the memory is "faster" and half the memory is "slower" because it is attached to the remote CPU.

Here's an example: https://images.anandtech.com/doci/17024/BandwidthCPU_575px.p...


The NUMA stuff is a big deal if you want to have predictable performance, in my database days we'd definitely try to run a database server in one NUMA node.


Reducing the RAM requirement to 8GB to fit this on my Macbook Air M2 (with room left for other running apps), I get the same figure (66 GB/s) regardless of the number of threads.

If I actually set it to the size of physical memory performance is _much_ lower, as OSX tries to compress the RAM it is using.


On your typical Linux host you can just run `hdparm -T $BLKDEV` (e.g. BLKDEV=/dev/sda here):

  # hdparm -T /dev/sda

  /dev/sda:
   Timing cached reads:   19110 MB in  1.99 seconds = 9582.36 MB/sec
Which quickly tells me single-threaded memory bandwidth of ~10GB/s (i7 X230 ThinkPad w/2x8GB SODIMMs(16GB))

It's often interesting to run multiple of these simultaneously and observe how much the per-core and aggregate bandwidth increases/decreases with increased parallelism.


I was curious and get vastly different numbers on my machine.

I grabbed this and compiled it with gcc https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/...

g++ -std=gnu++17 -O3 -o bandwidth bandwidth.cpp && ./bandwidth > 50.0 > ...

50 GB/s, it drops slowly to 45GB/s with 32 threads)

sudo hdparm -T /dev/nvme0n1

> Timing cached reads: 58222 MB in 1.99 seconds = 29230.67 MB/sec

It's a good lower bound though


That makes me wonder how much sys-wide kernel<->userspace and/or glibc copying performance is sitting on the table there, to be potentially reclaimed with an optimized build and/or vulnerability mitigations disabled?


Any easy way to monitor current memory bandwidth consumption using Linux perf or some other command line tool? As opposed to measuring max bandwidth.


you could infer the currently used bandwidth by inspecting the retired load/stores rate, which should be readily accessible to perf. That includes L1/L2/L3 bandwidth of course, so you might need to inspect more performance counters to distinguish them.


I'm impressed that this made it to the front page, but don't understand why.


Daniel Lemire has a good reputation in the HN due to his work on simdjson and the Eisel-Lemire floating point parsing algorithm among others.


In other words, it's a known source that posts semi-regularly about pleasingly low-level and technical things, with authority and often including assembly code in the posts. Sweet.

For me it's an almost certain upvote, i.e. I'm glad that this got posted and already looking forward to the next one.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: