From what I can tell the number of unique elements is pretty small. This would m...

ww520 · 2024-04-14T17:00:34 1713114034

The key can be up to 100 bytes. The key in the hashmap is in cache. The key being compared is from main memory. Basically the whole file in memory is being gone through and compared. With 1 billion rows it’s about 16GB data in memory for a 16-byte average row. That’s approaching the memory bandwidth limit for a second, making it a memory bound problem.

petermcneeley · 2024-04-14T19:51:41 1713124301

This really depends on number of unique keys. If they sit in L1-L2: the bandwidth of these caches is an order of magnitude greater than main memory.

This problem is quite a Nerd Snipe so it a good thing that I dont have a computer with more than 16 GB or I might end up trying it myself.