Hacker News new | past | comments | ask | show | jobs | submit login

From what I can tell the number of unique elements is pretty small. This would mean the hash map will sit in cache. Parsing is likely the bottleneck; I dont see any use of SIMD in the linked code.



The key can be up to 100 bytes. The key in the hashmap is in cache. The key being compared is from main memory. Basically the whole file in memory is being gone through and compared. With 1 billion rows it’s about 16GB data in memory for a 16-byte average row. That’s approaching the memory bandwidth limit for a second, making it a memory bound problem.


This really depends on number of unique keys. If they sit in L1-L2: the bandwidth of these caches is an order of magnitude greater than main memory.

This problem is quite a Nerd Snipe so it a good thing that I dont have a computer with more than 16 GB or I might end up trying it myself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: