From what I can tell the number of unique elements is pretty small. This would mean the hash map will sit in cache. Parsing is likely the bottleneck; I dont see any use of SIMD in the linked code.
The key can be up to 100 bytes. The key in the hashmap is in cache. The key being compared is from main memory. Basically the whole file in memory is being gone through and compared. With 1 billion rows it’s about 16GB data in memory for a 16-byte average row. That’s approaching the memory bandwidth limit for a second, making it a memory bound problem.