Thanks for your feedback. The linked_lists can definitely be replaced, and I am working on finding a suitable data structure to maintain the individual buckets.
However, the "unordered_map" with one global mutex will not achieve what I am trying to do, i.e., allow multiple threads to simultaneously write into the same hash_map, unless two (or more) of them collide on the same bucket at the same time.
It's true that a single global mutex will not provide such locking granularity, but unordered_map does provide bucket api allowing you to find a bucket by key.
This means that for a pair p, you can find the relevant bucket at index i, lock mutex i and insert to the map. The only problem you need to deal with is rehashing which can be solved by wrapping the hash function with a modulo operation and playing around with the max_load_factor.
(I could write a short sample later, it would be a very thin wrapper.)