Hacker News new | past | comments | ask | show | jobs | submit login
A Tale of Java Hash Tables (andreinc.net)
128 points by nomemory 13 days ago | hide | past | favorite | 87 comments

> I am yet to find an Open Addressing implementation that outperforms HashMap<K,V> by a significant margin

I have a performance-critical framework that went from Java’s Map to Trove (http://trove4j.sourceforge.net/html/overview.html) and now uses FastUtil (https://fastutil.di.unimi.it/).

For my use cases, these massively outperform the built-in java Maps and Sets.

It would be interesting to include these in the benchmark. A quick googling finds plenty of comparisons that are a few years old now but I can’t find a current up to date java map/set benchmark.

That's because they used primitives, no? In that case also look at some other options: https://www.reddit.com/r/java/comments/r0b9o9/a_tale_of_java.... While I still think Koloboke in general is the fastest, I do agree that the difference between it and fastutil will not be very big.

Or was this for Object/boxed primitives, too?

If you are working with primaries then the unboxed collections are a big boon. My code often uses objects though, so I use a lot of the fastutil Object2ObjectMap etc.

There are two areas that are important in my use case:

* the performance of normal get/computeIfAbsent/forEach etc. FastUtil has for example fast iterators avoid allocating Map.Entry etc.

* the memory usage. The total footprint is important because I actually dimension VMs based on stuff. But garbage collection minimization is also really important.

Nice. Thanks. (For other viewers, all the non-standard maps I'm aware of are trying to avoid allocating those pesky Map.Entry objects, that's why they generally take up much less memory.)

I can't really speak about Java, but for C++, std::unordered_map (which uses separate chaining) gets destroyed (performance-wise) by any number of alternate hash map implementations that use open addressing. There's just no question that open addressing is inherently faster for most use cases.

To be fair, unlike std::unordered_map, the open addressing implementations don't come with a guarantee that they won't move the items in the hash table. But in practice that doesn't really matter anyway, since you could always use unordered_map<unique_ptr<T>> instead of unordered_map<T>. But, I digress..

The moment you have value types and stack passed small structs you start beating Java pretty hard in precisely the collections area.

Some areas that I explored more deeply in my Earlier Days w.r.t hash tables (Java hash tables, at that) that aren't covered by the analysis are:

* Memory footprint and GC load

* Write performance (especially characterizing fast-path vs. rebalance times)

* Loading vs. performance tradeoffs (can the table perform well with 80% occupancy? 99%?)

* Small-table utilization.

If some of the implementations consume substantially less memory and/or GC churn while having largely indistinguishable lookup performance, then they're a win for value. This is one place that open addressing can win.

For the libraries we were writing, we didn't know if consumers would be stuffing in one element (the most common case) or millions of elements (uncommon but important), so we ended up implementing a clever abstraction that would start with a pointer to the single object, then replace it with a hash table when the second element was added. Since we used hash tables like country doctors prescribe aspirin, it reduced the heap footprint of the server by ~30%. Of course, Hansen's Law states "Clever is the opposite of maintainable," and that definitely applied here.

Didn't have time to write about it, but I've used a GCProfiler as well. With a few exceptions, all the 5 implementations are comparable.

> But, as described here, the decision to use Separate Chaining vs. Open Addressing is not unanimously accepted by programming languages designers. For example, in python, ruby, and rust, the standard hash tables are implemented using Open Addressing, while Java, go, C#, C++ are all more conservatory and use Separate Chaining.

I think with regards to C++ using separate chaining is a legacy decision made in the late 1990's. I think back then, the CPU vs memory speed difference was not as big, so pointer chasing wasn't considered that big of a deal. In addition, this was the days of 32-bit systems with MB's or RAM so having separate, smaller allocations was seen as better than having one large one (see for example deque).

Nowadays, with a much bigger relative performance penalty for accessing memory as well as 64-bit systems with GB's or RAM, I think people are moving to open addressing (for example Rust, as well as Google and Facebook's hash table implementations) unless you need pointer stability.

The fact that std::unordered_map more or less mandated separate chaining is seen as something that holds the standard library back in terms of performance, hence the competing implementations from Google and Facebook.

Could someone explain to me in a clear and high level fashion what exactly pointer stability (or pointer instability for that regards) entails? I've watched some of the cppcon talks on the google hashtable implementation [0] and in that talk Matt Kulukundis also mentioned it affecting the adoption rate, but I always failed to really grasp the concept and its implications. Google searches also ended up rather fruitless to be honest.

[0]: https://youtube.com/watch?v=ncHmEUmJZf4

An operation maintains Pointer Stability means if you take a pointer to an element in a data structure and then execute that operation, the pointer is still valid and pointing to the same data.

Think about how vector would be implemented under the hood. What happens if you delete an element? Does it shift down all of the elements after it? If you do that, the objects have literally moved in memory. Any pointer that points into those memory locations is now invalid. That's a problem if you wanted to hold onto pointer to elements. But if you declare that your interface has certain pointer stability guarantees then suddenly your implementation is massively constrained. Now your vector implementation will have big gaps that you need to deal with and will waste a ton of memory if you add and remove many elements.

So, am I correct if I then put it as following?

Pointer instability means that it is undefined behaviour to:

1) store (both on stack and heap) the pointer address to a value element outside of its respective container

2) then modify the container's content

3) then thereafter use the stored pointer address

This implies that:

- containers without pointer stability require pass-by-value (or a pointer to a temp copy) to be used for its values after accessing them OR

- containers without pointer stability require absolute immutibility of the container to guarantee runtime safety OR

- containers without pointer stability require "relative" immutability during the existence of a pointer to a value element to guarantee runtime safety. Note that I think this last option is a bad idea, because it requires multithreaded blocking/synchronization of the container and is also easy to encapsulate wrongly, leading to UB.

> 2) then modify the container's content

... with a specific set of mutating operations that (may) alter the container's structure (eg, element insertions and removals).

Altering the value of one element of the container doesn't invalidate iterators or references to any element. You can make as many mutating passes over a std::vector as you like without invalidating anything.

Things only start to get dicey when you push_back(), insert(), erase(), etc.

If you allocate your objects inline in your internal table, rehashing will invalidate existing references to those objects as they will need to be copied into the new, larger, table.

Allocating every object separately in its own node as used in chaining implementations guarantee that the object location is stable (i.e doesn't change), but in addition to making iteration slower it requires separate allocation for each object.


Looking at some discussions the open-jdk team had, they decided to continue with Separate Chaining. I don't have the link, but then I got interested to see how some Open Addressing (nothing fancy) will behave in Java, so this is the reason I've written this "article".

Unfortunately I coudln't beat HashMap with the "classic" approach, but I got close. I am curious to see a better implementation that brings more engineering tricks on the table.

> Unfortunately I coudln't beat HashMap with the "classic" approach, but I got close. I am curious to see a better implementation that brings more engineering tricks on the table.

I think one of the big advantages of using open addressing is memory locality. Since it is harder to control memory layout in Java (and relies on the optimizer/JIT to optimize common patterns) I wonder if you are getting the full benefit of open addressing in Java.

In addition HashMap is standard and the Java compiler/optimzer/JIT can make certain assumptions about how it is used which may enable certain optimizations. Also, a lot of the heuristics for optimization likely used a lot of profiling over the years, and because HashMap is so widely used, it is likely that common usages of HashMap are reflected in the profile data and thus the optimization heuristics.

Exactly, in Java i couldn't even inline my functions, and had zero control over the underlying memory layout.

> i couldn't even inline my functions

You could, manually :). Either way if they're hot, they're inlined.

One common trick for open-addressing maps in Java I don't see in your implementations is to have an array of keys zipped with values (or two arrays, one for keys, one for values) instead of array of Entries. This improves locality and indirection a bit. Obviously more is needed for even more speed.

(Mandatory "in the future it will be better": Valhalla is coming in a few years, that's when we'll have much more control over the memory layout.)

I've assumed they were inline initially, until I profiled and found out one of the swap methods was not. I believe there's also a limit on the amount of bytecode to inline, so no matter how hot a method, if the associated bytecode reaches a certain threshold it won't be. I need to double check though.

Indeed, max bytecode count (FreqInlineSize and MaxInlineSize) and inlining depth (MaxInlineLevel). Your GC choice will also slightly modify the inlining decisions, and then obviously GraalVM will be completely different.

Is there a full list of the differences between the graal jit and C2? I was only aware of graal having better partial escape analysis over C2's conservative EA.

It's simply a completely different implementation. Some of the optimiization passes are the same, obviously, but overall it simply performs ... differently.

But inlining will not cause better performance in every case, will it?

Correct! Inlining obviously costs CPU, code cache space, and makes the receiver method bigger so that it's less likely it will be inlined itself. If there ever is a decompilation occuring, the inlining efforts were pretty much wasted.

However it must be pointed out that inlining enables further optimisations.

The tradeoffs are more complicated for a jit, but for an aot compiler it’s one of the most important optimisations in the pipeline.

Out of curiosity, weren't your objects ending up neatly compacted after a GC cycle?

I didn't perform any deletes, and I've tried to keep everything clean without allocating intermediary objects. But one the pitfalls is that I cannot play easily on the memory lane with Java. For example in the entries and bins implementation I had to have two separate arrays. In C I would've kept a single array, with a compact zone, and another sparse zone, iterating through it using an offset.

Granted, it’s a micro benchmark without a lot of context, but wouldn’t the conclusion that the default HashMap implementation is, on average, faster than all the open addressing implementations contradict this conclusion?

For me it was faster on average and with less variance. But the micro-benchmarks are flawed by default, so I believe it's room to improve them.

Also maybe there are some tricks that can be done, to have a faster Open Addressing implementation in Java.

PS: I've also tried Cuckoo Hashing and Hopscotch and they were also slower, but haven't managed to document my findings.

FYI, if you're the author of the article, you deserve some recognition. It's extremely well written, concise and detailed. I've sent it out to my entire team as recommended reading.

So the reason HashMap isn't as fast as it could be is because it converts buckets with linked lists to red-black trees when it reaches some threshold. The reason it does that is because in any hashing scheme, specifically crafted input can cause O(n) lookup times. It is reasonably trivial to take advantage of this to launch DOS attacks. Your web server parses HTTP headers with a hash table? Might very well be exploitable. Same thing with parsing JSON objects using hash tables. I posit that in practice there are very few scenarios in which the extra performance of an "optimized" hash table outweighs the benefits of having one that is not vulnerable to hash-flooding attacks.

> Your web server parses HTTP headers with a hash table? Might very well be exploitable.

That is why you shouldn't use hash tables in code that might (even potentially) become a target for DoS attacks.

You can have hash tables that cannot be targeted by DoS attacks.

You can use a family of hash functions and seed the hash table instance with a key.

The attacker won't be able to do make assumptions regarding collisions because he doesn't have access to the initial key.

This being universal hashing right? Is there an implementation of this?

The tree conversion does not affect performance at all. It is only done on DOS attacks, which almost never occur. Counting the collisions is almost for free.

The optimizations are cache friendly open-addressing with linear probing, storing the hash, and power of 2 sizes. Another usual optimization would be a sorted array instead of the linked list.

You can always use a cryptographic hash of the key instead of an untrusted key.

Linked list buckets are terrible for cache locality anyway.

Can you show me an implementation of a high-performance hash table with a cryptographic key that is not vulnerable to hash-flooding?

All you have to do is have the hashmap be HashMap<K,V,CryptographicKey> and never reveal the key, or seed it from randomness? One can use an HMAC or some other method to use the key to derive hashes.

Is there as HashMap overload with three type parameters? Yes, you can use a cryptographic hash function like SipHash does but it wouldn't perform has well as HashMap.

In Java? No. Rust is safe out of the box while retaining O(1) properties at all times

But there is a heavy multiplier on that O(1) with the default hasher in Rust. My benchmarking has shown that FNVHasher gave me a 24% decrease in execution time over the default hasher. My use case had a high likelihood of a poll on the hash indicating that there was no match, and I got a near doubling of speed just by setting the initial capacity sufficiently large.

Rust's standard library HashMap is a good example.

As part of project Valhalla they did some performance tests with open addressing as well [1]. Result: java.util.HashMap was the best performing implementation most of the time.

[1] https://cr.openjdk.java.net/~skuksenko/valhalla/hashmaps/has...

> in practice, nobody will use a hash function that returns a constant integer

How much do you want to bet on that?

Not quite that bad, but the hash algo for ... function lookup? (the exact usage escapes me) in PHP was "strlen". Yup, the length of the function name was the hash value.

yes, the early php stdlib functions sometimes had mangled names by replacing a "to" with 2, and maybe shortening some names but not others due to this reason. At least, that's what the creator of php says here: https://news-web.php.net/php.internals/70691

Also fun read: https://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/

Maybe it was even more terrible, because it tried to be "smart".

I did this many years ago. It wasn't production code and I was prioritizing expediency. The class had rather complicated equals() semantics, I wanted "set" and "map" data structures, and I knew there would never be very many objects.

> How much do you want to bet on that?

Isn't it a rather safe bet since most people won't roll their own hash functions anyway.

I consider myself an optimist.

The post quotes a comment from the Python source code for the dict object:

  This is done by initializing a (unsigned) vrbl "perturb"
Is "vrbl" a term I should know, or just a prvrs abbrvtn for variable?

It's an acronym for "virtual retractable binary link", a concept well known to perturbation specialists.

It’s just an abbreviation.

If the OP has time, it might be nice to experiment with the hash table technique used by sets in Python. It has a variant of open addressing the runs several linear probes before jumping to another location in hash table.

The premise is that probing consecutive memory addresses is cheap because of cache locality and because the address computation is a simple increment. This should be cheaper than resolving hash collisions with additional random probes or by chasing linked list references to random memory locations for each link.


I like the link to Spotify profile, seems a much better way to get to know someone.

I've read that HashMap will use a tree if a single bucket gets too large so long as the keys are comparable.

Java, the language where linear probing is considered high-tech.

What's gonna happen once they discover robinhood hashing?

The point of the article was too prove that it's hard to do Open Adressing in Java and beat HashMap performance, that uses Separate Chaining.

The article contains also a Robin Hood implementation that overall works worse than HashMap. If you can give something recommendation on how to improve it I would forever in debt.

use a programming language where you have control on memory layout.

Check also PHP-7 or 8's "open addressing" hash map. Beats the hell out of everyone else. Like C++ red-black tree default hashmap implementation taking 1.0 second -- vs PHP taking 0.5 second for the same dataset.

Here's the code I tested with.


  #!/usr/bin/env php
  for($i=0; $i<5000000; $i++){
  print $ret["id.10000"]."\n";

  #include <iostream> 
  #include <map> 
  using namespace std; 
  int main(){ 
   map<string, string> ret; 
   for(int i=0; i<5000000; i++){
    ret[string("id.")+to_string(i%500000)] = string("val.")+to_string(i); 
   cout << ret.size() << endl;
   cout << ret["id.10000"] << endl;
   return 0;
Rust :

  use std::collections::HashMap;
  fn main(){
    let mut ret = HashMap::new();
    for i in 0..5000000 {
        ret.insert(format!("id.{}", i%500000), format!("val.{}",i));
    println!("{}", ret.len());
    match ret.get("id.10000") {
        Some(val) => println!("{}", val),
        None => println!("ERROR")

> C++ red-black tree default hashmap

unordered_map is the hashmap.

map is a red-black tree, which keeps the data sorted. You should use map<> if you need to visit the data in-order with a for-each loop. Hashmaps can't visit the data in order.

unordered_map performs even worse than map. Tried it.

Looking at the code more carefully, I'd actually bet its string that's slow, and not actually map.

unordered_map is actually known to have relatively bad performance on GCC and some other open-source implementations. I don't know if there's a spec-problem that favors a slower-hash table (rather than the modern open-addressing scheme, which is faster on modern computers), but I can believe unordered_map has an issue on common standard libraries today.


> ret[string("id.")+to_string(i%500000)] = string("val.")+to_string(i);

Lets count the mallocs.

> string("id.")


> to_string(i%500000)


> string("id.")+to_string(i%500000)

Three, then two frees() as the previous two strings are deallocated.

> ret[string("id.")+to_string(i%500000)]

Four. The ret[] operator searches for the string. Since it doesn't exist, it now has to malloc the key and insert the value in there. std::move() might solve that issue and cut this one out easily?

> string("val.")


> to_string(i)


> string("val.")+to_string(i)

Seven + 2x frees.

> ret[string("id.")+to_string(i%500000)] = string("val.")+to_string(i);

Eight, to place the value in there. Again, probably solved with std::move. +1 more free.

So eight mallocs and 6 frees?


Yeah, C++ strings shoot yourself in the foot performance-wise. But they're easy to use and free of stupid malloc/free bugs, but it means that they will malloc/free a LOT more than you might guess with the untrained eye.

I know that Perl / PHP / etc. etc. were explicitly designed to be more favorable towards string manipulation. And I'd bet that there's some old C++98 stuff that might lock in the performance of string to this older, less performant model.

EDIT: I see you managed to avoid the problem of excessive mallocs/frees in the Rust code with "format". The C++ equivalent might be snprintf, though that dips down into C-style strings.

> I don't know if there's a spec-problem that favors a slower-hash table

Yes. std::unordered_map requires that pointers to stored values will always be valid (except if the value is erased). This makes open adressing effectively impossible. If you need max. performance, std::unordered_map is indeed not a good choice.

> The ret[] operator searches for the string. Since it doesn't exist, it now has to malloc the key and insert the value in there.

The optimal method would be https://en.cppreference.com/w/cpp/container/unordered_map/em...

> So eight mallocs and 6 frees? > Yeah, C++ strings shoot yourself in the foot performance-wise.

Most implementations of std::string use short string optimization. Typically, you can save up to 12 chars (32-bit) or 24 chars (64-bit) inside the actual string container without any memory allocation.

Looking at the code example, I would say that there won't be any mallocs at all for std::string.

I think the performance difference just comes from the suboptimal design of std::unordered_map, where each node has to be allocated on the heap. In open addressing hash tables, as used by PHP (and I guess also by Rust), the nodes live in a single large array. The only memory allocation you see is when the array has to be resized due to a rehash - which happens very rarely!

Thanks for the SSO pointer. I'll keep that in mind in the future.

> Yes. std::unordered_map requires that pointers to stored values will always be valid (except if the value is erased).

Hmm, that requirement also prevents Cuckoo-hashing.

The general pattern here (and this is problematic in a lot of languages), is the idea of having a "shortcut" into an object in a data-structure. For example, C++ and Java's "iterator-invalidation" rules are absolutely arcane, even if we removed the pointer-requirements.

Many data-structures want to move things around (cuckoo-hashing, open-addressing, robin-hood hashing... look 3 different, high-performance strategies and we're still just talking about hash tables!!). But those moves "invalidate" the shortcuts (be it a pointer or an iterator).

Some data-structures can have those pointers just fine (Red-and-black trees), others cannot. Representing all these possibilities with a fixed, easily understood interface that generalizes among all of these data-structures / decisions seems impossible.


Well, I guess that's why I keep rolling my own data-structures when I need performance guarantees... standard-library just has to make decisions that may or may not match my cases.

> Well, I guess that's why I keep rolling my own data-structures when I need performance guarantees...

Yes, this is what many companies do. std containers are good enough for the general case, but are certainly situations where you need something more fancy. However, before you roll your own, first check if there's a good library you can use. The world's best hashtable has already been written by other people :-)

For std::unordered_map, I don't know why they decided to add the no-iterator-invalidation requirement. If you really need that property, you could get most of the benefits by using unordered_map<unique_ptr<T>> instead of unordered_map<T>. So maybe not the best example of "don't pay for what you don't use".

> the no-iterator-invalidation requirement

Iterators can be invalidated (e.g. after a rehash). The requirement is rather that the address of the value itself must not change (so that pointers will remain valid).

> If you really need that property, you could get most of the benefits by using unordered_map<unique_ptr<T>> instead of unordered_map<T>.

I agree. I don't know why they have decided for the current design. They even expose the bucket interface (https://en.cppreference.com/w/cpp/container/unordered_map/bu...) which should really be an implementation detail...

I mean, its a hashmap we're talking about. I'm not sure if iterators even make sense in any regard. Things inside of a hashmap have no order at all.

And determining if a cell is full or empty in a hash-map isn't exactly efficient either (You pretty much have to visit those locations and see if a tombstone or empty symbol is there).


I recognize that people want to foreach() over a HashMap, but I'm just not convinced its an efficient operation, nor a good fit for the iterator interface (especially with all this discussion of invalidations and movement...)

> I'm not sure if iterators even make sense in any regard. Things inside of a hashmap have no order at all.

Iteration doesn't necessarily imply a specified order.

> but I'm just not convinced its an efficient operation

Why does iteration have to be efficient?

> nor a good fit for the iterator interface

What would you use instead?

Iterator invalidation only comes into play when you erase elements. If you just iterate over a hashtable and, say, print the elements, you wouldn't have to worry about it at all.

In both Robin Hood and Cuckoo hashing, insertions will move elements around. That means the iterator guarantees (or: that you'd visit every element) could be broken).

So a foreach(cuckoo_table) operation may not in fact visit all elements if the table has an insert somewhere in the loop.

Well, inserting elements while iterating is never a good idea :-) This will cause problems even with std::vector.

BTW, inserting elements can invalidate iterators with any hashtable if it might cause a rehash.

That's what I'm talking about though.

        datastructure.what-functions-are-legal() ??
Vector: Inserts are allowed as long as element fits in vector.capacity(). Erase (which "shifts" the elements down) are allowed at the end of the vector (specifically: all locations "after" the erase are invalidated. So pop_back() is always legal)

List: All operations allowed, except for reading from an erased element.

map: All operations allowed, except for reading from an erased element.

unordered_map: ??? Under debate right now.


So even if you had assurances that unordered_map.insert would work (ex: Robinhood hash, you check size() and compare to capacity(), much like a vector), the movement of data means that your iterator is basically invalidated on any insert.

So unordered_map(Robin-hood): Nothing. No erases, no insertions allowed during the foreach loop at all.


This comes up in practice. The minute you have 2 threads, and a 2nd thread _might_ insert while the 1st thread _might_ be in a foreach loop, you suddenly have to have assurances for what is or isn't legal from an iterator-invalidation perspective.

I guess you could just have the entire for-each loop inside of a lock, but that seems a bit inefficient?

> So unordered_map(Robin-hood): Nothing. No erases, no insertions allowed during the foreach loop at all.

Yes, but this applies to any hashtable that might rehash on inseration (= most). In practice this is not a big problem. For example, you can just put the new elements / to be deleted keys on a std::vector and only insert/erase after the for-loop.

I don't see how this makes iteration itself problematic. You can still iterate over a std::unordered_map perfectly fine. Just don't erase/insert elements while iterating.

> I guess you could just have the entire for-each loop inside of a lock, but that seems a bit inefficient?

Yes, you would have to lock the whole for-loop. It's the same with any other data structures where the iterators can get invalidated on inseration/deletion, e.g. std::vector.

Even for containers where iterators are not invalidated, you have to be careful with fain-grained locks. For example, you can only increment the iterator while holding the lock!

> The C++ equivalent might be snprintf

There's std::format from C++20 too[1]. One can use std::ostringstream instead, I suppose, but at that point I'd probably use snprintf too.

[1] https://en.cppreference.com/w/cpp/utility/format/format

std::ostringstream will certainly allocate its buffer on the heap. When you call the str() method to retrieve the final result, it will have to make another copy. So you will always have an extra allocation.

Note that std::format still returns a std::string (which might allocate memory, depending on the size of the string), but that doesn't matter if it is going to be moved anyways.

If you want to write into an existing buffer, you can use std::format_to_n as a modern replacement for snprintf.

C++ 20 has std::format for formatting. And an errata even fixed the typical C++ footgun of allowing you to write nonsense and then only discover the problem at runtime (maybe months later if this code is in an uncommon path), the committee agreed that yes, compilers should tell you if your constant string format won't work at compile time as would happen in Rust, and many other modern languages.

  std::format("id.{}", i) // is fine

  std::format("{}-{}", oops) // You need two parameters, compiler knows this

that's absolutely untrue in a lot of cases: https://stackoverflow.com/questions/36392583/is-an-unordered...

of course one can always find faster, drop-in replacement libraries: https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-o...

e.g. replacing std::unordered_map with robin_hood::hash_map makes the above C++ code go from 1.2 seconds to 0.5 seconds on my machine (std::map is around 1.3/1.4 seconds for libstdc++).

(though as others said, this test is in big part a test of std::string performance... replacing strings by int make times go down to ~0.1s here)

> Rust :

You’re using the default hash function, which is SipHash 1-3, which is known to be quite slow. Not horrendously so but it was picked for attack resistance.

If you need high hashmap performances and control input (or have a rather safe type) it’s pretty common knowledge that you should replace the hash function with something like fnv or fx.

You’re also allocating a ton in a hot loop, also a known mispattern. I’d expect C++ to also be greatly affected there.

My experience with fx has been very mixed and I don't think it's a safe default recommendation. It can often outperform fnv and wyhash because it does so little, but it can also greatly underperform once it's used in a hash table because it can cause a lot of conflicts for some types of keys, especially after modulo.

This can be much slower than just having a slightly slower hash function, because the equality function will be called against many more candidates, potentially with each requiring one or more uncached fetches from main memory.

I now recommend wyhash more often than not, and sometimes fnv can beat that. wyhash has the substantial advantage that it can also be made attack-resistant with a seed given to the builder, with much less overhead than sip.

My most recent problematic use cases for fx were all to do with hashing just one or two scalar integers per key. Its few per-word instructions weren't nearly enough to get a good distribution when there were only 1-2 words (on a usize=u64 target). Even hacking in fmix64 from Murmur3 helped a ton, but made it no faster than wyhash in the end, and harder to reason about. I ended up using wyhash for some keys and fnv for others.

In any case, nobody should switch hash functions without evaluating performance on full-sized realistic production data. You can easily end up with bad overall hash table performance because of poor distribution with your real keys.

Microbenchmarks tend to miss this, especially if they do not create real-sized and real-shaped key sets. Then all you see is the hash function performance, which is misleading if it results in poor overall table performance, which is what happened to me with fx in particular. I think a lot of people choose fx based on such microbenchmarks and assume its distribution is good enough to scale out, but this should not be assumed, only tested.

You might also see surprising effects from cache locality when testing tables big enough that they don't fit in cache, and if this matters for your workload then you definitely want to account for that in your choice of hash table data structure.

Wow. That php program is fast! I was skeptical for much the same reasons as the other commentators. But I took the rust version, packed the strings into u128, eliminated the allocations in the loop, --release, lto, codegen-units=1, target-cpu=native (basically all the tricks to make it fast). I tried FxHashMap, AHashMap, and hashbrown, but they didn't make much difference with the other optimizations (and FxHashMap was significantly slower).

Even with all that, the php version was still 2x faster. And it truly is due to the hashtable implementation itself.

Last time I checked, that was around PHP 7.0, it was using chaining for its hash table implementation. One notable thing was that its hash table was also using a more optimal array impl (no hashing) if you were only using numerical consecutive keys starting from zero.

> that was around PHP 7.0

7.0 is when php switched to open addressing: https://www.npopov.com/2014/12/22/PHPs-new-hashtable-impleme...

Interesting, I definitely missed some of that, didn’t know about replacement of doubly linked list with arData array for maintaining insertion order. It does still use chaining though, just not with the classical pointers and instead with indexes into an array but indirection is still there for collision cases.

> Last time I checked, that was around PHP 7.0, it was using chaining for its hash table implementatio

Chaining was from php 5.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact