Hacker News new | comments | ask | show | jobs | submit login

I did a quick-test now, and with 16 cores + FNV hashing + disabling hyperthreads, I got ~41M ops/s total on those cores. That got me to digging a little further, and I decided to use the same benchmark harness to benchmark just a std::collections::HashMap with FNV hashing. Running a single-threaded write-then-read benchmark yields a throughput of 5M reads/s, which is about 2x that of evmap (which seems like a reasonable overhead).

Digging a little further, I realized that the benchmarker spends a bunch of time on generating random numbers. In particular, generating a Zipf-distributed number (which the benchmarker does even when you run uniform) takes about 100ns: https://github.com/jonhoo/rust-zipf, which sets an upper limit of 10M ops/s that the benchmarker can measure. With Zipf removed entirely, I can get the std HashMap up to 8M reads/s, but no higher, which makes me think that the map really then is the bottleneck (generating a uniformly random number takes ~5ns, so shouldn't be the bottleneck). Running evmap with uniform and the Zipf-generation removed gives 73M reads/s, so ~4.5M reads/s/thread, which is again about 1/2 of the standard library HashMap with no synchronization.

So, all that said, I do not believe this is an error in evmap. Rather, if you believe 8M read/s per core on a HashMap is slow, then it's the Rust HashMap implementation in the standard library that is slow. evmap's 2x overhead doesn't seem that bad to me.

I'm glad I dug through this though!




That makes sense, thanks!

For microbenchmarks, I try to remove any overheads during runtime by precomputing. That means fully populating the map (only hits) and generating the distribution of keys. Then the threads start at a random index and increment for the key on each operation. This is a good example in Java, https://github.com/ben-manes/caffeine/blob/master/caffeine/s...

I would expect ~15-30ns per integer key lookup on a decent hashtable, w/o concurrency, at a modest size (few collisions). Using a concurrent map that supports lock-free reads doesn't have to be significantly different. Since hashtable design make such a big difference, it sounds like you've narrowed it down to that as a reasonable bottleneck. Thanks for investigating.

https://tessil.github.io/2016/08/29/benchmark-hopscotch-map....


@NovaX it's worth noting that swapping out the underlying hash map that evmap uses is really easy, which is one of the advantages of the design as far as I'm concerned! For example, here's the diff for moving to someone else's custom hash map implementation: https://github.com/jonhoo/rust-evmap/compare/hashbrown. A better benchmarking harness is a good idea, though I'd like to see something that is somewhat disconnected from evmap! Pre-generating the randomness is something I've done in the past, but it does have the downside that you end up not actually exercising the distribution well (unless you generate vast amounts of keys)...


You might want to try with hashbrown too; there’s an open PR to replace HashMap’s guts with it, but there’s also a crates.io package so you don’t need to rebuild the stdlib yourself.


Someone else just suggested this on Twitter, but sadly the results are pretty much the same: https://twitter.com/Jonhoo/status/1082014596028715008




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: