
Dashmap: Fast concurrent HashMap for Rust - indiv0
https://github.com/xacrimon/dashmap
======
aratno
If you’re interested in this, you might like this live-coding session
implementing Java’s ConcurrentHashMap in Rust:
[https://youtu.be/yQFWmGaFBjk](https://youtu.be/yQFWmGaFBjk)

------
amelius
Nice! But a bit disappointing that read speed doesn't increase with number of
threads. Curious what the bottleneck is here (first thought is it can't be the
memory because CHT shows higher performance for reads in the graph). A
comparison with a read-only hashmap would be nice here.

~~~
xacrimon
Hey! I do know what the issue is I think and it is something that will be
resolved with v4 releasing later this year.

It's an atomic architecture specific thing.

------
tombert
I know this is a bit out of date, but I remember there was a port of the Scala
Ctrie data structure to Rust using Hazard pointers. I would be quite
interested in seeing how the performance of a snapshotable lock-free structure
compares to this

[https://github.com/ballard26/concurrent-
hamt](https://github.com/ballard26/concurrent-hamt)

~~~
xacrimon
The Contrie library is in the benchmarks and is a port. That said referring
the note in the repo the benchmarks arent 100% scientific atm and will be
revamped later this year.

------
CameronNemo
I am curious how this differs from chashmap:

[https://gitlab.redox-os.org/redox-os/chashmap](https://gitlab.redox-
os.org/redox-os/chashmap)

They provide benchmarks, but I would be more interested to know how the
implementation differs.

~~~
xacrimon
Hi! I am the author of dashmap. CHashMap is essentially a table behind an
rwlock where each slot is also behind its own rwlock. This is great in theory
since it allows decently concurrent operations providing they don't need to
lock the whole table for resizing. In practice this falls short quickly
because of the contention on the outer lock and the large amount of locks and
unlocks when traversing the map.

dashmap works by splitting into an array of shards, each shard behind its own
rwlock. The shard is decided from the keys hash. This will only lock and
unlock once for any one shard and allows concurrent table locking operations
provided they are on different shards. Further, there is no central rwlock
each thread must go thru which improves performance significantly.

------
phibz
I see Jon G referenced in the readme. Is this work based on his livestream
series where he ported Java's concurrent Hashmap to Rust?

I love his youtube streams. He's extremely patient and thoughtfully thinks
through problems in a similar way to me, making it easy to follow.

Regardless, kudos and great work. I'm sure I'll find a use for this in some of
my tokio projects.

~~~
Jonhoo
Nope, Dashmap is all xacrimon, and came on the scene long before my port.
We've been collaborating on writing a shared benchmarking suite over at
[https://github.com/jonhoo/bustle/](https://github.com/jonhoo/bustle/) though.
For the time being, it looks like Dashmap outperforms the port of
ConcurrentHashMap (called "flurry"), often by a significant amount. It seems
to be mainly due to the garbage collection scheme flurry uses, but we're still
digging into it (maybe you want to come help?).

In any case, I'm glad you enjoy the videos!

~~~
phibz
Ha "straight from the horse's mouth."

I'd love to help out if I can.

~~~
Jonhoo
Awesome! Some good places to read up on and join the discussion are
[https://github.com/jonhoo/bustle/issues/2](https://github.com/jonhoo/bustle/issues/2),
[https://github.com/jonhoo/flurry/issues/50](https://github.com/jonhoo/flurry/issues/50),
and
[https://github.com/jonhoo/flurry/issues/80](https://github.com/jonhoo/flurry/issues/80).
Happy to guide you further there!

------
jupp0r
Some feedback:

1\. Mind the difference between concurrency and parallelism. This is around
safe parallel access. Concurrent access can happen on one thread without any
synchronization.

2\. It’s oftentimes an anti pattern to model thread safety around primitive
data structures as opposed to higher level concerns. It forces all data that
has to be consistent across thread boundaries to be in this one map. When
circumstances change, you might want to have some of that data in different
data structures and still provide consistent access to them. This change will
be hard when you rely on data structure level thread safety.

~~~
BubRoss
> This is around safe parallel access

I'm not sold that this semantic game is usually worth playing, but here it is
pointless. Concurrent access on the same threads or different threads isn't
going to break.

> It forces all data that has to be consistent across thread boundaries to be
> in this one map

Why would it force that? There is no single technique for concurrency or
parallelism and that silver bullet line of thinking is a dead end.

Concurrent data structures are an important part of the puzzle, especially
maps and queues. Fork join, data flow, message passing, copying, swapping
buffers, read only data, etc. the list goes on. If it was simple it wouldn't
be a problem.

~~~
jupp0r
> Why would it force that?

Because you don’t get atomic writes across multiple data structures that are
each thread safe unless you perform all writes while holding a mutex. If you
do that, you don't need data structures to be thread safe on their own.

~~~
lcy
> unless you perform all writes while holding a mutex

No. Maybe you are not familiar with the concurrency data structures community.
Many techniques are invented and used in concurrency data structures, e.g.,
intrinsic CPU atomic instructions, like CAS, that make the data structures
"lock-free".

CAS: [https://en.wikipedia.org/wiki/Compare-and-
swap](https://en.wikipedia.org/wiki/Compare-and-swap).

~~~
ahupp
From the comment:

> you don’t get atomic writes across multiple data structures

They're saying that lock-free approaches don't help when you have to ensure
consistency across multiple datastructures.

------
asdf-asdf-asdf
11 uses of "unsafe".

(this is not a critique of this specific library, it's more a look at the rust
ecosystem as a whole)

i keep looking at Rust, but at the end it seems it is not a language for me.
Rust developers just seem to use more "unsafe" than what i am comfortable
with. generally, if there could be a choice between using "unsafe", and taking
a 2% performance penalty,i personally would go with the performance-penalty.
of course, i can understand others have different priorities. the question is,
what are the priorities of the rust ecosystem? i mean, can i find libraries
that go with as-safe-as-possible or are most libraries as-fast-as-possible?

also, the claim that the rust language is fast and safe becomes harder to
accept when the fast libraries use unsafe :) (i do understand code using
"unsafe" can be safe if the developer does not make mistakes. the problem is,
developers do make mistakes.)

~~~
Klonoar
I think... you miss the point.

Usage of unsafe is effectively flagging areas for peer review. You can't build
everything in safe Rust - certain things _require_ the use of unsafe. Having
it gated, reviewed, and so on is effectively a check on a class of bugs that
can be hard to pin down.

IMHO, the community cares way, way too much about the mere sight of an unsafe
in a codebase - it borders on religious zealotry. It's just a tool like
anything else in the (wonderful) language.

~~~
asdf-asdf-asdf
i don't think i missed the point here.

i wrote: "i do understand code using "unsafe" can be safe if the developer
does not make mistakes. the problem is, developers do make mistakes."

you wrote: "Usage of unsafe is effectively flagging areas for peer review. You
can't build everything in safe Rust - certain things _require_ the use of
unsafe. Having it gated, reviewed, and so on is effectively a check on a class
of bugs that can be hard to pin down."

it's the same thing. the difference is that you look at it from the glass-
half-full point of view (it's good that must-be-verified-by-a-person blocks
are limited here), and i do from the other end (it's bad that these blocks are
necessary).

~~~
nickez
I think you missed parent's point. There are constructs that the current
compiler _can 't_ prove is correct and to write such code you need unsafe. It
is often not about a trade-off between speed/safety.

~~~
Jonhoo
I actually gave a talk about exactly this a few weeks back that may be
relevant: [https://youtube.com/watch?v=QAz-
maaH0KM](https://youtube.com/watch?v=QAz-maaH0KM)

