It's an atomic architecture specific thing.
I love his youtube streams. He's extremely patient and thoughtfully thinks through problems in a similar way to me, making it easy to follow.
Regardless, kudos and great work. I'm sure I'll find a use for this in some of my tokio projects.
In any case, I'm glad you enjoy the videos!
I'd love to help out if I can.
They provide benchmarks, but I would be more interested to know how the implementation differs.
dashmap works by splitting into an array of shards, each shard behind its own rwlock. The shard is decided from the keys hash. This will only lock and unlock once for any one shard and allows concurrent table locking operations provided they are on different shards. Further, there is no central rwlock each thread must go thru which improves performance significantly.
1. Mind the difference between concurrency and parallelism. This is around safe parallel access. Concurrent access can happen on one thread without any synchronization.
2. It’s oftentimes an anti pattern to model thread safety around primitive data structures as opposed to higher level concerns. It forces all data that has to be consistent across thread boundaries to be in this one map. When circumstances change, you might want to have some of that data in different data structures and still provide consistent access to them. This change will be hard when you rely on data structure level thread safety.
The use of the word concurrent here is consistent with that definition and the literature, where concurrent data structures are data structures designed to for safe concurrent access (whether they are accessed literally at the same time or not).
I'm not sold that this semantic game is usually worth playing, but here it is pointless. Concurrent access on the same threads or different threads isn't going to break.
> It forces all data that has to be consistent across thread boundaries to be in this one map
Why would it force that? There is no single technique for concurrency or parallelism and that silver bullet line of thinking is a dead end.
Concurrent data structures are an important part of the puzzle, especially maps and queues. Fork join, data flow, message passing, copying, swapping buffers, read only data, etc. the list goes on. If it was simple it wouldn't be a problem.
Because you don’t get atomic writes across multiple data structures that are each thread safe unless you perform all writes while holding a mutex. If you do that, you don't need data structures to be thread safe on their own.
But if you require that you're in this higher level of transactional semantics that hashmaps don't try to solve (you can still use them as building blocks of course).
I think the furthest hashmap api could go is to do:
1. CaS - simple compare-and-swap on single key
2. CaSS - compare-and-swap on one key and set other key + read two keys tuple
Having 2. would be quite powerful already and you could solve a lot of tasks with it.
Fancy locking over key ranges/key expressions would also be interesting but that starts to be more like database, not hashmap anymore.
No. Maybe you are not familiar with the concurrency data structures community. Many techniques are invented and used in concurrency data structures, e.g., intrinsic CPU atomic instructions, like CAS, that make the data structures "lock-free".
> you don’t get atomic writes across multiple data structures
They're saying that lock-free approaches don't help when you have to ensure consistency across multiple datastructures.
(this is not a critique of this specific library, it's more a look at the rust ecosystem as a whole)
i keep looking at Rust, but at the end it seems it is not a language for me. Rust developers just seem to use more "unsafe" than what i am comfortable with. generally, if there could be a choice between using "unsafe", and taking a 2% performance penalty,i personally would go with the performance-penalty. of course, i can understand others have different priorities. the question is, what are the priorities of the rust ecosystem? i mean, can i find libraries that go with as-safe-as-possible or are most libraries as-fast-as-possible?
also, the claim that the rust language is fast and safe becomes harder to accept when the fast libraries use unsafe :) (i do understand code using "unsafe" can be safe if the developer does not make mistakes. the problem is, developers do make mistakes.)
And that's a good thing! In most other languages, what would require an "unsafe" in Rust can be done without any visible marker. The Rust language shines a spotlight in these places, which allowed you to notice them.
> the question is, what are the priorities of the rust ecosystem? i mean, can i find libraries that go with as-safe-as-possible or are most libraries as-fast-as-possible?
From what I've seen, the priorities of the Rust ecosystem tends to be "as-fast-as-possible wrapped into an as-safe-as-possible abstraction". That is, presenting a safe interface around a fast core, which can be audited separately; users of the safe interface do not have to worry about unsafety in the implementation.
Usage of unsafe is effectively flagging areas for peer review. You can't build everything in safe Rust - certain things _require_ the use of unsafe. Having it gated, reviewed, and so on is effectively a check on a class of bugs that can be hard to pin down.
IMHO, the community cares way, way too much about the mere sight of an unsafe in a codebase - it borders on religious zealotry. It's just a tool like anything else in the (wonderful) language.
Strongly agree. I personally find a lot of the people involved in https://github.com/rust-secure-code/safety-dance to be mildly annoying to very unpleasant in their zealotry and snarkiness.
"i do understand code using "unsafe" can be safe if the developer does not make mistakes. the problem is, developers do make mistakes."
"Usage of unsafe is effectively flagging areas for peer review. You can't build everything in safe Rust - certain things _require_ the use of unsafe. Having it gated, reviewed, and so on is effectively a check on a class of bugs that can be hard to pin down."
it's the same thing. the difference is that you look at it from the glass-half-full point of view (it's good that must-be-verified-by-a-person blocks are limited here), and i do from the other end (it's bad that these blocks are necessary).
i did talk about it in my other comment here: https://news.ycombinator.com/item?id=22701550
Data structures are a perfect place for judicious use of unsafe.
Rust isn't about completely eliminating all unsafe code. It never has been. It has always been about building safe abstractions with auditable unsafe internal parts.
In particular, the implementation of data structures. Raw memory allocation, usage of pointers, low level concurrency primitives - none of that can be done without the programmer manually enforcing invariants.
But unsafe isn't wanton abandon. You still have to obey the type system and ownership rules.
As for the goal, it depends on the author. Some prioritize one over the other. In general the Rust community these days tries to optimize the balance of safety and speed with as few compromises as possible - which is fundamentally why the language exists.
> If you're looking for concurrent code that doesn't use `unsafe` you won't find any.
The Rust standard library provides a bunch of concurrency primitives. You can write 100% safe Rust code use these primitives to do things concurrently, and Rust guarantees that your resulting program is free of a specific category of bugs: memory errors, data races, etc. No other mainstream language gives you this combination of concurrency safety and efficiency.
The implementation of the concurrency primitives, however, often includes assembly, or C, or unsafe Rust code, which the Rust compiler can't provide guarantees for. We have to rely on humans for that. This is the norm for almost all programming languages.
Ideally, yes, it would be nice if the safe subset of Rust were powerful enough to implement efficient concurrency primitives, but that would add a TON of complexity to the type system.
People are generally ok with the Rust standard library using assembly, C, or unsafe Rust, because even though it can be tricky to write this kind of code correctly, the standard library is maintained by Rust experts who take every change seriously.
People tend to worry more about third party libraries because, on average, the authors are not Rust language experts and may not understand the subtleties.
[EDIT]: clarified that i mean third-party libraries.
Not sure whether you would consider that to be by "design", "necesity", or a "compiler/language deficiency".