What is the real world use case for a HAMT data structure? Tree based associative arrays (like c++'s std::map) tend to have poor real world performance
std::map and red-black trees are associative & sorted (by key); HAMTs are just plain associative. HAMTs have generally excellent performance: the typical implementations use 32-way branching, which allows for the top levels to be likely in cache.
The best thing about HAMTs is passing immutable ones around, which have free update costs and multi-thread safe. Clojure uses these to represent information that flows through a program. (By free update costs I mean: they are free enough to never rise to the programmer's attention. Obviously ops have costs.)
More efficient (in both CPU and memory) immutable/persistent collections, which have the advantage that they're intrinsically thread-safe. They can also be used to implement lock-free concurrent maps (Ctrie)
> Tree based associative arrays (like c++'s std::map) tend to have poor real world performance
That's something HAMT significantly improve upon. You're not going to get linear performances out of them but basic operations are generally O(ln N) where N is usually 32.
Is there any research on the tradeoff between this kind of thread safety and the penalties incurred on the memory management side (specifically the fact that now you have far more heap allocations with immutable data and they have to synchronize too)?
I don't know of any formal research, but in Clojure, where HAMTs are a fundamental part of the language, the philosophy is more oriented towards paying for things (such as thread safety) with memory and CPU first, benchmarking to find bottlenecks second, and if there are any bottlenecks, optimizing accordingly. It was written with this in mind, which is why it requires world-class VMs like the JVM, CLR, and V8/JSC/SpiderMonkey/etc. that can deal with the GC and (in many cases) make runtime optimizations.
Also, I'm fairly certain synchronizing isn't an issue because there's nothing to synchronize on since the data structures are immutable. Am I understanding you correctly?
> Also, I'm fairly certain synchronizing isn't an issue because there's nothing to synchronize on since the data structures are immutable. Am I understanding you correctly?
No -- I'm referring to synchronization on the heap itself (think new()/delete()). Multiple threads allocating memory need to synchronize in order to avoid trampling on each other. You can't just get rid of synchronization entirely.
Depending on the use case, of course. Sometimes you need a structure that's both associative and ordered, for example, in which case std::map is handy.
But the entire discussion is about HAMT not std::map, the parent comment only talks about std::map as an example of performance issues in tree-based maps in asking how that relates to HAMT, it doesn't actually care a whit about std::map.
So would this be a good structure to store the recently discussed password hash dump, basically a fixed set of 300 million SHA1-sums where the only interesting operation is checking if the sum is or is not contained in the set?
Github repo for those interested: https://github.com/usethesource/capsule/