
Optimizing Hash-Array Mapped Tries for Fast Immutable JVM Collections (2015) [pdf] - tosh
https://michael.steindorfer.name/publications/oopsla15.pdf
======
jcdavis
The author presented a talk about this at the JVM Language Summit last year:
[https://www.youtube.com/watch?v=pUXeNAeyY34&list=PLX8CzqL3Ar...](https://www.youtube.com/watch?v=pUXeNAeyY34&list=PLX8CzqL3ArzUY6rQAQTwI_jKvqJxrRrP_&index=4)

Github repo for those interested:
[https://github.com/usethesource/capsule/](https://github.com/usethesource/capsule/)

~~~
agumonkey
Interesting talk in a way but god the amount of wasted neurons from java-like
OOP.. all this to xi..xn,yi...yn instead of xi,yi...,xn,yn ..

------
VHRanger
What is the real world use case for a HAMT data structure? Tree based
associative arrays (like c++'s std::map) tend to have poor real world
performance

~~~
masklinn
More efficient (in both CPU and memory) immutable/persistent collections,
which have the advantage that they're intrinsically thread-safe. They can also
be used to implement lock-free concurrent maps (Ctrie)

> Tree based associative arrays (like c++'s std::map) tend to have poor real
> world performance

That's something HAMT significantly improve upon. You're not going to get
linear performances out of them but basic operations are generally O(ln N)
where N is usually 32.

~~~
wfunction
Is there any research on the tradeoff between this kind of thread safety and
the penalties incurred on the memory management side (specifically the fact
that now you have far more heap allocations with immutable data and they have
to synchronize too)?

~~~
joe-user
I don't know of any formal research, but in Clojure, where HAMTs are a
fundamental part of the language, the philosophy is more oriented towards
paying for things (such as thread safety) with memory and CPU first,
benchmarking to find bottlenecks second, and if there are any bottlenecks,
optimizing accordingly. It was written with this in mind, which is why it
requires world-class VMs like the JVM, CLR, and V8/JSC/SpiderMonkey/etc. that
can deal with the GC and (in many cases) make runtime optimizations.

Also, I'm fairly certain synchronizing isn't an issue because there's nothing
to synchronize on since the data structures are immutable. Am I understanding
you correctly?

~~~
wfunction
> Also, I'm fairly certain synchronizing isn't an issue because there's
> nothing to synchronize on since the data structures are immutable. Am I
> understanding you correctly?

No -- I'm referring to synchronization on the heap itself (think
new()/delete()). Multiple threads allocating memory need to synchronize in
order to avoid trampling on each other. You can't just get rid of
synchronization entirely.

------
zokier
So would this be a good structure to store the recently discussed password
hash dump, basically a fixed set of 300 million SHA1-sums where the only
interesting operation is checking if the sum is or is not contained in the
set?

~~~
lord_jim
It would probably be better to use a trie directly, maybe a radix trie or a
b-trie.

------
didibus
Does this add something that the Clojure implementation doesn't?

------
maxpert
I wish the paper carried a GitHub link to the repo of reference
implementation.

~~~
e12e
Indeed, wasn't easy to track down:

[https://github.com/msteindorfer/oopsla15-artifact](https://github.com/msteindorfer/oopsla15-artifact)

Still trying to find out if there's a video of the OOPSLA15 presentation.

~~~
e12e
Some commentary on a blog, that I encountered along the way:
[https://blog.acolyer.org/2015/11/27/hamt/](https://blog.acolyer.org/2015/11/27/hamt/)

