
Finger Trees: A Simple General-purpose Data Structure - llambda
http://www.soi.city.ac.uk/~ross/papers/FingerTree.html
======
wandermatt
"This is a very nice theoretically clean and powerful sequential data
structure, but in practice it is very very very very very very very very very
very very slow... very very slow. No one uses finger trees because they're so
slow." -Daniel Spiewak

From his 2011 NEScala talk on functional data structures, covering practical
considerations like locality of references and caching. Available on vimeo
<http://vimeo.com/20262239>

~~~
AaronFriel
NEScala -- was he implementing finger trees as Java objects? Java objects have
extraordinary overhead, and Finger Trees are only efficient when the per-entry
overhead is low. So naturally he prefers array-mapped tries, because arrays
have substantially less per-object overhead in Java.

See this overview of the difficulty of building efficient datastructures in
Java:
[http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/mem...](http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-
efficient-java-tutorial.pdf)

~~~
AaronFriel
Some further exposition.

Some takeaways from this slide deck: a TreeMap<Double, Double> has a minimum
of 82% overhead. If you allocate 100 megabytes of memory to that tree map,
only one fifth of that memory is your data, the rest is object overhead. An
array, on the other hand, would have only 16 bytes of data, and thus would
store close to five times as much data per megabyte. Even if you implement
some strange accounting scheme on top of your array to make it act like a
tree, you're probably already reaping performance benefits due to greater data
locality, likelihood of having data in cache lines, etc.

I'm reminded of when I worked with libraries for Judy arrays in 2006 or so.
Judy arrays are a radix tree like data structure with performance optimized
for cache line sizes and dereferences at every node of the structure. Pointers
to leaves are tagged in their lowest 3 bits to indicate which of 8 types of
node they are. A very full node then will usually be allocated as an array, a
node with fewer entries than roughly 3/4 of a cacheline worth of pointers will
have an array of bytes, and an array of pointers to child nodes, located
adjacently. The first list is searched, and if there's a hit, it uses that
index to find the pointer to the child in the second list.

All of these optimizations are really fantastic, Judy arrays not only have
more "features" than hash tables, such as very efficiently stored arbitrary
length keys and the ability to access the first, last, and random elements of
the tree very efficiently, but also a Judy array can be accessed in
lexicographic order which cannot be done with a conventional hash table. On
top of this, in practice, they outperform naive hash trees straight out, due
to their eye to optimizing for real world processors' cache line size.

All of this is fantastic, except if you implemented this in Java it would be,
ahem, "in practice it is very very very very very very very very very very
very slow... very very slow".

------
DanWaterworth
If you're unfamiliar with finger trees, I recommend staring at this [1] for a
while.

[1]
[http://commons.wikimedia.org/wiki/File:2-3_finger_tree,_samp...](http://commons.wikimedia.org/wiki/File:2-3_finger_tree,_sample_tree.svg)

