
Finger Trees: A Simple General-Purpose Data Structure (2006) - tosh
http://www.staff.city.ac.uk/~ross/papers/FingerTree.html
======
keith_analog
A few years ago, our research group published a data structure, called Chunked
Sequence, that is similar to the Finger Tree. In short, Chunked Sequence
features the same asymptotic profile as does Finger Tree (neglecting
persistence) and, in addition, offers strong guarantees with respect to
constant factors. Very roughly speaking, Chunked Sequence is to Finger Tree
what b-tree is to red-black tree.

We've implemented Chunked Sequences as a package of C++ header files and
provided source code on github. To the client, Chunked Sequence looks like STL
deque, but with additional methods to allow split at a specified position and
concatenate, both in logarithmic time. The operations which push and pop on
the ends of the Chunked Sequence are approximately as fast as those of STL
deque, and in certain use patterns much faster.

[http://deepsea.inria.fr/chunkedseq/](http://deepsea.inria.fr/chunkedseq/)

~~~
osivertsson
Cool.

But please provide clear licensing information (preferably a permissive free
software license). Right now I can't find anything at all about what terms you
are releasing this source code under.

Therefore most people will be unable to use it, which is a shame.

~~~
keith_analog
Thanks for the comment! We are using the MIT license. I've updated the source
code in the git repository accordingly.

~~~
davexunit
Which MIT license?

~~~
keith_analog
Thanks again! We're using the Expat license.

------
traviswatkins
Finger trees are an example of a truly elegant functional data structure that
can be used for just about everything. The problem is they don't match the
memory model of real hardware so for all of their elegance and theoretical
performance in reality they're too slow to be useful.

~~~
jstimpfle
Is this true? How is this more true for finger trees than for red-black trees,
etc?

~~~
stuckagain
Red-black trees are pretty much terrible for performance as well. I spend most
of my work days trying to eradicate std::map and std::set for performance
reasons.

Academics are always concerned about bounds, but practitioners are always
concerned about cache misses.

~~~
catnaroek
When you don't need persistence, a search tree data structure with fatter
nodes (e.g., B-trees) is almost guaranteed to perform better on real hardware.
But when you do need persistence, red-black trees become competitive again.

~~~
detrino
I would expect B-trees to outperform binary trees even in a persistent data
structure. The necessity of binary trees comes when you need stable
iterators/references.

------
harpocrates
Given the number of comments about the inefficiency of finger trees: yes they
usually have a high constant factor (for their otherwise reasonable asymptotic
complexity) due to cache misses. However, they are immutable and persistent,
which means they have efficient sharing, which in turn makes them good
candidates for

    
    
       * use in multiple threads at once
       * code that needs to be proven correct (I believe the Haskell Data.Sequence implementation is a translated from Coq)

------
raphlinus
I looked at finger trees as a possible basis for the string representation in
xi-editor, but ended up going with a simpler b-tree based approach. The better
asymptotic bounds for doing manipulations at the ends are appealing in theory,
but I never saw an actual problem with the O(log n) cost in practice, and it
is possible to optimize the common append-only case a lot (I have a "builder"
API but the current implementation is not as heavily optimized as it might
be).

I _believe_ that the polymorphic recursive type, easily expressible in
Haskell, cannot be expressed in Rust. You'd fake it by just using trees and
having the shape as an invariant maintained by the library (just as the min
and max child count constraint is maintained in a B-tree). I personally think
that's fine, Rust wouldn't be a better language if its type system was made
even more rich, but it's interesting to have examples so you know where the
edges are. (there's also the possibility someone will find a way to encode it
anyway)

------
harpocrates
This is a fun data structure to implement in Haskell, and I've always been
curious about how one would do it in C++, largely due to the fact that

    
    
        data FingerTree a = Empty
                          | Single a
                          | Deep (Digit a) (FingerTree (Node a)) (Digit a)
    

Has polymorphic recursion in the last case. What would be the C++ approach for
dealing with this sort of thing? Pass in a compile time integer to represent
the level of nesting?

~~~
zerofan
I've written this several times in several ways using C++ (and I'll probably
write it at least once more to make it better). In an early version I used an
integer level as you came to, but it made the compiler very unhappy as it
recursively tried to expand nested types at compile time (it couldn't figure
out the recursive types would terminate in practice). Max template recursion
of 256 if I remember correctly, even though you'd never instantiate past 45 or
so on any machine in the world.

In a later version, I implemented the specializations as inheritance on the
abstract FingerTree base class (verbose, but it works), and I added Leafs and
Nodes. Leafs are FingerTrees that hold your data, and Nodes are FingerTrees
that point to other FingerTree instance. This dodges the recursive types
problem. I don't know much Haskell, but I think it would be the C++ equivalent
of:

    
    
        data FingerTree a = EmptyLeaf
                        | SingleLeaf a
                        | DoubleLeaf a a
                        | TripleLeaf a a a
                        | EmptyNode
                        | SingleNode (FingerTree a)
                        | DoulbeNode (FingerTree a) (FingerTree a)
                        | TripleNode (FingerTree a) (FingerTree a) (FingerTree a)
                        | FingerSpine (FingerTree a) (FingerTree a) (FingerTree a)
    

Virtual methods on each specialization took the place of pattern matching. Not
super elegant.

------
rohmbus
Could someone give an example where this data structure would be useful?

~~~
willtim
You can create a persistent ordered container that has both prepend and append
in amortised constant time; and concatenate in log time.

~~~
geocar
Caching isn't great though.

Pagetables are a trie, so it's max log log log time to prepend, append, or
concatenate, and usually constant small for two of them (but admittedly: you
usually have to pick if you don't know how many arrays you want).

------
DanWaterworth
The problem with using lazy data structures for storing things is that deletes
don't necessarily free up storage. Fingertrees are fine when you need to store
a collection in order to implement some algorithm (and being persistent makes
it useful if this algorithm has to backtrack), but they aren't so good for
backing a collection that changes over time, like the set of currently
connected sockets in a network service.

~~~
wyager
Looking at Haskell's Data.Sequence, it looks like it's strict in its elements.
So a delete does free up memory as long as the deleted object isn't being used
somewhere else.

~~~
DanWaterworth
It needs to be spine strict for that to work, but the amortization for finger
trees doesn't work without laziness

