
Exotic data structures - wslh
http://concatenative.org/wiki/view/Exotic%20Data%20Structures
======
jandrewrogers
This is a pretty good list of data structures that are out there, that are
also useful, and to which most programmers never seem to have been introduced.
I regularly use variants of some of the data structures listed (plus several
that did not make the list) and frequently get questions about the design and
structure from programmers that have never seen them.

In practice, you can get away with knowing just a handful of data structures
and most programmers do. However, when it comes time to profile it is valuable
to have alternatives to what the standard libraries and implementations offer.
For any given set of constraints on the intended use case and workload for a
classic container, there is usually a data structure that is nearly ideal for
that use case. Also, there are often neat ideas buried in some of these data
structures that are worth learning for their own sake.

For the work that I do (high-performance C++) the first pass will often use
the C++ STL. When we start profiling, the STL is usually among the first
things that need to be replaced. Having a large number of workload optimized
data structures in the toolkit for the same abstract container makes this
relatively easy.

~~~
pestaa
That's interesting, I always thought of STL as a well-designed piece of
library. Or is it just not specialised enough to fit your case?

~~~
ExpiredLink
STL uses 'value semantics'. Just use push_back in a loop and observe how often
the whole data in std::vector is copied. Not to mention std::sort.

~~~
cokernel_hacker
vector::emplace_back ameliorates by allowing new elements to be constructed
in-place [1]

[1]
[http://en.cppreference.com/w/cpp/container/vector/emplace_ba...](http://en.cppreference.com/w/cpp/container/vector/emplace_back)

------
Strilanc
Nice article, although a bit terse. One part that jumped out at me was:

> It is possible to implement fully persistent arrays in O(1) space per
> update, with O(lglgN) time for access and updates [4].

This is surprising because it would guarantee at most a log(log(n)) slowdown
for any immutable data structure (instead of the log(n) bound I was aware of).
But then I checked out the paper and it turns out that the complexity is
conditional:

> Fully persistent array use can also be provided at O(1) amortized cost,
> provided that the algorithm satisﬁes a simple requirement as to uniformity
> of access. For those algorithms which do not access the array uniformly or
> single-threadedly, array reads or updates take at most O(log n) amortized
> time, where n is the size of the array

By uniformity they mean no index is used more than K times the average use-
per-index (for some constant K). A simple example of a data structure breaking
that condition is a heap. The top element is accessed every query but the
bottom elements, that dominate the average, are accessed once per n queries
(not a constant).

~~~
Evbn
Yeah, something like a finger tree is amortized constant time for sequential
access, but log(n) for random access away from the ends.

What is nearly ALWAYS overlooked is constant factors. A finger tree is in a
nice complexity class, but still requires a lot of per-object overhead
compared to an array, that can turn a 10MB chunk of data into a 100MB data
structure.

------
jayferd
I would especially appreciate how one might go about implementing these in
some of the concatenative languages listed there, as these kinds of data
structures tend to be pointer- and memory-based, which I've found difficult to
replicate in Forth et al.

------
cdelahousse
The first data structure, the Compact dynamic array, is explained beautifully
for an undergraduate audience, in Pat Morin's Open Data Structures open source
data structures textbook. He calls them rootish array stacks.

[http://opendatastructures.org/ods-
java/2_6_RootishArrayStack...](http://opendatastructures.org/ods-
java/2_6_RootishArrayStack_Space.html)

------
tveita
I tried to find projects or libraries that have implemented the split
hashtable, but searching for "split hashtables" gave few relevant results.

Does this go by other names?

------
noconflict
For sure read this as "Erotic data structures."

------
erichocean
split-hashtables look pretty cool.

------
itsgeneral
Some of the ones that didn't make it:

Smushed list. Size O(1). The smushed list is a list of variables (of the same
type), stored in a single variable of that type. To produce the smushed list,
simply XOR all the elements of the list together, then store. To get a value
back, simply XOR the smushed list by all the elements other than the one you
want. Smushing is also embarrassingly parallel (you can smush two halves
separately and then smush the results) so producing smushed lists is blazingly
fast.

Unlinked list. O(n). This is slightly faster than a linked list, and acts as a
"black box". Simply allocate nodes that are not linked to each other in any
way. The data normally stays out of the way of your program, but in case of a
core dump you can find it again. NOTE: If your language does reference-
counting this will not work. Get a real language that does what you say.

Search-bush. Search trees are good at bisecting data, but they are not really
conducive to a random walk for inspiration. Begin by constructing a binary
search tree, keeping track of all the nodes you've added, and simply add a
third, random, pointer to each node - have it point at a random node somewhere
in the tree. In the search algorithm, either follow the left, right, or random
node, depending on how much meandering you are interested in doing. The
journey is the destination.

~~~
prawks
Search-bush sounds pretty neat, thanks!

------
Evbn
compact-array seems like a specialization of a rope, with the tree structure
arbitrarily capped at depth=2, and without the benefits of O(log(n))
insertion.

~~~
qiemem
Capping the depth gives you O(1) gets and sets though. Note that compact-
arrays grow breadth-wise whereas ropes grow depth-wise. Breadth-wise growth
trades insertion speed for random access speed.

