
Replex: A Scalable, Highly Available Multi-Index Data Store [pdf] - craigkerstiens
https://www.cs.princeton.edu/~mfreed/docs/replex-atc16.pdf
======
jandrewrogers
The idea of organizing replicas differently to effect secondary indexing is
old, it is used in some commercial data warehouse products as I recall. The
reason you don't see it used very often in practice is that it tends to scale-
out poorly, particularly if the workload is mixed.

Spatial decomposition to effect secondary indexing (HyperDex kinda sorta does
that) performs relatively poorly at smaller scales, as would be expected in
theory, but in very large scale-out systems the selectivity increases and
converges at the limit on what you would expect if every dimension was
indexed, minus the high overhead and poor performance of secondary indexing at
that scale.

The performance crossover point between secondary indexing and spatial
decomposition is somewhere within a couple orders of magnitude of a terabyte
of data in my experience, depending on the data and workload. It is difficult
to measure these effects across unrelated implementations because orthogonal
storage engine characteristics tend to dominate measured performance. If you
measure it in-memory on large clusters in a pure way (remove the unrelated
database-y stuff), the scaling and efficiency curves are much more obvious.

~~~
eternalban
> Spatial decomposition

Would like to learn more about this. Do you have recommended links?

------
mankurt
Here is a short summary of the paper.
[http://muratbuffalo.blogspot.com/2016/07/replex-scalable-
hig...](http://muratbuffalo.blogspot.com/2016/07/replex-scalable-highly-
available-multi.html)

