Apache Druid is pretty amazing tool, with one assumption - your data has an even...

sa46 · on March 31, 2020

The Procella[1] paper took a different approach to experiments. They embedded an experiment ID array in table rows and indexed the rows by experiment ID with a postings list.

Replex looks really neat. I've only skimmed the Acolyer summary so far. What's the difference between a replex and multiple projections of data with different partitions and sort-orders used in C-store and Vertica?

[1]: http://www.vldb.org/pvldb/vol12/p2022-chattopadhyay.pdf

gopalv · on March 31, 2020

> What's the difference between a replex and multiple projections of data with different partitions and sort-orders used in C-store and Vertica?

The 3-replicas for failure tolerance are reused, so that the first 3 ordering projections don't add storage costs to the system.

Also the paper doesn't mention it, but the rebuild traffic is also better distributed on failure if the failure of a replica causes a rebuild that draws from a wider set of machines rather than a single one.