
Alex: A ML-enhanced range index - ngaut
https://github.com/microsoft/ALEX
======
thesz
The paper [1] does not refer to COLA or Fractal Indices.

[https://dl.acm.org/doi/pdf/10.1145/3318464.3389711](https://dl.acm.org/doi/pdf/10.1145/3318464.3389711)

ALEX uses gaped arrays (arrays with holes) to amortize key insertion. This
wastes bandwidth - an exchange of read bandwidth wasted for speed of insertion
gained.

Cache-Oblivious Lookahead Arrays (hence, COLA [2]) waste bandwidth in a
different way - they use several layers of inserted data which are merged in
amortized way during insertion.

[2]
[https://en.wikipedia.org/wiki/Fractal_tree_index](https://en.wikipedia.org/wiki/Fractal_tree_index)
\- Fractal Tree Index is based on COLA paper.

That COLA thing allows for variable size keys, etc. And, generally speaking,
is quite fruitful and efficient structure, especially for a disk storage.

Because of this, I think this paper lacks very important part - a comparison
with modern write-and-read-efficient structure. Without this comparison, it is
not worth considering.

~~~
jasonwatkinspdx
ALEX copies the next entry to the right into the gaps, meaning there's no read
bandwidth wasted. It's also key to how they estimate the cost model to decide
when to split a model. I'd suggest rereading the ALEX paper with a bit more
care rather than using it just as a diving board to talk about unrelated
papers you're fond of. There's some really interesting and novel ideas here.

~~~
thesz
Please forgive me, but no to both.

Having gaps in an array is a waste of bandwidth (TLB, for example, memory
controller bandwidth is wasted too).

The COLA approach also can be adjusted - it is not necessary to have arrays
with sizes that are power of two, we can use any other multiplier, correcting
the necessity of merging. Many contemporary storage technologies can be
adjusted to the workload and data statistics. Fighting against half a century
(B+-trees were used by IBM in 1973) old technology with machine learning is,
at the very least, not honorary.

Please, excuse me for not reading ALEX paper more thoroughly. I think onus is
not on me in this case. So, my second no - I will not read ALEX paper again
until they compare their achievements with more or less current technology.

------
nn3
I had expected something really fancy for the predictive model, but it's a
really simple linear predictor that even I can understand )

That's somewhat comforting, but I guess that limits the properties of the key
space where it is useful?

[https://github.com/microsoft/ALEX/blob/d2fc3dc79766aa5c36adc...](https://github.com/microsoft/ALEX/blob/d2fc3dc79766aa5c36adc2772abd29b7fcf0e8fa/src/core/alex_base.h#L79)
[https://github.com/microsoft/ALEX/blob/d2fc3dc79766aa5c36adc...](https://github.com/microsoft/ALEX/blob/d2fc3dc79766aa5c36adc2772abd29b7fcf0e8fa/src/core/alex_base.h#L120)

------
yahyaheee
This is an amazing accomplishment, I could see the ideas here being applied to
more data structures

------
theblackcat1002
This will be quite useful if ported to database

~~~
random3
It is / will in various shapes. Check out Andy Pavlo's (CMU) work on self-
driving databases
[https://www.cs.cmu.edu/~pavlo/](https://www.cs.cmu.edu/~pavlo/)

