Only having skimmed the work, read the following as a somewhat educated guess.
I think one could see this similar to repeated applications of interpolation search. If you are looking for x in a sorted array of n numbers between a and b, then index (x - a) / (b - a) * (n - 1) would be a good guess assuming uniform distribution of the numbers.
But as one can not assume a uniform distribution in general, one does that repeatedly. The first interpolation leads to better interpolation coefficients for the relevant subrange, which may lead to even better interpolation coefficient for an even smaller subrange until one eventually finds what one was looking for.
If there is no structure in the data that can be exploited, this degenerates into a more or less ordinary tree as we one certainly fit a line through two points, but if at some level a larger range of the data can be well approximated with the interpolation function, then it can save space and search time because one can get close to all the values in the range with only one set of interpolation coefficients and only one interpolation.
Isn't it sometimes better to assume data has a structure rather than having less performance but knowing it works equally well with unstructured semi-random data?
But if you are inventing or implementing a general purpose data structure, then you can not assume any structure by definition, especially not when it comes to worst case performance. On the other hand you can certainly include special handling of common special cases that will improve the performance or even come up with special data structures that only work under specific circumstances but then outperform general purpose solutions. As always it is a trade-off, here between generality and performance.
I think one could see this similar to repeated applications of interpolation search. If you are looking for x in a sorted array of n numbers between a and b, then index (x - a) / (b - a) * (n - 1) would be a good guess assuming uniform distribution of the numbers.
But as one can not assume a uniform distribution in general, one does that repeatedly. The first interpolation leads to better interpolation coefficients for the relevant subrange, which may lead to even better interpolation coefficient for an even smaller subrange until one eventually finds what one was looking for.
If there is no structure in the data that can be exploited, this degenerates into a more or less ordinary tree as we one certainly fit a line through two points, but if at some level a larger range of the data can be well approximated with the interpolation function, then it can save space and search time because one can get close to all the values in the range with only one set of interpolation coefficients and only one interpolation.