
Z-order curve usage to decrease dimensionality to 1 - delidumrul
https://ssahinkoc.blogspot.com/2017/06/encoding-multi-dimensional-value-into-1.html
======
dweekly
Here is a writeup on Google's S2 library for considering addressing the
surface of the Earth as 1D, using Hilbert Curves.

[http://blog.christianperone.com/2015/08/googles-s2-geometry-...](http://blog.christianperone.com/2015/08/googles-s2-geometry-
on-the-sphere-cells-and-hilbert-curve/?a=2)

And 2015 HN thread:
[https://news.ycombinator.com/item?id=10066616](https://news.ycombinator.com/item?id=10066616)

~~~
MarkMMullin
My favorite has always been Serpinski curve - the math behind Philip Jose
Farmer's Riverworld :-) Here, you can print one yourself :-)
[https://www.thingiverse.com/thing:622627](https://www.thingiverse.com/thing:622627)

------
jhj
Usually a Morton ordering is used for things like improving average memory
locality of N-dimensional data (e.g., loop iteration order, data layout, ...).
But it is average locality that is being improved, because of huge jumps
between many neighbors.

In a small number of dimensions, without knowing what search algorithm is
being used, this is just more work than comparing the original values. It
doesn't mention what "k-NN algorithm" is being used, beyond brute force
search.

Lossily compressing N-dimensional data (from 2 to 1000s of dimensions) into a
representation that requires fewer bits can be done via quantization as well,
either scalar quantization, vector quantization (aka k-means) or product
quantization, if your data has known statistics.

It also matters if you are building a static data structure that is queried
many times, versus one that needs continual updating.

------
mcphage
As the first commenter on the site pointed out, the Hilbert Curve is probably
a better choice
([https://en.m.wikipedia.org/wiki/Hilbert_curve](https://en.m.wikipedia.org/wiki/Hilbert_curve))

~~~
hex12648430
The big advantage of Z-order curves is that the addressing computation is very
cheap, which is why it's used a lot in computer graphics.

~~~
leni536
While I agree that Z-order curves are simpler, but it's fast to calculate
Hilbert curves on modern CPUs too. Just to self plug:

[https://github.com/leni536/fast_hilbert_curve](https://github.com/leni536/fast_hilbert_curve)

I only implemented the index->XY calculation yet. It compiles to 36
instructions without any branches and takes up 86 bytes.

[https://github.com/leni536/fast_hilbert_curve/wiki/How-
effic...](https://github.com/leni536/fast_hilbert_curve/wiki/How-efficient-is-
it%3F)

I think I can apply the same tricks for the inverse function too.

~~~
m1el
But using the same set of instructions, z-order encoding and decoding is 8
instructions (5 if you exclude size conversion and return):

    
    
        zorder64_inv:
            movabsq $0x5555555555555555, %rax
            pextq   %rax, %rcx, %rdx
            shrq    %rcx
            pextq   %rax, %rcx, %rcx
            shlq    $32, %rcx
            movl    %edx, %eax
            orq     %rcx, %rax
            retq
    
        zorder64:
            movl    %ecx, %eax
            movabsq $0x5555555555555555, %r8
            pdepq   %r8, %rax, %rcx
            movl    %edx, %eax
            pdepq   %r8, %rax, %rax
            addq    %rax, %rax
            orq     %rcx, %rax
            retq

~~~
leni536
Nice! Now I wonder when 36 vs 8 machine instructions become a bottleneck. I
have seen applications of space-filling curves in quasi Monte Carlo
integration, it could be potentially significant there.

------
DocSavage
Space-filling curves have been studied as a way to reduce N-dimensional space
to 1-D. They are very useful for applications like maximizing sequential
access in a N-D datastore. A recent SIGMOD paper analyzed space-filling curves
impact on data access:
[http://dl.acm.org/authorize.cfm?key=N37709](http://dl.acm.org/authorize.cfm?key=N37709)

QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and
Skew-Tolerant Space-Filling Curves Shoji Nishimura (NEC Corporation); Haruo
Yokota (Tokyo Institute of Technology)

It discusses C-Curve, Z-Curve, and Hilbert curves.

------
albipenne
Here's a great write up on how you can use Z-curves to do multidimensional
sorting using redis otherwise 1 dimensional sorted set datastructure. It's one
of the best hands on examples on a way to solve real problems using this tech
within your stack.

[https://redis.io/topics/indexes](https://redis.io/topics/indexes)

------
zocoi
Can someone help explaining why this hash method could improve distance
calculation for k-NN? What does it improve compared with Geohash or k-d tree
structure?

~~~
imron
> What does it improve compared with Geohash

From what I can tell, it's the exact same algorithm used by Geohash.

~~~
mmalone
Yep. Geohash is just a fancy name for a z-curve.

------
minflynn
I've used Z-order curves and related curves in Neuroevolution research. It
allows conversion of an adjacency matrix to a spatial subtrate representation
that preserves locality as it grows (Thus going from 1 dimension to 2 or 3).
This technique is an alternative for HyperNEAT or ES-HyperNEAT and experiments
demonstrate higher performance that ES-HyperNEAT on the modular retina task
problem.

------
mmalone
If you're tempted to use space filling curves (z-curves, hilbert curves, etc.)
you should take a look at simple multi-dimensional data structures as an
alternative.

As a couple of people have mentioned in this thread, space filling curves
aren't great at preserving locality (i.e., two points that are "close
together" in two dimensional space might end up being "far apart" in one
dimensional space, and vice versa). A k-d tree is easy to code up and, in
general, will be more efficient for queries like k-NN than dimensionality
reduction because it's better at preserving locality.

There are also good libraries for multi-dimensional data structures for pretty
much any mainstream language.

------
aeroevan
AKA a geohash (for the lat/lon example), but can easily be extended to other
bounded dimensions.

------
Joboman555
This stuff mostly goes over my head, but I'd like to understand it. In the
original method, why is op concatenating the two positions into one number,
rather than just storing the lat/lon pair and using something like Euclidean
distance?

------
leitasat
As the commentators on the website mentioned, this method does not preserve
the closeness of the points (when going from 2D to 1D), so it is unclear how
can it help the author.

~~~
mmalone
True, but it is much better at preserving closeness than the alternative he
mentioned (lexicographical sort).

------
teddyh
So how would [https://xkcd.com/195/](https://xkcd.com/195/) look using a
z-curve instead?

~~~
VMG
I'd imagine it would look less continuous: some of the areas would be ripped
apart and spread across the map.

