How many dimensions can you use this with efficiently? Could you use this to store embeddings from machine learning models and find nearest neighbours of various items, like aNNoy or faiss does?
Postgres has a extension called cube (https://www.postgresql.org/docs/9.6/cube.html) that can be used for up to 150 dimensions (which is a compile-time limit, you can have more if you compile Postgres yourself).
It's a pretty cool extension that does distance between points, intersections between n-dimensional cubes (hence the name), different distance metrics etc.
It'd be perfect for storing and searching through large amounts of n-dimensional embeddings, I'm guessing it's used for that already.
On one index I'm using OPQ16_64,IVF262144_HNSW32,PQ16 with 128 dimensions initially.
1024 dimensions is a lot! Could you elaborate on what application requires that many? If it's a DNN layer output, your data must be sparse, so dimensionality reduction won't affect your recall if tuned properly.
It's actually a DNN layer output. I haven't considered dimensionality reduction, yet. Thanks for pointing my there, I'll look into it. Probably thats the better way to go.
I had the same initial thought based on the title. Unfortunately, the answer is no.
The article discusses a low-dimensional KNN problem. The curse of dimensionality guides intuition that the methods here likely will not apply to extremely high-dimensional problems.
faiss actually comes with a lot of excellent documentation that describes the problems unique to KNN on embedding vectors. In particular, for extremely large datasets, most of the tractable methods are approximations that make use of clustering, quantization, and centriod-difference tricks to make computation efficient.
this is a special case of reading the internal GIST spatial index used in PostGIS by the implentation code for operator '<->' , so no joy for N-dimensional search..
You can use a python library inside PostgreSQL using plPython, but supplying the coordinates to the evaluation is not going to be as compact and specialized as this