Facebook AI Similarity Search (Faiss)

tmp65535 · on Nov 21, 2021

This app, now almost 5 years old (warning, NSFW):

uses a custom brute force search in CUDA, based on bitonic sort (https://en.m.wikipedia.org/wiki/Bitonic_sorter) and a fractional norm distance metric (f=0.5, i.e., sqrt):

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23...

In 2017, on a low-end GPU, the indexing took about 10 minutes of 100% utilization, for a dozen patches covering each of 50,000 images (feature vectors were 1,024 32-bit floats).

After that, the app just queries a precomputed list (256 precomputed matches per query) depending on where in the image the user pressed. Even with hundreds of simultaneous users, it can hardly keep a $5 Linode busy.

Good times...

nnoitra · on Nov 21, 2021

What's the relation between Similarity Search and generating pictures of naked women using Neural Nets?

tmp65535 · on Nov 21, 2021

The app allows you to search through a large collection of images by "pressing" parts of an image.

The images are not generated.

gregsadetsky · on Nov 20, 2021

I’ve used annoy [0] for vector-similarity search before. Does anyone know the differences between Faiss and annoy? Thanks!

[0] https://github.com/spotify/annoy

ahurmazda · on Nov 21, 2021

There are benchmarks you can consider (with your use case and setup in mind of course)

http://ann-benchmarks.com/

currentsapi · on Nov 21, 2021

Here's a list of vector similarity search projects: https://github.com/currentsapi/awesome-vector-search, you can find other alternative method than faiss. Disclosure : I am the maintainer of this lists

nefitty · on Nov 21, 2021

If a government had many streams of temporal data, eg coordinates of people's locations, is it just a matter of no cache availability and computation that would slow this down?

If neither of those were concerns, I imagine applying a heuristic on points through time to predict where they would be next. If it's a sleep range, it would expect the current "pixel" on the next step until 9am. That might increase memory and computation costs, but increase speed with a reduction in accuracy.

If you're in the government blink once to confirm.

otherotherchris · on Nov 21, 2021

In 99.999% of cases "The Government" has multiple primary keys to choose from. IMEI, bluetooth mac, wifi mac, LNPR, credit card payment records, etc.

Facebook data merely verifies an identity against that key.

H8crilA · on Nov 21, 2021

Is the Voronoi partition the best/most commonly used solution in practice? Or are there better tricks for this problem?

jamesbriggs · on Nov 22, 2021

Voronoi is super popular and typically is a component of solutions used in practice, we did write another article that covers what some of those might look like: https://www.pinecone.io/learn/composite-indexes/

TLDR Voronoi + HNSW is popular