We designed EVA from scratch for managing unstructured data (e.g., video, audio, images, etc.). EVA leverages relational database systems to manage structured data and widely-used libraries to manage feature embeddings (FAISS library [1]). We aim to leverage decades of experience in relational database systems and reduce risk in production deployment.
Do you support weighted similarly search? I.e. when I have several embeddings and need to put a weight factor in front of the cosine similarity when I’m performing a query?
Faiss seems like an excellent choice. How do you get the vectors into it from the database? Or are they stored separately? I’m currently using pgvector and it’s not GPU optimized. But the advantage is that it enjoys the same levels of data protection as the rest of the database.
Actually, are there any vector similarity search query sample? I see the feature extractor, but can’t seem to find any similarity search samples.
EVA does not currently support a weighted similarity search. We are working on creating a notebook to illustrate similarity queries. But, EVA already supports the queries of this form:
-- Step 1: Extract objects in Reddit images using the YOLO object detector
CREATE TABLE reddit_dataset_object (name, data, bboxes)
AS SELECT name, data, labels FROM reddit_dataset
JOIN LATERAL UNNEST(YoloV5(data)) AS Obj(labels, bboxes, scores);
-- Step 2: Build index over features extracted using SIFT
CREATE INDEX reddit_sift_object_index
ON reddit_dataset_object (SiftFeatureExtractor(Crop(data, bboxes)))
USING HNSW;
-- Step 3: Retrieve the top 10 most similar images
SELECT id FROM reddit_sift_object_index
ORDER BY Similarity(SiftFeatureExtractor(Open(”“input_img_path.jpg”)),
SiftFeatureExtractor(data))
LIMIT 10;
EVA directly persists the feature vectors in a FAISS index. It does not use a relational database system for this purpose. FAISS supports retrieving the original vector through ID (required for similarity search).
We would love to jointly explore how to support such weighted similarity search queries. Please consider opening an issue with more details on your use case.
We designed EVA from scratch for managing unstructured data (e.g., video, audio, images, etc.). EVA leverages relational database systems to manage structured data and widely-used libraries to manage feature embeddings (FAISS library [1]). We aim to leverage decades of experience in relational database systems and reduce risk in production deployment.
[1] https://github.com/facebookresearch/faiss