Hacker News new | past | comments | ask | show | jobs | submit login

As someone who has been using pgvector for a while and is vaguely curious about alternatives without having the bandwidth to investigate -- is there anything out there that offers truly differentiated advantages over pgvector? I'm extremely wary of non-OSS solutions in this area, it seems ripe for enshittification and attempts at vendor lock-in.



I use PgVector myself but here's the advantages to a true vector db.

- Vectors are massive data wise. In our current production database they take up 95% of the memory - should they be stored separately?

- Better support for easily re-embedding, hybrid search, certain RAG workflows

- Stronger performance once you're dealing with millions of vectors.

I would still stick with PgVector until you're dealing with non trivial scale.


I'd also start with pgvector (it's easy to switch), but the limitations around hybrid search and filtering + ANN are real and if you're doing any kind of RAG-like thing it's worth being aware of them upfront. pgvector is also an open-source project with way less manpower behind it than a bunch of venture-backed companies, so while you can expect it to pick up important features, it takes much longer (support for HNSW indices was a good example).


What is taking the most time at scale? Is this ingest, index build or lookups ?


ingest and index build can take time


What volumes are we talking about.

There are ways to speed things up dramatically. Index build just became multithreaded (see above).

We have ideas on what to do with ingest.

Also do you interest from S3 ?


np.dot is also multi-threaded, based on BLAS


If you're still in the "millions of documents" scale range, then PostgreSQL on a beefy EPYC can probably handle everything fast enough so that it doesn't make sense to spend engineering time on using a vector db which would only shave off a few ms in latency.


No




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: