Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It makes terrible operational sense. What are the HA/DR, sharding, replica, and backup strategies and tools for pg_vector? What are the embedding integration and relevance tools? What are the reindexing strategies? What are the scaling, caching, and CPU thrashing resolution paths?

You're going to spend a bunch of time writing integrations that already exist for actual search engines, and you're going to be stuck and need to back out when search becomes a necessity rather than an afterthought.




The HA/DR, Sharing, Replica and Backup would all be the same as before. Its all in PG so you use the existing method.

If you have two systems, then you have two (unique) answers for HA,DR,Shard,Replica,Backup - the PG set and the Vespa.

That's more complicated, from an operational perspective.

PG FTS is quite good, and there are in-pg methods that can improve it.

And, from experience, when it's item to upscale to Solr/ES/etc it's not a very heavy lift.


What makes most operational sense is going to depend on your context.

From my vantage point, you’re both right in the appropriate context.


What if you don't need those things yet and you just have some embeddings you want to query for cosine similarity? A dedicated vector database is way, way overkill for many people.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: