I really don't want another database. I just want to have a solution built in for Postgres, and more specifically, RDS, which we use. I know there will be some extra difficulty that I will have to manage (e.g. reindexing to a new model that is outputting different embeddings), but I really don't want another piece of infrastructure.
If anyone from AWS/Google/Azure is listening, please add pgvector [1] into your managed Postgres offerings!
Totally with you on that - it's annoying to need multiple pieces of infrastructure, especially for solo developers or small teams. Often times, you want to filter further based on scalar fields/metadata such as timestamps, strings, numeric values, etc...
That's why we built attribute filtering into Milvus via a Mongo-esque interface. No SQL and not as performant as an RDBMS, but it's an option: https://milvus.io/docs/hybridsearch.md
Yes exactly. My company has asked AWS if they will be adding support for pgvector for rds but they haven't been able to confirm if that will happen any time soon.
If the vectors are in the same database as the tabular/structured data then text to sql applications of llm's are so much more powerful. The generative models will then be able to form complex queries to find similarity as well as perform aggregation, filtering and joining across datasets. To do this today with a separate dedicated vector db is quite painful.
You could write a FDW that reads/writes to a vector database using postgres id tagged vectors. You can write to it from postgres, reference it in queries, join on it, etc. That cuts out a lot of the pain from having separate databases, the only remaining issues are additional maintenance overhead and hidden performance cliffs.
With Postgres, you can do almost everything, also a full-text search, but you still have Elasticsearch, Mejlisearch, etc when you need performance and advanced features. The multitool approach is suboptimal in most cases.
In small teams, the infrastructure is often not able to be fully utilized, so performance is not an issue. However, feature richness allows this team to deliver higher-level feature faster. Think early stage startup (one or two engineers) or hairdressers-like business (they use a ready-made framework that targets a popular database and limits its feature to have a wide range of users). As a result, you can have a lot of such business creating a very long tail.
SaaS products are infrastructure. Each different SaaS used is another piece that needs to be connected to the system and maintained; it thus becomes part of the system infrastructure. Each new SaaS piece has costs (time and money) associated with it.
That said, it's up to the individual company to decide if the added cost is worth it. Just because the cost exists doesn't mean it isn't worth it.
If anyone from AWS/Google/Azure is listening, please add pgvector [1] into your managed Postgres offerings!
1. https://github.com/pgvector/pgvector