This is the same conversation we have when we add any specialized data store. Shortly after MongoDB, Cassandra, Couchbase, Elasticsearch, and Solr, critics asked couldn’t we just contort RDBMS to handle documents for flexible queries or searches. The history tells a different story. The scaling properties and feature expectations of the specialized stores culminating in tens of billions of market cap, and much more in terms of infrastructure spend.
Could the incumbents simply tack on vector features? Sure, that’s the JSONB story of Postgres. It’s the regex story of all RDBMS offerings after Oracle’s acquisition of Endeca suggested a real commercial opportunity for search-specific databases.
Vector-first storage engines have a place in the market as much as the tack on solutions do. PGVector will be good enough for most users, and Weaviate (or Milvus) will be better suited for the most ambitious, or those seeking the best dev ex.
Benchmarks/data volumes aside, PGVector (or similar extensions to more general purpose data stores) is a much more palatable way for an organization to explore adding vector search functionality/features to their product(s) than adding new technology to their stack.
There’s definitely some hype in any new data infrastructure trend (see: graph databases, time-series databases). But the problem vector DBs solve—retrieving context efficiently for LLMs—seems real enough. Maybe the question is whether LLM-native applications will be big enough to sustain a separate category. Is the industry moving toward general-purpose DBs incorporating vector search? or specialized vector DBs will still have a place in some time?
I know I'm biased on this, but it has always seemed obvious that vector search would be subsumed into other databases. At MongoDB we've made it easy to manage operational and vector data in the same place, simplifying data architecture. While I think we do this better than others, it's also true that other vendors and communities (like Postgres with pgvector) have added vector capabilities and, frankly, always were going to do so. It's just a natural extension. I don't want to be dismissive of purpose-built vector databases, but they're going to have to evolve to suit more general-purpose workloads. This could happen as Neo4j has done, e.g., making graphs a more general way of thinking about data. It will be interesting to see how it plays out.
Could the incumbents simply tack on vector features? Sure, that’s the JSONB story of Postgres. It’s the regex story of all RDBMS offerings after Oracle’s acquisition of Endeca suggested a real commercial opportunity for search-specific databases.
Vector-first storage engines have a place in the market as much as the tack on solutions do. PGVector will be good enough for most users, and Weaviate (or Milvus) will be better suited for the most ambitious, or those seeking the best dev ex.