A more important question is why supabase went with one of the slowest [0] implementations instead of building a better plugin especially since you are a VC funded company that's hardly lacking in cash. Simply gluing together a few random open source extensions is not great for developer experience.
that's a fair question. it's our policy to support existing open source tools/companies as much as we can[0]
Andrew is working hard on pgvector, and his goals are aligned to ours (and the industry's). We'll continue to support him as long as that's true.
> one of the slowest
Benchmarks are simply a snapshot in time - there's nothing fundamentally flawed with pgvector, it just needs more support and some better indexes (which are already being worked on)
>> [...] why supabase went with one of the slowest implementations
> Benchmarks are simply a snapshot in time - [...] it just needs [...] some better indexes
But the choice for IVF-flat indexes was made by the authors of pgvector, and that is one of the reasons why pgvector is slow: O(√n) optimal performance is just not great with large datasets, and this is especially true for the linked list approach chosen by the authors, as it results in large amounts of random IO.
You can't only implement a slow index and then shove the blame of the extension being slow on the index that was chosen by the extension authors: it is an inherent design issue. Sure, those can be fixed, but that doesn't mean it doesn't exist.
it would definitely be a design issue if the extension could only support a single index type, but Andrew already has plans to implement HNSW in 0.5.0:
If you say "no, pgvector is not slow, it is the index that is slow", while pgvector has been designed with only a single index type, then it is a design issue of pgvector in that the wrong type of index was chosen as the included index type.
That there are plans to solve this issue doesn't mean it is not an issue; indeed it validates my point that pgvector currently has design flaws.
Telling (potential) users to wait for 6 months for performance doesn't mean much if there are more performant alternatives readily available right now; a starving person isn't helped by a meal made available half a world away.
Didn't you (Neon) just announce pg_embedding and claim 20x performance over pg_vector even though pg_embedding just runs in memory and doesn't replicate the data?
Scary that database engineers do not understand how reckless that is.
[0] https://github.com/erikbern/ann-benchmarks