A more important question is why supabase went with one of the slowest [0] imple...

kiwicopple · on July 12, 2023

that's a fair question. it's our policy to support existing open source tools/companies as much as we can[0]

Andrew is working hard on pgvector, and his goals are aligned to ours (and the industry's). We'll continue to support him as long as that's true.

> one of the slowest

Benchmarks are simply a snapshot in time - there's nothing fundamentally flawed with pgvector, it just needs more support and some better indexes (which are already being worked on)

[0] https://supabase.com/docs/guides/getting-started/architectur...

mattashii · on July 13, 2023

>> [...] why supabase went with one of the slowest implementations

> Benchmarks are simply a snapshot in time - [...] it just needs [...] some better indexes

But the choice for IVF-flat indexes was made by the authors of pgvector, and that is one of the reasons why pgvector is slow: O(√n) optimal performance is just not great with large datasets, and this is especially true for the linked list approach chosen by the authors, as it results in large amounts of random IO.

You can't only implement a slow index and then shove the blame of the extension being slow on the index that was chosen by the extension authors: it is an inherent design issue. Sure, those can be fixed, but that doesn't mean it doesn't exist.

kiwicopple · on July 13, 2023

> it is an inherent design issue

it would definitely be a design issue if the extension could only support a single index type, but Andrew already has plans to implement HNSW in 0.5.0:

https://github.com/pgvector/pgvector/issues/27

mattashii · on July 13, 2023

Could you please not pull things out of context?

If you say "no, pgvector is not slow, it is the index that is slow", while pgvector has been designed with only a single index type, then it is a design issue of pgvector in that the wrong type of index was chosen as the included index type.

That there are plans to solve this issue doesn't mean it is not an issue; indeed it validates my point that pgvector currently has design flaws.

Telling (potential) users to wait for 6 months for performance doesn't mean much if there are more performant alternatives readily available right now; a starving person isn't helped by a meal made available half a world away.

Dhruva23 · on July 16, 2023

Didn't you (Neon) just announce pg_embedding and claim 20x performance over pg_vector even though pg_embedding just runs in memory and doesn't replicate the data?

Scary that database engineers do not understand how reckless that is.

KRAKRISMOTT · on July 13, 2023

> Andrew already has plans to implement HNSW in 0.5.0:

0.5.0 according to the GitHub issue tracker has been more than a year in the making and it is not any closer to release.