Hacker News new | past | comments | ask | show | jobs | submit login
Why Vector DBs Are the Wrong Abstraction – and What We Built Instead (topk.io)
22 points by Void_ 1 day ago | hide | past | favorite | 8 comments





Multi-tenancy: how do you keep it fair? If some dude in another team runs a crazy expensive query, does that just wreck everyone else’s perf? We’ve seen this happen in other distributed systems where ‘fair scheduling’ is more of a wish than reality.

We already mix vector and text search with Elastic + Vespa. What exactly makes your thing better? Not trying to be rude, but why should we care about Yet Another Search Engine?

disclaimer: co-founder of topk here :)

not rude at all, it's a common sentiment. TopK gives you all the capabilities under one roof -- text, vectors, filters, embeddings, re-ranking, all behind a single API (so single SDK, single vendor, single bill, single observability plane, etc). This simplifies your integration (less code on your part) and gives you more flexibility to define custom scoring rules (elastic-style) but over text + vectors + any other pre-computed factors. Take a look at [this](https://docs.topk.io/concepts/unified-retrieval#custom-scori...) docs page which goes into detail if you are interested.

I'd be curious to learn about your setup though. Vespa comes with lexical search so what made you choose this combo? Was Elastic already present in your stack before adopting Vespa?


DataFusion wasn’t a fit because it doesn’t do external indexes. But why not extend it?

We actually tried to extend DataFusion at first but ultimately decided against it since we can get most of the value by using Arrow and its compute kernels directly. DataFusion also executes filters in a way that rebuilds the underlying arrays (data copy) and requires strict schema, which is not a good fit for our schemaless document-oriented model. In the end, switching from DataFusion to Reactor gave us 3x better latencies.

so you’re using Arrow but also ‘custom layouts’ — what was missing? Arrow’s pretty flexible, and we’ve been able to get it to do some weird stuff in memory without having to hack it. what broke for you?

You’re saying vector DBs are the wrong abstraction, but companies keep throwing money at them. Why? Are they just slow to catch on, or are there legit cases where vectors actually make sense?

Founder of TopK here. There are legit use cases for vector-based retrieval (e.g. semantic search, recommendations, multi-modal search, etc.) but that only requires supporting vectors as a data type, not building the whole database around vectors as a first-class citizen (which is what vector DBs do). In practice, you also want to combine multiple vectors, text filters, and metadata alongside custom scoring functions to optimize relevance in your domain, which is not possible with a database built around a vector index.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: