Just put Zvec vs LanceDB vs Qdrant through the paces on a 3 collection (text only) 10k per collection dataset.
Average latency across ~500 queries per collection per database:
Qdrant: 21.1ms
LanceDB: 5.9ms
Zvec: 0.8ms
Both Qdrant and LanceDB are running with Inverse Document Frequency enabled so that is a slight performance hit, Zvec running with HNSW.
Overlap of answers between the 3 is virtually identical with same default ranking.
So yes, Zvec is incredible, but the gotcha is that the reason zvec is fast is because it is primarily constrained by local disk performance and the data must be local disk, meaning you may have a central repository storing the data, but every instance running zvec needs to have a local (high perf) disk attached. I mounted blobfuse2 object storage to test and zvec numbers went to over 100ms, so disk is almost all that matters.
My take? Right now the way zvec behaves, it will be amazing for on-device vector lookups, not as helpful for cloud vectors.
Author here. Thanks for putting Zvec through its paces and sharing such detailed results—really appreciate the hands-on testing!
Just a bit of context on the storage behavior: Zvec currently uses memory-mapped files (mmap) by default, so once the relevant data is warmed up in the page cache, performance should be nearly identical regardless of whether the underlying storage is local disk or object storage—it's essentially in-memory at that point. The 100ms latency you observed with blobfuse2 likely reflects cold reads (data not yet cached), which can be slower than local disk in practice. Our published benchmarks are all conducted with sufficient RAM and full warmup, so the storage layer's latency isn't a factor in those numbers.
If you're interested in query performance on object storage, we're working on a buffer pool–based I/O mode that will leverage io_uring and object storage SDKs to improve cold-read performance. The trade-off is that in fully warmed‑up, memory‑rich scenarios, this new mode may be slightly slower than mmap, but it should offer more predictable latency when working with remote storage. Stay tuned—this is still under development!
If you think that launching your app in a another region is hard, there is currently a case being evaluated in Europe which is evaluating the argument that even if the data never leaves the EU and the provider is a European entity but affiliated with or a subsidiary of a US company, that this is stil considered a violation.
So unfortunately just moving hardware locations may be insufficient, even forming a new entity won't suffice.
In my humble opinion we are witnessing the nationalization of the Internet, in the name of good intent, but eventually the risk vs reward calculation of doing business across the Atlantic (for either side) will tilt in the direction of avoiding the risk.
Although it could be argued that "good, laws are made for people not for businesses" I'd counter that a great deal of the free information published by US companies and non-profits will become unavailable in the EEA.
I'm hopeful that the DPAs and courts in Europe will decide to balance these concerns.
FWIW: I run one of the more popular data privacy platforms, Osano, so this is an area we track very closely and which is near and dear to my heart. I built Osano as a Public Benefit (and certifeid B-Corp) to try and prevent the nationalization of the Internet by giving businesses an easy way to respect the rights of their customers & visitors.
I mean, I assume the US are interested in this exchange as well. If they are, they could lead by example and reform the CLOUD act or implement some more effective data protection regulations themselves.
We aren't in this mess because the EU somehow wants to nationalize the internet, we are because with current legislation, US companies can be forced to hand over whatever data they posess, no matter where it's stored.
Not a lawyer, but my current understanding of the current events is more or less the EU saying "if it's subject to the CLOUD act, it violates the GDPR". That's a pretty clear indication of what's wrong.
Wow, this is super ugly. Not to hijack the thread, but we just launched yesterday at https://www.privacymonitor.com - The stuff we find about what companies bury in their privacy policies is pretty frightening.
Average latency across ~500 queries per collection per database:
Qdrant: 21.1ms LanceDB: 5.9ms Zvec: 0.8ms
Both Qdrant and LanceDB are running with Inverse Document Frequency enabled so that is a slight performance hit, Zvec running with HNSW.
Overlap of answers between the 3 is virtually identical with same default ranking.
So yes, Zvec is incredible, but the gotcha is that the reason zvec is fast is because it is primarily constrained by local disk performance and the data must be local disk, meaning you may have a central repository storing the data, but every instance running zvec needs to have a local (high perf) disk attached. I mounted blobfuse2 object storage to test and zvec numbers went to over 100ms, so disk is almost all that matters.
My take? Right now the way zvec behaves, it will be amazing for on-device vector lookups, not as helpful for cloud vectors.