https://ann-benchmarks.com is pretty good but I agree it needs an update. I'd li...

jbellis · 2025-05-30T20:29:29 1748636969

modern dimensions, yes

mixed workloads, also yes, especially in an "online" environment rather than the "batch mode" that ann-benchmarks does today

but most importantly, multicore -- ann-benchmarks is limited to a single core docker image which is absolutely ludicrous and I suspect is a significant reason that python-based systems do much better in their benchmark than you would expect from trying to deploy them under concurrent loads

binarymax · 2025-05-30T20:36:46 1748637406

Indeed! I'm just looking at JVector which I wasn't familiar with - looks cool. Have you tried it with the billion-scale competition? (not sure if that's still running)

jbellis · 2025-05-30T21:17:57 1748639877

sort of, there was the original bigann and then they followed up with a couple more specialized contests the following year, i think it's over now

~300M modern-sized vectors is pretty close to jvector's limit in a single index (the Cassandra layer can shard more) https://foojay.io/today/indexing-all-of-wikipedia-on-a-lapto...

that said I think Mariano (new jvector maintainer) is working on ways to handle larger datasets in a single index but I'm not sure where that is on his priority list