Hacker News new | past | comments | ask | show | jobs | submit login

https://ann-benchmarks.com is pretty good but I agree it needs an update. I'd like to see modern embedding dimensions (384, 768, 1536, etc.) as well as filters and combined read/write latencies.



modern dimensions, yes

mixed workloads, also yes, especially in an "online" environment rather than the "batch mode" that ann-benchmarks does today

but most importantly, multicore -- ann-benchmarks is limited to a single core docker image which is absolutely ludicrous and I suspect is a significant reason that python-based systems do much better in their benchmark than you would expect from trying to deploy them under concurrent loads


Indeed! I'm just looking at JVector which I wasn't familiar with - looks cool. Have you tried it with the billion-scale competition? (not sure if that's still running)


sort of, there was the original bigann and then they followed up with a couple more specialized contests the following year, i think it's over now

~300M modern-sized vectors is pretty close to jvector's limit in a single index (the Cassandra layer can shard more) https://foojay.io/today/indexing-all-of-wikipedia-on-a-lapto...

that said I think Mariano (new jvector maintainer) is working on ways to handle larger datasets in a single index but I'm not sure where that is on his priority list




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: