
Milvus – An Open-Source Vector Similarity Search Engine - maximente
https://milvus.io/
======
peter_l_downs
I'd be curious how they implement updating. AFAICT this is the thorniest part
of working with existing open source solutions. When working with ANNOY in the
past I've had data small enough to be able to recompute the full index in the
background every few seconds in a background process and then swap in the
built index to the process serving similarity queries.

(you can see the VERY "research quality" code on Github, here's a decent
starting place
[https://github.com/hyperstudio/spectacles/blob/master/specta...](https://github.com/hyperstudio/spectacles/blob/master/spectacles/nndb/server.py#L51))

EDIT: from the insertion docs
[https://milvus.io/docs/guides/milvus_operation.md#Insert-
vec...](https://milvus.io/docs/guides/milvus_operation.md#Insert-vectors-into-
a-table) it seems that they still ask you to re-build your indices after you
insert vectors, although in some cases they can tell that they need to re-
build the indices for you. Looks like the major value adds here are
potentially shifting computation to the GPU and building multiple indices.
I'll certainly evaluate this next time I'm building a project around vector
search.

~~~
gujun720
Milvus allows users to append vectors. Vectors are stored in multiple file
slices. When a file slice reaches the threshold, Milvus will build the index
for that file slice, and new data will be inserted into a new file slice. For
details, please refer [https://medium.com/@milvusio/managing-data-in-massive-
scale-...](https://medium.com/@milvusio/managing-data-in-massive-scale-vector-
search-engine-db2e8941ce2f)

We are now working on the vector deletion. Hopefully will be ready by the end
of 1Q this year.

~~~
peter_l_downs
If I append a single new vector, will it show up in search results without me
needing to ask for the index to be rebuilt? Can i update an existing vector
without having to ask for the index to be rebuilt?

EDIT: from reading the linked article, it seems like newly inserted vectors
will be queried using brute force. Very interesting idea!

~~~
gujun720
Correct, new vectors will first be searched thru brute force until the index
is created on that file slice.

------
tlack
Another option in this very interesting space is GNES[1], which attempts to do
the encoding/decoding on its own, rather than just working with
feature/embedding vectors.

[1] [https://gnes.ai/](https://gnes.ai/)

------
setib
Does anyone know how to combine vector similarity search with more
conventional field-based search (using elasticsearch for example)? For
example, given a set of labeled images, a user should be able to compose a
query using a combination of filters (like size or description) along with a
reference image (the vector).

~~~
gujun720
We are working on this feature which allows use to perform hyper search
(attributes plus feature vectors). And you can code your scoring rules.

Again, hopefully be ready by the end of 1Q this year.

------
bra-ket
how does it compare to state-of-the art? ([https://github.com/erikbern/ann-
benchmarks](https://github.com/erikbern/ann-benchmarks))

~~~
gujun720
We have some test reports in [https://github.com/milvus-
io/milvus/tree/master/tests](https://github.com/milvus-
io/milvus/tree/master/tests)

At this moment, the IVF indecies are based on FAISS. So the performance is the
same as Faiss.

IVF_SQ8H is the reconstruction from Faiss IVF SQ8. Performance is much better,
but you need GPU for it.

We provide benchmark test procedures and tools.

Please check this: [https://github.com/milvus-
io/bootcamp/tree/master/EN_benchma...](https://github.com/milvus-
io/bootcamp/tree/master/EN_benchmark_test)

~~~
rainmanwy
Cool stuff! Very easy to use and good examples to getting start.

------
gravypod
There's not a lot of information on the site about the architecture or storage
solutions used. Do the authors have more info about this space?

~~~
gujun720
You may check our Medium site. We will post more tech details.

[https://medium.com/@milvusio](https://medium.com/@milvusio)

------
bobosha
Great to see another ANN tool available. FAISS and SPTAG were good, but this
appears to be much better. Not sure if this supports "online" learning i.e. is
a training phase required?

~~~
gujun720
Please check [https://medium.com/@milvusio/managing-data-in-massive-
scale-...](https://medium.com/@milvusio/managing-data-in-massive-scale-vector-
search-engine-db2e8941ce2f)

It explains how Milvus managing vectors.

~~~
ffast-math
> "As each vector takes 2 KB space, the minimum storage space for 100 million
> vectors is about 200 GB"

Why are you not quantizing the vectors when you insert them? Bolt [1] and
Quicker-ADC [2] make 10-100x compression basically free for approximate search
(and also get you ~100x compression roughly 10x faster querying within a
partition....)

[1] [https://github.com/dblalock/bolt](https://github.com/dblalock/bolt)

[2] [https://github.com/technicolor-research/faiss-
quickeradc](https://github.com/technicolor-research/faiss-quickeradc)

~~~
gujun720
200 GB is the size of original vectors. When creating index, Milvus supports
IVF SQ8 and IVF PQ ADC.

Based on our users experience, SQ8 is the most balanced one at this moment.
SQ8 provides 8x compression, higher accuracy and better performance.

------
azinman2
What are good answers to this in the embedded space... eg mobile?

~~~
gujun720
Milvus could run on arm CPU. We ported it to Nvidia Jetson NANO and Raspberry
PI 4 (4GB mem) so far.

Most people told us running Milvus on arm looked cool but they were not sure
if they want to do this...

Please tell us your requirements and scenarios on arm. It will really help.

------
kateshao0510
any recommendation on the machine learning platforms to use?

~~~
gujun720
It's not about the ML platform.

It's about the ML scenarios. If you want to search thru a huge amount of
unstructured data after vectorization tech (like deep learning), Milvus will
help you a lot.

Our users use Milvus in below scenarios: 1\. Chemical molecules analysis,
searching SMILE format vectors 2\. Image retrieval type application, for
example shopping website 3\. NLP 4\. Recommendation system 5\. and more, we
are collecting users' feedback

