As part of the development of a system that requires searching by image, we needed to compute feature vectors for use with the Pinecone vector database. All the research we could find focused on either ML approaches, which were untenable due to hopes to perform vector generation in the browser, or hamming distance vector comparison, which are untenable for large scale search.
The README here contains my research into several algorithms' performance, and the repo contains the code that performed the data gathering.
The site alt-text.org is still alpha quality and under active development, and the library backing it is quite small so most searches will fail, but feel free to play around with it.
Twitter users can help build the library with the link in the upper right corner, though it does not yet work on mobile.
Hmm, the smallest model I see is still 4.3MB, are there smaller?
My quick read of the stats offered says that the smaller model's accuracy suffers considerably. I could definitely see running similar tests on it though!
All that said, the matching tensorflow offers from my understanding is also not exactly what I'm after. I'm primarily concerned with matching identical-to-humans images, possibly with small modifications such as size changes. Think more "are these two images identical" vs "give me pictures of dogs"
That said, I don't see many good models available for download on tfhub or huggingface optimized for it, but you can always programmatically modify your images (if you truly mean identical to humans) - change white balance, crop, rotate, select adjacent frames from videos, etc. and optimize a network that is small enough for you to be satisfied and see if that works, as a possible alternative.
Cool to see a vector search application that leverages traditional image features. I'm curious to know which of these methods performed best on cropped images, which remain somewhat of a challenge for traditional classification models (contrastive models trained specifically with image crops tend to work much better).
Also for the vector database itself, have you considered spinning up your own open-source alternative such as Milvus (https://github.com/milvus-io/milvus), or were you only considering managed services?
The full cropped results are in the "Correctness" tab of the Google sheet linked at the bottom, with more details in the "Scoring" tab, but TL;DR the intensity vector worked best, with Goldberg (the one I chose) a pretty close second. Goldberg correctly returned the correct result highest-scored in 79% of cases, with it present in the result in 90%.
I'm primarily interested in managed services. I've been an SRE and I hope to not be in that role again.
I also developed a service based on the Goldberg paper in the mid 2010s. We flattened and discretized the signature to make it searchable which gave us pretty good results: https://github.com/ProvenanceLabs/image-match
Sorry I haven’t maintained the project in years so it’s unlikely to work out of the box. But who knows, maybe you’ll find something useful here for your project!
The insertion rates were based on the elasticsearch backend, with multiple shards/nodes (using the already flattened image signatures). Don't remember specifics unfortunately. I never tried it with more than 5 lightness buckets, sorry!
The README here contains my research into several algorithms' performance, and the repo contains the code that performed the data gathering.
The site alt-text.org is still alpha quality and under active development, and the library backing it is quite small so most searches will fail, but feel free to play around with it.
Twitter users can help build the library with the link in the upper right corner, though it does not yet work on mobile.