
Show HN: Scalable reverse image search built on Kubernetes and Elasticsearch - alexkern
https://github.com/pavlovml/match
======
rhsimplex
Hi everyone, I'm the author of the underlying image matching library (
[https://github.com/ascribe/image-match](https://github.com/ascribe/image-
match) ). First, thank you Alex for your contribution -- it makes image-match
much more useful for the typical user.

Just to answer a couple of questions in the comments:

Goldberg's algorithm (the one used in the image-match library) is not very
robust against arbitrary rotation -- around +/\- 5 degrees should be ok. 90
degree rotations, mirror images, and color inversion are handled with the
`all_orientations` parameter but under the hood this is just manipulating the
image array and searching multiple times.

Even though the hash is a vector, and the similarity is a vector distance,
when it's time to search for an image, we don't compute every distance. The
hash vector is binned into integer "words" and we lookup against these
columns. Only if there is a hit is the full signature computed. You can find
more details in the Goldberg paper (
[http://www.cs.cmu.edu/~hcwong/Pdfs/icip02.ps](http://www.cs.cmu.edu/~hcwong/Pdfs/icip02.ps)
).

Our original use case was a web image crawler, and hopefully we can release
that code someday too. In the meantime, if you decide to roll your own
crawler, be sure to Elasticsearch's bulk API (
[https://www.elastic.co/guide/en/elasticsearch/reference/curr...](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-
bulk.html) ) for the crawlers so as not to burden the Elasticsearch cluster
too much. We were able to get well over 10k inserts/s on a 5-node
Elasticsearch cluster (I don't remember how many worker nodes...the whole
thing is IO limited waiting for images to download for processing, so there is
even more optimization to be had there).

Thanks again, Alex!

------
teraflop
This is cool!

I'm curious about which perceptual hashing algorithm is being used. The README
says it's "invariant to scaling and rotation" but the approach described in
the linked paper is highly rotation-sensitive. (EDIT: it looks like the
implementation can handle multiples of 90 degrees, which is a bit better than
I thought at first)

~~~
alexkern
Thanks! :)

I've added a bit of documentation for the `all_orientations` parameter on POST
/search and clarified what we mean by "rotation". By default, Match search for
all 90 degree rotations of the given image. We're open to improvements on the
hashing algo which would make it more flexible at the expense of accuracy when
provided a flag.

------
rjvir
Would be useful if they put up a simple site with an image uploader that demos
the search

~~~
alexkern
That would be awesome. A demo is in the works. :)

------
aub3bhat
Is it built for exact reverse image search or is it able to retrieve
semantically similar objects?

~~~
alexkern
Match is built for finding images that visually look similar. It doesn't
understand the semantics of the image itself. There's no way to attach
keywords, though plain-old-Elasticsearch is pretty good at that.

~~~
aub3bhat
Does the algorithm generate a binary hash? or does it generates a real valued
vector?

I have been using Tensor Flow to build something similar, though its
significantly more computationally expensive but can allow you to find
semantically related images. I still have to find an optimal encoding scheme
for efficient nearest neighbors.

E.g.

[https://raw.githubusercontent.com/AKSHAYUBHAT/VisualSearchSe...](https://raw.githubusercontent.com/AKSHAYUBHAT/VisualSearchServer/master/appcode/static/alpha4.png)

[https://raw.githubusercontent.com/AKSHAYUBHAT/VisualSearchSe...](https://raw.githubusercontent.com/AKSHAYUBHAT/VisualSearchServer/master/appcode/static/alpha3.png)

------
impostervt
How about a scalable web crawler/image indexer? ;)

~~~
alexkern
You read our mind. ;)

~~~
impostervt
So tineye?

~~~
XiZhao
Actually, the API looks remarkably similar. 1-1 even.

------
justinsayarath
wow super cool.

