
Image-Match: Open-source scalable reverse image search - trentmc
https://github.com/ascribe/image-match
======
danso
This is _sweet_...image-matching is one of those functions that external
services do extremely well for the consumer -- e.g. Google Image Search and
TinEye -- but not for those who need to have such a service in a private
domain, such as an in-house photo library...I've used pHash for comparisons,
and have a decent idea of how to build my own classifiers...but pretty much no
idea how to do it efficiently and in a structured way to do reverse-image
matching.

FWIW, John Resig uses pastec for his work:

[http://ejohn.org/blog/image-similarity-search-
wanted/](http://ejohn.org/blog/image-similarity-search-wanted/)

[http://ryanfb.github.io/etc/2015/11/03/finding_near-
matches_...](http://ryanfb.github.io/etc/2015/11/03/finding_near-
matches_in_the_rijksmuseum_with_pastec.html)

------
gobengo
The LSH wikipedia page is a fun read and relevant
[https://en.wikipedia.org/wiki/Locality-
sensitive_hashing](https://en.wikipedia.org/wiki/Locality-sensitive_hashing)

Here's a real-world use case [http://blog.livefyre.com/architecting-
sidenotes/](http://blog.livefyre.com/architecting-sidenotes/)

------
richmarr
You might want to look at Morelikethis queries to boost performance. I worked
on a proprietary version of this and at the time Lucene performance dropped
off nearly linearly with the number of query terms.

We used MoreLikeThis to reduce our queries count to the 30-40 most
statistically interesting terms. The one hiccup being an issue in Lucene [1]
where the term cache wasn't operating properly. We just added our own image
query term cache and a custom MLT query to leverage it, which gave us a 10x
speed bump over any other methods we tried.

The interestingness of the terms is assessed on a per-term basis though, so
you might see a relevence drop for some types of image if you set MoreLikeThis
to use too few terms.

[1]
[https://issues.apache.org/jira/browse/LUCENE-1690](https://issues.apache.org/jira/browse/LUCENE-1690)

~~~
rhsimplex
Thank you for the suggestion. I actually did try restricting the terms by
measuring correlation between columns -- the idea being that more
discriminating terms should be searched first. This did result in modest
speedups.

Fortunately or unfortunately, we were already achieving pretty good speed with
Elasticsearch so we didn't implement it. However, it didn't occur to me to try
a MoreLikeThis query, which should be even simpler -- I will look into it!

~~~
richmarr
Cool. Impressive project by the way; I forgot to say that before.

I tried something similar; but with a different approach. I tried creating
compound words, a bit like n-grams. I didn't get it working as that was a
side-project and I couldn't commit enough time.

------
mkoryak
Last time I looked into doing some content based image matching in nodejs the
best I could find was a node-phash fork that was difficult to get working on
osx.

Does anyone know if this has changed since?

------
pilooch
An implementation of image similarity search based on deep convnets can be
found at
[https://github.com/beniz/deepdetect/tree/master/demo/imgsear...](https://github.com/beniz/deepdetect/tree/master/demo/imgsearch)

------
th0br0
Uhh... that's a fork. This is the original repo:
[https://github.com/rhsimplex/image-match](https://github.com/rhsimplex/image-
match)

~~~
michaelbuckbee
It appears that since your comment was posted the original repo's README was
updated with a notice stating it was no longer maintained and that OP's linked
repo is actually the correct one.

