
Evaluating Perceptual Image Hashes at OkCupid - based2
https://tech.okcupid.com/evaluating-perceptual-image-hashes-okcupid/
======
kaivi
This is all good, but how does one store, index and efficiently search for
near matches? I have a side project with terabytes of photo URLs, and
hashing&indexing them has been an itch I could not scratch.

~~~
mctx
Have you looked at
[https://github.com/dermotte/lire](https://github.com/dermotte/lire)? This
plugin for ElasticSearch uses it: [https://github.com/kzwang/elasticsearch-
image](https://github.com/kzwang/elasticsearch-image)

~~~
kaivi
Looks like it does all kinds of tricks at once -- histograms, SIFT/SURF
features and edge patterns. I was looking for a simpler solution, like an
effective MVP-tree implementation for perceptual hashes. Wouldn't that be
faster and overall better?

------
adityapatadia
We have developed neural net based solution for finding images which are
nearly or even distantly similar. It can be trained to work on any type of
image and works pretty accurately.

You can check demo at:
[https://www.turingiq.com/demo/image/similar](https://www.turingiq.com/demo/image/similar)

~~~
alexcnwy
I suspect you could just use the content loss approach from the style transfer
examples.

Take say the second last layer of a pre-trained convolutional neural network
(e.g. VGG) and just use that to compute cosine similarities between images.

Would have to pick a cutoff with some manual testing but I suspect that'd work
plus can be done in a matter of hours using pre-trained weights in keras.

------
based2
[https://www.reddit.com/r/programming/comments/6efoqw/evaluat...](https://www.reddit.com/r/programming/comments/6efoqw/evaluating_perceptual_image_hashes_at_okcupid/)

------
olympusmountain
The obvious (yet politically controversial) solution is to just use one of the
many Open face recognition embedding CNNs and then combine with low level
appearance descriptor.

~~~
lobster_johnson
The author pointed out on Reddit that a large proportion of the photos don't
have faces in them.

~~~
aisofteng
It is obvious that a large percentage of photos are not of people. It is
obvious that most photos uploaded to a dating website will have faces in them.

~~~
lobster_johnson
Did you even read the Reddit comment? [1]

    
    
        Largely the issue with this is that a lot of spammers
        don't have photos with faces [...]
    

[1]
[https://www.reddit.com/r/programming/comments/6efoqw/evaluat...](https://www.reddit.com/r/programming/comments/6efoqw/evaluating_perceptual_image_hashes_at_okcupid/diainlz/)

~~~
jbg_
Wouldn't you then just flag as spam any profiles with a high proportion of
photos with no faces in them?

------
mctx
I wonder if the author has looked into using SURF/OTB etc?

