

Fast Exact Search in Hamming Space with Multi-Index Hashing (2012) - brettlangdon
http://www.cs.toronto.edu/~norouzi/research/mih/

======
putterson
I used this paper and associated code in my undergraduate thesis. After
decoupling the C++ code from MATLAB I was able to make it into a library and
use it to search binary features instead of the floating point features hashed
with Locality Sensitive Hashing, giving an exact k-NN instead of approximate.
The code was fast but the benefits really manifest with large numbers of codes
(pretty much what we want). Contact me if you would like to see the
performance of this code for various m and image corpuses, or the rest of my
paper.

EDIT: also it's probably best to link to the homepage for the paper:
[http://www.cs.toronto.edu/~norouzi/research/mih/](http://www.cs.toronto.edu/~norouzi/research/mih/)
and the code:
[https://github.com/norouzi/mih/](https://github.com/norouzi/mih/)

~~~
dang
Ok, url changed from
[http://www.cs.toronto.edu/~norouzi/research/papers/multi_ind...](http://www.cs.toronto.edu/~norouzi/research/papers/multi_index_hashing.pdf).
Also, cool work!

------
brettlangdon
I found this paper while reading an article about Curalate, they used this
paper as a basis for their internal image de-duplication service.

[http://blog.underdog.io/post/120612462747/curalate-
helping-t...](http://blog.underdog.io/post/120612462747/curalate-helping-the-
worlds-greatest-brands)

------
billrobertson42
Out of curiosity, what sort of problem would you apply this to?

~~~
putterson
Well this datastructure lets us find the k nearest neighbors in Hamming space
(think binary vectors in binary vector space) quickly. If we can map features
(for simplicity lets think of images) to points in hamming space such that
similar features have a small hamming distance then we have a fast way of
finding similar images. That is the problem that this paper proposes to help
with, but I could easily see this also applying to sound clips or other media,
though my imagination is not that great. When I used this code I applied it to
the problem of image matching using binary codes generated by a feature
detector [1]. The benefit of using binary codes over traditional floating
point vectors is they are much faster to compare. [1]
[http://docs.opencv.org/modules/features2d/doc/feature_detect...](http://docs.opencv.org/modules/features2d/doc/feature_detection_and_description.html#orb))

~~~
billrobertson42
Thanks for the info. I've been digging around the opencv docs a bit after some
googlefu. Interesting to see other potential uses though.

