

MinHash for Dummies (2013) - chesterfield
http://matthewcasperson.blogspot.com/2013/11/minhash-for-dummies.html

======
moab
This glosses over so much detail that really helps you understand MinHash and
LSH. I feel like matt has really solid intentions, but this is one of the
algorithms where actually reading the math and understanding it helps you
write a good implementation. You're also missing the intuition (random
projection) behind this sort of method by skipping out on the maths.

For book chapters: Sarier Har-Peled's comp-geo book has a pretty solid chapter
on LSH from a geometrical point of view and builds it up from basics in
Hamming Space.

Otherwise the Ullman book the author recommends is pretty excellent.
[http://infolab.stanford.edu/~ullman/mmds.html](http://infolab.stanford.edu/~ullman/mmds.html)

~~~
dang
> Sarier Har-Peled's comp-geo book has a pretty solid chapter on LSH from a
> geometrical point of view

Do you mean this one?

[http://www.amazon.com/Geometric-Approximation-Algorithms-
Mat...](http://www.amazon.com/Geometric-Approximation-Algorithms-Mathematical-
Monographs/dp/0821849115)

------
colin_mccabe
I thought this was a good Cliff's notes style summary of MinHash. Thanks for
posting it, Matt.

It looks like your intro paragraph has some people here on HN convinced that
you hate math... might want to change that.

Of course the Ullman chapter is also pretty readable, and if you have the
patience for 50-some pages, there's a lot more detail and rigor there.

------
jamesfisher
> Before I start, please take a look at [this other, clearer
> explanation]([http://infolab.stanford.edu/~ullman/mmds/ch3.pdf](http://infolab.stanford.edu/~ullman/mmds/ch3.pdf))

FTFY

