For example, look at Word2Vec. It "indexes" / "hashes" words that are semantically similar to the same "bucket". This means that you can search for semantically similar data just by inspecting what is near by. This has all kinds of applications.
also see https://en.wikipedia.org/wiki/Sparse_distributed_memory
It groups words together based on their distributional characteristics, and with the right parameter tuning, that can be a good-enough model of similarity.
Thanks for sharing!
But he linked to my SMHasher.
I Googled "Cache-Conscious Hashing" but didn't quickly find anything promising :(.
You can cache the hash in the entry itself or not. You can compress the entries, but mostly using linear collision structures.
Best paper "Cache-Conscious Collision Resolution in String Hash Tables”, Askitis 2005.
"In response to the findings of the Google/MIT collaboration, Peter Bailis and a team of Stanford researchers went back to the basics and warned us not to throw out our algorithms book just yet. Bailis’ and his team at Stanford recreated the learned index strategy, and were able to achieve similar results without any machine learning by using a classic hash table strategy called Cuckoo Hashing"
"We can do your ml hashing with ordinary hashing" is a statement about both ordinary hashing and ml hashing, indeed a fairly strong statement about each.
Whether it's true is a different matter but to say "nothing about ml hashing" seems unsupportable.
>(If you’re already familiar with hash tables, collision handling strategies, and hash function performance considerations; you might want to skip ahead, or skim this article and read the three articles linked at the end of this article for a deeper dive into these topics.)
One can "compress" the Factor Tables using statistical "stereotyping" as a kind of a lossy learning technique. You have systemic control over the size-versus-accuracy tuning of the hashing (indexing).
Similarly, we learn to recognize or respond to patterns without remembering each specific instance of the pattern we encounter, which can be seen as a lossy form of hashing also.
The Case for Learned Index Structures