Cool. I was wondering what Tanimoto meant, since I've tried to make myself familiar with all the useful similarity measures, and apparently it's just another name for Jaccard Index.
I do think there is a lot of potential in exploring more sensitive measures of similarity or statistical dependence. It seems like the ML community has basically decided that all the heavy lifting should be done at the model embedding level, and then you can just use cosine similarity for speed and the answers just "fall out". Which is definitely nice because then you can search across millions of records per second.
But there are some lesser-known measures of similarity/dependence that can pick up on more subtle relationships-- the big drawback is that they are slow. I included a couple of these exotic ones in my project, Hoeffding's D and HSIC, mostly out of curiosity.
I do think there is a lot of potential in exploring more sensitive measures of similarity or statistical dependence. It seems like the ML community has basically decided that all the heavy lifting should be done at the model embedding level, and then you can just use cosine similarity for speed and the answers just "fall out". Which is definitely nice because then you can search across millions of records per second.
But there are some lesser-known measures of similarity/dependence that can pick up on more subtle relationships-- the big drawback is that they are slow. I included a couple of these exotic ones in my project, Hoeffding's D and HSIC, mostly out of curiosity.