If you're interested in clustering text documents, the canonical algorithm would...

If you're interested in clustering text documents, the canonical algorithm would be latent Dirichlet allocation, which is a topic modeling algorithm. You can find latent Dirichlet allocation in sklearn; however, you're more looking for something that returns a raw similarity score it sounds like, in which case it might be interesting to check out word2vec. Perhaps checkout this stack overflow answer: https://stackoverflow.com/questions/22129943/how-to-calculat...