Hacker News new | past | comments | ask | show | jobs | submit login

If you're interested in clustering text documents, the canonical algorithm would be latent Dirichlet allocation, which is a topic modeling algorithm. You can find latent Dirichlet allocation in sklearn; however, you're more looking for something that returns a raw similarity score it sounds like, in which case it might be interesting to check out word2vec. Perhaps checkout this stack overflow answer: https://stackoverflow.com/questions/22129943/how-to-calculat...



That you very much, I'll look into those.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: