Hacker News new | past | comments | ask | show | jobs | submit login

Very interesting but, I just tested this out and I had better performance with a process to find text similarity that uses the Word2Vec model to represent text documents as vectors and then computes the cosine similarity between these vectors. Here is that code. https://github.com/jimmc414/document_intelligence/blob/main/... It does require a 3GB download of pretrained word2vec embedding model. An explanation is provided in https://github.com/jimmc414/document_intelligence/blob/main/...

Here is the gzip knn implementation I tested https://github.com/jimmc414/document_intelligence/blob/main/...

I will note that I am comparing entire text files in these implementations not sentences.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: