Very interesting but, I just tested this out and I had better performance with a...

Very interesting but, I just tested this out and I had better performance with a process to find text similarity that uses the Word2Vec model to represent text documents as vectors and then computes the cosine similarity between these vectors. Here is that code. https://github.com/jimmc414/document_intelligence/blob/main/... It does require a 3GB download of pretrained word2vec embedding model. An explanation is provided in https://github.com/jimmc414/document_intelligence/blob/main/...

I will note that I am comparing entire text files in these implementations not sentences.