
Word Mover's Embedding: Cheap WMD For Documents - vackosar
https://vaclavkosar.com/ml/Word-Movers-Embedding-Cheap-WMD-For-Documents
======
SomewhatLikely
Thanks for introducing a new (to me) idea. I didn't watch the video but I felt
the write-up could have been more cohesive. Perhaps just a conclusion to tie
all the ideas together. I'm also left wondering why we would use this WME
approach over other document embedding techniques (averaging word vectors,
paragraph vectors, smooth inverse frequency weighting, etc). Is it faster,
gives better similarity estimates, etc.?

------
gojomo
Interesting idea!

Perhaps the 'random' docs could instead be generated (or even trained) for
even-greater significance of the new embeddings.

For example: after doing LDA, generate a 'paragon' doc of each topic. Or
coalescing all docs of a known label together, then reducing them to D summary
pseudo-words – the D 'words' with minimum total WMD to all docs of the same
label. Or adding further R docs into regions of maximum confusion.

