Hey doctoboggan, awesome questions! >> What is the dimensionality of each word v...

dhammack · on Nov 12, 2013

For 2D visualization, t-sne is an excellent tool. I've used it with word2vec and you can see clusters of similar words:

https://raw.github.com/dhammack/Word2VecExample/master/visua...

And more in https://github.com/dhammack/Word2VecExample/tree/master/visu...

murbard2 · on Nov 12, 2013

Build your kd-tree on the vectors expressed in the eigenvector basis. If the eigenvalues decrease fast enough, you can get bounds on the dot product while going only a few levels deep.

robseed · on Nov 13, 2013

Don't most search engines use an inverted index to find the similarity between the query vector and the document vectors? (instead of doing the dot product with every document)

Maybe something like this could help: http://radimrehurek.com/gensim/similarities/docsim.html

Cool stuff, thanks.

doctoboggan · on Nov 15, 2013

I'd love access to the raw vector database if you feel like sending it my way.