
Language Models, Word2Vec, and Efficient Softmax Approximations - rvarma
http://rohanvarma.me/Word2Vec/
======
kirillkh
How can this be used for full-text search, e.g. with Lucene? The first step in
indexing a document for full-text search is reducing each word to its base
form, and similarly for a search string. While it's not a difficult problem in
English, in some languages (e.g. Herew) it's notoriously hard to figure out
the base form of a word and further disambiguate its meaning, as the only way
to do so is based on context. So how can you easily build a stemmer/lemmatizer
on top of these instruments to perform such task?

~~~
visarga
Run Doc2Vec (or Word2Vec) on a large corpus of text or download pretrained
vectors. To compute a document vector, take a linear combination of the word
vectors in the document according to TFIDF. Now that you have vectors for each
document, you need to create a fast index with a library called "Annoy". It
can do very fast similarity search in vector space for millions of documents.
I think this approach works faster than grep and doesn't need to bother with
stemming. It will automatically know that "machine learning" and "neural nets"
are related, so it does a kind of fuzzy search.

~~~
garysieling
If you wanted it to know that "machine learning" and "neural networks" were
related, wouldn't you need to do some type of entity extraction first, since
Word2vec is run on tokens?

~~~
kbwt
The original word2vec source code comes with a probabilistic phrase detection
tool. Keyword: word2phrase.

~~~
garysieling
Good to know, thanks!

------
kleebeesh
Nice write-up! I also found this one useful in terms of high-level
implementation: [http://adventuresinmachinelearning.com/word2vec-keras-
tutori...](http://adventuresinmachinelearning.com/word2vec-keras-tutorial/)

Initially it was not obvious to me that the dot-product was even part of the
model. In hindsight it's intuitive: a pair of high similarity vectors will
have a high dot-product, which yields a high sigmoid activation. This also
motivates the use of cosine similarity, which is just a normalized dot-
product. Likely obvious to some but this eluded me the first few days I
studied this model.

~~~
rvarma
Thanks for the link! It actually provides a really clear and intuitive
explanation of the notion of similarity.

