
Abstractions my Deep Learning word2vec model made - aliostad
http://byterot.blogspot.com/2015/06/five-crazy-abstractions-my-deep-learning-word2doc-model-just-did-NLP-gensim.html
======
fchollet
The "deep" in deep learning refers to hierarchical layers of representations
(to note: you can do "deep learning" without neural networks).

Word embeddings using skipgram or CBOW are a shallow method (single-layer
representation). Remarkably, in order to stay interpretable, word embeddings
_have_ to be shallow. If you distributed the predictive task (eg. skip-gram)
over several layers, the resulting geometric spaces would be much less
interpretable.

So: this is not deep learning, and this not being deep learning is in fact the
core feature.

~~~
rndn
Couldn't you treat the network as a matrix and perform addition and
subtraction on it in the same manner?

~~~
Smerity
I'm uncertain whether you mean the algorithm or the output. The question is
interesting to both however.

The most common method for producing word vectors, skip-grams with negative
sampling, has been shown to be equivalent to implicitly factorizing a word-
context matrix[1]. A related algorithm, GloVe, only uses a word-word co-
occurrence matrix to achieve a similar result[2].

You can also view the output as an embedding in a high dimensional space
(hence the name word vectors) but more surprisingly you can learn a linear
mapping between vector spaces of two languages, which lends it immediately
useful in translation. From [3]: "Despite its simplicity, our method is
surprisingly effective: we can achieve almost 90% precision@5 for translation
of words between English and Spanish".

[1]: "Neural Word Embedding as Implicit Matrix Factorization"
[http://papers.nips.cc/paper/5477-neural-word-embedding-as-
im...](http://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-
matrix-factorization.pdf)

[2]:
[http://nlp.stanford.edu/projects/glove/](http://nlp.stanford.edu/projects/glove/)

[3]: "Exploiting Similarities among Languages for Machine Translation" \- page
2 has an intuitive 2D graphical representation
[http://arxiv.org/pdf/1309.4168.pdf](http://arxiv.org/pdf/1309.4168.pdf)

------
lelf
> _word2vec is a Deep Learning technique first described by Tomas Mikolov only
> 2 years ago but due to its simplicity of algorithm and yet surprising
> robustness of the results, it has been widely implemented and adopted._

… And patented
[http://www.freepatentsonline.com/9037464.html](http://www.freepatentsonline.com/9037464.html)

~~~
andrewtbham
If you use word2vec in an agent, like siri or watson, how would google know?

~~~
msoad
That's not a question you can ask in a meeting at Apple!

------
lqdc13
I thought Word 2 Vec isn't "Deep Learning" as both CBOW and skip-gram are
"shallow" neural models.

~~~
dave_sullivan
How about a single layer neural net trained with dropout? Not deep learning
because there's only 1 layer, but the technique is fairly new and used in deep
learning, popularized by some of the guys that popularized deep learning,
usually mentioned in conversations about deep learning. It's really a shallow
neural model though. Word2vec is similarly related, but you're right in that
it is not a perceptron-based neural network with multiple layers (ie "deep
neural network")

Still, interesting blog post, worth reading and googling for more information.
FWIW I just use the deep learning definition of "neural net or representation
learning research since 2006" and find it fits better.

~~~
fchollet
> _the technique is fairly new and used in deep learning, popularized by some
> of the guys that popularized deep learning, usually mentioned in
> conversations about deep learning_

Logistic regression with regularization is fairly new? 'Pioneered' by the same
people as deep convolutional neural networks? Are you certain about this?

~~~
agibsonccc
I think you need to define regularization. L1/L2? Yes those are old.

Drop out itself IS fairly new[1]. As well as its newer cousin drop connect.

I agree neural nets themselves are basically just a crazier parametric model.
Many of the things we do to modify the gradient are applicable to logistic and
other simpler regression techniques as well.

[2]:
[https://cs.nyu.edu/~wanli/dropc/dropc.pdfhttp://www.cs.toron...](https://cs.nyu.edu/~wanli/dropc/dropc.pdfhttp://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf)

------
bdamos
How did you select words to compare? Did you have to try many poor
combinations before selecting a "good" set?

~~~
danieldk
Mikolov, et al. 2013 [1] do a proper evaluation of this. E.g. they found that
the skip-ngram model has a 50.0% accuracy for semantic analogy queries and
55.9% accuracy for syntactic queries.

word2vec comes with a data set that you can use to evaluate language models.

[1] [http://arxiv.org/pdf/1301.3781.pdf](http://arxiv.org/pdf/1301.3781.pdf)

~~~
rspeer
I would insist on a better dataset before really calling these "semantic
analogies" (and don't just take my word for it: Chris Manning complained about
exactly this in his recent NAACL talk).

The only semantics that it tests are "can you flip a gendered word to the
other gender", which is so embedded in language that it's nearly syntax; and
"can you remember factoids from Wikipedia infoboxes", a problem that you could
solve exactly using DBPedia. Every single semantic analogy in the dataset is
one of those two types.

The syntactic analogies are quite solid, though.

~~~
danieldk
_and "can you remember factoids from Wikipedia infoboxes",_

That's a simplification. E.g. I have trained vectors on Wikipedia dumps
without infoboxes, and I queries such as _Berlin - Deutschland + Frankreich_
work fine.

Of course, even the remainder of Wikipedia is nice text in that it will
contain sentences such as 'Berlin is the capital of Germany'. So, indeed, it
makes doing typical factoid analogies easier.

That said -- I am more interested in the syntactic properties :).

~~~
rspeer
I didn't mean that you _have_ to learn the data from Wikipedia infoboxes, just
that that's a prominent place to find factoids.

It's a data source that you could consult to pass 99% of the "semantic
analogy" evaluation with no machine learning at all, which is an indication
that a stronger evaluation is needed.

------
eatonphil
I am not getting the "Obama + Russia - USA = Putin" piece nor the "King +
Woman - Man" bit either. Nothing particularly meaningful came up on a search
for the latter. Could someone explain?

~~~
mwsherman
If I understand your question, the idea is that one can do “arithmetic” on
concepts. Essentially the first equation asks “Obama : USA :: ? : Russia”.
Similarly, “King : Man :: ? : Woman”.

The way the corpus “talks about” Obama in relation to the USA is similar to
how the corpus talks about Putin in relation to Russia. That the system can
reveal this is amazing to me.

~~~
eatonphil
Oh cool, I see! Thanks for the explanation.

------
tshadwell
When I see things like this, it makes me wonder how much data forms each of
these vectors; if a single article were to say things about Obama, or humans
and animals, would it produce these results?

------
thisjepisje
Anyone tried this with the corpus of HN commentary?

~~~
fchollet
I have, actually. Here's the code for the experiment, with a link to download
the data:
[https://github.com/fchollet/keras/blob/master/examples/skipg...](https://github.com/fchollet/keras/blob/master/examples/skipgram_word_embeddings.py)

I also recommend using Gensim for word embeddings.

------
platz
reminds me of how Chinese words are made up of individual characters that have
semantic meaning themselves.

~~~
sho_hn
The Korean Hangeul alphabet is an interesting compromise. It's an alphabet,
but multiple letters are grouped together into syllabic characters when
written. Those syllables in many cases map back to Chinese Han characters by
way of sound values (a lot of the Korean vocab is Chinese in origin, even
though the language have very distinct grammar), which means the boundaries
between morphemes often match the character boundaries. This is reflected in
the orthography, where in case of multiple options for how to distribute
letters over characters, the option that keeps the same morpheme spelled
consistently through use in different words is preferred. So while you can
write phonetically as in Latin, the written language retains a high level of
morphological information and things feel very Lego-like.

~~~
aswanson
There seems to be a pattern in Asian languages that try to equate symbols that
represent things to the physical respresentation. I recall a friend stating
the Korean word/character for balance looked like a person holding jugs of
water on each shoulder.

~~~
sho_hn
Some Han characters are ideographic in nature, but not all of them. Korean
used to be written with Han characters (using sets of very complicated rules
for how to apply them to the language) prior to the invention of Hangeul, but
other than a handful of them they aren't in widespread use any more outside
specialized or educational contexts. However, some of the Hangeul letters are
featural in design, e.g. the velar consonant ㄱ (g/k) is meant to be a side
view of the tongue when producing its sound.

~~~
aswanson
Very interesting. Can you refer me to some tutorials or texts on these
features of those languages?

------
fauigerzigerk
I wonder if Obama + 2017 == Obama - President

~~~
kmicklas
That would be equivalent to 2017 == -President, which is unlikely.

------
datacog
OP:

Do you have some more results to share coming from your model?

------
SilasX
So that's the result? That you can find sorta clever vector equations in the
results like "Obama + Russia - USA = Putin"?

~~~
Frompo
More significant I guess is how excited the author manges to be over these
coincidences. Maybe a word of caution is needed: Overly generous
interpretations is how things like Nostradamus or the hidden code of the bible
retain their credibility with parts of the population.

And frankly, if you think that stock market \approx thermometer is insightful,
you should probably be kept away from positions of responsibility.

~~~
MrLeap
Try some yourself. [http://radimrehurek.com/2014/02/word2vec-
tutorial/](http://radimrehurek.com/2014/02/word2vec-tutorial/)

Scroll down and you have some text boxes you can try some yourself. Here's
some I tried (filtering out repeats and plural forms of input words. Those
artifacts seem to happen a lot and would be easy to ignore)

cat:dog::bread:butter eh? I guess.

sword:shield::attack:protect okay that works.

up:down::left:leaving Eh, not great. I guess if you think they're analogous in
terms of tense it kind of works. Disappointing, word2vec. (to be fair, "right"
was third highest)

drive:car::dick:? Whatever it was, it made me giggle immaturely.

