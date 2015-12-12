Hacker News new | comments | show | ask | jobs | submit login
King – man + woman is queen; but why? (p.migdal.pl)
37 points by stared 2 hours ago





I really like these posts by Sanjeev Arora :

1) http://www.offconvex.org/2015/12/12/word-embeddings-1/

2) http://www.offconvex.org/2016/02/14/word-embeddings-2/

For a more theoretical explanation :

https://arxiv.org/abs/1502.03520

I've never found the "vector space" of word2vec remotely satisfying. In order to form a vector space, you need to also be able to make sense of scalar multiplication. What is 2x"king"? What is 3king - 2man + 0.5*woman? You can kind of make sense of this for adjectives, but it really breaks down with nouns.

I thought it was putting words on the surface of a sphere and just doing math with cosine distance?

It seems to behave a little like positions, then. London makes sense. Paris makes sense. London - Paris makes sense. But 2*London does not.

> What is 2x"king"?

War? Murder?

:)

Then you really shouldn't look too closely at physics.

What is a secondmeter? A voltvoltgram?

I don't follow the point you're making. A volt plus a meter is just that, a volt plus a meter. A king minus a man plus a woman is a nonsense statement. Neat, but not as not exactly illuminating.

I'm glad it can be useful. But, I agree it seems to leave a lot lacking.

Can someone explain a bit more on vector space model for words (and documents)? I first saw that approach in prof. Erik Demaine lecture on algorithms [1], and also here. It's fascinating how linear algebra and vector spaces pop up in unexpected places.

[1] https://courses.csail.mit.edu/6.006/spring11/lectures/lec01....

