where you discuss with one of the authors of the excellent paper "Towards Understanding Linear Analogies"
you are discussing ELMo and specifically word senses i.e. "leaves" which has multiple senses (departs, foliage, ...)
I recently stumbled on a paper from 2016 (modified 2018) which IMHO gives a lot of insight, but I had to read both the old and the new version (I recommend reading v1 first and then the newest)
They illustrate how for example the word "leave" in word embeddings, is in fact simply a linear combination (with coefficients on the order of 1) of the true positions of each individual sense i.e. "leave" = A"leave.1"+B"leave.2"+... with A,B, ... constants close to 1. Theres typically less than 10 for a single word.
These reside in the same vector space as the word embeddings, and they illustrate how these sense vectors can be retrieved from the shallow word embedding vectors by sparse coding!
Interesting. Based on a quick glance, it seems this would answer the question I asked in that thread about whether it might be possible to get word-sense embeddings via two simpler transformations: first a transformation to the space of word-sense compositions (e.g., via GloVe/SGNS), and then a transformation to the space of word senses. I'll take a closer look. Thank you!
corpus -- word2vecOrGloVe--> word embeddings v_w in R^n
word embedding --sparsecoding--> sense embeddings v_s in same R^n
the sparse coding process gives the constants A_ws and senses where subscript w is a word index and s is a sense index, so that:
word vectors v_w = sum(A_ws v_s, s)
and for each word w most A_ws are zero except for a few s values
1) polysemy: a word w can have multiple senses, namely those sense vectors with index s where A_ws is nonzero
2) synonyms: a sense s can have multiple synonyms w, again those w where A_ws is nonzero
so the result of sparse coding gives for each word, a couple of indexes of the sense vectors, and for each sense the corresponding indexes of word vectors... and of course the sense vectors themselves.
so that to find say a synonym of "leaves", you just look at the sense indexes corresponding to that string, then you look at the different words indexes for that sense, and they will refer to the words "foliage" but also "leaves" of course and possibly others...
I also believe that once you have the sense vectors, in theory a second pass through the corpus should improve results if the context of each focus word is used to determine the closest sense vector compatible with the focus word... so that in effect word2vec or Glove extraction is run on the senses instead of the words
for the sparse coding they used SMALLbox, and I am still trying to better grasp how exactly the sparse coding works, and what prevents the A_ws and v_s to reduce to the trivial solution v_s = v_w and A_ww = 1 and A_wx = 0 for x differing from w...
https://news.ycombinator.com/item?id=18364148
where you discuss with one of the authors of the excellent paper "Towards Understanding Linear Analogies"
you are discussing ELMo and specifically word senses i.e. "leaves" which has multiple senses (departs, foliage, ...)
I recently stumbled on a paper from 2016 (modified 2018) which IMHO gives a lot of insight, but I had to read both the old and the new version (I recommend reading v1 first and then the newest)
They illustrate how for example the word "leave" in word embeddings, is in fact simply a linear combination (with coefficients on the order of 1) of the true positions of each individual sense i.e. "leave" = A"leave.1"+B"leave.2"+... with A,B, ... constants close to 1. Theres typically less than 10 for a single word.
These reside in the same vector space as the word embeddings, and they illustrate how these sense vectors can be retrieved from the shallow word embedding vectors by sparse coding!
The paper is at https://arxiv.org/abs/1601.03764
Again I recommend reading first v1 then v6