sorry, this is off topic, but it's impossible to add a comment to the discussion...

cs702 · on Jan 24, 2019

Interesting. Based on a quick glance, it seems this would answer the question I asked in that thread about whether it might be possible to get word-sense embeddings via two simpler transformations: first a transformation to the space of word-sense compositions (e.g., via GloVe/SGNS), and then a transformation to the space of word senses. I'll take a closer look. Thank you!

DoctorOetker · on Jan 24, 2019

correct, the flow of information is:

corpus -- word2vecOrGloVe--> word embeddings v_w in R^n

word embedding --sparsecoding--> sense embeddings v_s in same R^n

the sparse coding process gives the constants A_ws and senses where subscript w is a word index and s is a sense index, so that:

word vectors v_w = sum(A_ws v_s, s)

and for each word w most A_ws are zero except for a few s values

1) polysemy: a word w can have multiple senses, namely those sense vectors with index s where A_ws is nonzero

2) synonyms: a sense s can have multiple synonyms w, again those w where A_ws is nonzero

so the result of sparse coding gives for each word, a couple of indexes of the sense vectors, and for each sense the corresponding indexes of word vectors... and of course the sense vectors themselves.

so that to find say a synonym of "leaves", you just look at the sense indexes corresponding to that string, then you look at the different words indexes for that sense, and they will refer to the words "foliage" but also "leaves" of course and possibly others...

I also believe that once you have the sense vectors, in theory a second pass through the corpus should improve results if the context of each focus word is used to determine the closest sense vector compatible with the focus word... so that in effect word2vec or Glove extraction is run on the senses instead of the words

for the sparse coding they used SMALLbox, and I am still trying to better grasp how exactly the sparse coding works, and what prevents the A_ws and v_s to reduce to the trivial solution v_s = v_w and A_ww = 1 and A_wx = 0 for x differing from w...