Hacker News new | past | comments | ask | show | jobs | submit login

> Basically, every word can be numerically described with up to 300 properties, some properties being more important

I'd give a bit more of a nuanced view here -- we can choose any number of properties (dimensions) to represent words, which are all learned from a corpus. 300 dimensions is a pretty popular choice. These dimensions aren't (generally) interpretable: they represent latent properties. In other words, it's not possible to say which property each dimension represents, it's simply one that your word embedding algorithm has picked up in the data. Generally speaking, feature importance is hard to define for the same reason.




Imagine that we made word vectors out of PCA reduced sparse tf-idf or countvectorized vectors. I can tell you exactly what each PCA component explains. I could even do that at the word level because it's not difficult to do inverse transforms with some simple dimensionality reduction techniques

The model interpretability goes out the window because we used techniques for the vectorization that kinda suck. NLP is obsessed with self-supervision unnecessarily when they should be innovating in dimensionality reduction techniques


Why do you think NLP practictioners are focusing on self-supervision instead of dimensionality reduction?


I agree, and I have an idea for this dimensionality reduction which makes the original unsupervized word vectors interpretable.

it boggles my mind I haven't seen anyone implement my idea.


SVD has been used for dimensionality reduction of co-occurrence matrices for ages [1], but the resulting word embeddings aren't as performant as those of word2vec/etc. The same is probably true of using PCA.

Word2vec's popularity is the result of people valuing performance (i.e. accuracy) more than interpretability.

[1] https://dl.acm.org/citation.cfm?id=148132


Well, it might be because it's hard to read your mind from here.


no, that wouldn't be mindboggling

what's mindboggling to me is that I haven't seen anyone else come up with the idea independently.


it's so obvious one wouldn't have to read my mind, it's all implicit in the king-man+woman=queen type of relations... if you really ask a second time, fuck it, im not in ML sector, perhaps I just give away the idea...


I agree. I guess Andrew chose those examples to better illustrate what those properties could represent.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: