I'd give a bit more of a nuanced view here -- we can choose any number of properties (dimensions) to represent words, which are all learned from a corpus. 300 dimensions is a pretty popular choice. These dimensions aren't (generally) interpretable: they represent latent properties. In other words, it's not possible to say which property each dimension represents, it's simply one that your word embedding algorithm has picked up in the data. Generally speaking, feature importance is hard to define for the same reason.
The model interpretability goes out the window because we used techniques for the vectorization that kinda suck. NLP is obsessed with self-supervision unnecessarily when they should be innovating in dimensionality reduction techniques
it boggles my mind I haven't seen anyone implement my idea.
Word2vec's popularity is the result of people valuing performance (i.e. accuracy) more than interpretability.
what's mindboggling to me is that I haven't seen anyone else come up with the idea independently.