For implementation, I am surprised it leaves out https://adoni.github.io/2017/11/08/word2vec-pytorch/. There are many other, including in NumPy and TF, but I find the PyTorch one the most straightforward and didactic, by a large margin.
This is true, but the clustering of points in space is not. So while the choice of axes is arbitrary, it becomes nonarbitrary if you're trying to choose the axes in such a way as to represent the clustering of points. This is why you end up with different rotations in factor analysis, because of different definitions of how to best represent clusterings.
I think there's some ties here to compressed sensing but that's getting a little tangential. My main point is that while it's true that the default word2vec embedding may lack meaning, if you define "meaning" in some way (even if in terms of information loss) you can rotate it to a meaningful embedding.
Instead, you can:
- rotate it with SVD (works really well, when working on a subset of words)
- project it on given axes (e.g. "woman - man" and "king - man")
People are quick to claim that embedding dimensions have no meaning, but if that is your goal, and your embedding space is good, you’re not terribly far from getting there.