Word2vec is definitely an impressive algorithm. But at the end of the day, it's just a tool that cranks out a fine-grained clustering of words based on (a proxy measure for) contextual similarity (or rather: an embedding in a high-dimensional space, which implicitly allows the words to be more easily clustered). And yes, some additive relations between words, when the signal is strong enough.
But to say that it's "learning" the "meaning" of these words is really quite a stretch.
I remember that when I first read A Clockwork Orange, it took me a while but I finally started to understand the meanings of those words/phrases (though I may not have every encountered them before.) It did feel like my brain was re-wiring itself to a new language. It'd be interesting to see how some type of language AI would treat these works.
edited to add there's a wiki article on the language of A Clockwork Orange, Nadsat: https://en.wikipedia.org/wiki/Nadsat
Gender was just an example. There are lots of semantic information learned by word2vec, and the vectors have shown to be useful in text classification and other uses. It can learn subtle stuff, like the relationship between countries, celebrities, etc. All that information is contained in a few hundred dimensions, which is tiny compared to the neurons in the brain.
You say, as many people do, that the operation "king - man + woman = queen" indicates that it understands that the relation between "king" and "queen" is the relation between "man" and "woman". But there is something much simpler going on.
What you're asking it for, in particular, is to find the vector represented by "king - man + woman", and find the vector Q with the highest dot product with to this synthetic vector, out of a restricted vocabulary.
The dot product is distributive, so distribute it: you want to find the maximum value of (king * Q) - (man * Q) + (woman * Q).
So you want to find a vector that is like "king" and "woman", and not like "man", and is part of the extremely limited vocabulary that you use for the traditional word2vec analogy evaluation, but not one of the three words you used in the question. (All of these constraints are relevant.) Big surprise, the word that fits the bill is "queen".
(I am not the first to do this analysis, but I've heard it from enough people that I don't know who to credit.)
It's cool that you can use sums of similarities between words to get the right answer to some selected analogy problems. Really, it is. It's a great thing to show to people who wouldn't otherwise understand why we care so much about similarities between words.
But this is not the same thing as "solving analogies" or "understanding relationships". It's a trick where you make a system so good at recognizing similarities that it doesn't have to solve analogies or understand relationships to solve the very easy analogy questions we give it.
There's also the relationship it has with the world.
If literally the only thing you know is the relationship between words, but you have a perfect knowledge of the relationship between words, you'll quickly determine that "Day" and "Night" are both acceptable answers, and have no means of determining which is the right one. At the very minimum, you need a clock, and an understanding of the temporal nature of your training set, to get the right one.
What do you see?
What do you smell?
What do you hear?
What does the landscape look like?
What memory does this bring up?
These are messages that the language is communicating. If an AI can't understand at least some of the content of the message then can it compose one effectively? I'm not certain it can understand the meaning from words alone, but we can certainly try.
And then there is the fundamentally dynamic aspect of language, which strengthens the need for a rich understanding of the world that words describe and convey.