Of course we have no idea how the Voynich manuscript is encrypted (which would make the assumptions of word2vec wrong), or if it even has any meaning at all. And it's an incredibly small dataset compared to modern text corpuses, so there is probably significant uncertainty and overfitting. And other problems like inconsistent spellings, many errors in transcriptions, etc. But in principle this is a good strategy.
>how would a person do if they were locked in a room with lots of books written in a language unknown to them?
If you spent all day reading them, for years, and you somehow didn't get bored and kept at it, eventually you would start to see the patterns. You would learn how "slithy toves" are related to "brillig", even if you have no idea how that would translate to English. Study it long enough, and you may even be able to produce text in that language, indistinguishable from the real text. You may be able to predict the next word in a sentence, and identify mistakes, etc. Perhaps carry out a conversation in that language.
And I think eventually you would understand what the words mean, by comparing the patterns to those found in English. Once you have guesses for translations of just a few words, you can translate the rest. Because you know the relationships between words, and so knowing one word constrains the possibilities of what the other words can be.
If the translation it produces is nonsense, the words you guessed must have been wrong, and you can try again with other words. Eventually you will find a translation that isn't nonsense, and there you go. This would be very difficult for humans, because the number of hypotheses to test is so large, and analyzing text takes forever. Computers can do it at lightspeed though.
More generally, has any attempt been made to identify the meanings of words in any sufficiently large corpus of text in a known foreign language (for example, Finnish), without being provided with a translation into English, and then compare the identified meanings to the actual meanings, as a first step towards translation?
Doing this without any translated words at all, would be more difficult. But I believe possible. It's actually a project I want to try in the near future.