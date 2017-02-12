The world owes a big THANK YOU to Tomáš Mikolov, one of the creators of Word2Vec[0] and fastText[1], and also to Radim Řehůřek, the interviewer, who is the creator of gensim[1].
The number of software developers and researchers in industry and academia who rely on the work of these two individuals is large and growing every day.
[0] https://code.google.com/p/word2vec/
[1] https://github.com/facebookresearch/fastText
[2] https://radimrehurek.com/gensim/
I also liked how this podcast was conversational like the interviews in the talking machines podcast. I look forward to the future episodes.
The things which still amaze me are that meaningful vector operations work (Queen + Man ~= King), that you only need 300 dimensions (or sometimes even less!) and that it is possible to build this vector space so "easily".
(1) This representation is learned, essentially, by trying to predict words from the surrounding words in a sentence or the other word around.
These gender-based vector differences can be explained as capturing syntactic rather than semantic information, but semantic relations (Rome:Italy -> Paris:France) can also work.
