
GloVe: Global Vectors for Word Representation - ot
http://nlp.stanford.edu/projects/glove/
======
forefeather
This is really interesting work and a good paper. But, it should also be
mentioned that some concerns have been raised about the evaluation. As
troublesome as it is, it is a common problem with word representation papers
since there is not yet a solid standardized way to approach evaluation.

[https://plus.google.com/114479713299850783539/posts/BYvhAbgG...](https://plus.google.com/114479713299850783539/posts/BYvhAbgG8T2)

~~~
chrman
Yoav Goldberg's comments - and those of the anonymous reviewers - were indeed
very useful in encouraging us to do a better job at the evaluation in the
paper. These were comments on the submission version, and so the improvements
were included in the camera-ready version following the usual procedure.

To say that there were troublesome concerns about the evaluation is a bit too
strong. Yoav's experiment showed that for the one data setup he ran, training
both models on the same data, GloVe out-performed word2vec by less than in our
results where we compared against the publicly released word2vec vectors. But
it still outperformed word2vec. And Yoav's comparison isn't the last word: In
his experiments, he ran GloVe for only 15 iterations, but, as we already knew
and were taking advantage of, GloVe's performance continues to improve for
many more iterations. This is now much more clearly documented in the final
version of the paper (see fig. 4).

But at the end of the day, these numeric differences were never the point of
the paper. The contribution of the paper is to show how the kind of good
results word2vec gets with online learning on a token stream can be achieved
also by working from a global co-occurrence count matrix, more in the style of
the traditional SVD, but changing the loss function and frequency scaling, and
that you could expect working in this way to be somewhat more statistically
efficient. Yoav has actually been involved in some very interesting work along
the same lines himself: [https://levyomer.files.wordpress.com/2014/09/neural-
word-emb...](https://levyomer.files.wordpress.com/2014/09/neural-word-
embeddings-as-implicit-matrix-factorization.pdf)

@jo_9: We don't use word2vec for training, only for experimental comparison,
and the style of training is fairly different.

~~~
forefeather
Doctor Pennington? I really like the paper, as I said "This is really
interesting work and a good paper.", but disagree that "troublesome" would be
too strong. The original version claimed 11% improvement, Professor Goldberg
found that it would be along the lines of 2-3%. Even if this is not the point
of the paper it is how it was empirically evaluated and was the key
performance statement in the original abstract and could mislead a reader to
expect better model improvements. Nevertheless, I am a fan of your work and
excited to see what you publish next.

------
mdda
The most remarkable take-away from this whole genre of word-embedding is that
just by doing 'dumb averages' of word contexts and then optimizing the
'vector[word]' on the input (and output sides), you end up with a SEMANTIC
understanding of the English language in the word vectors.

This paper is the latest in the series (across multiple researchers), and
seems to boil the task down to its bare minimum : Just a raw least-squares
optimization works. And instead of the 'linguistic knowledge' being smuggled
into the problem set-up _increasing_ (initially, people used tree-embeddings,
and WordNet bootstrapping, in the 2003 papers), this is getting rid of almost
all structure. And ending up with better results.

So, instead of semantics being a naturally very deep problem, apparently
common sense understanding can be derived from surface statistics. IMHO, more
people should be excited about this (from an AI standpoint).

------
jo_
How is this different from the heretofore prolific Word2Vec? I see they
mention it but don't provide information about how it is distinct from their
approach.

EDIT: My fault. I was only reading through the site instead of the paper. It
looks like they utilize a similar approach to training (even making use of
Word2Vec), but their approach involves using a smaller, specially chosen
subset of the data to improve the robustness of the comparison between two
word vectors.

