
Distributed Representations of Sentences and Documents (2014) [pdf] - espeed
https://arxiv.org/abs/1405.4053
======
argonaut
The _second author of this paper_ was unable to replicate the results.

He was merely advising the first author, who actually wrote the code. Source:
(requires Google login) [https://groups.google.com/forum/#!topic/word2vec-
toolkit/XC7...](https://groups.google.com/forum/#!topic/word2vec-
toolkit/XC7j6BhR4XM) and [https://groups.google.com/forum/#!msg/word2vec-
toolkit/Q49FI...](https://groups.google.com/forum/#!msg/word2vec-
toolkit/Q49FIrNOQRo/DoRuBoVNFb0J).

This highlights something people on HN don't appreciate about machine
learning: how hard it is to actually trust results, and how likely it is that
the results were affected by bugs in the code or how the dataset was handled.
In this case the second author was only able to replicate if he didn't shuffle
the dataset. Graduate students almost never write tests for their code.

~~~
pigscantfly
You're right, and I'm surprised (and concerned) that there haven't been any
retractions or errata published for the paper cited. That said, the sentence-
to-vector and paragraph-to-vector models are obvious extensions of Mikolov's
original word2vec architecture (the performance of which has been extensively
verified by people all over and by a myriad of reimplementations), and I can
attest that they've outperformed traditional text vectorization techniques
(different bag of words parameterizations, largely) as features for every task
I've evaluated them on (can't be too specific publicly, unfortunately). I
guess what I'm saying is that ideas can be decent in spite of a poor first
implementation.

~~~
lrei
> I'm surprised (and concerned) that there haven't been any retractions or
> errata published for the paper cited

there is a subsequent paper by Mikolov and Mesnil at ICLR with the correct
results. And code to replicate them.

Retraction? You want to retract a great paper because there is a 1-3% accuracy
discrepancy for a result on a more or less random text classification task?

~~~
pigscantfly
No, I think errata would be more appropriate; the gist of my comment was in
support of the idea behind the paper, but it's disingenuous to pretend that
the model works as well as described on that task.

It's also disingenuous to trivialize the difference as 1-3% - that difference
is more than a third of the actual error rate (7.42% vs. 11.27%) assuming
you're referring to [1]. True, the IMDB dataset isn't a very important one,
but it's important to clarify when you've made a mistake, especially if your
paper has been cited hundreds of times.

[1]
[https://arxiv.org/pdf/1412.5335v7.pdf](https://arxiv.org/pdf/1412.5335v7.pdf)

------
espeed
For background on "Distributed Representations" as cited in the paper, see:

Distributed Representations (1986)
[http://stanford.edu/~jlmcc/papers/PDP/Chapter3.pdf](http://stanford.edu/~jlmcc/papers/PDP/Chapter3.pdf)

" _Each entity is represented by a pattern of activity distributed over many
computing elements, and each computing element is involved in representing
many different entities._ "

Full Book:
[http://stanford.edu/~jlmcc/papers/PDP/](http://stanford.edu/~jlmcc/papers/PDP/)

------
bglazer
How does this compare vs LSTM'S for sentence embedding?

------
mining
I experimented with gensim's implementation of doc2vec this year; despite not
being able to achieve similar results in sentiment analysis (because
unshuffled datasets in the original paper) it's still really impressive. I
analysed some document relations in wikipedia, and it finds some really
unusual / neat relationships, e.g. Autism - Cat + Dog ~= ADHD.

~~~
tmsam
That relationship makes so much sense to me in the form Dog - ADHD ~= Cat -
Autism. Brilliant!

------
mitbal
Anybody have tried this algorithm compared to simpler strategy, like average
of word vector, for document classification task? Or compared to using
skipthought sent2vec pre-trained model?

