
Word2vec, LDA, and introducing a new hybrid algorithm: lda2vec - juxtaposicion
http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec-57135994
======
nl
This looks really interesting, but it pretty hard to follow without the video.

My summary after a quick flick through is that it is a better
classification(/clustering?) model for text, because it takes Word2Vec-style
similarity into account, which plain LDA doesn't. That sounds like a
reasonable approach to me, and nice to see someone get it working.

I think. Comments?

Here is the version with notes. I haven't read this through yet:
[http://www.slideshare.net/ChristopherMoody3/word2vec-lda-
and...](http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-
introducing-a-new-hybrid-algorithm-lda2vec?next_slideshow=1)

Code here, BTW:
[https://github.com/cemoody/lda2vec](https://github.com/cemoody/lda2vec)

~~~
fnl
Here's my attempt at an in-a-nutshell summary for those familiar with the
underlying material. Warning: This might be complete nonsense!! Chris Moody
proposes to replace the technique of _summing_ paragraph vectors (to word
vectors) with sparse "LDA-vectors"; Then he appends categorical variables
(features) to these summed word+LDA vectors and estimates a multinomial
mixture over the latent word topics. All this is applied to a conditional
probability model to predict the final topic assignments ("topic vectors") for
some set of pre-defined groupings of input documents. Finally, he claims the
resulting, posterior topic assignments are even good enough to predict
(supervised) outcomes. A more non-mathematical explanation: Imagine analyzing
books with word2vec, summing LDA results to the word vectors, and adding in a
few categorical variables like year or country when/where the book was
written. Then use that "super-vector" to assign LDA topic (distributions) to
the respective _authors_ of the books. The final claim (which he makes a point
of stating that his evidence is weak) is that you could use that "author-
specific topic vector" to predict, e.g., how popular each author is.

------
warrenmar
Previous work by Chris on word2vec

[http://multithreaded.stitchfix.com/blog/2015/03/11/word-
is-w...](http://multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-
thousand-vectors/)

~~~
stared
It is a wonderful post! And HN discussions:

[https://news.ycombinator.com/item?id=9185091](https://news.ycombinator.com/item?id=9185091)

[https://news.ycombinator.com/item?id=10123041](https://news.ycombinator.com/item?id=10123041)

------
ginger_beer_m
I know word2vec and LDA separately, but what does this work do? Somehow
combine the word similarities from word2vec when forming LDA topics?

~~~
juxtaposicion
It combines the (arguably) best properties of both algorithms. Word2vec is
local and creates word representations that are powerful and flexible. LDA is
global, creating document representations that are less flexible but very
interpretable to humans. lda2vec mixes both ideas.

Ultimately, the goal is to use all of the information that is usually
available alongside text. Word2vec treats this text like one long string. LDA
has the notion of documents. But lda2vec can use more features (for example)
the zip code a client comment might come from (and so you get regional topics,
like outer wear in Vermont or cowboy boots in Texas) the client ID a comment
comes from (so you get that a client might be a sporty client, or a expecting
mother) in addition to document-level topics (which might surface customer
comments like "perfect service!" or package delivery problems). Those topics
are readily consumed by analysts and can be used to understand the business
from the client's perspective; word2vec on the other hand produces
representations that are hard for anything but machines to consume.

------
meeper16
Word2Vec is based on an original approach from Lawrence Berkeley National Lab.
This was also at the same time that David Blei was working on LDA at Berkeley.
[https://www.kaggle.com/c/word2vec-nlp-
tutorial/forums/t/1234...](https://www.kaggle.com/c/word2vec-nlp-
tutorial/forums/t/12349/word2vec-is-based-on-an-approach-from-lawrence-
berkeley-national-lab)

~~~
nl
It seems like you have made this claim in a few places (Kaggle, and here under
at least two different usernames).

Few people seem to agree with you, and whilst there certainly are similarities
it looks to me like there are more differences.

I understand that you think your patent is being ignored, but I don't think
commenting everywhere that mentioned Word2Vec is going to help you.

------
xuewei4d
what does v_client mean in the page 108 of the slides?

------
aerioux
awesome work :)

