Hacker News new | past | comments | ask | show | jobs | submit login

Hello HN,

Author here. I wrote this blog post attempting to visually explain the mechanics of word2vec's skipgram with negative sampling algorithm (SGNS). It's motivated by:

1- The need to develop more visual language around embedding algorithms.

2- The need for a gentle on-ramp to SGNS for people who are using it for recommender systems. A use-case I find very interesting (there are links in the post to such applications)

I'm hoping it could also be useful if you wanted to explain to someone new to the field the value of vector representations of things. Hope you enjoy it. All feedback is appreciated!

Nice work jalammar! Author of gensim here. Quotes from Dune are always appreciated :-)

Here's some more layman reading "from back when", for people interested in how word2vec compares to other methods and works technically:

- https://rare-technologies.com/making-sense-of-word2vec/ (my experiments with word2vec vs GloVe vs sparse SVD / PMI)

- https://www.youtube.com/watch?v=vU4TlwZzTfU&t=3s (my PyData talk on optimizing word2vec)

I read some of your posts of few weeks ago when searching more info about gensim, there well explained and understandable even for a beginner. Thanks.

The Dune references aren't limited to this article. :)

The BERT article [1] has 'em too!

[1] https://jalammar.github.io/illustrated-bert/

You're the first to point that one out! Nice catch!

Oh wow. Hi Radim! Huge fan of Gensim! Thanks for the links!

I'm half-way through your excellent article. How do you produce such great artwork?

I believe I understand the concepts of CBOW and skip-gram. But I'm a little bit stuck. I kind of don't understand this [0]. In fact I understand it so poorly that I can't even formulate a question around it.

Now what do we do?

[0] https://skymind.ai/images/wiki/word2vec_diagrams.png

Edit: An attempt at formulating a question: is it the process of feeding the model with the [context][context][output] vector that you are depicting?

Thanks! Mostly Keynote, and lots of iteration.

I'll be honest, I personally found this figure puzzling. Still not 100% clear on it, but I don't believe it refers to the negative sampling approach. My best guess is that it's referring to earlier word2vec variants where the input in skipgram (or sum of inputs in CBOW) are multiplied by a weights matrix that projects the input to an output vector.

It shows the input output pairs you would use to train the network. Projection is simply your fully connected layer of dimension the embedding size you want (e.g., something like 300). The output column is what is being predicted by the model, for which you have the true data and you'll calculate a loss and backprop as usual. In the BOW case you take multiple context words and predict the middle word (as shown in your diagram) and skip gram is the opposite approach.

Great post, thanks!

Is there a reason why the training is started off with two separate matrices - the embedding and the context matrix? If the context matrix is anyway discarded at the end, why not start and work with only the embedding matrix?

Thank you so much for writing this. As a software developer with some familiarity with ML concepts and terminology (I’d heard of word2vec for example) I found this post really easy to follow along with.

What a great work, man! It makes ML way simpler to understand. For those interested in see a similar content to learn advanced Maths, here is good YouTube channel: https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw

Then you definitely succeeded! Those are the parts where I learned the most.

Another great article as always Jay.

Also thumbs up for the Dune references :)

Thanks for this, beautiful work!

This is great!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact