
Item2Vec: Neural Item Embedding for Collaborative Filtering - ukz
https://arxiv.org/abs/1603.04259
======
rm999
I don't get the innovation in this paper - are they just running word2vec on
groups of items? If so, Spotify has been doing this on playlists for years
now: [https://erikbern.com/2013/11/02/model-
benchmarks/](https://erikbern.com/2013/11/02/model-benchmarks/)

Also, I know the paper isn't claiming state-of-the-art, but their SVD results
are horrendous. Standard CF would create much better artist-artist pairings
with even a medium sized dataset.

As an aside, I've run some quantitative and qualitative tests and have found
the best recommendations come from a combination of user-item and item-item. I
co-gave a talk at the NYC machine learning meetup recently
([https://docs.google.com/presentation/d/1S5Cizi9LFQ7l0bMYtY7g...](https://docs.google.com/presentation/d/1S5Cizi9LFQ7l0bMYtY7gASvOPqxNsQk0-NuP5KWAl-4/pub?start=false&loop=false&delayms=3000&slide=id.p4))
that shows how this can work, starting at slide 20. The idea is to create a
candidate list of matches using item-item, and then reorder using item-user.
I've found this creates "sensible" suggestions using item-item, but truly
personalizes when re-ordering. You can remove obvious recommendations by
removing popular matches or matches the user has already interacted with (I
consider this a business decision rather than something inherent in the
algorithm).

~~~
rahimnathwani
From the Spotify blog post: "We train a model on subsampled (5%) playlist data
using skip-grams and 40 factors."

Any idea what those 40 factors might be?

(The item2vec paper describes using pairs of items that occur in the same set,
i.e. just like using n-grams, but without a fixed n, and ignoring ordering.)

~~~
rm999
That's the dimensionality of the resulting word vectors in word2vec; in the
item2vec paper this is the "dimension parameter m".

------
praccu
Fascinating.

The qualitative comparison suggests that the item2vec may produce _more_
homogenous / boring results, which is kinda unfortunate; the interesting
question in recommendations is how to find "aspirational" recommendations
(things the shopper would not have looked for on their own).

I would really love to see an analysis that did an A/B test using more
traditional CF and this, and see what the revenue lift was, because "accuracy"
as measured here doesn't necessarily map onto the objective that you care
about in the real world.

On the other hand, I played with using collaborative filtering to improve the
personalization of language models for speech recognition for shopping, and in
that context this approach sounds like it might have been super useful,
because it was actually fairly challenging to get broad enough coverage of the
full set of items from a small number of purchases for the purposes of
language modeling. Having good embeddings would have helped a lot.

~~~
flashman
It may be an urban myth, but somebody told me Amazon tweaked their
recommendation algorithm to occasionally provide random items, the thinking
being that people might be persuaded to buy something on the mere suggestion
that they would like it.

~~~
aab0
A multi-armed bandit will occasionally provide 'random' items as part of the
exploration phase. Perhaps that's what's going on, and not any sort of
diabolical self-fulfilling prophecy.

------
apstls
I wonder if the item vectors capture semantics and behave in a way analogous
to word vectors. So, for example, would a PS4 - a PS4 controller = an XBox -
an XBox controller, the same way France - Paris = Greece - Athens? Something
along these lines could maybe be used as a way to find relevant addons/upsells
to show on the checkout page.

~~~
brg
They do. In my current research I've been working on metric embeddings to
solve the question analogies of the flavor "Favorite Sushi Restaurant:Current
City::???:Foreign City". It takes some work to remove the geographic signal
that is overwhelmingly present in fan and checkin data.

------
olh
Does anyone know good resources/research about generating latent vector
representations with iterative processes using numerical analysis algorithms
and not neural networks?

The black-box effect on word2vec and similars puts back some applications like
generalizing linguistics methods to bioinformatics.

~~~
RockyMcNuts
hmmh... I don't believe word2vec or item2vec would be considered neural
network algorithms.

you come up with a model where a numerical vector represents the attributes of
the word or item, you predict the likelihood of a match between words/items by
multiplying vectors together, and then you use numerical optimization, i.e. an
iterative gradient descent algorithm starting from randomly initialized
vectors, to estimate the vectors that work best.

~~~
ves
They're NNs because you learn the representation using RNNs. Everything
afterwards is trivial since you're in a hilbert space. But getting the
representations is the hard part.

~~~
RockyMcNuts
oh, ok. Do you have to use RNNs? I think I've done them without RNNs.

Would love a good RNN word2vec type example with Tensorflow if anyone knows
one.

~~~
olh
Or you could use a pre-trained list like the ones from Google [1]. If not you
probably solved an open problem in the area and publishing it would help us
not to lose time trying to solve it again.

[1] -
[https://code.google.com/archive/p/word2vec/](https://code.google.com/archive/p/word2vec/)

Edit: word2vec on tensorflow tutorial
[https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/...](https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/index.html)

~~~
RockyMcNuts
Yeah, I implemented something based on the code from the Udacity course that
Googlers (Vincent Vanhoucke) did on Tensorflow, basically same I think

their version
[https://github.com/tensorflow/tensorflow/blob/master/tensorf...](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb)

my version
[https://github.com/druce/streeteye_word2vec/blob/master/word...](https://github.com/druce/streeteye_word2vec/blob/master/word2vec.ipynb)

------
galaxy911
This is a great model. I applied it to online retailer data and movies and it
works amazingly well! much better than SVD++ or SVD. I have found it to
perform very well on items with low usage too. I took the authors advice to
change the window size dynamically according to the set size.

------
karmacondon
Github! This should be on github

~~~
akkartik
[https://tensortalk.com/posts/ISw1FSTgJiwaymJXL/item2vec-
neur...](https://tensortalk.com/posts/ISw1FSTgJiwaymJXL/item2vec-neural-item-
embedding-for-collaborative-filtering-oren-barkan-noam-koenigstein)

