
Geometry, Data and Neighbors Predict People’s Favorite Movies - pseudolus
https://www.quantamagazine.org/how-geometry-data-and-neighbors-predict-your-favorite-movies-20190522/
======
alister
The article holds up Netflix as a shining example of movie recommendation
engines but Netflix has only 3000 to 5000 distinct movie titles (depending on
your country). You cannot get interesting or surprising movie recommendations
from such a small search space. Imagine the extreme case where Netflix had 1
single movie, say "Twilight". They'd always recommend "Twilight" even if
Donald Knuth wrote their recommendation engine.

You need something like IMDb's catalog of 500,000 movie titles to do something
interesting.

~~~
dmos62
MovieLens [0], by far, my favorite movie recommendation engine, says that the
dataset it was based on [1] contained "2,811,983 ratings entered by 72,916
[users] for 1628 different movies". By comparison, its current biggest
published dataset [2] contains "20 million ratings and 465,000 tag
applications applied to 27,000 movies by 138,000 users. Includes tag genome
data with 12 million relevance scores across 1,100 tags".

On a side note, if you like discovering movies, I can't recommend MovieLens
enough. Its accuracy and ability to recommend obscure movies still blows me
away. On top of all that it's free, as in no subscriptions or ads, and it's
developed and maintained by "GroupLens, a research lab in the Department of
Computer Science and Engineering at the University of Minnesota".

[0] [https://movielens.org/](https://movielens.org/) [1]
[https://grouplens.org/datasets/eachmovie/](https://grouplens.org/datasets/eachmovie/)
[2]
[https://grouplens.org/datasets/movielens/](https://grouplens.org/datasets/movielens/)

------
forgottenentry
How are the data points assigned to their locations in the geometric space? I
didn't notice any mention of that. What makes something "more like" a thriller
and "less like" a comedy, for example? We can all tell the intuitive
difference, but what markers serve best to precisely assign a mathematical
distance?

I know very little about this problem space, so these are actual questions,
not at all rhetorical.

