

Comparing taste in films using pairwise vector comparisons - johnb
http://blog.goodfil.ms/blog/2012/04/20/project-ingen/

======
baddox
I'd be interested in reading a more general blog article about their theory
behind using "quality" and "rewatchability" as their key user rating. It
sounds reasonable at first, but when I think more deeply about it, I wonder
what "quality" is supposed to be interpreted as. Is it "how much I enjoyed the
_first_ viewing of the film," something more specific like "how skillful was
the camera work" or "how good was the acting," or something more meta like
"how good I think _critics_ or movie buffs would think the film is?"

I've gone through stages of armchair film criticism, so I've thought about
personal ratings a lot. I even drafted a web app to track my viewings and
watchlist, and the rating idea I've liked the most is a stupidly simple
boolean rating. You could call it almost anything: "Like/Dislike," "Good/Bad,"
"Enjoyed/Didn't Enjoy," or even something a bit different like "I'm glad I
watched it/I wish I hadn't watched it."

~~~
geelen
Check out this post on the topic:
[http://blog.goodfil.ms/blog/2011/10/07/a-better-way-to-
rate-...](http://blog.goodfil.ms/blog/2011/10/07/a-better-way-to-rate-films/)

'Quality' is intended to be a more objective score of the craft of the film.
Quality of writing, directing, acting; originality of the idea; how
influential it is.

'Rewatchability' is where your enjoyment gets factored in. We think it's
important to consider 'watching it again' rather than enjoyment first time
because it separates out films better. For example, the film Avatar is quite
enjoyable the first time round, but IMO not particularly worth rewatching.

Btw, I'm the author of both posts :)

~~~
baddox
Thanks for the reply. I've thought about this sort of thing a lot. I made a
super-basic personal web app a few years ago for recording the movies I've
watched and keeping a queue of movies I want to watch. I never added multiuser
support or rating, but I gave ratings a lot of thought. My main problem is
that a lot of words end up being tied together semantically: "good,"
"enjoyable," etc. really just end up meaning personal preference. I
experimented with the whole gamut of ratings: from the most granular (rating
camerawork, acting, humor, effects, etc.) to the least granular ("good" or
"not good").

Should "quality" be affected by your film preferences, or is it meant to be an
objective (and testable) hypothesis about what some group of people (critics,
film bloggers, etc.) would report? If the latter, then I think the
quality/rewatchability metric potentially misses information about "first
viewing enjoyability," which is what I personally think is much more important
than rewatchability.

In the end, I still have trouble with the "quality" metric. If it's meant to
be objective, why use user input rather than some aggregation of real data
(e.g. Metacritic or Rottentomatoes)? If "quality" is meant to be subjective,
then it doesn't seem much different than "first viewing enjoyability," since
"enjoyability" and "personal opinion of quality" seem by definition
equivalent.

~~~
ileitch
Another thought is that there may be some ratio between quality and
rewatability that signifies the first-time (but not repeated) watchability of
a film.

For example, 5 stars for quality and 0 for rewatachability doesn't tell me I
should watch the film if I haven't already. But maybe 2 stars for
rewatability/subjective enjoyment is enough justification to watch it to
appreciate the quality?

------
Locke1689
How is this better than a normalized cosine similarity? The vector being
arbitrary, but in this case being a normalized value on quality and
rewatchability.

Cosine similarity would also let you express pairwise similarity as a single
normalized value, instead of a 9-way comparison.

------
ileitch
Interesting read. Vector Victor sounds like a linear correlation algorithm. I
wonder what coefficient they're using under the hood...

~~~
geelen
It's not actually linear correlation, since we effectively normalise the
pairwise scores to [-1,0,1],[-1,0,1] (nine possible combos). We're exploring
blending in a few other signals along the way, but we wanted to see how far we
could get by discretising the pairwise comparisons in this way.

Once we've collapsed all pairs down to a Vector Victor, we treat matching
Vector Victors as a thumbs up and non-matching as a thumbs down, take the
square root of both then take the lower bound of the Wilson interval as our
ranking function.

More questions? Shoot!

~~~
ileitch
I assume each vector has its own weight? So better in "Better in both
respects" is a stronger sign of similarity than just "Higher quality but same
rewatchability."

So say.. "Same in both dimensions" = 0 "Same quality but more rewatchable." =
+1 "Same quality but less rewatchable." = -1 "Higher quality but less
rewatchable." = +2 "Higher quality but same rewatchability." = +3 "Better in
both respects." = +4 etc..

Then you could pass those to a coefficient like Pearson's R.

x = [0, 1, 2, -1, -3, 4, -4] y = [0, 1, 1, 2, -1, -2, 0]

It'd be an interesting experiment to see what results that gives vs. your
current algorithm.

~~~
geelen
That's something we haven't tested, but my gut tells me contrasting Vector
Victors (like better in both dimensions) is 'worth' more than similar Vector
Victors.

The really significant change would be that agreeing in one dimension (yes A
is better quality than B, but we disagree on which is more rewatchable) still
contributes to your correlation with someone. We're not doing that at the
moment, because it felt like pairwise partial agreement would weaken the
signal - I wanted _real_ agreement (in both dimensions) to stand out.

While there might be a way to capture that with a linear function, I've
favoured solutions that reflect that our ratings are two-dimensional.

~~~
ileitch
Also, if you avoid the normalisation step you could easily factor in the
degree at which user A liked the quality vs. user B, instead of just a 'more'
or 'less' question.

If you factor your vector weights by the scale of your quality rating (0 -
10?) then if user A liked the quality film X vs. film Y +6 more points than
user B's +1, this would give you a more accurate correlation.

Anyway, food for thought. A very fun problem to be working on!

------
chrisberkhout
It's cool they're taking a new approach. I wonder if there aren't scientific
papers on this stuff.

~~~
johnb
I wouldn't be surprised if there were. If anyone knows of any, please share
here.

Glen is bringing a lot of what he learned doing transport modeling and
advertising analysis to the project - but we don't have any reference material
specific to what we're doing now.

