
A data science investigation of IMDB, Rotten Tomatoes, Metacritic and Fandango - funspectre
https://medium.freecodecamp.org/whose-reviews-should-you-trust-imdb-rotten-tomatoes-metacritic-or-fandango-7d1010c6cf19
======
davidad_
I have many problems with this analysis, perhaps enough to write my own blog
post, but I am lazy so I will just outline my disagreements here (at least for
now).

First: I claim that what we seek in a movie rating is information about
whether we will like the movie, and that this can be formalized as the
expected KL-divergence (information gain) between the Bayesian posterior
distribution (probability of enjoying the movie conditional on its rating) and
the prior distribution (probability you would enjoy a randomly selected
movie). Of course, this will depend on your taste in movies, especially how
much it correlates with others. But, we can _bound_ it by taking the Shannon
entropy of the rating distribution: there is no way we can get more
information from a rating than this! It is this bound that allows us to
penalize the distributions that are heavily biased towards one side of a
discrete scale, like Fandango. However, the "ideal" shape in this context is
far from a Gaussian - it is uniform! The uniform distribution can also be
justified as being calibrated such that the quantile function is linear - a
score of 90/100 from a uniform distribution means a 90th-%ile movie.
Determining a quantile is often a transform we try to perform intuitively on
ratings so such a transform being trivial seems useful.

Second: The Gaussian distribution does not have bounded support! That is, a
rating scheme with what you claim as the "ideal" distribution would have
_some_ ratings with values that are negative or otherwise "off the scale". Not
so ideal! If you wanted to model movie-goodness on an unbounded scale such
that a Gaussian would have sense, then you should transform that scale into a
bounded scale, eg with a logistic function, yielding an "ideal" shape of a
logitnormal distribution, which incidentally can fit the strange bimodal
Tomatometer distribution quite well. Even if you specifically wanted a
unimodal, bell-shaped distribution, at least pick a bounded one like the beta
distribution.

Third: setting aside which distribution you want to penalize distance from or
why, dividing the space into three arbitrary intervals to facilitate the
comparison seems ridiculous. There is already a perfectly good metric on
probability distributions, the mutual information.

~~~
tmoertel
Along the lines you suggest, a while ago I took IMDB's ratings and used their
emperical cumulative distribution function to "flatten" them into something
more useful, percentile scores:

[http://blog.moertel.com/posts/2006-01-17-mining-gold-from-
th...](http://blog.moertel.com/posts/2006-01-17-mining-gold-from-the-internet-
movie-database-part-1.html)

This was about a decade ago, so I'd expect the resulting decoder ring to be
somewhat miscalibrated for today's movie ratings. But the same process would
be straightforward to apply to a more up-to-date data set of ratings.

------
beloch
Another pitfall to be wary of in analyses like these is that imdb's ratings
_change_ over time. New releases typically have inflated scores that regress
over time. Ideally, this sort of analysis shouldn't use anything under a year
or two old, so using _only_ movies from 2016 and 2017 puts this particular
study off to a really bad start.

~~~
alex_g
But isn't that the case for all of these websites, not just IMDB?

~~~
ko27
Rotten tomatoes/meta-critic ratings don't change anymore after a month or so,
unlike IMDB where newer movies are trending downward for at-least a year or
two.

------
SubiculumCode
I was disappointed by the 'data science' in this article. From selecting
distribution as a criterion for the qualitry of a metric, for using Pearson
correlation with non normal data, for using non correlation with Fandango as
tie-breaking criterion, for failing to use external criteria (e.g. ticket
sales) to validate or compare metrics, and to be picky, for failing to discuss
(for example) generalized lin models with link functions to deal with non
normal error distributions.

------
minimaxir
I did a similar analysis a year and a half ago in which I found that yes, IMDb
movie ratings are not uniform while RT/MetaCritic are, but that's only part of
the story. All forms of movie ratings are actually poor predictors of box
office success (especially with Indies/Documentaties; I'd love to look at
rating/BO data again while faceting by genre of movie) (full blog post:
[http://minimaxir.com/2016/01/movie-revenue-
ratings/](http://minimaxir.com/2016/01/movie-revenue-ratings/))

The Four Point Scale
([http://tvtropes.org/pmwiki/pmwiki.php/Main/FourPointScale](http://tvtropes.org/pmwiki/pmwiki.php/Main/FourPointScale)),
while a problem from a utilitarian point of view, is still practical from a
_consumer psychology_ point of view, which is why the popular ratings systems
won't change easily.

~~~
ghaff
I guess it doesn't surprise me that a fan rating site like IMDb skews high.
While there are films that elicit strong negative reactions, on average I
would think that people see and rate films that they expect to like. I know
that there are a vast number of movies out there I would never expect to
enjoy; mostly, I just ignore them. If I'm going to spend a couple hours
watching something I almost certainly have at least neutral to mildly positive
expectations going in.

------
bradknowles
So, the question I would ask is does Metacritic include reviews of movies that
are not in IMDB, or vice-versa? That could definitely skew the scores.

The second question I would ask is whether or not there is a relatively simple
transform that could make the IMDB and maybe even the Fandango scores more
uniform in their distribution, over the same set if movies?

~~~
danso
Seriously doubt that Metacritic would have movies that IMDB would not, as
Metacritic entries exist, ostensibly, when a movie is reviewable. IMDB has
listings for every kind of movie project (in-development, pre-production).

------
platz
I'm not sure that the justification for normality applies when considering
that movies across different historical periods may be regarded as having a
different average quality. Therefore the distribution across historical time
would not be stationary.

That said, I've always preferred metacritic's scores over the others.

------
ZoeZoeBee
Fandango, Rotten Tomatoes, IMDB and Metacritic are owned by content
distributors/creators, Fandango and Rotten Tomatoes are owned by Comcast/Time
Warner, Amazon owns IMDB while CBS owns Meta-critic.

It's no coincidence the accuracy of ratings sites have deteriorated over the
past few years, one of the most glaring violations of trust has been Rotten
Tomatoes pre-certifying movies as "Fresh" and keeping it so despite aggregated
reviews which would contradict a "Fresh" rating. And the even more troubling
trend of "Sponsored" movies like Step receiving the certification.

It should come as no surprise the number of newly released Certified Fresh
Films is increasing despite the quality of films at the box office decreasing

~~~
O1111OOO
Sadly... so much of what we get these days is manipulated. Can hardly wait for
how big business plans to tweak "AI" to further muddy the info we receive./s

I usually first go to Wikipedia and get a summary of the critical reviews,
length of film, some other stats. Head over to IMDB for a synopsis of the film
(wikipedia doesn't do summary very well, focusing instead on entire plot).
Maybe skim over a couple of user reviews (both positive and negative).

If it looks interesting, I'll head over to Youtube and watch the trailer. It's
always the trailer that decides for me. Having watched plenty of films over
the years, I pick up a tremendous amount of info from a 2-3 minute trailer.

~~~
albertgoeswoof
You might as well watch the movie instead of doing all that then :-)

~~~
O1111OOO
Honestly, it takes all of 5 minutes:) Saves me from throwing time and money
away.

------
wslh
Cinephile here and I cannot follow any ranking for a non popular movie and for
blockbusters I can't either. I don't buy that you can have a good ranking for
everyone. You should separate in different clusters. For example, Wonder woman
score at Metacritic? 76. Are you kidding me?

BTW this is my shared ranking:
[https://docs.google.com/spreadsheets/d/1ojCTmnu8-uIXxnas142M...](https://docs.google.com/spreadsheets/d/1ojCTmnu8-uIXxnas142MAksw38qBJjgkmdDtU4YQ2OU/edit?usp=drivesdk)

EDIT: changed 7.6 to 76 based on the comment below.

~~~
ghaff
Clearly the less mainstream your tastes are, the less useful a mainstream
rating is going to be. A site can, as Netflix tries to do, adjust ratings
based on expressed individual preferences but it's marginally effective in my
experience.

For myself, it's fairly rare that I find myself way off from the critical
consensus. I'm more likely to not care for big box office action films but,
then, these aren't usually at the top of critics lists either.

~~~
wslh
The problem I see is that mainstream films scores are trivial to calculate and
are pretty unusable because you just need to the see the list of the few
successful films worldwide. Then you have a lengthy list of films that should
be weighted in a different way.

~~~
ghaff
Worldwide box office definitely skews to big budget action film though. Unless
that's your thing, critic ratings are probably better.

Recommendations is a really tough problem even within a fairly narrow problem
domain like film or music. I find Amazon and Netflix to mostly be pretty bad
and you can be sure they've invested heavily.

------
Sukotto
Whenever this sort of analysis shows on the homepage, I like to put a link to
my favorite movie statistical-ranking site: [http://www.phi-
phenomenon.org/](http://www.phi-phenomenon.org/)

I like it a lot and have found it valuable in figuring which movies to watch
in my limited free time.

------
mulvya
There must be platforms that try to do 'graph-matching' i.e. ask a new user to
rate 25-30 popular movies. Compare against scores of other users who have
rated the same movies. Identify a cluster of users who rate similarly; our new
user is matched with this group. So, for other movies, the user can check the
mean/modal score by members in this cluster. Sound familiar?

~~~
dandermotj
You are loosely describing collaborative filtering [1], a very common
recommendation technique.

[1]
[https://en.wikipedia.org/wiki/Collaborative_filtering](https://en.wikipedia.org/wiki/Collaborative_filtering)

~~~
mulvya
Thanks for the tip. Is there any media recommendation engine that implements
it?

Note though that I'm not talking about the engine itself recommending movies.
It places you within a group and shows you the scoring data of like-minded
reviewers.

------
j_s
[https://www.cinesift.com/](https://www.cinesift.com/) lets you sort by
combined rating.

source:
[https://news.ycombinator.com/item?id=14964701](https://news.ycombinator.com/item?id=14964701)

------
robertwiblin
Can anyone explain why he preferred a lower correlation with Fandango scores
as a tie-breaker between Metacritic and IMDB? I couldn't follow the argument
that a lower correlation shows Metacritic has more reliable results.

~~~
Tomminn
I'm pretty sure there is no sound explanation. If he wanted to pick the one
that was most normal he should have just found the distribution that was
quantitatively most correlated with a normal distribution.

The only rub in doing this would have been deciding on what an "optimal" value
for the standard deviation in a 10 point rating system is, which would have
been an interesting discussion. To me, this is the most important question of
this whole approach. If really big standard deviations are fine, then Rotten
Tomatoes uniform distribution might turn out to be the most "normal". But he
seems to have totally glossed over this important fact, which is the real
difference between IMDB and the other systems. With IMBD, the standard
deviation of the scores is small compared to the range of possible scores. As
a result, if a movie is rated 9.0, you know its gonna be pretty damn great,
and a high 9 or 10 would suggest this is one of the greatest movies ever made
by some distance. That's the kind of information you can't get on a rating
system where 5% of movies get 5 stars.

