
Movie Review Aggregator Ratings Have No Relationship with Box Office Success - minimaxir
http://minimaxir.com/2016/01/movie-revenue-ratings/
======
femto113
This is a classic case of learning nothing by averaging together two different
populations (as acknowledged in the article, critically-acclaimed indies and
mainstream blockbusters). Its like averaging the height of an NBA team with a
kindergarten class. It would probably be more informative to group movies by
size of release (number of theaters they open in), as well as advertising
budget. Just by eye there seems to be a correlation for well reviewed
blockbusters to do better than poorly reviewed ones.

~~~
sago
Right. For movies over $10m budget, there is a positive correlation, and
because the OPs graphs are log, it isn't a minor one.

Indies form the second clump high in the rankings.

The negative correlation is caused, as the density contours show, by a knot of
highly reviewed indies showing up in the data, confusing the positive
correlation of the studio movies.

~~~
minimaxir
Alright, I double-checked, with appropriate filters on data, and RT scores:

Indie cluster alone:

log-corr: -0.1243865

corr: -0.1673577

Blockbuster cluster alone:

log-corr: 0.230726

corr: 0.2570628

Maybe I should have separated the clusters, and I'll definitely do that for
further analysis (even though at best, the correlation is weak in either
direction, and the log doesn't change much). I did, however, want to address
the general reliance on RT score for any movie, so I stand by the post.

~~~
sago
> the general reliance on RT score for any movie

The reliance of RT score for what? What are people routinely extrapolating
from the RT score that you think is not valid?

Extrapolating budget from the RT score is definitely invalid: you've shown
that. But who does that?

People might go the other way: extrapolate quality from a big budget (it must
be good, it's a big blockbuster), but as you say, that correlation _is_ valid
(though noisy).

So not quite sure what you're standing by, sorry. I don't want to sound rude
and argumentative. I'm not trying to be a jerk, I'm very happy to concede your
analysis is applicable where it is, I just genuinely don't understand what
you're referring to here.

~~~
minimaxir
Very frequently I've heard the question "is this a good movie?" and the
response is "what does the RT score say?" Many online stores include the RT
score inline when you buy a movie too.

This avenue doesn't immediately distinguish whether a movie is a blockbuster
or indie or how much money it made. But then again, some people do read movie
descriptions thoroughly...

~~~
sago
[edit: Read the article again, deleted previous response, sorry]

So you think box office is a better correlate of 'good movie' than RT score? I
didn't get that from the post, but if that's what you're saying, then yes, I
concede you have shown that correlation is not valid.

~~~
minimaxir
Maybe not _better_ , but more important.

I'll update the post with a footnote tomorrow clarifying things.

------
MiguelVieira
Having worked for a short time in the movie industry (at DreamWorks
Animation), I thought this was obvious. And it was certainly true for
DreamWorks' movies. Even people working directly on the movies were terrible
at predicting how they would perform at the box office.

The reason for this is simple: movie critics are self-selecting movie
aficionados. Their tastes just don't reflect the tastes of the public at
large. When you aggregate critical reviews, you're only aggregating the
opinions of a tiny sample of the movie-going public, and that sample largely
shares the same tastes.

~~~
ghaff
And specifically, I suspect that a lot of the high box office films are big,
loud action films which appeal to the theater-going demographic far more than
they do to movie critics.

~~~
jdietrich
The British film critic Mark Kermode wrote about this at length in his book
_The Good, The Bad and The Multiplex_. Spend enough money on a movie and it is
almost guaranteed to be profitable, because sheer extravagance turns it into a
"motion picture event".

[https://www.youtube.com/watch?v=cDmluUKLbYc](https://www.youtube.com/watch?v=cDmluUKLbYc)
[http://www.theguardian.com/books/2011/sep/08/good-bad-
mulitp...](http://www.theguardian.com/books/2011/sep/08/good-bad-mulitplex-
mark-kermode-review)

------
sweezyjeezy
I don't find this convincing because it feels like there must be confounding
variables here. For example, high budget movies may be judged more harshly
overall than indie movies, which are clearly not going to make as much money.

It would be interested to see how the picture looks when controlled for movie
budget, time of year released etc.

~~~
jrcii
As a dedicated reader of movie critiques I can say with absolute certainty
that well known actors and directors are in general held to a higher standard.

------
ohashi
I actually looked at this briefly in my Master's thesis.
[https://lup.lub.lu.se/student-
papers/search/publication/1626...](https://lup.lub.lu.se/student-
papers/search/publication/1626974)

Rotten Tomatoes was the only one that had any hint of meaningfulness. The
others (IMDB, MetaCritic had none). Twitter volume, sentiment and number of
theaters it opened in were pretty good indicators though.

~~~
igravious
And ad or promotional spend? This is a missing factor, surely?

A movie will have a reputation: good, bad, or indifferent. But a movie also
has to be known. Ad spend gets the movie in front of eye-balls at least
insofar as it gets to be known to exist. But also word-of-mouth, which is like
free ad spend. This is why there are slow burners and mega-flops.

Incidentally this is why I have refused to see Titanic, Avatar, and the latest
Star Wars in the theatre. If you scream in my face I'm not going to watch your
movie in the theatre. Hype turns me off seeing something, but that's just me
not wanting to roll with the crowd perhaps, and not a function of hype itself.
Seems like most people don't react this way though.

~~~
ohashi
Didn't investigate it at all. Not really sure if it would or wouldn't be
meaningful. Just thinking off the top of my head, Word of Mouth should be
generated by ad spend. It's measuring ad efficacy in a way. Which might be
better than ad spend (total dollars) for predicting success.

------
pessimizer
Movie reviews are evaluations of the quality of a movie, not the marketability
of a movie. I'd be surprised if there _were_ a direct relationship.

Next article: Evaluation of Paintings by Well-Regarded Contemporary Critics
Has No Relationship to Future Dorm Room Poster Sales Figures.

Edit: Also, a measure of success that doesn't find a way to divide by the
budget somewhere is not a good measure. A lot of flops have made 100 million
dollars, they just cost 120 million to make.

~~~
bfaviero
Gross vs budget is huge, just look at any Blum House film

------
cwyers
Not finding a significant effect when you fail to control for confounding
variables is not evidence of no effect. This is ridiculously bad analysis. His
contour maps basically disprove the headline. How is it on the front page?

~~~
cwyers
Just real quick here. Box Office Mojo has data on 702 movies released in 2014
(I only wanted to do one year because there's no API I know of for this and I
was doing it quickly, I chose 2014 instead of 2015 because there's still many
2015 releases that are still earning money and so I wanted the most recent
"complete" year). The correlation between total box office gross and theaters
shown in is very robust, at .79. So if you look at the top Metacritic movie
from 2014, Boyhood, you can see it has a low gross, ranked 100th on Box Office
Mojo's data. But if you look at the theaters shown in, it was only in 775
theaters. Mr. Turner, the second movie on Metacritic's rankings for 2014, was
only shown in 120 theaters. Birdman, the Best Picture winner and the 16th
ranked movie on Metacritic's list, was in 1213 theaters. Meanwhile, you have
to go to the 21st movie on the top grossing list to find a film that was in
fewer than 3,000 theaters. It's a classic example of Simpson's paradox[1]. You
have two populations, wide-release movies and small-release pictures, and
within each population, there's a positive correlation between critical
approval (as measured by Metacritic) and box office gross, which is picked up
by the contour maps. But because the highest Metacritic scores go to the
movies released in 2000 theaters or less, without controlling for the number
of theaters a movie shows in, it looks like there's a negative relationship
between critical acclaim and box office totals.

1)
[https://en.wikipedia.org/wiki/Simpson%27s_paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox)

~~~
minimaxir
That's a fair counterpoint. I'll see if I can find theater showing data (that
was not in my dataset, and as you note, it's hard to get), and see if I can
normalize.

------
tamana
ticket sales count how many people thought the movie was good enough to watch
(amd that depends highly on marketing spend) not how good they thought it was.

I would like to see this analyses run on box office as a _percentage of
budget_ , not just top line revenue.

------
DanielBMarkham
A few years ago I was listening to "How to listen to and Understand Great
Music"[1] and the speaker made an interesting point about art.

Let's say you take a friend from another country to a baseball game. He's
never seen baseball and knows nothing of the game. He can certainly enjoy
himself -- there will be lots of colors, activity, and it's quite a spectacle.
He may even want to come back.

As he learns more and more about baseball, however, he will have a completely
different experience. It's the same game, but now you see a lot more of the
complexity and nuance.

As a film fan, I use the meta-ratings almost exclusively. I'll go and watch
popular films. Sometimes I'll even enjoy them. But there's a lot going on in
cinema. I like understanding the nuance and detail, and I think well-mace
movies make for better experiences.

1\. [http://www.thegreatcourses.com/courses/how-to-listen-to-
and-...](http://www.thegreatcourses.com/courses/how-to-listen-to-and-
understand-great-music-3rd-edition.html)

~~~
pluma
As someone who's not from a baseball country: I strongly doubt that friend
would enjoy himself. It's not exactly action packed. I think the decisive
factor isn't knowing more about the game but knowing more about the meta-game.

Sure, you will experience it differently if you know what good tactics look
like but most people enjoy it because they are emotionally involved with the
teams and players. This seems to be true with most sports, even for the more
"knowledgeable" fans.

------
rcpt
Aside from carefully constructed surveys, the best indicator of US box office
success that I have found is Wikipedia page views
[http://journals.plos.org/plosone/article?id=10.1371/journal....](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071226)

Outside the US it's a harder problem
[http://arxiv.org/abs/1405.5924](http://arxiv.org/abs/1405.5924)

------
wmeredith
There is no effing way this is true. Everyone I know under 40 checks the RT
score of a movie and factors that into their decision to see it.

~~~
uptownJimmy
I think it is safe to say that fully 1/2 of Americans would seem almost alien
to you if you were to interact with them in their natural habitat(s). It's
true for me, and I was raised among them, before fleeing as soon as I was old
enough to drive away.

I mean no offense to anyone. But there's a lot of folks out there living a
very different lifestyle than the average HN'er.

------
baccheion
Marketing budget (or maybe degree of awareness of the movie's existence),
theatre/screen count, and audience score predict how well a movie will do.
Appeal and marketing, rather than movie quality, has always been what draws
people to the movie theatre.

------
DubiousPusher
Normally, I'm all about research, experiments and data crunching that confirm
or contend ideas we pretty much all take for granted but this makes me wish
there was some very polite way to say, "well duh!"

~~~
minimaxir
Given the "There is no effing way this is true" comment in this very thread,
this may not be as "well duh" as you think.

That's why data analysis is always important, even if it's dumb.

------
heed
A lot people don't realize, but RT also shows a mean rating (out of 10) which
is the same type of scoring as Metacritic. It might not be exposed in the api
mentioned in the article though.

------
jermaink
Interesting! My question is: How do the p-values of your Pearson correlations
look like? Correlations alone are vague. Please insert :-)

~~~
minimaxir
I deliberately avoided using P-values since that metric is used to imply a
_causal_ relationship between review scores and box office gross, which is
definitely not true.

~~~
jermaink
In the evaluation of correlations, it can always be informative to know the
confidence interval for r, with all caution towards p-value interpretation.

Surely, correlation provides information on association rather than cause and
effect (causation should rather be modeled with Granger and other regression
models). Sample sizes and variances will certainly contribute to different
p-value outcomes. This is because p-values reward low variance more than the
magnitude of impact (Type I/II error etc.). If you have p-values, better
report them and add a footnote on how to interpret them.

~~~
minimaxir
> _However, the significance or p-value reflects the probability that the
> correlation does not imply a causal relation._

Technically, in this case, a significance test would answer the question "is
the Pearson correlation statistically significant from 0?" In this case, we
would expect it to _fail_ since it clearly isn't, and is therefore the test is
less helpful/important. (even if it passed, the conclusion would be
"correlations are low in magnitude and therefore do not matter" as noted in
the post anyways)

Finding the exact P-value of a Pearson correlation requires setting up
bootstrapping, which is not something I have handy at the time but will work
on in future posts.

Again, I'm not looking at R^2 and the P-value of a linear regression, which is
different.

~~~
jermaink
It's just a recommendation to improve the reporting, no general defense of
p-values. Pearson does not imply to analyze a causal relationship. I see the
point it's not linear (then you would have had a fitted linear reg, I assume)
but still can tell you that missing p-values may cause arching eyebrows :)

In small sample sizes, correlation can easily be significant, often at the
cost of low confidence. To the opposite, in large sample sizes, the magnitude
of the effect may be lower but at higher confidence. In both cases, results
have to be interpreted with caution. The recent p-value debate points towards
a lot of issues here. For instance, there have been medical studies
overestimating correlations in small sample sizes while other authors seemed
to underestimate their long-term large-sample results with correlations in the
ballpark of 0.15 (p<0.05).

