A blog post on the topic:
Somehow the number of movies exploded last(?) year.
If a long movie is a real stinker, not just like a five or six, but a 2/10... it's costing tons of money. Someone's going to know and walk off the project or the producers are going to pull in emergency editors... you're not going to distribute a three hour film reel of pure trash.
You could also observe the same behavior from rating vs #votes graph; as # votes increases, the number of movies decreases. However, rating and # votes correlate quite strongly.
Secondly, psychology will come into play. The more time an audience invests in a film, the more likely they are to seek a positive reward for their time so they don't feel like they have got a bad deal. Thus, they're more likely to rate the film higher than it perhaps otherwise would be. I also believe this holds true for 'art house' films that are difficult to follow and perhaps less enjoyable than a more mainstream film. Audiences will rate them higher to reassure themselves that they haven't just wasted 2 hours watching something boring that they don't understand.
Some links for further reading:
I did a quick and dirty project involving IMDB and Neo4j when I had some time off between jobs over the holidays. I used screen scraping to get the list of IMDB ids for the AFI top 100 movies and then made calls to MyMovieAPI to pull down IMDB data about each AFI film. I wasn't aware of the imdb.com/interfaces at that point, but it wasn't really my goal to do the "best" possible implementation since it was just a learning experience. For those interested, there's a simple overview of the project that shows what (i thought) were interesting questions about the data: for instance, which actors, if any have appeared in 2 or more of the top 25 AFI films?
After looking at imdb.com/interfaces, I'm not sure that it has what I'm looking for. My plan on expanding this project at some point in the future is to start with data from Freebase since it's already presented in a normalized format and then filling in missing details via IMDB as necessary.
My ultimate goal is to generalize the N-degrees-to-Bacon trivia question to work with any two actors, but that requires getting a lot more data to work with.
All in all, it's a fun dataset to play with.
I'd really like to see whether directors or writers have a bigger impact on quality of films. Like a smallish number of critics, including Pauline Kael, I'm deeply suspicious of the auteur theory that everyone kind of unquestioningly accepts.
“A filmgoer seeking out pictures written by, say, Eric Roth or Charlie Kaufman won’t always see a masterpiece, but he’ll see fewer clunkers than he would following even a brilliant director like John Boorman, or an intelligent actor like Jeff Goldblum. It’s all a matter of betting on the fastest horse, instead of the most highly touted or the prettiest.” - David Kipen
This may not hold true. A while ago I was looking into it and they seemed to use more complex weighed average without exact details (possibly using internal user scoring). This may affect the final rating in many ways. More detailed analysis here: http://www.quora.com/Movies/What-algorithm-does-IMDB-use-for...
I was wondering how your post got 3 million facebook shares, then I realized that you left in the default data-href attribute for the facebook docs. You might want to change that.
Thanks for the bug feedback around facebook widget as well, I will fix it.
A couple of comments:
* The first two tables could be joined, with the movies from the first table bolded to distinguish them as "best rated".
* Should be: "not average runtimes(>70 and <120)" (not the other way around)
* The lables of the certificate graphs are on the wrong axis.
In the table, I also gave the release year of the movie in order to avoid name conflicts.
Actually, my first thought was that Melancholia might have been 450 minutes long. Because it felt that long when I watched it.
I think it would be interesting to look at those stats next to economic stats, etc.
I'd also like to see a more granular breakdown of attributes of each movie (movies relating to technology, movies with a workers' union being a strong component of the film, race relations, international relations, etc.) and the # of each of those per year, but that would be much more work.