

Analyzing IMDB Data on 90,000 TV Series - dfkoz
http://dfkoz.tumblr.com/post/88884293461/the-best-of-tv-analyzing-imdb-data-on-90-000-tv-series

======
jonnathanson
Some very interesting analysis, and in some ways, it raises even more
interesting questions about the sampling at IMDB. Who is filling out IMDB
reviews and ratings? How many segments are there among the raters, and to what
extent are they truly representative of the general, TV-watching population?

Looking at the list, for instance, you see that Dragon Ball Z shows up as one
of the highest-rated shows of its decade. Now, perhaps a truly randomized,
representative sample of the population really would rank Dragon Ball Z as one
of the greatest TV series ever made. But my gut tells me that's unlikely. More
likely is that a big cluster of fans of the show have self-selected into
rating it highly. What you're actually seeing, in this case, is the power of a
very vocal, very rabid niche. [1]

This same effect, in reverse, can artificially deflate the ratings of shows
and movies that a vocal minority of the population rallies against. This
appears to be the case with Gunday, one of the lowest-rated movies on the
site, which David Goldenberg of FiveThirtyEight discusses in a very
interesting post: [http://fivethirtyeight.com/features/the-story-behind-the-
wor...](http://fivethirtyeight.com/features/the-story-behind-the-worst-movie-
on-imdb/)

Finally, we need to filter the data set for size. Ice TV, a show from 1996
with all of 6 ratings on its IMDB page, probably wouldn't pass our gut check
for "Best of the 90s," alongside shows like The Sopranos and Friends, which
probably got ratings more or less representative of public opinion.

This is a fascinating subject and a really compelling start. But the next step
in this sort of analysis is developing a reasonable methodology for parsing
and reconciling quality, quantity, and clusters. To give the author due
credit, he expresses the appropriate amount of skepticism or reservation when
showing results. And his observations about genre and overall trends are
fascinating to see.

[1] Of course, this raises some intriguing questions about what, exactly, a
"quality" show is. A show that has a large, thriving fanbase is a great show
to that fanbase. And to the extent it's attracted a sizable, rabid following,
it can probably be said to be a great show in general -- even if it's not
everyone's cup of tea. And that's probably fine. Obviously not every show is
going to appeal to everyone in the population. Nevertheless, strong-niche
shows confound our idea of "generally" great shows and make normative
comparisons very hard to do, unless we start factoring for things like
strength of sentiment.

------
BenderV
"Using a simple scraper written in Ruby, I was able to grab data on the
>90,000 TV shows in the IMDB database"

I know it may be a bit more hard to use than a crawler (sadly), but IMDB give
their dataset for free (under conditions) :
[http://www.imdb.com/interfaces](http://www.imdb.com/interfaces)

(imdb stricly forbit crawler)

edit : nice analysis ! thanks !

~~~
yaeger
True, but I think the rational behind that is that imdb doesn't want someone
creating a software that gets used by a lot of people which implements such a
crawler.

I doubt although I can't say for certain, that imdb much cares about someone
who made a scraper to gather data for a one time aggregation like this.

Of course, for a use case like OP has, the static data imdb offers would also
have sufficed, I'd say. Of course, that data is not as "up to date" as the
data you can scrape off of the web pages, but it would have been sufficiently
"up to date" for such an analysis, I think.

------
stephenaturner
Interesting. Though as mentioned the ratings of shows are largely skewed by
time -- ie: because no one was reviewing shows from the 50s-80s at the time of
airing, it's all viewed through nostalgia and history, so it skews to certain
shows and avoids even reviewing the dreck.

When you get into the 90s and beyond, shows were being viewed and reviewed
contemporaneously and therefore everything was covered and a greater range was
covered so overall ratings for the period actually went down...

Nice analysis of the available data anyway. Also interesting to see the
appearance of certain non-English language shows as well.

------
gggggggg
What happened to the west wing.....

that aside, its hard to compare years with such a service, as hard core fans
come onboard to imdb, this will impact results of new shows. As he says in the
link, its hard to compare to shows on years ago.

