Your own movie database in 5 minutes with IMDb API and Perl

danso · on Nov 21, 2011

The terms and conditions seem pretty clear that you can't save site content except for page caching: http://www.imdb.com/help/show_article?conditions

>IMDb grants you a limited license to access and make personal use of this site and not to download (other than page caching) or modify it, or any portion of it, except with express written consent of IMDb. This site or any portion of this site may not be reproduced, duplicated, copied, sold, resold, visited, or otherwise exploited for any commercial purpose without express written consent of IMDb. This license does not include any resale or commercial use of this site or its contents or any derivative use of this site or its contents.

(and no, I'm not trying to be a negative nancy. I've wanted to analyze imdb for awhile but have known that it's not possible to do so without breaking TOC)

AdamTReineke · on Nov 21, 2011

There is some data available for personal, non-commercial use. http://www.imdb.com/help/show_leaf?usedatasoftware

However, licensing the database is super expensive: "We offer licensing packages that start at US$15,000 per year." http://www.imdb.com/help/licensing/contact

Jun8 · on Nov 21, 2011

Right, which brings us to the question: who actually owns the data, i.e. where did IMDb get its data? Is a movie's crew public information? Will the studios give this information to me if I ask them?

I think it would be interesting an very useful to have a db of information of all Hollywood movies (my guess ~50-60K) and make it freely available.

sp332 · on Nov 21, 2011

It doesn't matter who owns the information. If you break the Terms of Service, you don't get to use their service to access the information.

Jun8 · on Nov 21, 2011

I think it matters a lot if your intention goes beyond playing with the data into using it commercially. The question is: Can I download the IMDb data (by any means necessary) and use it for my startup. To me their License rules this out.

My question was: if the data is public, can IMDb enforce this hold on it. Probably not, as there's precedent of courts not siding with Museums who tried to shut off access images to the objects they hold, citing the effort required to take photos, etc.

sp332 · on Nov 21, 2011

Well, facts are generally not copyrightable. But the specific representation of the data on the IMDB website or through the API probably is copyrightable. So making an unauthorized copy might be illegal, and redistributing that raw data is definitely a violation of copyright.

But if you get the data from IMDB, and "substantially transform" the actual representation into something else, they can't claim an infringement just because you copied their facts.

But then again, if you break the ToS, they might get you on "unauthorized access of a computer system" with commercial intent, so huge fines etc. even if there is no copyright violation.

hvs · on Nov 21, 2011

Who would be paying for this DB to be hosted? It's always interesting to talk about making things "free", but in the end someone is paying for it.

Jun8 · on Nov 21, 2011

Well, we're not talking of GB of data here, or are we? Let me, see: I think about 1K movies are produced every year, halve that and multiply by 100, you get about 50K movies total. Let's double that to account for shorts, independents, etc. and we get 100K movies in the db. How much info is for one movie? The crew of Titanic from IMDb as text is around 85K. Let's say less than 256K per movie. So we're looking at about 13GB of data. With lossless compression, e.g. LZW and some austerity, say ~5GB.

That's not tiny, but not huge. And the amount of download bandwidth will not be much, due to long-tail effects. A company like Google or Amazon, who stand to benefit a lot from such a db can easily accommodate this.

swores · on Nov 21, 2011

Amazon already have a database like that, called imdb. ;)

Jun8 · on Nov 22, 2011

Yeah, I meant one that wouldn't charge ~$15K for commercial access. One that would charge per db access would be nice.

But it's not exactly the money (after all fifteen grand, although excessive is not prohibitive) but the unreliability factor: if you're building a business on a db API there should be a warranty that the company won't decide to abandon it or cut your access unreasonably.

swores · on Nov 22, 2011

My point was that, seeing as Amazon own imdb, they probably don't charge themselves ~$15k for access.

dashr · on Nov 21, 2011

anyone know of a good place to get Movie data (and possibly Cast/Crew) info for free or low cost for a commercial project?

huskyr · on Nov 21, 2011

Wikipedia has loads of movie data. You can use the API to extract it, or use dbpedia.org for a more friendly view.

senthil_rajasek · on Nov 21, 2011

freebase.com

pyre · on Nov 21, 2011

themoviedb

phaylon · on Nov 21, 2011

There's also a CPAN DIstribution for that: https://metacpan.org/release/TMDB

cabirum · on Nov 21, 2011

Why not just download original db files from http://www.imdb.com/interfaces ? I mean parsing the whole db through the API may not be a good idea. There's a tool (http://www.jmdb.de/) which parses the tables you need to mysql or postgre.

1010010111 · on Nov 21, 2011

Wow. Props to AMZN. How long have these files been available?

I prefer to use my own choice of parsing tools and database software and really appreciate provision of raw files like this.

But I'm not sure what the purpose of requiring attribution is if the data can only be used for personal use. Assuming we comply with the license and do not share the data, who else besides the user is going to see it?

If a user builds a better movie database with this data, he must not share it with the world or even his neighbor. Sorry, them's the rules.

ig1 · on Nov 22, 2011

Since IMDB started, it was originally a volunteer contributed project.

10101010101 · on Nov 22, 2011

Interesting.

So, 1. the text files have long been publicly available, since before the acquisition and 2. it was not a commercial project prior to acquistion.

Who wrote the non-commercial license? Was that also written before the acquisition?

ig1 · on Nov 22, 2011

Wikipedia has the history of imdb:

http://en.wikipedia.org/wiki/Internet_Movie_Database

10101010101 · on Nov 22, 2011

User-generated content. Make a list of factual information. Accept submissions from users. Watch it grow. Then partner with a large company to sell products. And sell licenses to the data for $15K/year. (Give the top contributers a "free" membership to something to keep them from suing you.) Yeah, that could work.

Movie information, restaurant reviews, video clips, you name it. The miracle of user-generated content.

jc4p · on Nov 21, 2011

Why are you giving props to Amazon?

1010010111 · on Nov 21, 2011

Because I believe they acquired IMDb many years ago and would have the final say in how the data may be used.

bgaluszka · on Nov 22, 2011

Files've been there before acquisition

szabgab · on Nov 21, 2011

Unfortunately that Perl code is not a good representative of how one should write Perl in 2011.

gglanzani · on Nov 21, 2011

Do you care rewriting it for us(me)? Just curious for how perl should look like now

peteretep · on Nov 21, 2011

I'm a complete cowboy, and would probably write something like: http://pastebin.com/mvjTG9XG

Edit: spot where I missed a comma :-P

autarch · on Nov 21, 2011

Don't quote the variables, use placeholders. It's much simpler, and it's safer.

$dbh->do( 'INSERT INTO movie_collection VALUES ( ?, ?, ... )', undef, @{ $data->{movie} }{@fields} );

pyre · on Nov 21, 2011

  > knee-jerk reaction

Fixed that for you.

peteretep · on Nov 21, 2011

Sure. Where's the dbh coming from in this example though? And how do I get it out again nicely to print the SQL to the command line?

rjbond3rd · on Nov 21, 2011

It doesn't exist in the example, which is only creating the INSERT statements, presumably to pipe into, e.g., the mysql command line client:

mysql -u user -p < inserts.sql

If the $dbh were in the example, then:

(a) you could avoid that (eek!) archaic escapeSingleQuote() sub, e.g., my $released = $dbh->quote($released), but much better:

(b) as mentioned, use the SQL placeholders to avoid quoting altogether, but

(c) if you really want to print the INSERT's, just do method (a). Start by declaring the $dbh, e.g. for MySQL:

my $dbh = DBI->new( 'DBI:mysql:my_db', 'db_user', 'db_pass');

pyre · on Nov 21, 2011

Made the SQL portion a little more readable: http://pastebin.com/MNzD0NSE

rjbond3rd · on Nov 21, 2011

Ah, much better.

[It doesn't need saying but one could be even lazier and instead of generating a timestamp, one could use NOW() or an automatic timestamp field.]

eCa · on Nov 21, 2011

I don't have time for that..

Looks good though, much more readable.

anonova · on Nov 22, 2011

If anyone is looking for a movie database with a less strict terms of use, check http://www.themoviedb.org/

This is the service "used by many popular media centers like Moovida, XBMC, Plex, MythTV and MediaPortal."

They have a pretty nice API too. http://api.themoviedb.org/2.1/

pavel_lishin · on Nov 21, 2011

Argh, a screenshot of code.

tristanperry · on Nov 21, 2011

The code is linked in a few places:

http://bobbelderbos.com/src/moviecollection/getMovieData

It is annoying though; a link doesn't do those using screenreaders (or search engine bots!) much good.

patrickk · on Nov 21, 2011

Interesting. I've got a big XBMC media setup, and scraping metadata from a local source could certainly speed things up nicely instead of downloading from IMBD constantly.

See:

http://lifehacker.com/5536963/the-ultimate-start-to-finish-g...

In particular:

http://lifehacker.com/5505849/how-to-whip-your-movie-and-tv-...

ImprovedSilence · on Nov 23, 2011

I found this really cool. last night I knew nothing about Perl or mySQL, and just working through little things like this can really get you going on learning code and just grasping a basic understanding of the syntax. But no PHP code linked, just a video shot? I was hoping to get that part going as well.

bbelderbos · on Nov 23, 2011

thank you. I uploaded a the php code of the example site you saw in the video: http://bobbelderbos.com/src/moviecollection/site_php

ImprovedSilence · on Nov 28, 2011

Awesome, thanks. I saw lots of other interesting hacks on your blog too, I intend to play around with some of them, it's really the best motivator for me when it comes to learning programming.

bbelderbos · on Nov 28, 2011

I am glad to hear that. Just let me know (via my contact form) if you have any questions, feedback or interesting topics I could investigate. thanks

bbelderbos · on Nov 22, 2011

thanks all for your feedback. this is just for personal use. thanks for the perl suggestions, I am just learning