>IMDb grants you a limited license to access and make personal use of this site and not to download (other than page caching) or modify it, or any portion of it, except with express written consent of IMDb. This site or any portion of this site may not be reproduced, duplicated, copied, sold, resold, visited, or otherwise exploited for any commercial purpose without express written consent of IMDb. This license does not include any resale or commercial use of this site or its contents or any derivative use of this site or its contents.
(and no, I'm not trying to be a negative nancy. I've wanted to analyze imdb for awhile but have known that it's not possible to do so without breaking TOC)
Right, which brings us to the question: who actually owns the data, i.e. where did IMDb get its data? Is a movie's crew public information? Will the studios give this information to me if I ask them?
I think it would be interesting an very useful to have a db of information of all Hollywood movies (my guess ~50-60K) and make it freely available.
I think it matters a lot if your intention goes beyond playing with the data into using it commercially. The question is: Can I download the IMDb data (by any means necessary) and use it for my startup. To me their License rules this out.
My question was: if the data is public, can IMDb enforce this hold on it. Probably not, as there's precedent of courts not siding with Museums who tried to shut off access images to the objects they hold, citing the effort required to take photos, etc.
Well, facts are generally not copyrightable. But the specific representation of the data on the IMDB website or through the API probably is copyrightable. So making an unauthorized copy might be illegal, and redistributing that raw data is definitely a violation of copyright.
But if you get the data from IMDB, and "substantially transform" the actual representation into something else, they can't claim an infringement just because you copied their facts.
But then again, if you break the ToS, they might get you on "unauthorized access of a computer system" with commercial intent, so huge fines etc. even if there is no copyright violation.
Well, we're not talking of GB of data here, or are we? Let me, see: I think about 1K movies are produced every year, halve that and multiply by 100, you get about 50K movies total. Let's double that to account for shorts, independents, etc. and we get 100K movies in the db. How much info is for one movie? The crew of Titanic from IMDb as text is around 85K. Let's say less than 256K per movie. So we're looking at about 13GB of data. With lossless compression, e.g. LZW and some austerity, say ~5GB.
That's not tiny, but not huge. And the amount of download bandwidth will not be much, due to long-tail effects. A company like Google or Amazon, who stand to benefit a lot from such a db can easily accommodate this.
Yeah, I meant one that wouldn't charge ~$15K for commercial access. One that would charge per db access would be nice.
But it's not exactly the money (after all fifteen grand, although excessive is not prohibitive) but the unreliability factor: if you're building a business on a db API there should be a warranty that the company won't decide to abandon it or cut your access unreasonably.
Why not just download original db files from http://www.imdb.com/interfaces ? I mean parsing the whole db through the API may not be a good idea.
There's a tool (http://www.jmdb.de/) which parses the tables you need to mysql or postgre.
Wow. Props to AMZN. How long have these files been available?
I prefer to use my own choice of parsing tools and database software and really appreciate provision of raw files like this.
But I'm not sure what the purpose of requiring attribution is if the data can only be used for personal use. Assuming we comply with the license and do not share the data, who else besides the user is going to see it?
If a user builds a better movie database with this data, he must not share it with the world or even his neighbor. Sorry, them's the rules.
User-generated content. Make a list of factual information. Accept submissions from users. Watch it grow. Then partner with a large company to sell products. And sell licenses to the data for $15K/year. (Give the top contributers a "free" membership to something to keep them from suing you.) Yeah, that could work.
Movie information, restaurant reviews, video clips, you name it. The miracle of user-generated content.
Interesting. I've got a big XBMC media setup, and scraping metadata from a local source could certainly speed things up nicely instead of downloading from IMBD constantly.
I found this really cool. last night I knew nothing about Perl or mySQL, and just working through little things like this can really get you going on learning code and just grasping a basic understanding of the syntax. But no PHP code linked, just a video shot? I was hoping to get that part going as well.
Awesome, thanks. I saw lots of other interesting hacks on your blog too, I intend to play around with some of them, it's really the best motivator for me when it comes to learning programming.
>IMDb grants you a limited license to access and make personal use of this site and not to download (other than page caching) or modify it, or any portion of it, except with express written consent of IMDb. This site or any portion of this site may not be reproduced, duplicated, copied, sold, resold, visited, or otherwise exploited for any commercial purpose without express written consent of IMDb. This license does not include any resale or commercial use of this site or its contents or any derivative use of this site or its contents.
(and no, I'm not trying to be a negative nancy. I've wanted to analyze imdb for awhile but have known that it's not possible to do so without breaking TOC)