

MDB: IMDB data for a folder of movie files. Sort by ratings/genre/runtime - legaloslotr
http://legaloslotr.github.com/mdb

======
click170
How do you interface with IMDB? There are two ways I can think of but neither
would work well in this scenario. Option one is to download their offline
dataset[0] which is huge and which you aren't allowed to distribute with your
program so each user would have to redownload it. Option two is to scrape
their site, which they don't allow but is possible. There is no API to my
knowledge, I tried to write a similar app once before, and they refused to
give permission to scrape their site.

[0] <http://www.imdb.com/interfaces/>

~~~
legaloslotr
Umm... well, there is this site <http://imdbapi.com>; which does the scraping
(or maybe they have the offline datasets!), and then you can get json replies
from them.

~~~
waterlesscloud
Their source is shown as Freebase, which doesn't have anything to do with
IMDB. It's freely usable data as long as it's attributed.

I suspect they may have trademark issues with their site name, but their data
looks free and clear.

EDIT: Nirvana has a good point about the imdb ratings. That may not be legally
obtained.

~~~
nivla
I use IMDBapi and for one of my project and I can assure you that they are
indeed scraping IMDB. The ratings, desc and the poster I get back is ditto to
that of IMDB. It is a great service and I have also donated to them but as far
as legality goes, not sure how they support IMDB's terms and conditions.

------
jimmyjim

        Automagically resolves file names like "Eva.2011.R5.XviD.VaLeBo.mkv"
    

Out of curiosity, how? Regex matching? Does it miss a good amount of times or
what?

~~~
legaloslotr
Yeah mostly regex. Plus some heuristics. Works most of the times. Pretty
efficient for the simple amount of heuristics it uses. You can see the code
here
[https://github.com/legalosLOTR/mdb/blob/master/MDB/DBbuilder...](https://github.com/legalosLOTR/mdb/blob/master/MDB/DBbuilder.py)
Check out the function get_movie_name!

~~~
wingerlang
Wouldnt this break a movie like 2012?

    
    
        #2 remove year and stuff following it
        filename = re.sub('\d\d\d\d.*', ' ', filename)

~~~
legaloslotr
Yeah it totally would! Thanks for pointing this out. Btw, I am planning to use
another library that has better detection rates -
<https://github.com/wackou/guessit>

------
pooriaazimi
This is great. I always wanted to create something like this!

But shouldn't there be a button that lets you open the video you're viewing?
Clicking and double-clicking don't do anything...

~~~
legaloslotr
Yeah! That is a high-priority item in my todos! :)

~~~
pooriaazimi
Great.

Another feature, that might be trickier to implement: like in web browsers,
pressing middle button could present a little gizmo that helps scrolling
_faster_ (I though about it for five minutes and can't think of a better way
to describe that thing).

~~~
legaloslotr
Yeah.. this might be a bit trickier! Anyways, isn't the scrolling good enough
with the mouse scroll wheel?

~~~
pooriaazimi
Well, I'm almost always on my Mac and use the trackpad, so I always feel a
little awkward holding a mouse in my hand. But even so, I think if you have a
large enough library (say 200 movies), it would take a lot to scroll to the
end (I tried it right now: 40 seconds).

~~~
legaloslotr
Btw, so MDB works fine on a Mac? Never tested it! Any Mac-specific
comments/bugs/feature reqs?

~~~
pooriaazimi
Unfortunately I haven't tested it on a Mac yet (I didn't know it could work,
as the Windows GUI seemed native). I use Mac all the time, but my movies are
on my old Windows PC.

I'll test it later on and see if there's any bugs :)

------
ancymon
Similar project is here: <http://code.google.com/p/movies-js/>

------
CrazedGeek
Ooo, I like this! Will it work on TV episodes or just movies?

~~~
legaloslotr
Glad to hear that you liked it! It works for everything that IMDB supports!
Right now, it tries to show the IMDB info for _each_ episode separately.
Maybe, later I will add an option to consider the series as a whole.

------
nirvana
This reminds me of the Tragedy of IMDB, once again. Many people may not know
or remember, but IMDB (and CDDB which had a similar fate) were some of the
great early "open source" works of the internet-- millions of movie (and
music) buffs painstakingly compiled information to create these massive
databases. They gave the sites the data for free as part of an agreement that
the sites would make that data available for free to everyone.

They grew like crazy and became really useful-- for instance in the early days
of playing CDs on computers the CD programs (this is before MP3 rippers) would
connect to CDDB to look up the tracks to display while playing the CD (since
this info wasn't on the CD in digital form.) Later MP3 rippers used this data
to automagically put the right metadata into MP3 files.

But eventually the dotcom mania happened and everything with a dotcom suddenly
was valuable. So it isn't a surprise that these guys who created these
volunteer sites sold them out. IMDB, which uses data created voluntarily by
members of the internet, sold to Amazon which now charges a massive premium
price for access to the data (they keep a subset of it free on restrictive
terms, possibly as terms of the sale, but they really try to limit the utility
to maximize their profits among those who want to know about in production
movies, etc.) It used to be that IMDB was a great place to find out info about
in-production movies, etc, but that data is now stripped (Though the titles
remains so people think there just is no data.)

When I was an employee of Amazon, I had a free imdb-pro account. There's a
whole lot of data IMDB is not showing you unless you pay. And of course, IMDB
is still crowd sourcing most of its content!

This has extended to the data you can download. Back in the 1990s you could
download the entire IMDB database. No more. The page about their "offline
dataset" calls it a "Subset", but they never say what it is a subset of (%1 of
the movies? Could be it seems pretty small.) The "movie-database-faq" we are
to use as guide to the data is the rtfm faq (even with links to rtfm.mit.edu)
from back in the day when IMDB was a free database, and apparently hasn't been
updated in 15 years.

Their offline dataset does not include Ratings. In fact the data seems really
almost deliberately useless (but I just started looking at it.) For instance,
Movies.list is just a list of movie titles. Complete-cast.list just indicates
whether they have verified the complete cast - by title. Either they are using
the movie title + supplementary information and year as the foreign key (which
seems fraught with peril) or there isn't a good way to reconstruct a
relational database from this data.

Sorry, excuse me for that rant. It just seems fraudulent in a way to crowd
source a database from the community with the promise of making it available
to the community and then sell out and have that data taken away and used for
profitable purposes.

This cuts off a lot of useful apps (like the one this topic is about
originally) that would otherwise be built easily using this data. (I think
imdapi may be scraping, haven't' figured that out yet.)

~~~
legaloslotr
So, is this legal? As in, what imdbapi.com is doing or what my app is doing!
Does imdb have a FAQ regarding apps somewhere?

~~~
nirvana
I'm still trying to figure out what imdbapi is doing exactly. They cite
freebase and wikipedia (at the bottom of their home page) but neither of those
have imdb ratings.

What data imdb makes available is here, along with references to their ToS:
<http://www.imdb.com/interfaces/>

It seems the data IMDB makes available is nearly useless (or I'm
misunderstanding it) and doesn't include ratings.

This implies IMDB API is scraping IMDB. Whether that's legal or not, I can't
say. But I can say this- IMDB was created on the work of many users who
licensed their efforts under terms that were, in my opinion, broken when IMDB
sold to Amazon, and thus I think scraping is moral.

