I recently built an entire application with licensed theater showtime data. The data is licensed by two companies in the USA, Tribune Media and Westworld Media. One or both of these companies will approach you should you scrape data from their affiliates. Eventful.com, for example, is a subscriber to the showtime data from Tribune media.
The application and company we built has since shut down for a variety of reasons, not the least of which was the cost of licensing.
nickadams reply is completely spot on. Either license the data or go home.
nickadams is also spot on regarding the complexity of the interfaces to which you would need to integrate. The extraction, translation and loading of data is hugely problematic and time consuming. The providers data comes from theater owners and is therefore, by definition, inconsistent. Theater owners like to put marketing information into data fields on an inconsistent basis. For example, under Amenities will be "50% of on Tuesdays."
If you want to talk in greater depth about dealing with Movie showtime data, get back to me.
As for the grey area, believe what you want. nickadams and I are in perfect agreement on this subject. We've both been in the trenches and know the territory well.
TL;DR: You might get a cease and desist or worse for doing things like this. It can be a headache. Be careful.
First, I have no idea about all the legal intricacies or accepted uses in this space, so this obviously may not apply in every case -- especially considering our implementation of this was in commercial software (but if anyone has any plans to implement this approach in commercial software, beware).
Backstory: We built a CMS for small weekly newspapers. One of the features of that CMS was automatically generated real-time Movie Showtimes. This was handy since previously the newspapers would manually add this info to their websites and it'd quickly be out of date (showtimes change occasionally). And while they had this info for their papers, the means to get this data on their sites was painful. They weren't tech savvy number one, but even if they were, these API's from TMS (Tribune Media Services -- the big guys), CinemaNow, etc are all ridiculous. They don't charge based on API call and they are so poorly designed you end up creating wrappers for their APIs and its all a rats nest.
Anyway, our goal was to make this easier, so we did something very similar to the poster -- essentially scraping the web content and repurposing it on their sites. Admittedly its a bit of a grey area, but we figured it wasn't egregious. Especially since most of them already had the data legitimately for their printed papers. Like the poster we also left the Fandango link in there so users could just purchase a ticket online if they wanted. We didn't see the harm. Fandango still got the purchase whether the user got there from Yahoo, Google, or sites on our CMS.
What was the result of this? Cease and desists. First from Fandango. They saw the referrals from our sites, then saw that we were displaying showtimes without being licensed to do so (how they knew this I'm not sure). They told us they had to protect their licenses (they apparently licensed the showtimes from TMS so they felt they needed to act on their behalf or something) so they sent us a cease and desist. Obviously Fandango also passed this info along to TMS too, so next up was a cease and desist from them.
The final result was us just paying to use the API's. Which we probably would have done from the beginning if they made any sense at all. It took many phone calls and emails to get them to create a "special" package in place for us. We figured we could just pay X per API call for showtimes at theaters in Y zip codes. Nope. Their packages seemingly had no concept of modern web services and instead were structured based on strange market blocks each with a daily minimum of requests that had to be made and so on. Not to mention the fact that we needed to do all this for the multiple clients in multiple regions using our hosted CMS (these concepts of software as a service were foreign to them).
It was a while ago so maybe I'm embellishing the ridiculousness of their packages (I'm probably not) and maybe things have changed. But overall, I think it's safe to say, if you plan on scraping these showtime at least don't leave in the Fandango link (they'll find you!). And probably more so, just be careful doing things like this unless you're prepared to read some cease and desists and reacquaint yourself with a fax machine.
> They saw the referrals from our sites, then saw that we were displaying showtimes without being licensed to do so (how they knew this I'm not sure)
The referrals are likely the headers you send when scraping. i.e. Referer: <your newspaper>.tld. Depending on whether you actively set the User-Agent header, that might also have contributed to them catching on (be it omitted User-Agent, "urllib2", "<newspaper> Bot 1.0 +<newspaper>.tld; don't sue us", and so forth). If you run a content provider, and try to protect your content/pageviews/API, the lack of either of these headers is also worth looking out for.
Good point. I should have mentioned we weren't scraping Fandango itself though (we were using another source). So more so, how did they know that we didnt have a license to display these showtimes. How did they know we were scraping and not just displaying legitimately. Sure they knew we didn't have a license to display their link, but I don't see how that alone would lead them to the other conclusion. If that makes sense...
I'm curious to hear the actual legal explanation. To me, it seems like showtimes are facts and my understanding is that it's fine to re-publish facts. Of course, that doesn't mean you're allowed to scrape and re-publish the facts directly. Is that the main issue here?
That was our initial justification/rationalization too... They were public facts. It'd be like repurposing the current temperature. But apparently that's not the case here. They actually own the rights to these facts and they had court cases they won to back it up.
So the scraping was an issue too, yes. Thats probably always a legal grey area. But in this case, it was also just the basic reuse of those facts through any means, scraping or otherwise.
So much so that we couldn't even link to google movies if the URL had a zip code passed as a query. Thatd be illegal since google alone licensed the data, not us. It seems crazy but we didn't have the means to fight it. (Though I'm assuming a programmatic link like this in a commercial app is different than someone linking on their blog.)
The closest analogy I could think of is live sports broadcasts where technically even description or dissemination without consent is illegal. That's why live blogging for example isn't possible in many situations.
This might be legal, scraping is a gray area. There is no way that reelbox is going to persist if they get a strongly worded letter though, so the legality really doesn't matter.
The application and company we built has since shut down for a variety of reasons, not the least of which was the cost of licensing.
nickadams reply is completely spot on. Either license the data or go home.
nickadams is also spot on regarding the complexity of the interfaces to which you would need to integrate. The extraction, translation and loading of data is hugely problematic and time consuming. The providers data comes from theater owners and is therefore, by definition, inconsistent. Theater owners like to put marketing information into data fields on an inconsistent basis. For example, under Amenities will be "50% of on Tuesdays."
If you want to talk in greater depth about dealing with Movie showtime data, get back to me.
As for the grey area, believe what you want. nickadams and I are in perfect agreement on this subject. We've both been in the trenches and know the territory well.