Hacker News new | more | comments | ask | show | jobs | submit login
I scraped all of Fandango's movie listings (readypipe.com)
118 points by chadmhorner 21 days ago | hide | past | web | favorite | 58 comments

I can vouch for their blurb on CMX in Miami. Truly rapacious experience. $28 and the popcorn isn't free, and service was awful (food delivered in the last 15 minutes of a 2-hour movie after ordering before the movie started). /rant

how do they stay in business?

my theories are: no other super close theater (though one opened less than a mile from them at 60% of their price), or grandfathered in super cheap rent (they opened with the new building), or tourists don't know better, but never return.

This is really interesting, even as someone who is not from the US.

I find it astonishing that it's only 13 bucks for a movie ticket over there (around 11 euros). Here in Finland tickets can easily cost over 20e if you see a movie on the weekend.

Meanwhile, in France, they’ve been doing monthly subscriptions to the movie theatre since... forever for 20 EUR/month. 17 if you’re 26 or under.

The theater near me charges $9.25 for an evening show, but it's usually worth it to drive 10 miles or so to another one where the same ticket is $5. (Their concessions are considerably cheaper too, which in most places is as big a factor as the ticket price, if not bigger.)

How much do you pay for an oversized soda drink and a family-sized bag of popcorn for one? I believe the concession stand is a major source of revenue for theatres compared to the ticket price itself.

Yes. I read on here a while ago that theaters actually don’t receive much revenue from movie tickets, due to the studio negotiating a 60% or so share of ticket sales. So most of the money has to come from expensive concessions.

Hmm, if that were true as it's said to be, you'd think all theater chains would offer cheap subscriptions, since getting people in seats is way more important than maximizing their revenue from ticket sales.

Hamburger smuggling is half the fun of theaters

6 euros in the UK tho when I went to glass last week

Wow, where and when? Easily £18 (≈ €20.5 ≈ $23) at Odeon London, "on the weekend" as per parent post.

£5 per head here in Sheffield for Saturday showings. London ain't right.

You can get that price in London too e.g. at the Peckhamplex. GP probably went to the Odeon flagship cinema in the west end.

My Local Vue Bedford UK

Is this the same technology that was used by Yipit to aggregate daily deals back in the Groupon/Living Social days? What languages do you support and how are you different from scrapinghub.com ?

Yes, we've been improving our scraping technology for the past 8 years as we've worked on YipitData (the #1 provider of web data to wall street).

Python is the only supported language right now. Scrapy is an awesome project, but we have a very different approach. We strive to be Flask instead of Django.

If you want to, you can use ReadyPipe entirely in the browser through jupyter notebooks instead of needing to setup a local environment. This is especially helpful with more complicated systems using Selenium and Puppeteer. We discuss a lot more of the features that differentiate us on our homepage: https://readypipe.com/ and the docs https://docs.readypipe.io/

Feel free to reach out (email in profile) with additional questions.

Great piece of natural content marketing and demonstration of product. Keep it up.

Given the effort that's clearly gone into this article, it would seem appropriate to mention/link to the underlying product near the top of the article. Good job, marketing in a way that didn't annoy.

Is the author aware that the prices depend on the day of the week? Competition in Utah has driven all the big theaters to charge only $5 for all movies every Tuesday, making the movies cheaper than even Wyoming. I'm curious to learn whether the same thing has happened elsewhere.

Yep! We scraped on both weekends and weekdays, and also limited our analysis to "Adult" tickets for "Standard" class. Typically the special Tuesday tickets (they are often on Tuesdays) will have a different designation.

Possible we didn't catch everything, but in aggregate it should be a reasonable estimation.

And the type of screen, a single screen 1960's area cinema isn't going charge as much as a flagship cinema.

I believe this is the case for AMC theaters even in San Francisco.

Same in Oregon.

very cool! I always wanted to know these things more often

I've done the SF/NYC/Miami circuit, completely skipping the heartland like everyone else, and these ticket prices are baked into my budget.

Oh you want 3D/Special Sound System/random perk? Prepare for $22 and I'm okay with that

For the actual residents of these cities, there isn't often a place for you to watch something you chose to see in a loud usually spacious environment. These are still undervalued entertainment experiences, amongst the sea of entertainment choices, for actual city residents.

I wonder what the industry's own pricing models show

Is this a violation of Fandango's TOS?

Most likely, although there isn't a practical issue unless they redistribute the raw data.

seems like they are advertising a scraping service -> "Want to get the data yourself?"

Yeah, that's a legal gray area. (also, from looking at the example code, it's unclear what the service is value-adding anyways; you still need to do most of the legwork by figuring out what/how to scrape).

It is not gray at all, it's a violation of their terms plain and simple. They only gray part is whether or not it's enough of a bother to Fandago to actually initiate legal action.

There’s no login required and consent to TOS is not necessary to access a site, or scrape its data.

While it is a TOS violation is this data being scraped really hurting anything? I dont think so.

That’s why I said in practice nothing would happen. However, if the data is used to build a competitor or it contains PII, it could be problematic.

Plus it consumes server resources and bandwidth.

And some scrape so aggressively that it's like being DDOS'd.

Hopefully they offer him a job before taking legal action over something that petty.

Probably and who gives a shit?

Do people actually care about lengthy tos?

"Oh I'm not supposed to do that" hits enter

Generally, risking a lawsuit (or worse, criminal charges off of CFAA) is a net negative expected value.

OkCupid threatened as such when researchers published scraped data: https://www.vox.com/2016/5/12/11666116/70000-okcupid-users-d...

Tangent, but I wish someone would disrupt the movie ticket industry. Fandango's mobile app is one of the worst apps that I've ever seen: it's very unresponsive and full of advertisements, spams and interstitials I do not care about. And yet it's the only way to buy tickets in my area.

I mean I'm already using the app to spend money, why not offer me a good experience instead of bombarding me with crappy ads?

I assume it's because they want to avoid paying fees to Apple no?

I don't see how that transfers. Paying fees to Apple also might not apply with Fandango, as it delivers a service fulfilled physically and not digitally. I would suspect the ad bombing in the Fandango app is due to razor thin margins in the movie tickets purchasing business.

Hmm good point.

If one was to take on Fandango would you have to take over their contract with the movie theaters or could I be a competitor to Fandango?

They definitely have direct contracts that are not easily replicable.

I mean, they do have a website which is fine...ish.

You might want to clean up that messy prototype code. It's annoying to have to scroll horizontally and the nesting level is just insane.

How did you verify that you got all the theaters?

Having looked at the code in regards to only using zip codes that end in 1 in order to eliminate overlapping neighborhoods, it seems like you missed some.

Why not grab all of them and clean the data? Maybe it is too intensive, but insisting you have "all of Fandango's movie listings" is actually false.

This is a really cool example of ReadyPipe - It's a great demo of scraping a site that's got some intent not to be scrapeable :D

On the other hand, your data just seems to follow https://xkcd.com/1138/ - What happens to the prices if you adjust for cost of living index? What about if you adjust just for population? It looks to me just like a Cost-of-Living heatmap.

>> Only cities with 500,000 residents or more (per Wikipedia) were included in this analysis

So 35 cities? Interesting read none the less. Good to see independent theaters are still doing well.

Yep! Just those 35. Main reason being that it’s actually a pretty manual effort to assign theaters to cities, because there could be a bunch of different “cities”, as Fandango classifies them, that are all really the same city - like for Miami as an example, you can have Miami and Coral Gables and Miami Beach, which you’d really want to classify all as Miami.

Long story short, didn’t want to go to that process for too many more cities haha.

Maybe theres a mapping somewhere, between List<Zipcode> <-> Metropolitan area.

So what about the data for states without a city of >500K people?

For instance, the largest city in Wyoming is Cheyenne and it's only 95K...

Oh sorry, should have been more clear: the analysis of "Most Expensive" and "Least Expensive" cities was limited to cities with populations over 500K. But for the main analysis we scraped by zip code.

No worries, thank you for clarifying!


Why don’t you summarise what you found in the title, and give me an indication of whether the I should read your article or not

Honestly planned to but was character-limited

The HN guidelines (https://news.ycombinator.com/newsguidelines.html) favors the article title w/o numbers, in this case "Insights From Analyzing Fandango Ticket Prices"

HN specifically discourages clickbaity titles. (in this case, your title "I scraped all of Fandango's movie listings, here's what I found" is also misleading; you didn't scrape all the listings)

Ah, noted. Will keep that in mind in the future

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact