
Using Yelp Data to Predict Restaurant Closure - yarapavan
https://towardsdatascience.com/using-yelp-data-to-predict-restaurant-closure-8aafa4f72ad6
======
dsfyu404ed
I wonder how accurate this is in areas where tourism is a large contribution
to the local economy. You don't actually have to be much good at running a
business if you've got an endless stream of new people for several months out
of the year and don't need to rely on repeat business. You just rename the
place and hire a new GM when the negative reviews start overwhelming you (this
applies more generally than restaurants btw).

I would probably try new restaurants more frequently if I could be more sure I
wasn't gonna pay $10 for a $5 burger and help buy some sleazy J1-slave-driver
(owner is too nice of a word) a new Land Rover in the process.

~~~
praneshp
What does the J1 in J1-slave-driver mean?

~~~
snerbles
J-1 is a category of exchange visa issued by the US.

[https://en.wikipedia.org/wiki/J-1_visa](https://en.wikipedia.org/wiki/J-1_visa)

------
monksy
How did he get the data? It's pretty hard to pull the reviews and the data
from yelp. I tried to do that to do some querying, but their search isn't so
great and they pull a lot of stunts to prevent you from scraping.

\---

Oh, I see he's using the kaggle data. That's not guaranteed to be reliable.

~~~
Tostino
Eh it's possible, the reviews are harder though.

I wrote a scraper which pulled address info / phone number / star rating /
review count for pretty much every restaurant in the US.

It was "easy" because all of that data is available within the search page,
and you just need to correctly parse it out.

The hardest part was getting around their really crazy rate limiting and IP
blocking.

I managed to get myself IP banned from yelp prior to ever trying to scrape by
just doing a bunch of searches manually pretty quickly over like 20 min, next
thing I knew I could no longer access anything on Yelp.

~~~
monksy
That's not suprising. You can get yourself IP blocked just by opening things
in other tabs to queue them up to read. (If you notice yourself getting random
404s .. that's when you're being watched)

------
vinayan3
As the author mentioned changes in rent are a huge factor. Did the date of
closures coincide with a new lease which can range from 1 - 10 years. Seeing a
distribution of the age of the restaurant when closed could show them.

The other huge factor is cost of labor. Maybe looking at the minimum wage
could be another feature. The news usually has those articles about how
restaurants are struggling and the incremental minimum wage increase will hurt
their business. It'd be interesting to see how strong of a factor that is in
restaurant closures.

Also factors that could be tough to get but important * Cost of the
ingredients like meat, vegetables etc.. * General Economic conditions, are
consumers going out to eat?

------
yarapavan
code repo:
[https://github.com/alifier/Restaurant_success_model](https://github.com/alifier/Restaurant_success_model)

------
e34093409oent
It sounds like they un-anonymized the data, which strikes me as slightly
unethical. (I mean it's not medical data or anything, but I don't think that
was the intended use of the anonymized data.)

Further, it seems like the results of this will be used to deny loans to
restaurants that are not doing so great, thus ensuring that they fail because
they can't get funding for renovations and improvements.

~~~
malifier
The original dataset already contained the names, addresses and coordinates of
each restaurant. Finding the restaurant ids does not reveal any additional
information. It just makes it easier to reveal recent information from yelp
which is available through their API anyway

------
raymondgh
Very nice! I like how you used multiple data sources to enable a study that
couldn't be done with just one.

