

Google Play Store in Numbers. Open Source Crawler for Mobile Apps Data - marcellolins
https://github.com/MarcelloLins/GooglePlayAppsCrawler

======
lern_too_spel
The robots.txt forbids any user agent from crawling the pages this crawler
crawls. It's unwise to do this in the open.

~~~
chatmasta
He's not exposing his personal server IP addresses. He's doing it in private,
and opensourcing the code for others to do the same.

robots.txt are laughably unenforced policies, and are disrespected every day.

~~~
smilliken
In practice robots.txt is just a suggestion; even Google doesn't fully respect
it.

~~~
lern_too_spel
Google specifically recommended using robots.txt in Copiepresse v. Google. I
haven't noticed them disobeying robots.txt on my own sites.

~~~
smilliken
I've seen several blog posts on exceptional cases before, but the only case
I've personally witnessed is how the crawl-delay directive is ignored. Granted
you can set it if you're a webmaster tools user, but even then the setting
disables itself after a few months.

------
smilliken
Shameless self-promotion: MixRank's API has all of this and much more
available, including Appstore. Email in profile if you're interested.

~~~
marcellolins
Its a shame to think people seeing this as a "self promotion", when its not.
What i am promoting is the data that this crawler obtained, and all the
insights people can get for free, just by using it. As opposed to an API, that
will limit either the access for the data or the way you can access it,
acessing the raw data on a database directly has no limits.

Anyway, thanks for the reference, i will make sure to add it to the project's
wiki as a related link.

~~~
notatoad
i don't think he's accusing you of self promotion, he's calling out his own
recommendation for mixpanel as being self-promotion. Smilliken is "co-founder
of mixrank" according to his profile.

~~~
marcellolins
Woah, now i feel bad. Sorry about that Smiliken : (

------
inthewoods
Applause Analytics (www.applause.com/mobile-analytics) provides some of this
data plus data on the iTunes app store.

~~~
marcellolins
TY for the reference. I know that apps annie also provide some really cool
analytics. Regarding the itunes data, i also wrote a crawler for that, which
is on github aswell

