

Help me HN: I'm stuck with my graduate internship, need app store data - Avalaxy

Hi,<p>I'm currently doing my graduate internship, on a very interesting subject. I am comparing the Windows Store to the competition (Play Store and Apple App Store) for tablet apps. This means that I am comparing what kind of apps people like on the different platforms, where the opportunities for developers/entrepreneurs lie, etc.<p>I have some good data on the Windows Store (a list of all apps with their category, price, publisher, etc. etc.), but I just can't find a similar data set for the Android and iOS apps. I have contacted a lot of people with data sets, I googled all there is to google, I searched for research reports, etc. The problem is that companies that have this kind of data (such as Distimo) don't really want to give me this data (they sell reports for €10.000) and the research reports (such as Distimo's free reports) aren't complete or outdated. I could ask my company to pay €10.000 for such a report, but that's just too much, I don't want them to pay that kind of money for my graduate internship.<p>This is a serious problem for me. I've been stuck for 2~3 weeks now, and it's starting to give me sleepless nights because this could mean that I will fail my graduate internship.<p>Tldr;
Can anyone please help me find a data set of Play Store / Apple App Store apps for tablets, or point me in a direction where I could find this kind of data? I'm stuck and I don't really know where to go from here.
======
junecpy
I recently did a research mini-project for a startup in Shanghai and I found
this site useful: <http://xyo.net>. They have apps data by iOS/Android by
category by country.

Agree with t0 that screen scrape will do in most case. Download free version
from screen-scraper.com and go through their tutorial. I have no tech
background by manage to scrape information I need.

Good luck.

~~~
Avalaxy
Wow, especially your xyo.net suggestion is really awesome. Seems they don't
offer a complete data set, but a lot of good data indeed. I wonder why I
didn't find this site with google.

~~~
junecpy
I found that on Quora :)

------
pyvek
It would be hard to find someone to just give you away that sort of database
since everyone collects and store data according to their needs. Looking at
your blog posts, it seems that you know programming. Have you tried scraping
the data from the respective stores?

~~~
Avalaxy
I know how to program a crawler (did that for the Windows Store), but the
Windows Store simply has a sitemap with links to all the apps. I couldn't
something similar for the other stores.

~~~
parad0x1
Here is a crawler for the Play store (Python):
<https://github.com/ohwang/scraper> And for the App Store, try the following:
(C#) <https://github.com/geykel/AppStoreScraper> (JS)
<https://github.com/johncch/AppStoreScraperJS> This only scrapes the icons
though, but I bet you could adapt it to scrape info too.

~~~
Avalaxy
Thanks! I'm currently trying to write my own scraper for the Apple App Store
(C#). Had some code lingering around from the Windows Store crawler that I
wrote earlier. I can reuse some parts of it.

I'm not sure if the Play Store lists the entire list of apps. The headers say
'popular paid' and 'popular free', but I'm not sure if these are just the
'most popular apps' or 'all apps'.

------
t0
It would be fairly easy to just scrape the data with cURL. I'll whip it up for
you this afternoon.

~~~
kaoD
Give a man a fish and you feed him for a day. Teach a man to fish...

------
1123581321
Apple provides this information in a daily data dump.

~~~
Avalaxy
You mean their RSS feeds? They only provide the newest/top apps.

~~~
1123581321
No, they have a data dump of app information for every app. I believe it's an
enormous XML file. My friend runs an app information/aggregation site and uses
it.

~~~
Avalaxy
Where do I find it? Can't find it on Google.

~~~
1123581321
Not perfectly sure. It might not be something open to the general public. I've
helped troubleshoot his code but I don't recall the location. I'd ask around
on IRC.

