

Public datasets on AWS - fs111
https://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1

======
dalke
I see that the data sets which interests me the most aren't as useful as I
would like, because they are out of date. The Genbank release is "Last
Updated: December 9, 2009 2:49 AM GMT", and PubChem is "Last Modified: Jun 4,
2009 20:21 PM GMT". Both of these datasets are modified continuously.

On the plus side, Ensembl is up-to-date.

------
jadz0r
This is great! Public datasets can do such amazing things for people.

Do not want to derail this, but for more datasets (and a very easy way to use
them) check out Yahoo's YQL - <http://developer.yahoo.com/yql/console/>

------
hughw
Some of these are just notional and don't exist; e.g. "petroleum dataset"
<https://aws.amazon.com/datasets/2900>

------
duwease
This is great.. hope some novel new apps, or at the very least the germs of
some cross-field research, spring from this. I wonder how you can go about
encouraging data providers to update the data, however? Obviously they were
convinced once..

------
astar
Geocoding without a restrictive API, thanks to Twilio/Wigle.net street vector
data set: <https://aws.amazon.com/datasets/2408>

------
mark_l_watson
Definitely useful. In the past, I have fetched DBPedia manually and put it on
an EBS volume to process - now I can save a little money.

------
thedangler
If you use the data how will you billed? postGIS data can be huge.

~~~
candre717
I did some work with the public data sets.

The data is stored (free of charge) via ebs (look at the EC2 instance) which
persists to S3 but is not visible in or directly usable from your S3
directory. If you decide to transfer the data or run computations (e.g. via
emr), you'll then pay for the resources used.

I didn't find the documentation all that clear to efficiently use the public
data sets, which had financial consequences.

If anyone is adept with using the public data sets, I'd love to speak with
you.

~~~
dalke
WTF? I had assumed that it was a simple sort of file access which allowed
anyone in EC2 to read the data without having to import all of the storage.
Then again, PubChem is only about 25 GB and inbound data transfer is free, so
this is only about US$4/month.

------
rmc
Shame they haven't added OpenStreetMap data dumps.

~~~
minimax
You sure about that?

<https://aws.amazon.com/datasets/2844>

~~~
veyron
Last Updated in october 2009 ...

~~~
fs111
Maybe you can contact the person that submitted it:
<https://forums.aws.amazon.com/profile.jspa?userID=89792>

------
hohoho2012
why is this advertising spamlink upvoted in the news list? can't you use a
search engine? any admins here?

