
Earth on AWS – Open geospatial data - thecodeboy
https://aws.amazon.com/earth/
======
rburhum
Looking at the comments, most people don't understand what this is.

In the geospatial industry, there are many organizations that produce free
open data.

For example the NAIP image data comes USDA and has been paid for by the US
govt so the city/state can used it for agriculture - hence why the images are
not just RGB, but they also include an infrared band so they can be used for
agriculture algorithms like NDVI results. For that particular dataset the
license is very liberal. In case you are curious about that particular
problem, you can find more info here: [https://www.fsa.usda.gov/programs-and-
services/aerial-photog...](https://www.fsa.usda.gov/programs-and-
services/aerial-photography/imagery-programs/naip-imagery/)

The problem with dealing with datasets of this size is that just the mere
collection and storage of it, is a problem of resources. This AWS link here is
saying that they have grabbed all these datasets from various govt and non-
profits and are hosting them in raw form so you can use them. Because the data
comes from so many different institutions, the license is different - but
practically speaking super liberal.

It is not competing with any previous commercial service from any vendor, nor
it is meant to be a solution of any kind... Just big public spatial datasets
hosted at AWS.

~~~
yellowbkpk
I want to give another example: I'm currently updating the Terrain Tiles
dataset [0] and a very significant portion of my time was spent working around
terrible government data download websites or incompatible distribution
formats.

For example, the UK Government flew some excellent LIDAR data missions
generating a very high resolution elevation model for most of the country and
then put it behind a terrible website where you have to click 3 or 4 times to
get a small piece of the data. After a couple hours I built a script to
download all the pieces and put them back together into usable sized downloads
[1]. Mexico's INEGI has a similar situation, so I had to dig through that to
build a scraper [2]. USGS's EarthExplorer uses a terrible shopping cart
metaphor for download [3].

All that is to say that the interesting piece with Earth on AWS is that this
is public data that smart people are putting in a more easily accessible place
for mass consumption and AWS is footing the bill. In return AWS is getting
people more interested in AWS products and a set of customers that are more
knowledgeable about how to process data "in the cloud".

[0] [https://aws.amazon.com/public-
datasets/terrain/](https://aws.amazon.com/public-datasets/terrain/)

[1] [https://github.com/iandees/uk-lidar](https://github.com/iandees/uk-lidar)

[2] [https://github.com/iandees/mx-lidar](https://github.com/iandees/mx-lidar)

[3] [https://earthexplorer.usgs.gov/](https://earthexplorer.usgs.gov/)

~~~
toomuchtodo
Would you mind if I stuck your collections of data in the Internet Archive? I
appreciate AWS' efforts in this regard, but trust the Internet Archive for
access and persistence more (and the Archive serves every object as a
torrent).

~~~
yellowbkpk
It'd be great to have this data in Internet Archive. If you check out our
GitHub repo [0] you can see where we coordinate finding data sources. The data
I download gets composited into tiles for display on maps, so we're updating
~4 billion objects in S3.

The source data is probably the better thing to include in IA, and the GitHub
repo is probably the best place to find how to mirror it. If you've got time
to spend on it, you might post an issue in there and I can help point you in
the right direction.

[0]
[https://github.com/tilezen/joerd/issues?q=is%3Aissue+is%3Aop...](https://github.com/tilezen/joerd/issues?q=is%3Aissue+is%3Aopen+label%3A%22data+source%22)

------
colordrops
Looks like a lot of this data comes from free sources. It's not clear from
their site what the licensing is though.

------
tantalor
How does this compare to other offerings like Google Earth Engine[1], GCP
Landsat[2], or GCP Sentinel-2[3]?

[1] [https://earthengine.google.com/](https://earthengine.google.com/)

[2] [https://cloud.google.com/storage/docs/public-
datasets/landsa...](https://cloud.google.com/storage/docs/public-
datasets/landsat)

[3] [https://cloud.google.com/storage/docs/public-
datasets/sentin...](https://cloud.google.com/storage/docs/public-
datasets/sentinel-2)

------
cloverich
In case you don't scroll all the way down, there's a list of articles and
video's titled "Use cases" at the bottom of the page which appear to cover (at
least) how some of this data has been used.

~~~
anigbrowl
I wish all technology announcements included that and put it up front. I'm
often surprised at the number of interesting-sounding things I follow from
HN's front page only to end up with no idea of what they're good for or why I
might want to invest time in learning about them.

Your most enthusiastic customers can sometimes be the people who didn't know
what was possible until your product came along.

------
gravypod
> and identifies the people, locations, organizations, counts, themes,
> sources, emotions, counts, quotes, images and events driving our global
> society every second of every day.

They're really serious about counting, aren't they.

~~~
notwedtm
Dodge, duck, dip, dive and dodge!

------
Moocat87
I'd like to see the National Snow and Ice Data Center's data (soil moisture,
sea ice cover/concentration, snow cover [looks like MODIS is already
available], permafrost, glacier outlines) on AWS.

I know there are people there that want to see it happen, but it's a matter of
cost. What incentives/programs does Earth on AWS offer to assist stewards of
public data to make it available on AWS?

Additionally, I think some of this data is normally behind URS/Earthdata
Login, what did the politics of making the data available on AWS without URS
look like?

~~~
brailsafe
I had the chance to speak with the/a rep from Amazon who at the time was
working to make the Landsat8 data available on AWS exactly a year ago at a
conference. From what I remember, AWS covers the hosting cost of the datasets
in exchange for being able to incentivize the use of AWS in working with them.
The data storage and transfer costs, as well as logistics are enormous. I
don't recall how the transfer was being managed but it was certainly describe
more or less as a partnership.

------
rmocnik
Exploring thru these datasets can be quite addictable. Especially with service
like [http://apps.sentinel-hub.com/sentinel-playground/](http://apps.sentinel-
hub.com/sentinel-playground/)

------
mentos
Would love to see Amazon make an integration for Unreal Engine 4 or their
Lumberyard video game engine so a game developer can easily import detailed
swaths of the earth.

~~~
flippmoke
This already exists with Unity via Mapbox -
[https://www.mapbox.com/unity/](https://www.mapbox.com/unity/)

~~~
jmkni
Very cool, will need to check this out!

~~~
andybak
Also see
[https://www.assetstore.unity3d.com/en/#!/content/86284](https://www.assetstore.unity3d.com/en/#!/content/86284)

------
querious
What I'm psyched about is OpenStreetMaps data queryable with Athena. It's
traditionally kind of a pain to convert PBFs to a queryable format.

~~~
SOLAR_FIELDS
Out of pure curiosity, how so? I deal with Protobuf regularly, and as long as
a decent library exists to dump to JSON that is domain specific to your use
case it is trivial. Is that the only thing missing here?

~~~
rmc
For starters, the OSM PBF file format is not a protobuf file! Instead it's a
collection of protobuf files inside each other!

You can read more in the fileformat:
[https://wiki.openstreetmap.org/wiki/PBF_Format](https://wiki.openstreetmap.org/wiki/PBF_Format)

There are other problems, specific to OSM and not PBF/protobuf, like needing
to store the locations of nodes until the end of file because they could be
referenced anywhere in the file.

------
malux85
Wow what great timing! Just as we are scaling up our imagery DL projects, this
is cool!

------
lukejduncan
I wonder how this compares to Planet Labs dataset.

~~~
rkda
Are you referring to their Open California dataset?

[https://www.planet.com/products/open-
california/](https://www.planet.com/products/open-california/)

It's larger as Open California only has the datasets from Landsat 8, Sentinel,
and Planet's own satellites.

------
brightball
Wow...that’s a treasure trove of useful data all in one place. Major thanks to
Amazon.

~~~
rement
Here is another source
[https://earthexplorer.usgs.gov/](https://earthexplorer.usgs.gov/). This one
is the RAW data for many of the tilesets.

------
zitterbewegung
This makes me want to take this open source weather forecasting model and run
it on AWS. [http://planetwrf.com](http://planetwrf.com)

~~~
zorm
People are doing this already. See
[https://depts.washington.edu/learnit/techconnect/cloudday/wo...](https://depts.washington.edu/learnit/techconnect/cloudday/wordpress/wp-
content/uploads/Kevin-Jorissen_Amazon_HPC-on-AWS-cfnCluster-and-WRF.pdf) for
some good info on this.

------
phy07
On a somewhat related topic - can anyone recommend a geocoder available
through AWS?

There are several AWS marketplace solutions available on the link at the
bottom of the original article.[1] Only Geolytica and Forward Geocoder seem to
be available to new customers, and both have < 5 reviews.

[1]
[https://aws.amazon.com/mp/gis/#geocoding](https://aws.amazon.com/mp/gis/#geocoding)

~~~
Bedon292
Might not be what you are looking for but
[https://wiki.openstreetmap.org/wiki/Nominatim](https://wiki.openstreetmap.org/wiki/Nominatim)
is a geocoder that runs on OSM data.

~~~
tuukkah
Pelias (and Mapzen Search) is so much better:
[http://pelias.io/](http://pelias.io/)

~~~
SOLAR_FIELDS
Thanks for sharing. Geocode and RevGeo are generally considered a Hard Problem
(TM) in GIS so it is nice to see great projects such as this.

------
jaipilot747
Slightly tangential, but is there a "modern" alternative to GDAL for working
with raster data?

The last time I tried, stitching together tiles and cutting it to state
boundaries took an inordinate amount of time (upwards of 15 minutes for 6
tiles from Landsat-7/8). Though, I'm half convinced it was because I was doing
something very suboptimal..

Also, iirc, it was single threaded.

~~~
llccbb
No, GDAL is still the best. I also suspect you were doing something
suboptimal. As far as modern wrappers for GDAL, `rasterio` is the most
pythonic. Part of sgillies suite including shapely, rasterio, and fiona.

------
borplk
What are some interesting things to do with this?

~~~
jefft255
In my lab there is a masters student working on monitoring deforestation for
palm fields in Indonesia using Google Earth Engine, which is similar to Earth
on AWS. There is a whole scientific field devoted to analysing this kind of
data: remote sensing. It's underrated in the hacker community honestly.

~~~
Boothroid
Geo as a whole in my observation. There are decades of research effort that
have got us to where we are now, well developed study programmes worldwide,
advanced proprietary and open software and data available, and geo is
effectively mainstream in google maps and sat nav. Yet for some reason this
contextual background is missed by many, and so I see commenters making
statements about what a leap forward this or that is, when in fact it's just
part of an evolving history. I'm not criticising people for not knowing what
they don't know - my question is - why does it seem like geo as a whole has
trouble communicating this context? I wonder whether there are any other areas
of tech that suffer from this lack of awareness?

------
pknerd
Pardon if the question sounds dumb; will we have real time data of a certain
region, for instance getting info about clouds?

~~~
maxerickson
Not real time, but the Landsat, Sentinel-2, MODIS and GEOS data are all
updated on a continuing basis.

GEOS are from geostationary satellites pointed at the US and are updated a
couple times an hour:

[http://www.ssd.noaa.gov/imagery/index.html](http://www.ssd.noaa.gov/imagery/index.html)

~~~
zorm
GOES-16 imagery could be up to every 30 seconds over specific regions, every 5
minutes over CONUS, and every 15 minutes over entire disk now.

------
wenbert
I'm always interested in these kinds of data.

A few months ago, I was looking at different open sources to geocode a lot of
addresses around the world.

I have tried openstreetmap and some VM from datasciencetoolkit - both have
poor results.

Are there other sources aside from Google? Google appears to be the most
accurate.

~~~
arctux
Check out [https://openaddresses.io](https://openaddresses.io)

It has ~477 million freely-licensed addresses.

------
dsnuh
Will the datasets be open to contribution from members of the public or are
these readonly mirrors? Seems like Blue Horizon and Prime Now amongst many
other of their offerings that would be use cases for up to the minute data?

~~~
kkmx
Looks like many of the datasets were obtained from federal organizations in
which case it should actually be under public domain.

------
kiproping
I am waiting for the day we can get satellite images that are so fine you can
see people or animals.

~~~
Boothroid
You can get this now if you are military or have plenty of spare cash lying
around. The problem is that to get this level of resolution your sensor needs
to be nearer the earth, and thus your platform has a shorter lifespan because
it will be subject to greater atmospheric drag, and thus its per-picture cost
will be comparatively very high. This might prompt the question, why not put
it farther away with a bigger lens? Well, there is an upper limit on the
size/weight of the lens that you can lob up to any given orbit, and thus it's
less feasible to get this level of resolution from a higher orbit. You also
have the issue of swath width to think about - generally the higher your
resolution the smaller your imaging area, which might limit the usefulness and
thus the price you can charge for your imagery.

I think drone aerial imagery holds more promise than satellite imagery. Who
knows though, perhaps with fancy new image processing algorithms and sensors
we will get the level of resolution you are talking about from satellite
imagery at reasonable cost over time.

Edit: or with bigger cheaper rockets.

------
ktta
This feels like another service Google can replicate and be much better at it
considering their past experience.

~~~
finstell
Sure, they can replicate much better and then shut it down later on.

~~~
andybak
Imagine if they'd never shut anything down.

People would be constantly popping up on HN repeating a different mantra about
how bloated and unfocused they were (although arguably Google manages to be
bloated and unfocused despite the shutdowns - but that's another debates)

Most of the Google shutdown's were understandable whether you like them or
not.

In any case - it's getting tedious to hear the same comment on every Google
related post. They shut things down. We get it.

~~~
psergeant

        > People would be constantly popping
        > up on HN repeating a different mantra
        > about how bloated and unfocused they
        > were
    

I don't see that for AWS.

~~~
ktta
All the services on AWS fulfill a need. Google often starts projects without
direct profitability in their mind. Note that I'm talking about Google as a
whole and not Google Cloud.

I'd say Google is a lot more liberal in starting new projects than any other
company of its size. Look around in the news and you'll see how analysts
comment about how Google has no 'direction' and is burning money.

------
_pdp_
I know this will be cool if combined with machine learning but to do what? :)

~~~
brootstrap
lol is this a serious comment??? You have terabytes (petabytes) or data in
front you of but cant think of a single thing to do with it????

Oh i know, we'll just 'machine learning' our 'big data' and get great business
insights.

------
cerealbad
how can you keep everyone safe if you can't see where they are, what they are
doing or what is inside their head?

