In the geospatial industry, there are many organizations that produce free open data.
For example the NAIP image data comes USDA and has been paid for by the US govt so the city/state can used it for agriculture - hence why the images are not just RGB, but they also include an infrared band so they can be used for agriculture algorithms like NDVI results. For that particular dataset the license is very liberal. In case you are curious about that particular problem, you can find more info here: https://www.fsa.usda.gov/programs-and-services/aerial-photog...
The problem with dealing with datasets of this size is that just the mere collection and storage of it, is a problem of resources. This AWS link here is saying that they have grabbed all these datasets from various govt and non-profits and are hosting them in raw form so you can use them. Because the data comes from so many different institutions, the license is different - but practically speaking super liberal.
It is not competing with any previous commercial service from any vendor, nor it is meant to be a solution of any kind... Just big public spatial datasets hosted at AWS.
For example, the UK Government flew some excellent LIDAR data missions generating a very high resolution elevation model for most of the country and then put it behind a terrible website where you have to click 3 or 4 times to get a small piece of the data. After a couple hours I built a script to download all the pieces and put them back together into usable sized downloads . Mexico's INEGI has a similar situation, so I had to dig through that to build a scraper . USGS's EarthExplorer uses a terrible shopping cart metaphor for download .
All that is to say that the interesting piece with Earth on AWS is that this is public data that smart people are putting in a more easily accessible place for mass consumption and AWS is footing the bill. In return AWS is getting people more interested in AWS products and a set of customers that are more knowledgeable about how to process data "in the cloud".
The source data is probably the better thing to include in IA, and the GitHub repo is probably the best place to find how to mirror it. If you've got time to spend on it, you might post an issue in there and I can help point you in the right direction.
The National Map, from the USGS, is another example of "ugh", at least it was when it was tile-by-tile download only.
This is far more than a glorified mirror.
My comment was more a warning to not expect this to be any easier to access than the original data in the way of manipulating data.
Your most enthusiastic customers can sometimes be the people who didn't know what was possible until your product came along.
They're really serious about counting, aren't they.
I know there are people there that want to see it happen, but it's a matter of cost. What incentives/programs does Earth on AWS offer to assist stewards of public data to make it available on AWS?
Additionally, I think some of this data is normally behind URS/Earthdata Login, what did the politics of making the data available on AWS without URS look like?
On the NOAA side, there tend not to be loginwalls so that hasn't been much of a concern.
I work with one of the partners on this project, so if you have specific datasets or use case ideas feel free to drop me an email at zflamig uchicago.edu.
(it provides direct access to OSM data using a DSL: http://wiki.openstreetmap.org/wiki/Overpass_API/Overpass_QL )
For tiny purposes the public servers are sufficient and there seem to be quite a few people running private servers.
PS: I'm on the Athena team
You can read more in the fileformat: https://wiki.openstreetmap.org/wiki/PBF_Format
There are other problems, specific to OSM and not PBF/protobuf, like needing to store the locations of nodes until the end of file because they could be referenced anywhere in the file.
It's larger as Open California only has the datasets from Landsat 8, Sentinel, and Planet's own satellites.
There are several AWS marketplace solutions available on the link at the bottom of the original article. Only Geolytica and Forward Geocoder seem to be available to new customers, and both have < 5 reviews.
I'm one of the makers of the OpenCage Geocoder: https://geocoder.opencagedata.com
We provide a single, simple API that behind the scenes aggregates numerous open geocoders, including nominatim, DSTK, and others. Please give us a try, there is a free testing tier you can use as long as you like.
The last time I tried, stitching together tiles and cutting it to state boundaries took an inordinate amount of time (upwards of 15 minutes for 6 tiles from Landsat-7/8). Though, I'm half convinced it was because I was doing something very suboptimal..
Also, iirc, it was single threaded.
I'm sure there's also more altruistic uses, such as providing better forcasting and advise for farmers in developing countries.
GEOS are from geostationary satellites pointed at the US and are updated a couple times an hour:
A few months ago, I was looking at different open sources to geocode a lot of addresses around the world.
I have tried openstreetmap and some VM from datasciencetoolkit - both have poor results.
Are there other sources aside from Google? Google appears to be the most accurate.
It has ~477 million freely-licensed addresses.
I think drone aerial imagery holds more promise than satellite imagery. Who knows though, perhaps with fancy new image processing algorithms and sensors we will get the level of resolution you are talking about from satellite imagery at reasonable cost over time.
Edit: or with bigger cheaper rockets.
AWS product basically consists of open remote sensing datasets uploaded to S3. This is convenient if you deploy on AWS (transfer costs) but still have to develop all the data processing.
Bing has given OSM tracing rights over its aerial imagery for several years now - it's hard to overestimate how significant that is.
Apple (as you'd expect) has been much more private about its OSM work, but it has numerous people working on it and has been surfacing OSM data in Apple Maps in several parts of the world.
Bing was a gold sponsor of the most recent State of the Map (global conference) in Japan, Facebook silver: http://2017.stateofthemap.org
Bing has for years allowed the use of their sat images for OSM tracing.
It used to be Yahoo aerial in the early days.
People would be constantly popping up on HN repeating a different mantra about how bloated and unfocused they were (although arguably Google manages to be bloated and unfocused despite the shutdowns - but that's another debates)
Most of the Google shutdown's were understandable whether you like them or not.
In any case - it's getting tedious to hear the same comment on every Google related post. They shut things down. We get it.
> People would be constantly popping
> up on HN repeating a different mantra
> about how bloated and unfocused they
I'd say Google is a lot more liberal in starting new projects than any other company of its size. Look around in the news and you'll see how analysts comment about how Google has no 'direction' and is burning money.
Even if they decide to shut it down soon, the value companies and scientists get out of it will be worth it.
Maybe also worth pointing out that Google's record in geo isn't without its failures.
Haha. Good luck with that. Google doesn't give its geodata away.
Oh i know, we'll just 'machine learning' our 'big data' and get great business insights.