In the geospatial industry, there are many organizations that produce free open data.
For example the NAIP image data comes USDA and has been paid for by the US govt so the city/state can used it for agriculture - hence why the images are not just RGB, but they also include an infrared band so they can be used for agriculture algorithms like NDVI results. For that particular dataset the license is very liberal. In case you are curious about that particular problem, you can find more info here: https://www.fsa.usda.gov/programs-and-services/aerial-photog...
The problem with dealing with datasets of this size is that just the mere collection and storage of it, is a problem of resources. This AWS link here is saying that they have grabbed all these datasets from various govt and non-profits and are hosting them in raw form so you can use them. Because the data comes from so many different institutions, the license is different - but practically speaking super liberal.
It is not competing with any previous commercial service from any vendor, nor it is meant to be a solution of any kind... Just big public spatial datasets hosted at AWS.
For example, the UK Government flew some excellent LIDAR data missions generating a very high resolution elevation model for most of the country and then put it behind a terrible website where you have to click 3 or 4 times to get a small piece of the data. After a couple hours I built a script to download all the pieces and put them back together into usable sized downloads . Mexico's INEGI has a similar situation, so I had to dig through that to build a scraper . USGS's EarthExplorer uses a terrible shopping cart metaphor for download .
All that is to say that the interesting piece with Earth on AWS is that this is public data that smart people are putting in a more easily accessible place for mass consumption and AWS is footing the bill. In return AWS is getting people more interested in AWS products and a set of customers that are more knowledgeable about how to process data "in the cloud".
The source data is probably the better thing to include in IA, and the GitHub repo is probably the best place to find how to mirror it. If you've got time to spend on it, you might post an issue in there and I can help point you in the right direction.
The National Map, from the USGS, is another example of "ugh", at least it was when it was tile-by-tile download only.
This is far more than a glorified mirror.
My comment was more a warning to not expect this to be any easier to access than the original data in the way of manipulating data.