Hacker News new | past | comments | ask | show | jobs | submit login
Start Using Landsat on AWS (amazon.com)
115 points by mxfh on Mar 20, 2015 | hide | past | web | favorite | 14 comments

Does anyone know the resolution of the images available through AWS? This site http://landsat.usgs.gov/landsat8.php says Landsat 8 has a max resolution of 30m/pixel.

I checked in with Jed (the guest blogger) and he told me that Band 8 of the data has the highest (15m) resolution. Per the post at https://www.mapbox.com/blog/putting-landsat-8-bands-to-work/ , you can use this to sharpen the images in the other layers.

I wrote that Mapbox post, and helped with some of the processing that happens to get this data on AWS. Yep, you get multispectral resolution of 30 m, and with pansharpening (http://en.wikipedia.org/wiki/Pansharpened_image, etc.) you can get visually acceptable quality at 15 m in RGB.

Landsat is basically intended for science about seasonal/annual/decade-scale changes in Earth’s land surface. When you see an estimate of how a city’s built-up area has grown since 1980, or how the Everglades are changing, it probably has Landsat as one source. This explains a lot of design decisions that might seem weird to a layperson who wants to use it for everyday RGB imagery. Most use of Landsat imagery is basically off-label. It’s just very good data in terms of accuracy, precision, and general ease of use. And if I say so myself, it looks real pretty: https://www.mapbox.com/blog/landsat-live-live/

Which pansharpening method is used in the example image?

In the images in the blog post and the live map? Those aren’t pansharpened at all. If we do add pansharpening in a later version, it’ll likely be naïve, without spatially aware modeling of the multispectral data. (Specifically, it’ll probably be a cleaned-up, rasterio-based, null-aware, parallelized descendant of this sketch of the Brovey transform in numpy: https://gist.github.com/celoyd/2e7beed82951d22b9b90 .)

From what I’ve seen – and I haven’t tested it carefully yet, so I could be wrong – the more elaborate methods are severe overkill on Landsat 8. It has only 4 pan px per multi px (where some commercial data has 9 or 16), and the pan band is almost exactly R+G+B (without NIR). So my gut and some simple experiments suggest that doing PCA-or-whatever is overthinking it.

In the blog post

> Pansharpened Malibu, 15 m (50 ft) per pixel. Notice the wave texture in the water.

Ugh, Brovey. There's better options available. Like MMP (really low spectral distortion but can be slow): http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6677587) or even affinity/guided filtering (my own paper, more spectral distortion than MMP but a lot faster and you can sharpen hyperspectral with multispectral or RGB): http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7008094.

It would be most useful if historical Landsat 7 data was also available, but of course that is a lot more data.

We will definitely consider adding more data over time.

Great! That would be an amazing resource saver and enabler for scientists as well as small companies, incentivizing the use of AWS, as opposed to in-house built systems.

Is there a specific initiative/team/person to follow for Public Data Sets on AWS? Is there anything in place to keep data sets up-to-date (especially derivatives/mashups)? Is there a way to contribute to such an initiative? AWS Public Data Sets is awesome, but it seems unnecessarily restrictive[0].

Part of what sounds interesting to me in the longer-term vision for the dat project[1] is the ability to write transformations that point to a living data source and output an up-to-date parsed/processed version of it (and only download diffs!). Processed data sets tend to be stale, or alternatively you have to start from a data dump and run a slew of scripts to process or index the data, which can take days.

One painful example of this problem was Freebase's very interesting data set, WEX (pairing Wikipedia textual content with structured data from Freebase), of which there is an outdated snapshot on AWS Public Data Sets[2] containing less than half the data of newer versions. Google acquired Freebase a little while before then, and there were only one or two updates to WEX. I was lucky enough to download what I think was the last WEX data dump before they killed the download.freebase.com[3] subdomain. I have yet to confirm if it's gone or if it was simply moved/renamed[4].

While it was amazing Freebase/Google provided dumps this processed data, and Amazon provided an easy to access snapshot of it, we really ought to have a way of publishing and subscribing to the latest post-processed version of a data set derived from one or more regularly updated data sets, be they from NASA/Landsat, Wikipedia, or otherwise. I don't know exactly what this process would look like, but the raw data is there and all we need is a way to publish the processing software/commands (docker?) to be re-run whenever a data dependency is updated.

It seems like AWS Public Data Sets would be an ideal destination for data sets and more accessible derivatives. Is any of that in line with the intent of AWS Public Data Sets?

I apologize for letting that turn into a bit of a rant, but I wanted to provide an anecdote and context.

[0] https://forums.aws.amazon.com/thread.jspa?threadID=156996

[1] http://dat-data.com

[2] http://aws.amazon.com/datasets/2345

[3] http://wiki.freebase.com/wiki/WEX/Documentation

[4] https://developers.google.com/freebase/data#freebase-wikidat...

Thanks Jeff!

I don't know anything about this data, so excuse my ignorance. What's available in terms of historical data? Can I get images for a certain region for the last 10 years, say?

LS7 is 15 years, LS5 is about 30


I worked on Landsat and Spot data back in the 80s using American (Gould) and Canadian image processing systems. Both came with Fortran source code, so it was very educational.

ATI tried to recruit me for consumer products, but their understanding of image processing was so primitive that we couldn't communicate. All they understood was red-eye removal and edge detection. :)

The various Landsat resolutions are ok for earth sciences, including ground cover, cloud cover and ice studies.

But I think most of the people here would be more interested in Spot or higher resolution data.

An interesting factoid is that one of the earliest Sony CD-ROMs ever burned (1985'ish) had Landsat sample data on it. It was distributed to a few Japanese geoscientists who had an obvious need for mass storage.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact