
Deep learning pipeline for orbital satellite data for detecting clouds - kartikkumar
https://github.com/BradNeuberg/cloudless
======
strebler
This is neat. We've deployed a 98+% accuracy satellite & aerial cloud
detection solution for many years, so I would have the following suggestions:

-Why use AlexNet and not VGG (or Googlenet)?

-Make sure to train on clouds vs desert. There are a lot of instances where their spectral signatures are very close, depending on the satellite.

-Make sure to train on clouds vs snow. They are even more close.

-Dark clouds. This might not show up much unless you're working with the satellite vendor, but there are cloud formations where shadows of clouds project onto other clouds. Very difficult to deal with and NN may be well suited to it.

I would say since it's absolutely possible to get higher accuracy using older
methods on this exact satellite constellation, there is definitely room for
improvement. Just switching to VGG might even do the trick. But this is a
great first step!

~~~
bradneuberg
[Cloudless developer here]

Thanks for all the great suggestions! In terms of your existing 98% accuracy
solution, can you point me to more details if possible?

I grabbed AlexNet as its a bit easier to work with and was readily available
as a fine tunable model on the Caffe Model Zoo, but you're certainly right
that VGG or Googlenet would give more accuracy.

For training on clouds vs snow and deserts, this is a bit more of a proof of
concept for now based on the data we had access to (state of California).
Someone could certainly scale this up using a larger data set and generating
more annotation data with examples of snow and deserts to handle more edge
conditions. They would probably want to build Mechanical Turk support into the
annotation tool if they did.

Thanks for all the great comments! All of this is open source so contributions
using any of these suggestions are certainly possible.

Best, Brad Neuberg

~~~
strebler
We did it before Deep Learning existed. If we had to do it again today from
scratch, I think we'd use an approach similar to yours. My comment about VGG
was based on experience with VGG vs AlexNet in other projects.

------
adamwkraft
Very cool stuff! We have a lot of internal tools for automatically detecting
clouds at Orbital Insight as clouds present a large challenge across many
types of satellite resolutions and bands. Glad to see more people are turning
towards deep learning for satellite imagery.

Curious to hear how long the pipeline takes to run on a single image? Also,
have you thought about running the pipeline in a fully convolutional manner,
reducing the need for proposal regions?

~~~
bbabenko
Tacking on a shameless recruiting plug: if you're interesting in computer
vision, deep learning, and/or satellite imagery, Orbital Insight is hiring!
[http://orbitalinsight.com/jobs/machine_vision.html](http://orbitalinsight.com/jobs/machine_vision.html)

~~~
bradneuberg
[Cloudless developer here]

Hi Boris! It's Brad from Dropbox :)

Responding to the parent comment, in terms of running time, the primary
bottleneck is the RCNN localization portion; that takes about a minute and a
half on my laptop to process a single image. In the Future Work section of the
blog post I talk about collapsing the entire detection and localization
pipeline into a single deconvolution network that directly takes in raw images
and outputs image masks. The hope is that this runs much faster at inference
time.

------
zump
Nice, but couldn't this be easier done with heuristics?

~~~
bradneuberg
[Cloudless developer here]

It can certainly be done with hand rolled heuristics pipelines. However, one
of the trends in machine learning is what is known as end to end learning -
having your machine learning model automatically discover what these
heuristics are going from raw input to final output. This approach now
dominates in computer vision and speech understanding, replacing previously
complicated hand rolled feature pipelines. It's worth attempting it on
satelittle imagery as it seems the same approach should be valid there too.

Best, Brad Neuberg

------
julienchastang
Interesting project. Question: do the the authors have any background in
meteorology? The satellite meteorology community maybe interested in this. I
am at AMS this week in NOLA where there will be various talks on machine
learning applied to meteorology[1]. There have been advancements in this area
from groups who possess no domain knowledge and simply approach the problem
from the big data side.

[1]
[https://ams.confex.com/ams/96Annual/webprogram/start.html#sr...](https://ams.confex.com/ams/96Annual/webprogram/start.html#srch=words%7CMachine%20learning%7Cmethod%7Cand%7Cpge%7C1)

~~~
bradneuberg
[Cloudless developer here]

Our backgrounds are more in machine learning and computer science than
metereorology, so that aspect was new to us. As you mention we were mostly
approaching the problem from having more data and tools like neural networks
to throw at the problem.

Best, Brad Neuberg

------
bradneuberg
FYI a technical article with a full write up on Cloudless is here:
[http://codinginparadise.org/ebooks/html/blog/introducing_clo...](http://codinginparadise.org/ebooks/html/blog/introducing_cloudless.html)

------
jgalt
I'm wondering if a similar approach be used to detect oil spills in the ocean.
What would be the challenges ?

~~~
bradneuberg
The primary challenge is collecting enough training data to teach the neural
network.

------
planteen
Landsat data has a field which gives cloud cover percent for the image. Do you
know how that derive that number?

~~~
bradneuberg
[Cloudless developer here]

I believe the Landsat cover percent field is generated by humans and
historical information, but I'm not completely sure.

