
Deep Belief Networks at Heart of NASA Image Classification - jonbaer
http://www.theplatform.net/2015/09/21/deep-belief-networks-at-heart-of-nasa-image-classification/
======
benanne
I have some doubts about this. Deep learning moves fast and DBNs are pretty
much outdated models, even for unsupervised pre-training. It doesn't make much
sense to me that unsupervised pre-training would help for this problem to
begin with, seeing as their dataset totals around 65TB.

The paper is worth checking out:
[http://arxiv.org/abs/1509.03602](http://arxiv.org/abs/1509.03602) I haven't
read it in full, but based on a quick skim, the convnet architectures they
evaluated seem laughably tiny and shallow (at most three convolutional layers)
by today's standards -- although I appreciate that there may be other
constraints at play here (limits on training time etc.).

But to claim that DBNs are better suited for this problem than convnets based
on these results is quite far-fetched. I'm confident that a convnet could
crush these results, given enough effort and time spent on hyperparameter
tuning.

I find this part particularly misleading (section 6, page 13): "shape/edge
based features which are predominantly learned by various Deep architectures
are not very useful in learning data representations for satellite imagery.
This explains the fact why traditional Deep architectures are not able to
converge to the global optima even for reasonably large as well as Deep
architectures."

The whole point of learning features is so that they are better suited for the
task at hand. If "shape/edge based features" are not suitable to perform a
particular task, then a properly trained convnet should not learn them. I
think the conclusions drawn from this work would have been very different if
the chosen network architectures were more sensible.

~~~
karpathy
+1. There are several fishy statements throughout this paper. Another one in
conclusion:

"For satellite datasets, with inherently high variability, traditional deep
learning approaches are unable to converge to a global optima even with
significantly big and deep architectures."

this quote points to some basic misunderstandings of how/when these models
work. "Inherent high variability" is suddenly some kind of a problem? Unable
to converge to a global optima? The modern view of the deep net optimization
landscapes based on several recent studies argue against these outdated
interpretations.

~~~
paulfr
I'll pile on the bandwagon.

I just downloaded the dataset, and color is such a powerful feature that
training a random forest on images downsampled to a _single_ pixel results in
95% and 98% accuracies! (for the 4-category and 6-category versions,
respectively)

And you can easily exceed 99.5% by adding more features to the forest, which
is far above their DBN accuracy.

I have no idea how they were able to get an accuracy as low as 69% when they
evaluated random forests.

~~~
mturmon
I read the paper, and I also have some reservations. The procedure they used
to extract and randomize their data seems biased towards large homogenous
areas.

In short, in their procedure, it seems possible to rope off a large contiguous
area of Mojave desert, ground-truth it using their GUI system as "barren", and
have that area be carved up into 28x28 pixel chips and spread equally into the
training and test sets.

In such a case, the training and test sets are not really independent. And
their 6 classes, as you point out, are amenable to color features.

Having done classification of remote sensing data...the above is not a good
test of accuracy at any useful task. You have to test accuracy on
representative data.

That means training within a few areas, and testing on geographically distant
but ecologically similar areas. (I.e., same class, but statistically
independent.). And, varying things like time of day, observing geometry, and
seasonality. Color features will be quite fragile in such tests.

And, testing on a more diverse sample, to see if "none of the above" can be
detected, because their class decomposition is nothing like exhaustive.

------
vonnik
Fwiw, a deep-learning startup that's doing some pretty cool satellite imagery
analysis is Orbital Insight
[http://orbitalinsight.com/](http://orbitalinsight.com/). So deep nets are all
over deep space...

(I'm aware of them because I work on another deep learning project,
[http://deeplearning4j.org](http://deeplearning4j.org))

~~~
jk4930
Is there a curated list of deep learning startups (or AI startups in general)
somewhere?

------
adamrights
I stopped reading when I hit this sentence: But armed with two of these, and a
new slant on deep belief networks, they are breathing new life into an
established deep learning knowledge base and proving how a new model for deep
learning is proving itself at scale for NASA’s terabytes of near-range
satellite data—and potentially for other areas as the work expands.

------
akshayB
I wonder if this can be used to study impacts of urban expansion and how
natural ecosystems around big cities get affected. Also it can be a great way
of exploring natural resources conservation.

~~~
Katydid
What interesting about this is that it is appears to be the next wave of
PRACTICAL deep learning that can start in image recognition and move the same
models into other areas. CNNs have limited (for now) use cases, this can
actually be applied to some very large-scale problems using the single layer
approach (which is the difference) and scale to massive sets. Very cool. Very.

------
dharma1
Thanks, interesting. Have been doing the same with hyperspectral aerial
images. Did these guys open source their trained weights or any source code?
Couldn't see any links

~~~
Thrymr
There's a link to the paper in the article:
[http://arxiv.org/abs/1509.03602](http://arxiv.org/abs/1509.03602)

------
casperc
Tangentially related: Since the deep learning field is moving so fast, what is
currently the best resources for learning the field?

------
deepbasu007
Being one of the authors of this paper, I think its my responsibility to
clarify some of the doubts raised by some of you in this thread.

Firstly, I would like to thank everyone for taking the time to go through the
paper and raising your questions. I think most of these are valid questions
and I will try to answer them here.

First of all, regarding the dataset creation method and the chances of having
overlapping patches between training and test sets - as we mentioned in the
paper, care was taken to avoid this specific scenario where a homogeneous
landcover area was first sampled and then 28x28 patches extracted from it
occur both in training and test sets. For this, once we selected the set of
1500 tiles from the NAIP dataset, we separated the tiles into
spatially/geographically distant areas and then used these two areas for
extracting the training and test patches respectively.

Now, regarding the comparison with raw pixel based DBN and CNN. As benanne
suggested that we claimed that DBN is better suited than CNN for this task, in
contrast, I would like to clarify that in the paper, we claim that the
integration of Haralick features with the DBN yields better results than
either of DBN or CNN with raw pixels. In fact, validating the popular belief,
we show that DBN alone is indeed inferior to CNN on this task same as what
benanne states. Now, the question arises - why didn’t we use bigger CNNs with
higher number of layers - may be 5 or 6 for the experiments? Here, I would
like to point everyone to the paper “ImageNet Classification with Deep
Convolutional Neural Networks” where the authors trained a deep CNN with 5
convolutional layers and 3 fully connected layers. But we can see that the
dataset used in that paper consists of 256x256 images. As the theory of CNN
suggests, a CNN models the human visual cortical system that builds a
hierarchical model of the image in such a way that bigger images require
deeper nets to encode all the contextual dependencies between neighboring
pixels and to perform a hierarchical clustering from the features to labels.
So, in contrast to their dataset, we see that the images in our dataset are
only 28x28 which means that we need to scale down our model to avoid
significant overfitting which can’t even be avoided with L2 regularization or
Dropout. Also, if we consider results on another object recognition dataset,
namely CIFAR-10, which is 32x32 (roughly equal in size to our images), most of
the state-of-the-art results on this dataset (like Deeply Supervised nets and
Network in Network) uses an architecture with 3 convolutional layers and 1
fully-connected layer same as the one we considered. Also, in the Imagenet
classification paper, while using the significantly being and Deep networks,
in order to avoid overfitting the authors use techniques like data
augmentation and dropout. So, effectively the model would be bigger and take
longer to train while the integration of the texture based features with the
Deep Belief Network saves us this overhead which is significant for a 65 TB
dataset (It should be noted that our goal in this research is not just to
handle this dataset but to use this as a pilot to develop an algorithm that
can scale for landcover classification to the whole of continental US). Now
someone might ask, why we chose 28x28 in the first place? Its because an
important difference between satellite image classification and object based
classification is that given an image with a bigger context, we say if a
particular object is present in a scene or not (for e.g., CIFAR, Imagenet
etc.). On the other hand for satellite imagery, what we need is a near
accurate per-pixel labelling. So, choosing a smaller window removes most of
the contextual information required for a per-pixel scene classification and
choosing a bigger window means loosing the statistical properties of the
object of interest - e.g., a tree or bush might be much smaller than a 64x64
window which covers a spatial area of 64mx64m. To conclude, I would like to
stress the fact that out of traditional DBN, CNN and SDAE, CNNs are probably
the best suited to handle these kinds of datasets and that is also proved in
the experiments in the paper. However, I would like to re-iterate that in this
research we try to show that even if Deep Neural Networks - DBN, CNN or SDAE
are good as general-purpose learning machines and are versatile to suit a
large number of different datasets - ranging from handwritten digits to facial
recognition to object recognition, but, there are situations where we need to
integrate these models with domain specific knowledge to help improve the
learning efficiency of the model in the light of the particular decision
problem at hand.

Also, “inherently high variability” is not a problem if we are ready to use
bigger networks which in turn require higher number of training samples
(according to the theory of learning for neural nets) and higher training
time. However, as I told earlier in order to improve the overall efficiency of
the learning algorithm to be scalable across millions of scenes covering the
whole of US, the integration of DBN with handcrafted features seem to be very
useful at this point of time.

~~~
adamwkraft
I want to share some quick results I had from trying out the dataset. First
off, I want to say thanks for your work and the dataset. I'm glad to see more
people getting excited about machine learning in this field!

I used a simple Lenet-like architecture to train both Sat4 and Sat6. This
seemed reasonable, especially considering this network works well on the 28x28
patches for MNIST. After only 10k iterations, I achieved accuracies of 98% on
Sat4 and 97% on Sat6. I haven't gone back yet, but I assume these results
could be increased with tweaking the network slightly or playing around with
learning parameters. I also spent a while looking for any bugs in my
evaluation, but things seemed to check out. This would suggest that CNN may be
able to at least match, if not potentially outperform (97% vs 94% on Sat6) the
methods in the paper. I'm curious to hear if anyone else has gotten similar
results with CNN or other methods? I'm also curious to know if you tried other
CNN architectures other than those described in the paper?

