
Improving Deep Learning Performance with AutoAugment - rusht
https://ai.googleblog.com/2018/06/improving-deep-learning-performance.html
======
randcraw
It might be fun to exercise this method across an information-theoretic well-
bounded set of shapes or object domains to try to quantify its limitations in
generating useful independent forms of novelty.

For example, you might use it to formulate a set of wavelets that when
combined judiciously would effectively span a well-defined distribution of
shapes generated from a small grammar. In so doing, you could quantify the
shape variance and identify which augmentation transformations added most
value for training (minimally modeling that variance) and which added least.

Maybe you could also combine this with t-SNE to gain some intuition of which
'wavelet' manifested where in the trained net, which resonated most, and in
concert with which other wavelets. You could explore this across different CNN
sizes and designs, looking for evidence of wavelet ensemble or hierarchy.

With some careful engineering, you could try to force emergent autoencoders to
reveal themselves and then explore their interactions.

------
PaulHoule
Since the 1990's at least, augmentation has been one of the most important
"tricks of the trade" in NN and it may be even more important in the deep
learning era.

~~~
w_t_payne
Yup. :-)

------
kriro
Direct link to the paper:
[https://arxiv.org/abs/1805.09501](https://arxiv.org/abs/1805.09501)

PDF:
[https://arxiv.org/pdf/1805.09501.pdf](https://arxiv.org/pdf/1805.09501.pdf)

------
paradroid
AutoOverfit is more like it.

~~~
Q6T46nT668w6i3m
Overfit to ... your augmentation policy? I think that’s a good thing. :)

------
anchpop
I wonder how large your dataset has to be for this to be useful. You can get
by with small datasets in some fields (i.e. retraining the last layer of
Mobilenet, you can get good results with 200 annotations), I'd be interested
to see how useful this is there.

------
mlthoughts2018
This seems like it could dramatically worsen overfitting-like effects for
algorithms like CNNs for image processing, where surface statistics of the
available data set seem to be more responsible for the learned model than any
type of “semantic” understanding.

If you prespecify what data augmentation you would do, like preregistering the
details of a clinical trial, you’ll be less susceptible to a spurious result
from this.

It seems like especially things like color distribution manipulation would
have a potentially very adverse effect that counters any gains from clamping
the supervised learning to be “robust” to that color variation.

I’m thinking in the spirit of: <
[https://arxiv.org/abs/1711.11561](https://arxiv.org/abs/1711.11561) >.

~~~
nafizh
Don't exactly remember the paper, but many of the claims in the paper you
mentioned were seemed to be proven false. As such, augmentation like this
should, if any, reduce overfitting.

~~~
mlthoughts2018
Could you provide a link to where the Bengio paper was “disproven”?

I would be quite surprised to learn that, especially for the experimental
result they have with the low-pass Fourier filter on the training set, and
also because the Bengio paper is quite recent.

------
XnoiVeX
Did they share any code?

