
Survey of Dropout Methods for Deep Neural Networks - Anon84
https://arxiv.org/abs/1904.13310
======
m3kw9
When in doubt, use random dropouts

~~~
anthony_doan
Is drop out still empirical or are there any proof of why it works in the
overall model?

I recall reading up on CNN and playing around with it and it was interesting
to add random drop off in there but was never explained why it works. I think
the general thinking of why it works is that the network is overfitting so
randomly dropping node is required for generalization?

~~~
jimmy_dean
Addressing your second question. Informally, dropping nodes fights overfitting
by creating subsample architectures of which are essentially thinned out
networks of the one you've designed. Having trained on these sub nets means
you've effectively combined the learning of a few different models and in
doing so have generalized beyond the capabilities of your original "single"
architecture.

