
Uncertainty in Deep Learning (2016) - mannigfaltig
http://mlg.eng.cam.ac.uk/yarin/blog_2248.html
======
syllogism
If you use Keras, you might have noticed the dropout_W and dropout_U arguments
on RNN layers. These calculate dropout using Gal's recommendation,
"variational dropout".

With other ways of applying dropout, LSTMs typically fail to converge --- and
with no dropout, they often over-fit. Gal's variational dropout therefore
brings a significant improvement to many leading models.

There are several other nice contributions in the thesis as well, including a
recommendation for applying dropout to word embedding matrices that I don't
think has been well explored yet.

------
matheweis
Yarin Gal also wrote the excellent "What My Deep Model Doesn't Know..." [0] in
2015.

If these ideas look interesting, you might also want to check out Thomas
Wiecki's blog [1] with a practical application of ADVI (a form of the
variational inference Yarin discusses) to get uncertainty out of a network.

[0]
[http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html](http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html)

[1] [http://twiecki.github.io/blog/2016/06/01/bayesian-deep-
learn...](http://twiecki.github.io/blog/2016/06/01/bayesian-deep-
learning/#Uncertainty-in-predicted-value)

------
plusepsilon
I don't understand the math completely but it looks like dropout can be
derived from a Gaussian prior (approximating the Bernoulli) in a Bayesian
context.

One useful tidbit is that you can get prediction intervals from deep learning
models by running it forward N times with dropout and take the mean and
variance of that distribution (plus another precision term).

------
garagemc2
Can anyone explain like I'm 5? Or since this isn't reddit, like I'm 21?

~~~
ericjang
Suppose you train a neural net on cat pictures to classify the breed of cat.
We desire the property that if we were to feed in a picture of a horse instead
of a cat, we could somehow measure how good the network's parameters are for
classifying this particular image. This is uncertainty estimation, and Yarin's
blog post + thesis provides an elegant way to compute this, which get nearly
for free from the existing model.

Concretely, if you are trying to train a neural net to forecast stock prices
or drive a car safely, not only do you want to have predictions, but you want
to estimate some measure of how confident your model is of that prediction.
This is eminently useful for models that lean towards the "black-box"
spectrum, such as deep neural nets.

Note that parameter uncertainty and risk estimation are quite different, which
are addressed in this preliminary work
[http://bayesiandeeplearning.org/papers/BDL_4.pdf](http://bayesiandeeplearning.org/papers/BDL_4.pdf)

~~~
backpropaganda
What's the verdict on this? Does Dropout do parameter uncertainty or risk
estimation? Gal seems to be claiming the first, while the paper you linked
claims the second.

~~~
btown
This seems to be a novel application of dropout for uncertainty. The author's
2015 post linked by matheweis [0] gives an approachable walkthrough:

> I think that's why I was so surprised that dropout – a ubiquitous technique
> that's been in use in deep learning for several years now – can give us
> principled uncertainty estimates. Principled in the sense that the
> uncertainty estimates basically approximate those of our Gaussian process.
> Take your deep learning model in which you used dropout to avoid over-
> fitting – and you can extract model uncertainty without changing a single
> thing. Intuitively, you can think about your finite model as an
> approximation to a Gaussian process. When you optimise your objective, you
> minimise some "distance" (KL divergence to be more exact) between your model
> and the Gaussian process. I'll explain this in more detail below. But before
> this, let's recall what dropout is and introduce the Gaussian process
> quickly, and look at some examples of what this uncertainty obtained from
> dropout networks looks like.

[0]
[http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html](http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html)

------
aaronjg
This sort of approach is also used in reinforcement learning
[https://arxiv.org/abs/1702.01182](https://arxiv.org/abs/1702.01182)

------
deepnotderp
Awesome!

Gal's variational dropout is one of the paths forward to Bayesian deep
learning

------
clydethefrog
This got me excited since I was expecting a critique of the shortcomings of
current AI methods, in the spirit of Dreyfus [0]. It seems to me another
analytical approach to reinvent the wheel of phenomenology. Is the divide (and
the perceived hostility) between the continental and analytical schools so
big, that the two don't even share ideas anymore to improve these AI systems?

[0]
[https://en.wikipedia.org/wiki/Hubert_Dreyfus%27s_views_on_ar...](https://en.wikipedia.org/wiki/Hubert_Dreyfus%27s_views_on_artificial_intelligence#Vindicated)

~~~
btown
This is a (significant) technical advance in unifying two approaches (Bayesian
probabilistic modeling and deep learning) for well-defined machine learning
problems. It makes no claims about the philosophy and design of artificial
general intelligence.

------
terrahutte
Resolving uncertainty is a deeply human trait, though usually it just gives us
a false sense of confidence in what we think we know.

~~~
sabertoothed
Why is resolving uncertainty a human trait? Human, as in non-humans don't do
it?

Why would resolving uncertainty lead to a false sense of confidence?

Without some explanations it's impossible to understand what you want to say.

