
Bayesian Generative Adversarial Networks - stablemap
https://github.com/andrewgordonwilson/bayesgan/
======
mendeza
Student enrolled in his Bayesian Machine Learning class ORIE 6741. The
formulation of adding bayesian inference to deep learning can be applied to
CNN's as well. I plan on using his work on Stochastic Gradient Hamiltonian
Monte Carlo to do Bayesian Inference on Image Segmentation as well. For more
interest, check out this blog, super interesting:
[http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_3d801aa...](http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_3d801aa532c1ce.html)

~~~
tMcGrath
I'd like to flag up this paper on HMC and minibatch inference:
[https://arxiv.org/abs/1502.01510v1](https://arxiv.org/abs/1502.01510v1). It
seems that SGHMC might not always behave as well as you might like.

I should say this isn't to pour cold water on the idea of Bayesian deep
learning - I think it's an extremely sensible idea and undoubtedly a good
direction for research, but just wanted to caution that this particular MCMC
method might have subtle flaws.

~~~
eref
I wished more academic papers would have a "Limitations" section. Often it is
so hard to tell what these are, e.g.: Does it scale well? How much
faster/slower is it than other methods? Does it converge reliably?

~~~
pishpash
No incentive for such a thing.

~~~
eref
Neither is there for reproducible code, but now there are artificial
incentives such as the Reproduction Challenge.

------
jchook
For a super quick (non-expert) explanation of a Generative Adversarial
Network, you essentially have 2 models that are pitted against each other:

1\. A generator ANN creates synthetic examples 2\. A evaluator ANN indicates
how "realistic" the examples are

The generator uses the evaluator output to incrementally improve it's
synthesis.

~~~
xcodevn
And for Bayesian GAN, you have a model that generates those two models that
are pitted against each other.

~~~
pas
What does it do with the models? So it samples a probability distribution, by
training GANs, but how does it evaluate the points? (Which GAN is better?)

Even after looking at the readme, I don't understand how it works. Could you
explain it a bit, please?

~~~
xcodevn
If you read the paper
[https://arxiv.org/abs/1705.09558](https://arxiv.org/abs/1705.09558), in
section 2.1, it defines two conditional posteriors for the parameters of the
generator and discriminator. Then, classical GAN is just a maximum likelihood
estimate (or, a MAP estimate with uniform prior) of the parameters.

Next, the paper said: how about sampling the whole posterior distribution
instead of finding only one point of maximum likelihood as classical GANs do.
This is the point where the famous Markov chain Monte Carlo (MCMC) algorithms
become useful. They use something called _stochastic gradient Hamiltonian
Monte Carlo_ , basically, it is a random walk algorithm, at each step, you
follow a _noisy_ gradient, as a result, you converge to the posterior
distribution instead of a local minima as gradient descent does.

The paper claims that sampling the whole posterior helps to resolve problems
with classical GANs. IMHO, this isn't a surprise claim, this is exactly what
is good about Bayesian statistics.

~~~
carbocation
Is it fair to presume that we'll next see a variational inference approach to
this, with faster but slightly less optimal results?

------
bitL
Very cool! Can't wait to try this with BEGAN! I am curious if this could model
some probabilistic decision making as well outside usual image/video
application, like with complicated/unknown (inverse) reinforcement learning
policies.

------
akyu
Resistance is futile. You will be assimilated.

