
Implicit Generation and Generalization Methods for Energy-Based Models - gdb
https://openai.com/blog/energy-based-models/
======
lenticular
I'm not really clear on the difference between Bayesian methods and EBMs. EBMs
associate an "energy" to each point the the parameter space, but this energy
is just the log-partition function of some distribution. Sampling schemes for
such models are just MCMC methods, which are already a long-established genre
of Bayesian techniques.

The article talks about sampling from EBMs using Langevin Dynamics, but it
appears to be identical to Bayesian sampling with Langevin dynamics, which has
been fairly popular for a few years. Some of the other stuff is just focused
on minimizing the EBM, but then that's just identical to MAP/frequentist
estimates.

Also, they gloss over a lot of problems that Langevin dynamics has. Unlike
what they claim, it is not at all good at finding nodes separated by low-
probability regions, since it has to take increasingly small steps to maintain
asymtotic correctness.

~~~
yilundu
EBMs actually associate an "energy" to each point of the input distribution
which then defines a probability distribution through the Boltzmann
Distribution. It's true that Langevin dynamics get stuck at low-probability
modes and it would be worth drying with an adaptive version of HMC. However,
since we initialize chains with a random prior distribution, each individual
chain is individually likely to hit any mode so all modes are likely to be
explored.

~~~
lenticular
>EBMs actually associate an "energy" to each point of the input distribution
which then defines a probability distribution through the Boltzmann
Distribution.

Yes, this is precisely what MCMC methods do as well. Every posterior
distribution is a Boltzmann distribution for some energy function.

>However, since we initialize chains with a random prior distribution, each
individual chain is individually likely to hit any mode so all modes are
likely to be explored.

This is also a pretty standard technique in MCMC. But most high-dimension
Bayesian models have a huge amount of modes that that cannot be explored in a
reasonable number of samples/chains.

~~~
yilundu
Right and this arguably is especially the case for high dimensional image
datasets. Yet despite this case, we are able to train models on these high
dimensional dataset through MCMC (with some tricks) with good likelihood,
indicating that standard MCMC technique can actually scale up to very high
dimensional multi-modal situations which was previously thought to be
computationally intractable.

