Implicit Generation and Generalization Methods for Energy-Based Models 10 points by gdb 30 days ago | hide | past | web | favorite | 4 comments

 I'm not really clear on the difference between Bayesian methods and EBMs. EBMs associate an "energy" to each point the the parameter space, but this energy is just the log-partition function of some distribution. Sampling schemes for such models are just MCMC methods, which are already a long-established genre of Bayesian techniques.The article talks about sampling from EBMs using Langevin Dynamics, but it appears to be identical to Bayesian sampling with Langevin dynamics, which has been fairly popular for a few years. Some of the other stuff is just focused on minimizing the EBM, but then that's just identical to MAP/frequentist estimates.Also, they gloss over a lot of problems that Langevin dynamics has. Unlike what they claim, it is not at all good at finding nodes separated by low-probability regions, since it has to take increasingly small steps to maintain asymtotic correctness.
 EBMs actually associate an "energy" to each point of the input distribution which then defines a probability distribution through the Boltzmann Distribution. It's true that Langevin dynamics get stuck at low-probability modes and it would be worth drying with an adaptive version of HMC. However, since we initialize chains with a random prior distribution, each individual chain is individually likely to hit any mode so all modes are likely to be explored.
 >EBMs actually associate an "energy" to each point of the input distribution which then defines a probability distribution through the Boltzmann Distribution.Yes, this is precisely what MCMC methods do as well. Every posterior distribution is a Boltzmann distribution for some energy function.>However, since we initialize chains with a random prior distribution, each individual chain is individually likely to hit any mode so all modes are likely to be explored.This is also a pretty standard technique in MCMC. But most high-dimension Bayesian models have a huge amount of modes that that cannot be explored in a reasonable number of samples/chains.
 Right and this arguably is especially the case for high dimensional image datasets. Yet despite this case, we are able to train models on these high dimensional dataset through MCMC (with some tricks) with good likelihood, indicating that standard MCMC technique can actually scale up to very high dimensional multi-modal situations which was previously thought to be computationally intractable.

Search: