
My Favorite Algorithm: Metropolis-Hastings - mjfl
http://flynnmichael.com/2015/06/01/my-favorite-algorithm-metropolis-hastings/
======
Matumio
I like the build-up explanation in this chapter (Monte Carlo Methods) from
MacKay's book:
[http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/356.384....](http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/356.384.pdf)

------
xioxox
In practice, there are some nicer algorithms for doing Markov Chain Monte
Carlo, such as Goodman & Weare's Affine Invariant sampler, as implemented in
[http://dan.iel.fm/emcee/current/](http://dan.iel.fm/emcee/current/) . This
algorithm has the advantage of not needing a proposal distribution.

------
gfodor
I always go back and forth on which of the two things are more amazing: the
fact that we can accurately estimate the rendering equation in tractable time,
or the fact that the universe manages to do it in real time.

~~~
eru
The universe can take all the time it wants. You wouldn't notice.

------
mandelken
Listen to Press on the naming of the algorithm, seems like Metropolis just put
his name on the paper:

[https://youtu.be/4gNpgSPal_8?t=938](https://youtu.be/4gNpgSPal_8?t=938)

~~~
mjfl
This is true! It should be called the Rosenbluth-Hastings algorithm.

------
stellographer
Interesting choice...

Seems like genetic algorithms, particle swarms etc. would be more attractive
choices since they solve the same problem and are inherently parallelizable
while Metropolis Hastings is almost 100 years old and designed for a 4
function calculator.

Although I guess some people have meta-parallelized it... but still seems like
a patch job compared to modern likelihood navigation algos.

~~~
enupten
One of the parallelized versions is Gibbs sampling that is used for sampling
from Bayesian networks. In this case you don't even need a proposal
distribution; neither would you need the test from MH.

I think Graphlab ([https://dato.com/](https://dato.com/)) comes implemented
with something of this kind.

The trouble with Particle swarms/Genetic algorithms is that they aren't
guaranteed to sample from the underlying p.d. It is not yet apparent whether
you can find the mode of a distribution faster by choosing a Markov chain
whose stationary distribution is different from the underlying one.

~~~
stellographer
Are you sure that Gibbs sampling isn't just the multivariate version of MH?

What I'm saying is that convergence speed for MH is limited by the fact that
guesses cannot communicate with each other... which doesn't matter when you
have a pencil and a 4 function calculator like when it was designed.

A genetic algorithm or a particle swarm algorithm is capable of much swifter
convergence because the guesses _can_ communicate and influence the direction
of the drunken walk.

~~~
enupten
It _is_ a multivariate (co-ordinate wise) version of MH. You can paralleize it
because the Bayes net allows a decomposition of the p.d.

I feel like while your comment on Global-optimization algorithms may indeed be
true, I don't quite yet believe that the hacks they involve are quite that
general yet.

MH wasn't designed for Global optimization, and there is only one "particle".
I guess this what you meant by "parallel" ?

~~~
tel
Well, you can run multiple simultaneous particle chains and sum their results.
There's some wasted work since each will need to burn in, but modern
algorithms can make that go quite quickly.

~~~
enupten
_Sure_ \- but that's not what the OP is referring to. There is no "sync" step
in multiple parallel chains, unlike particle swarm & genetic algorithms.

MCMC is also used for finding MAP solutions, an operation which is strictly
less difficult than computing the partition function.

