
A Common Derivation for Markov Chain Monte Carlo Algorithms [pdf] - magoghm
https://arxiv.org/abs/1607.01985
======
joe_the_user
Oh, I'd actually found this approach searching a while back.

It frames all Markov Chain Monte Carlo as a variation of slice sample. I like
it because it makes the process fairly simple but I have tried to implement it
yet.

[https://en.wikipedia.org/wiki/Slice_sampling](https://en.wikipedia.org/wiki/Slice_sampling)

~~~
smallnamespace
Another way to look at this: slice sampling is analogous to taking the
Lebesgue integral rather than the Riemann integral (we sum small horizontal
strips rather than vertical bars).

------
timdellinger
It's interesting seeing how a community codifies the nomenclature and the
formalization of the various things that it has stolen/borrowed/reinvented
from other communities.

------
smallnamespace
Here's a very simple example of what MCMC does and how it can be useful:

\- Let's say I offer to play a game with you, where you take an unbiased coin,
flip it a thousand times, then we take the number of heads (H) and subtract it
from the number of tails (T), and I pay you $|(H-T)|. So if there are 520
heads and 480 tails, I'll owe you $40, but if it were reversed, you'd owe me
nothing. How much would you be willing to pay to play this game with me, i.e.
what’s the expected value of this game?

\- One simple approach would be to simulate the entire game many times to
estimate the payoff, which is 'vanilla' Monte Carlo simulation. We can do it
like this:

    
    
      import random
      def flip():
        # 1 or -1
        return random.randint(0, 1) * 2 - 1
    
      numDraws = 1000
      numFlips = 1000
      totalPayoff = 0;
      for i in range(numDraws):
        totalPayoff += abs(sum(flip() for _ in range(numFlips)))
    
      print('$' + str(round(totalPayoff / numDraws, 2))) # around $24
    
    

The more simulations we run, the more accurate our result will be, but we need
to run a full thousand flips for every attempt, so the time complexity is
O(flips * tries). What if the game had 10k or 100k flips instead of 1k?

This is the problem that MCMC solves: finding an efficient way to draw more
samples of a process, where each independent draw is hard to calculate. For
this problem, note that:

1\. If you start with a set of 1000 coins that are randomly initialized, and
you just keep picking up at random and flipping it from its current
configuration, after you flip it enough times, the equilibrium distribution of
the result will match the expected distribution of the coins after 1000 flips
in our original game.

2\. This means if you just keep randomly picking up a coin and flipping it,
then recording the payoff that _you would have gotten_ had that been the
thousandth flip, once you average all those payoffs, you'll also have an
estimate of payoff.

In code:

    
    
      numSteps = 1000 * 30
      numFlips = 1000
      totalPayoff = 0;
    
      # initialize our coins
      coins = [0] * numFlips
      for i in range(numFlips):
        coins[i] = flip() # 1 or -1
    
      currentTotal = sum(coins)
      totalPayoff = 0;
      for i in range(numSteps):
        coinToFlip = random.randrange(numFlips)
        coins[coinToFlip] *= -1
        currentTotal += 2 if (coins[coinToFlip] is 1) else -2
        totalPayoff += abs(currentTotal)
    
      print('$' + str(round(totalPayoff / numSteps, 2))) # Also ~$24
    

Now we've gotten the time complexity down to O(numSteps), regardless of the
number of coins that we're flipping. Now the question is: how does O(steps)
relate to O(numTries * numFlips)? It turns out with a bit of math, that we can
show that for a similar amount of accuracy, steps ~ tries * sqrt(flips), so
here we save a sqrt(numFlips) factor (one can also find tighter bounds). For
very long games, this can be a substantial speed improvement.

This is the essence of MCMC: when it's expensive to take independent draws of
a space, we can instead take a random walk around _once_ instance of the space
and still get a reasonable answer. A lot of the details of MCMC revolve around
1) how do I walk around the space and 2) how many steps must I take so I know
the answer is reasonable?

* For this toy problem, there is actually a simple closed-form solution for the probability so there’s no need to simulate, but for more complicated processes numerical simulation is often the best option available for getting a number

~~~
credit_guy
> This is the essence of MCMC: when it's expensive to take independent draws
> of a space, we can instead ...

I disagree. The essence of MCMC is that it's a clever MC scheme that defeats
the curse of dimensionality by using an adaptive importance sampling method.
In this method, you use information from the current sample to help draw a
better new sample, or at least not a much worse one. This way, even if you
start form a place far away from the region of interest, you end up in that
region; the trade-off is that the samples are obviously not independent
anymore. In very high dimensional spaces the region of interest is always very
tiny compared to the vastness of the space (that's the "curse of
dimensionality"), so even when it's very cheap to draw samples, you would be
hopeless without an importance sampling scheme.

~~~
smallnamespace
> by using an adaptive importance sampling method

Although not all MCMC schemes actually do this, e.g. naive Gibbs (re)sampling
just sweeps through the domain spaces iteratively.

You're right though MCMC enables you to do this relatively easily. Still
requires some work if to get this right, especially if your integrand has
multiple 'spikes'.

------
ranc1d
This was posted on here 2 years ago but didn't get any comments.

[https://news.ycombinator.com/item?id=12286521](https://news.ycombinator.com/item?id=12286521)

------
bob1029
Incomprehensible LaTeX soup IMO. What value does this paper afford one of us
"hackers"? It would take me weeks to digest this to only find it was a bunch
of bullshit talking in circles over exceptionally pedantic topics that have no
real world applications.

~~~
boxfire
For many researchers MCMC is described in very specific terms very differently
in many places. This paper offers a single view of how to understand the idea
and how you can use it to derive specializations for your field. Of course if
you have no interest in mathematics, just move on. Nothing about being a good
hacker means being abrupt and unwilling to dive deep in the necessary tools
for a job. I for one am glad to see a paper like this and know I can start
from here to build my own sampler for a complex system for which there is
otherwise no software to efficiently sample. The hacker spirit is curious and
open minded, but moves fast. If it's not a tool for you that doesn't mean it
can't help thousands of others gain real insight into a difficult subject.

