
Markov Chain Monte Carlo Without the Bullshit (2015) - sebg
https://jeremykun.com/2015/04/06/markov-chain-monte-carlo-without-all-the-bullshit/
======
gaur
I think it's a bit rich of the author to advertise a bullshit- or jargon-free
explanation of MCMC and then launch into a litany of definitions and theorems
about directed graphs.

In fact, when I am looking for "bullshit-free" explanations of statistics
concepts, I am emphatically _not_ looking for a resource that reads like
lecture notes from a graph theory class.

~~~
argonaut
Skimming the article, I see barely any mathematical notation, relative to what
you would actually see in a lecture on this topic. This seems pretty math
sparse as theoretical topics go.

------
elchief
How would you explain Markov Chain Monte Carlo (MCMC) to a layperson (2010)?

[http://stats.stackexchange.com/questions/165/how-would-
you-e...](http://stats.stackexchange.com/questions/165/how-would-you-explain-
markov-chain-monte-carlo-mcmc-to-a-layperson)

------
Singletoned
> There we called the matrix A “a web matrix,” noted it was column stochastic
> (as it is here), and appealed to a special case of the Perron-Frobenius
> theorem to show that there is a unique maximal eigenvalue equal to one (with
> a dimension one eigenspace) whose eigenvector we used as a sort of
> “stationary distribution” and the final ranking of web pages.

Thank god this explantion doesn't have any of that bullshit in it.

~~~
wfo
Those words, except for the references to the Perron Frobenius theorem are
fairly basic mathematical literacy, and should already be well known to anyone
who is doing math or CS or programming or engineering or reading the blog in
question.

------
j2kun
Previous discussion:
[https://news.ycombinator.com/item?id=9331808](https://news.ycombinator.com/item?id=9331808)

~~~
j2kun
And, for the record, I still don't like superfluous statistics jargon :)

------
marknadal
Hallelujah that people are trying to call stuff like this out, I've really
been trying to encourage people in this direction. We need clear explanations
of things, not jargon (I don't care how it can be precise, it can also be
vague). Randall of XKCD has also done some interesting work in this area.
Props.

------
krick
Yeah, that wasn't exactly what was promised in the preface. In fact, unlike
many other his posts, this one is worse than a commonplace math-heavy
explanation. That way I wouldn't even understand what is the problem we are
solving, as drawing x with probability p(x) given p(x) for every x in finite
set X sounds pretty trivial thing to program w/o knowing anything about Markov
chains and stuff: I know for sure I can generate random uniform number y
somehow, so this one is a solved problem, and mapping y to the distribution
over a finite set D is simple: if y <= p(x1) return x1, if y <= p(x1) + p(x2)
return x2, ..., return xn.

The link to stackexchange in the comments is way more enlightening.

~~~
j2kun
> drawing x with probability p(x) given p(x) for every x in finite set X
> sounds pretty trivial thing to program w/o knowing anything about Markov
> chains and stuff

Then you misunderstood the complexity of the problem. It's okay, I did too
when I first learned about MCMC, and I proposed the same algorithm as you did
(in fact, your proposed algorithm is right there in the fifth paragraph of the
post. Did you read it before commenting here?). The problem is that if X is
exponentially large, and many of the probabilities are exponentially small,
your algorithm takes exponential time to draw even a single example.

~~~
krick
That's correct, that would be misunderstanding. No offense, I'm actually quite
a fan of your blog, but this one explanation was easy to misunderstand, in my
opinion. I didn't even quite understand, what we are trying to achieve here,
and suddenly found myself reading about Perron-Frobenius theorem. Compare it
to the explanations at stackexchange, especially this:

> The way MCMC achieves this is to "wander around" on that distribution in
> such a way that the amount of time spent in each location is proportional to
> the height of the distribution.

In fact, yes, I read the fifth paragraph, but didn't quite recognize what I
proposed as a "solution" earlier before you pointed this out.

------
chombier
Many thanks to the author! I have the exact same feeling described in the
introduction when reading anything about statistics, but this article made
perfect sense to me.

------
Houshalter
I think all the explanations of MCMC are really bad. There is a fantastic
visual explanation that I came up with that's really intuitive. It involves
the graph of the probability distribution, and walking around the area of it.
Unfortunately explaining it in text is hard, I need a whiteboard.

~~~
lolive
Youtube is your friend.

------
nshm
Any writing on this topic without a huge section on convergence is a BS. In
practice, what you need to know is that algorithm almost never converges just
like any complex randomized algorithm. If you want to train a reasonable
model, you need to look for something convex instead.

------
jschlesser
Thank You, I agree that explanations for people that already know are close to
worthless.

~~~
clifanatic
Well... how much should the author assume you already know? I think you'd
agree that this would be tedious if he started with a few hundred pages
describing calculus - preceded by another few hundred describing algebra. At
some point, it's reasonable to assume that the reader has a specific
background and that readers without that background aren't your target
audience.

