Hacker News new | past | comments | ask | show | jobs | submit login
Markov Chain Monte Carlo Without the Bullshit (2015) (jeremykun.com)
125 points by sebg on Sept 20, 2016 | hide | past | web | favorite | 20 comments

I think it's a bit rich of the author to advertise a bullshit- or jargon-free explanation of MCMC and then launch into a litany of definitions and theorems about directed graphs.

In fact, when I am looking for "bullshit-free" explanations of statistics concepts, I am emphatically not looking for a resource that reads like lecture notes from a graph theory class.

Skimming the article, I see barely any mathematical notation, relative to what you would actually see in a lecture on this topic. This seems pretty math sparse as theoretical topics go.

I'd agree... He defines "bullshit" pretty close to bullshit. I'd imagine it to mean a laymans version of an MCMC explanation. A standard intro to the topic for undergraduates is about equally technical.

Maybe a black box explanation would suite dev better.

I.e. Use this method to work out the probability density function from some data and these are its caveats and how to spot them.

In fact, the graph business is only applicable to the discrete-state case. Continuous-state cases are probably even more applicable, and you need different tools to prove convergence results there.

How would you explain Markov Chain Monte Carlo (MCMC) to a layperson (2010)?


> There we called the matrix A “a web matrix,” noted it was column stochastic (as it is here), and appealed to a special case of the Perron-Frobenius theorem to show that there is a unique maximal eigenvalue equal to one (with a dimension one eigenspace) whose eigenvector we used as a sort of “stationary distribution” and the final ranking of web pages.

Thank god this explantion doesn't have any of that bullshit in it.

Those words, except for the references to the Perron Frobenius theorem are fairly basic mathematical literacy, and should already be well known to anyone who is doing math or CS or programming or engineering or reading the blog in question.

I updated it to be simpler, but to be fair there is a preface to that paragraph saying "for long-time readers of this blog" to remind them of some of the details of a related previous post.

And, for the record, I still don't like superfluous statistics jargon :)

Hallelujah that people are trying to call stuff like this out, I've really been trying to encourage people in this direction. We need clear explanations of things, not jargon (I don't care how it can be precise, it can also be vague). Randall of XKCD has also done some interesting work in this area. Props.

Yeah, that wasn't exactly what was promised in the preface. In fact, unlike many other his posts, this one is worse than a commonplace math-heavy explanation. That way I wouldn't even understand what is the problem we are solving, as drawing x with probability p(x) given p(x) for every x in finite set X sounds pretty trivial thing to program w/o knowing anything about Markov chains and stuff: I know for sure I can generate random uniform number y somehow, so this one is a solved problem, and mapping y to the distribution over a finite set D is simple: if y <= p(x1) return x1, if y <= p(x1) + p(x2) return x2, ..., return xn.

The link to stackexchange in the comments is way more enlightening.

> drawing x with probability p(x) given p(x) for every x in finite set X sounds pretty trivial thing to program w/o knowing anything about Markov chains and stuff

Then you misunderstood the complexity of the problem. It's okay, I did too when I first learned about MCMC, and I proposed the same algorithm as you did (in fact, your proposed algorithm is right there in the fifth paragraph of the post. Did you read it before commenting here?). The problem is that if X is exponentially large, and many of the probabilities are exponentially small, your algorithm takes exponential time to draw even a single example.

That's correct, that would be misunderstanding. No offense, I'm actually quite a fan of your blog, but this one explanation was easy to misunderstand, in my opinion. I didn't even quite understand, what we are trying to achieve here, and suddenly found myself reading about Perron-Frobenius theorem. Compare it to the explanations at stackexchange, especially this:

> The way MCMC achieves this is to "wander around" on that distribution in such a way that the amount of time spent in each location is proportional to the height of the distribution.

In fact, yes, I read the fifth paragraph, but didn't quite recognize what I proposed as a "solution" earlier before you pointed this out.

Many thanks to the author! I have the exact same feeling described in the introduction when reading anything about statistics, but this article made perfect sense to me.

I think all the explanations of MCMC are really bad. There is a fantastic visual explanation that I came up with that's really intuitive. It involves the graph of the probability distribution, and walking around the area of it. Unfortunately explaining it in text is hard, I need a whiteboard.

Youtube is your friend.

Any writing on this topic without a huge section on convergence is a BS. In practice, what you need to know is that algorithm almost never converges just like any complex randomized algorithm. If you want to train a reasonable model, you need to look for something convex instead.

Thank You, I agree that explanations for people that already know are close to worthless.

Well... how much should the author assume you already know? I think you'd agree that this would be tedious if he started with a few hundred pages describing calculus - preceded by another few hundred describing algebra. At some point, it's reasonable to assume that the reader has a specific background and that readers without that background aren't your target audience.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact