
A Zero-Math Introduction to Markov Chain Monte Carlo Methods - rbanffy
https://medium.com/@benpshaver/a-zero-math-introduction-to-markov-chain-monte-carlo-methods-dcba889e0c50
======
svara
I really wanted to like this article, because I came accross the term MCMC
quite a bit, but never knew what it was exactly.

I'm slightly disappointed, though. Almost all of it is dedicated to
introducing prerequisite terminology (prior, posterior, likelihood, markov
chains) which probably a lot of readers will already be familiar with, and
then the actual explanation of MCMC is just this:

> To begin, MCMC methods pick a random parameter value to consider. The
> simulation will continue to generate random values (this is the Monte Carlo
> part), but subject to some rule for determining what makes a good parameter
> value. The trick is that, for a pair of parameter values, it is possible to
> compute which is a better parameter value, by computing how likely each
> value is to explain the data, given our prior beliefs. If a randomly
> generated parameter value is better than the last one, it is added to the
> chain of parameter values with a certain probability determined by how much
> better it is (this is the Markov chain part).

"subject to some rule", "it is possible to"... I feel like this is really
glossing over the actual explanation of how this works. Actually, I still have
no idea why there is a Markov chain. What is the structure of the chain? Where
does it come from and why can't we just sample the parameter without using a
chain?

Anyway, I appreciated most of the article and it started out really promising.
It would be great if the author could try to expand on that still-baffling
part ;)

~~~
j2kun
I humbly submit my article as an alternative. I clearly explain why there is
no alternative, and give a concrete description of the markov chain. Though
after the initial description of what problem MCMC is solving, I do dip into
the math.

[https://jeremykun.com/2015/04/06/markov-chain-monte-carlo-
wi...](https://jeremykun.com/2015/04/06/markov-chain-monte-carlo-without-all-
the-bullshit/)

~~~
mathcomp7
Your article is interesting. But I see a problem in your exposition, in MCMC
we don't know the probabilities of individual events. For example let p_n
proportional to 1/n^3. We know that 1/n^3 converges but we don't know the sum.
The key factor here is that we can compute the transition probabilities
without computing the sum of the series. In the continuous variable setting we
can compute the transition probabilities without computing difficult
integrals. I hope you can incorporate this key factor to improve the quality
of your article.

~~~
j2kun
I'm sorry but I can't make any sense of what you've written.

~~~
mathcomp7
Let X be the random variable that takes values in natural numbers n>1 with
p(X=k)=A/k^3 with constant A such that sum p(X=k)=1, you can't apply MCMC with
ypur explanation since you don't have an oracle for p(x=k). To apply MCMC you
only need an oracle for the quotient of probabilies, here the transition
probabilities are k^3/j^3 and you don't need to compute the unknown constant
A.

~~~
j2kun
I don't think this distinction is key. You're talking about a detail of the
Metropolis-Hastings algorithm, while I'm defining a problem that Metropolis-
Hastings is one of many algorithms to solve. Sure, Metropolis-Hastings might
not need an oracle, but providing an oracle doesn't make the sampling problem
any easier, and the way I wrote it makes it clearer why the sampling problem
is hard. "Avoiding a difficult integral" is an engineering detail.

~~~
mathcomp7
On one hand, mathematically speaking, the problem of sampling a finite state
system when the probabilities of each state are known is a trivial one, as you
know very well since you have shown that method in your posts (computing
cumulative probabilities ...) On the other side, your blog is about
computation and math, and in the realm of computation the complexity of
algorithms is a key factor. Mixing time and burning time of markov chains are
also computationally important.

The curse of dimensionality is a cause of Monte Carlo methods to be used
instead of other numerical methods in high dimensional problems.

You should communicate to your audience that the advances in stochastic
sampling and its application to science has revolutionize the field not
because a breathrough in maths but because great key insights that allow us to
explore multidimensional problems effectively with the help of computers.

Another important factor is that MCMC is used extensively in Bayesian
statistics, but to explain that main application of MCMC in your blog it is
necessary to introduce concepts like Bayer's theorem, likelihood, posterior
density function, and prior probabilities.

There are some blogs in which all those concepts are illustrated and code in R
or python is provided. Perhaps MCMC requires more than one post to be
adequately explained.

To sum up, I am a follower of your posts and I appreciate the effort you take
to show the main ingredients and the python code. My only desire is for you to
continue improving the content of your blog for us to follow and enjoy.

Merry Christmas to you and your family from a follower in Spain.

------
ironSkillet
It's silly to say something like "A non mathematical introduction to this
mathematical concept". What the author really means is that he's not using
equations in his introduction. Far too many people think equations are what
mathematics is solely about, and titles like this perpetuate that stereotype.

~~~
waynecochran
This post would have been better if it actually used equations. A few short
equations are easier to understand that a bunch of lengthy paragraphs. In fact
P(A|B) = P(B|A)P(A)/P(B) is compact and not hard to understand, but there are
entire books written about it!

[https://www.goodreads.com/book/show/10672848-the-theory-
that...](https://www.goodreads.com/book/show/10672848-the-theory-that-would-
not-die)

~~~
optimuspaul
I agree that equations should have been introduced, but I disagree that your
example is not hard to understand, which is the point. I'd rather read about
the concepts than look at esoteric equations.

------
sriku
Why call it a "zero math introduction" to a math topic? Kind of twists my guts
a bit to read that. Seems to suggest some of a wannabe or love-hate
relationship with math that you don't want to know any math but you find this
math topic interesting?

~~~
g_delgado14
One of my gripes with the current education system is that it trains the pupil
to simply regurgitate knowledge without ensuring that the pupil truly
understands the topic - in this case MCMC methods.

Articles such as this one are a great resource to help what would be a
mindless regurgitator actually understand what the whole point of an MCMC
method is for. The 'why' to the 'what' essentially. And you don't need math to
explain that.

~~~
johnhenry
> And you don't need math to explain that.

There's an argument that you need math to explain it because it is math --
just not what we're taught to think of as "math".

One of my gripes with the current education system is that it makes it hard to
recognize when math appears in situations not explicitly involving numbers,
equations, and matrices.

~~~
thomasfortes
It reminds me of a text that Feynman wrote about when he gave classes in
Brazil, the students knew all the theory and the math behind it, but they
could not recognize the phenomenon when it happened in the real world.

So I guess that is not only in math, but in the entire corpus of knowledge,
our system makes us prepared for grading tests, not for applying the
knowledge.

~~~
yesenadam
He told a room full of students that a French curve (that extremely curvy
shape used in technical drawing) has the remarkable property that the tangent
of the lowest point is always horizontal.. and they all believed him.

~~~
LgWoodenBadger
Well it's true, isn't it? I think that point was that they were astounded that
the curve had that feature because they didn't understand why it was always
horizontal. They thought it was magic.

~~~
yesenadam
Uh the point seemed that they knew the theory (of differentiation, tangents,
trig etc) but were absolutely helpless in practically applying it. Couldn't
relate the theory they had learnt to what they knew of the world, at all.
Didn't think 'But the lowest point of ANY curve is horizontal!'

------
curiousgal
Who exactly is the target audience for this? A lot of developers I’ve met who
hated math had the skills for it. If you can master a new programming paradigm
in a few weeks than you can do math, it’s not the monster all theses writers
make it out to be.

~~~
dsacco
_> A lot of developers I’ve met who hated math had the skills for it. If you
can master a new programming paradigm in a few weeks than you can do math,
it’s not the monster all theses writers make it out to be._

Let me start off by saying that, as someone whose research is applied
mathematics, I agree with you that math does not need to be "scary" if the
pedagogy is tailored to the right level and style.

That being said...I think you're being a bit optimistic (or cavalier?), and
maybe swinging too far to the other end of the spectrum. I don't believe
programming has much overlap with mathematics at all. It's very close to
applied logic, but I think even then it's specifically much more like an
engineering discipline than a mathematical one. At the higher levels of
computer science (like complexity theory) I see a much broader overlap with
mathematics, but you use very little of that in typical programming.

I think the skills that make someone a good programmer and the skills that
make someone a good mathematician are essentially orthogonal: I've met
mathematicians who are excellent programmers (and vice versa), but I
personally think that's more because they have learned to keep one discipline
"out of the way" of the other one. I don't think most computer scientists or
mathematicians actually make very good programmers (and vice versa) because
there is little actual overlap in their skillsets.

Once you get past calculus (and maybe linear algebra), math becomes primarily
about proofs, not computation. Proving abstractions in math and programming
software do not have a significant overlap. I think most people can reasonably
learn an undergraduate math curriculum (say, up to abstract algebra and
something like elementary number theory), and I think most people can
reasonably learn programming. But I don't see any realistic skill transference
between these two, i.e. learning one won't make you learn the other any
faster.

I'm not trying to claim one subject is obviously harder than the other, I'm
just saying that it might give someone false confidence to have them dive into
complex math just because they have experienced success in learning new
programming frameworks or paradigms every year for their career. That is
setting them up for failure - it takes significant mathematical maturity to
read through a textbook and understand it without a teacher, just as it takes
some domain-specific maturity to optimize learning a new programming language
from a textbook or documentation.

People who want to learn math should be prepared to spend 5 - 10 minutes
reading each page of a math textbook to really understand the material,
followed by a few hours on each chapter's problems. _This is absolutely doable
for an autodidact_ , but I think my framing is a bit more realistic than
yours. Even if they're innately capable of learning the math, they should be
prepared for it to be essentially as difficult as learning programming for the
very first time.

~~~
kristopolous
It's more the style. Often the technical jargon is arcane and there's usually
not a clear distinction between when the author is using a very exacting
mathematical term or just a generalized english term unless you are knees deep
and spend a large number of your waking hours in a very specific field.

Then there's the naming ... P(X) and we're in probability and X is a random
variable but if we are using just "p" then we're in physics and talking about
momentum. If it's "P" then it's power, unless it's P(x) and then we're talking
about geometry. And then you can italicize it, bold it, put a hat or a dot on
it or under it, make the braces square, straight, or curly, and you get
something else entirely; I mean completely different fields. Sometimes the
same thing is used in multiple fields and then you need substitution syntax
when using the two together. Great system!

So if you have a job where you are juggling 6 or so disciplines and you see a
jumble of stylized letters in an equation along with a description, it's a
fairly absurd system especially when the author assumes that all the readers
know what they mean when they say i+p(j)/k or that when they use common
everyday words which, in that particular branch of mathematics are actually
very specific technical terms.

I often read things and think "what on earth is this person saying?" and then
have to go back and decrypt this terribly designed mathematical language
everyone uses that we are all supposed to say is a glorious and perfect
interface. It's not, it's god awful and horrendous. The vast majority of
humanity run screaming from it and can't interface with it to work through
even simple concepts that they probably already know.

It's the modern version of ancient latin and it's become equally heretical to
insist that we must create a better, more humane, more consistent, more
discoverable, more flexible interface to describe the world that isn't just
vestigial symbols from far-flung authors spanning 3000 years thrown together
in a huge dumpster fire.

~~~
yorwba
Every discipline has the problem that it's jargon is impenetrable to
outsiders. Try showing someone with zero programming knowledge a text talking
about "string search" and "pointer chasing" and they will be just as confused.

Mathematics is only different because it is useful in many other fields, so
you get lots of non-mathematicians trying to make use of some result in
isolation. When they don't understand the explanation, they blame it on the
unfamiliar words and notation, but often that's just a symptom of not
understanding the concepts. Anecdote: I have been taking a course taught in
Chinese, which I don't speak very well, so all mathematical jargon is new to
me. But just seeing how they were used let me recognize the words for familiar
mathematical concepts.

Of course some mathematical writing is just bad, but usually mathematicians
will then agree that it's bad. But if some formula comes with an explanation
using words you don't understand, there is no way around looking up
definitions until you understand the prerequisites. There is no language you
could translate mathematics into to make it magically more understandable,
unless the translation always prepends an introductory textbook.

I don't think it's heretical to long for better notation, I just don't think
it's going to help much. But everyone is free to make up their own symbols, so
feel free to go ahead. If it's really better, people will adopt it, just as
they adopted superscripts for powers instead of one-off symbols, Leibniz
notation instead of Newton's, and so on.

------
happy-go-lucky
I have not read the article, but a title like _Zero-math introduction to
something_ may attract those who hate math, do not want to work on something
involving math, or are too lazy to re-familiarize themselves with it. What
they gain from reading such articles is short-lived. They cannot put it to
use. This leads to disappointment and a disturbing trend towards math anxiety.

As a kid, I tried to keep away from math not because I hated it but because my
math teacher did not know how to teach it. It was my conclusion then that they
had not learned the subject thoroughly. As an adult, I decided to take matters
into my own hands and started learning it by myself.

~~~
HiroshiSan
Though I agree with the theme of what you said (mainly that you shouldn't run
away from the math), I don't think it's as mutually exclusive as you paint it.
Its always good to get different view points of the same topic, and a lot of
the times not having it be do math heavy lets you have nice intuition behind
the topic. A good example is 3blue1brown's videos. He gives great intuition
without sacrificing rigour in his presentation, while not filling his videos
with the daunting formulas and such.

------
mathcomp7
I recommend the example in the book Doing Bayesian Data Analysis, in this book
to motivate MCMC : A politicean want to visit all of US states and each with
frequency proportional to the total population of the state. He takes
decisions to visit a new state comparing the population of the current and the
proposed state, He don't know in advance the total population of every state,
but applying MCMC he archieves his goal. Here you may think that computing the
total population of the US is a simple detail, but in other problem the
computation involves a very difficult multivariable integral with thousand of
variables, that in practice is impossible to calculate.

------
raister
It has occurred to me that the discussion has shifted towards programming vs
mathematics, instead of the actual discussion proposed by the OP to focus on
MCMC.

This happens a lot here, focus is lost very easily.

------
andrewprock
The first thing I saw when I loaded the page was a chart with numbers. If
charts are no longer math, then someone has certainly redefined the term math.

------
signa11
mcmc treatment in hopcroft, kannan is really really very good. highly
recommended. there are a large number of articles on the web e.g. at math-
intersection-programming which are based on that. but, i would really be wary
of any 'zero-math' thingy for that.

