
Show HN: Markov chains explained visually - vicapow
http://setosa.io/blog/2014/07/26/markov-chains/index.html
======
jgable
Beautiful. I had seen Markov chains mentioned before, but had not looked them
up. Skimming the wikipedia page made sense (it's a state machine with
transitions determined by probabilities instead of defined events), but I
would not have had an intuitive understanding of why they are useful. The
explanation mid-way down about modeling the distribution of sunny and rainy
days really made it click for me.

~~~
danieldk
_The explanation mid-way down about modelling the distribution of sunny and
rainy days really made it click for me._

Another very easy to understand is language. Say, I give you a small text. You
could create a small state machine containing the possible transitions between
words. You could also compute probabilities (estimated from the text) of going
from one word to another (e.g the -> text vs. the -> possible, etc), and
you'll have a Markov chain.

Of course, this is a very weak model of language, and usually in such models
probabilities are modelled on at least the previous two states. But it turns
out to be very useful in practice, e.g. you can compute the fluency of
different formulations of the same semantics.

~~~
Orangeair
Is that how things like Swiftkey or Google Now can predict words so well? If
so, how do they do it so quickly?

~~~
regularfry
Very probably with the Viterbi algorithm:
[http://en.wikipedia.org/wiki/Viterbi_algorithm](http://en.wikipedia.org/wiki/Viterbi_algorithm)

~~~
danieldk
Viterbi is used for decoding e.g. HMMs, when there are multiple possible state
sequences given the observations, and you want to find the most probable. In a
normal (non-hidden) Markov model, you can just follow the state transitions
and get out a probability (or go to a state and see what the next
possibilities are).

~~~
regularfry
An HMM is exactly what you have in both predictive typing and speech
recognition, since in both cases you've got some form of sensor noise to deal
with.

~~~
danieldk
But predictive typing usually corrects per word. Given a sequence of words,
you can give the top suggestions (like Swype and others do) using a non-hidden
Markov model. For partially typed words, it's easier to take words from the
same suggestion list and rank them by combination of probability and some
similarity measure (e.g. edit distance). Possibly complemented by non-
suggestions through some other method (e.g. levenshtein automata). If you want
to correct

    
    
      predctiv tping an spech recgnition
    

Yes, you you'll want to use a hidden Markov model. If you already have

    
    
      predictive typing and speech recgn
    

then a normal Markov model will serve you fine. And since such predictive
keyboards do corrections per word, they are like the latter example and not
the former. For speech recognition (e.g. Google Now), you indeed need an HMM.

------
cscheid
This is really nice.

Minor nit #1:
[https://www.dropbox.com/s/2meqa8hhen9ztba/Screenshot%202014-...](https://www.dropbox.com/s/2meqa8hhen9ztba/Screenshot%202014-07-29%2010.37.58.png)
Seems like the graph visualization is sticking to the wrong coordinates
(dragging it to the left doesn't help; it moves back to the center)

Minor nit #2. I'd love to see a visualization of the "probability mixing"
interpretation of markov chains and stationary distributions, which is what
PageRank is really about. That is, it'd be really nice to have a visualization
of the fact that Markov chains are ultimately memoryless (it eventually
doesn't matter in which state you start for the distribution of events). I
think it could be done by exchanging "probabilities conditioned on the past",
which is most easily done by multiplying the entire probability vector by the
stochastic matrix and visualizing that.

~~~
mturmon
One way to get at what you're saying in #2 is to have a counter that
accumulates the number of epochs spent in state 1, state 2, state 3, for one
of these examples. Have a way to reset this counter, and compute the relative
frequency (by normalizing the counts-in-each-epoch by the number of epochs so
far).

Come to think of it, you could even calculate the first eigenvector of the
transition matrix, and show this eigenvector next to the relative frequency
mentioned above.

In fact, you could then plot an L1 distance of the relative frequency (above)
to the eigenvector, with one point for each epoch. This distance would
decrease (non-monotonically) towards zero as the number of epochs increases. I
believe it is (in the expectation sense) exponentially fast, but with an
exponent that depends on the transition matrix. (Typical transition matrices
would exhibit fast mixing,
[http://en.wikipedia.org/wiki/Markov_chain_mixing_time](http://en.wikipedia.org/wiki/Markov_chain_mixing_time)).

------
murbard2
Now look up Hidden Markov models,
[http://en.wikipedia.org/wiki/Hidden_Markov_model](http://en.wikipedia.org/wiki/Hidden_Markov_model)

How they can be calibrated in the finite case
[http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm](http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm)

And how they can be evaluated for arbitrary models
[http://en.wikipedia.org/wiki/Particle_filter](http://en.wikipedia.org/wiki/Particle_filter)

------
itodd
This is fantastic. I have encountered markov chains in my career and always
thought of them as a black box. This simple visualization makes it so easy to
understand what has previously been so hard for me. Thank you.

------
tel
The one thing to add to this is that usually each state doesn't emit a single
token ("I am in state 1" then "I am in state 2") but instead you assume that
each state has a range of possible actions and the likelihood of a choice of
action varies with state.

So if might not be that your model is sunny versus rainy but instead cold
front v warm front. Since rain is more likely during a cold front your
observation of rain increases your belief that the system is in the "cold
front" state.

~~~
taejo
That's a _Hidden_ Markov Model.

------
sinwave
Thought I'd point out a little typo. In your table with sliders for adjusting
probabilities of state transitions, the P(B|B) probability reads "P(B|A)".

Edit 1: Also, P(A|B) reads "P(A|A)".

Edit 2: Not trying to be too nitpicky, though. It's a really nice
visualization. Really excited about the growing use of d3 to visualize
algorithms. Is this inspired by Mike Bostock's post by that title?

------
ajanuary
Presumably in the B row it should read "P(A|B)" and "P(B|B)"

------
beenpoor
Thank you! I understood what Markov Chains are now. Nicely done and in a
simple understandable fashion.

I am also trying to understand what they call Hidden Markov Model
(specifically, I just cannot wrap my head around how it gets used in speech.
They just look like entirely different things). Would be awesome to see an
update with the Hidden MM.

~~~
tdaltonc
To understand an HMM: Start with a MM with two states. In state "A" we always
observe "X" in state "B" we always observe "Y". That's a basic MM. Now lets
change it to a HMM. Now, in state "A" we observe "X" 90% of the time and "Y"
10% of the time, and in state B we observe "Y" 90% of the time and "X" 10%.
The observations don't tell us exactly what state we're in. The state is
"hidden." We still get a clue about the state, but there is some uncertainty.

~~~
notduncansmith
This is an excellent explanation, thanks!

------
devindotcom
I've seen Markov chains applied to language generation - producing sentences
that make sense grammatically but not literally. Anyone know what the
connection is here? I think I have an idea but would like to see if it gets
independently verified by someone else.

~~~
ddt
There are complicated ways of doing this, but the naïve way is as follows:

First you need a corpus of text that's grammatically correct

Each node in the chain is a word or piece of punctuation. Each word has a
certain probability of being followed by every other word in the corpus,
including itself. There are a few different ways to start the sentence. One
approach is to start from the node for the punctuation mark ".", and only
selecting a following node that is not a period, since sentences don't tend to
start with punctuation. From there, use a random number generator to pick a
following node based on your probability matrix, rinse, repeat.

If you'll notice, there's no guarantee that it will be grammatically correct.
There's just some statistical likelihood that it will be.

~~~
scott_s
_If you 'll notice, there's no guarantee that it will be grammatically
correct. There's just some statistical likelihood that it will be._

Which is also true for human speakers.

------
sitk
This is an absolutely magical and intuitive (not to mention beautiful) way to
imagine the complex mathematical concept of a Markov Chain. This is the exact
sort of pedagogical tools that MOOCs and other educational software platforms
need to build and adopt to bring education into the 21st Century and finally
replace traditional teaching methods. How does this compare to what even the
best teacher could draw on a whiteboard? Teachers will still play an essential
role in the emotional and social development of students, and they can then
focus their energy on these things which software probably will struggle to
ever replace.

------
romaniv
Good lecture on Markov models:

[https://class.coursera.org/modelthinking-006/lecture/60](https://class.coursera.org/modelthinking-006/lecture/60)

------
dekhn
Back when I was in college (~20 years ago) I was struggling to understand
generative models, and I asked my CS professor.

he said, "imagine god is sitting around emitting DNA sequences. She has
sevearl 4-sided biased dice, rolls one of the 4-sided die, BAM, emit an A!
Again, roll the die, BAM, emit a A! Roll again, BAM, emit a T! Now, imagine
god is a fickle person, and between rolls, decides to roll a die to decide
which of the biased die to roll.

For some reason, that helped.

------
saganus
Very nice! I've never had to work with Markov chains but I've read about them
and they seem to pop up in lots of places.

Nice and simple and interactive explanation.

------
jesuslop
Gian Carlo Rota is always a pleasure to quote, despite he knowing it. One from
his reminiscence about Jack Schwartz, in his "Indiscrete Thoughts" Book
(TL;DR: Markov Chains seen as random maps):

The first lecture by Jack I listened to was given in the spring of 1954 in a
seminar in functional analysis. A brilliant array of lecturers had been
expounding throughout the spring term on their pet topics. Jack's lecture
dealt with stochastic processes. Probability was still a mysterious subject
cultivated by a few scattered mathematicians, and the expression "Markov
chain" conveyed more than a hint of mystery. Jack started his lecture with the
words, "A Markov chain is a generalization of a function." His perfect
motivation of the Markov property put the audience at ease. Graduate students
and instructors relaxed and followed his every word to the end.

Beuatiful visualizations.

------
granttimmerman
I created a Markov chain generator:
[https://gist.github.com/grant/561834963dc526495c45](https://gist.github.com/grant/561834963dc526495c45)

var numNodes=10;var roundNum=100;var a=[];for(var i=0;i<numNodes;++i){var
connections=[];var sum=0;for(var j=0;j<numNodes;++j){var
randNum=Math.random()/numNodes;randNum=Math.round(randNum _roundNum)
/roundNum;connections[j]=randNum;sum+=randNum}connections=connections.map(function(e){var
t=e_(1/sum);t=Math.round(t _roundNum) /roundNum;return
t});sum=connections.reduce(function(e,t){return
e+t});connections[numNodes-1]+=1-sum;connections[numNodes-1]=Math.round(connections[numNodes-1]_roundNum)/roundNum;a[i]=connections}console.log(JSON.stringify(a))

Copy and paste the output into the side bar.

~~~
TazeTSchnitzel
Please indent each line with a double space so it's properly monospace
formatted.

~~~
chatmasta
Yeah, your comment breaks the readability of this comment thread on my mobile
(as in, everything is horrible stretched horizontally).

------
pimgeek
It really helps, I used to learn Markov Chain Theory in graduate school, but
either I was not paying attention, or the tutor did not really intended to get
things better-explained, I never truely care about its real life application.

But this tutorial, both visually attractive and expained with real life
examples, make me want to re-learn this topic. Just quote one passage:

[if you made a Markov chain model of a baby's behavior, you might include
"playing," "eating", "sleeping," and "crying" as states, which together with
other behaviors could form a 'state space': a list of all possible states.]

Thanks for sharing!

------
dnautics
This is really great, but could you put in a bit how some transition matrices
aren't markov (e.g. [0 1; 1 0]) and the convergence criterion where you can
take M^n n->infinity and get the occupancy of the states?

~~~
aet
Why isn't your example Markov?

~~~
thetwiceler
It is in fact Markov; Markov just means that the probability distribution of
the future depends only on the present, and so the past adds no additional
information in conjunction with the present. That's certainly the case here.

This is an example of a Markov chain that is not aperiodic; what that means is
that, given a starting node, at any point in time in the future, it will
always be the case that it is impossible to be at a certain node. This ends up
meaning that the Markov chain never ends up reaching a steady state; rather,
its behavior is periodic!

~~~
dnautics
fix some n in n. if my starting state is [1; 0] then the probability of the
occupancy being [1; 0] after n cycles is either 1 or 0. If the starting state
is [0; 1] then the probabilty of the occupancy being [1; 0] is exactly the
opposite, so for a fixed point in the future, the probability is tightly past-
dependent.

~~~
aet
Yes, but I think it is still a Markov process since the information is
contained in its current state.

------
vicapow
There was a bug in the playground earlier that I just fixed that allows you to
share your Markov chains via the url hash. For example:
[http://setosa.io/markov/#%7B%22tm%22%3A%5B%5B0.9%2C0.1%2C0%2...](http://setosa.io/markov/#%7B%22tm%22%3A%5B%5B0.9%2C0.1%2C0%2C0%2C0%2C0%5D%2C%5B0%2C0.9%2C0.1%2C0%2C0%2C0%5D%2C%5B0%2C0%2C0.9%2C0%2C0%2C0.1%5D%2C%5B0%2C0%2C0%2C0.8%2C0.1%2C0.1%5D%2C%5B0%2C0%2C0%2C0%2C0.9%2C0.1%5D%2C%5B0.1%2C0%2C0%2C0%2C0%2C0.9%5D%5D%7D)

~~~
girvo
Hey, I think that link is so long it's breaking HN's layout! Reckon you can
chuck it in a URL shortener or something? :)

~~~
michaelcampbell
Verboten on HN.

------
bdavisx
Great! It would be nice to be able to stop the animations though, they are
distracting while you are trying to read the text.

The sunny/rainy probability example is perfect as a scenario.

------
nabeelahmed13
This is at a tangent, but I'm a fresh CS undergrad and this simple explanation
really hooked me.

So my question is, where can I find more of this stuff? MOOCs are tough to
manage with university, but if I wanted to learn more about these mathematical
concepts presented in an interesting way, where should I start looking?

I'm a tad bit indecisive about how good I am with CS theory but I know if I
took the leap and mastered some basics I would enjoy it. Any recommendations
will help.

~~~
theoh
Dasher
[http://www.inference.phy.cam.ac.uk/dasher/](http://www.inference.phy.cam.ac.uk/dasher/)
takes some of the Markov chain ideas to the next level. If you are a visual
thinker it is a good way to get a feel for probabilistic compression
techniques, specifically PPM and arithmetic coding.

Also good is the Viterbi Workshop (EDIT: unfortunately the URL below leads to
a broken version of the applet.)
[http://www.alantro.com/viterbi/workshop.htm](http://www.alantro.com/viterbi/workshop.htm)
It used to have animated Java illustrations of the Viterbi algorithm,
mentioned in other comments on this post. There are other working applets out
there, e.g.
[http://www.wirelesscommunication.nl/reference/chaptr05/recei...](http://www.wirelesscommunication.nl/reference/chaptr05/receiver/viterbi/epfl_1/viterbi1.htm)

The general field is called "algorithm animation" but it seems be out of
fashion. The old SRC Modula-3 distribution used to contain a cool package for
creating animations of algorithms. Here is a video made by the authors:
[http://www.youtube.com/watch?v=zIgu9q0vVc0](http://www.youtube.com/watch?v=zIgu9q0vVc0)

------
lewis500
Man the text is really well written! Am I right, everyone?

~~~
jaekwon
Lewis, is that you?

I wanted to tell Lewis that his domain (heylewis) is expiring.

Also, yes. Keep it up!

------
CharlesMerriam1
Very nice concept; a mvp

Just on first glance: 1\. first diagram and others, ball jumps from beginning
of BtoA arc to B without sliding along the arc.

2\. second diagram box was no P(B|B). That is boxes are mislabeled.

3\. strange, but arcs are sometimes at an angle. It appears to happen if they
are scrolled to, but no if drawn on the initial screen.

4\. while the R S on the next diagram does settle to a steady state, it starts
with random Rs and Ss marching across at random rates.

Good Start!

------
kevinwang
That's pretty cool. The markov chain diagrams seem very similar (identical?)
to deterministic finite automota. Would it be correct or incorrect to say that
a Markov Chain can be thought of as a DFA where the changes in state are
determined by probability?

~~~
kachnuv_ocasek
Yes, that's exactly what it is. There's nothing more to it.

------
vitd
Great explanation, but wow, those animations were awful! The movement was
great, but the seizure-like jump every time it hit a state was unwatchable. I
had to cover them up to get through the text.

------
lobotryas
Has anyone thought about or attempted to model game AI with Markov Chains
instead of decision trees? Ex: NPCs, wildlife or enemies that use Markov
Chains to react to their surroundings.

~~~
un_publishable
Not sure about individual-scale models, but Markov Chains are common in
Mathematical Ecology. You can do quite a lot in animal group modeling with
transition matrices and basic probability theory. This is a good book if you
can get it cheap (like most monograph textbooks):
[http://www.amazon.com/Introduction-Stochastic-Processes-
Biol...](http://www.amazon.com/Introduction-Stochastic-Processes-Biology-
Applications/dp/0130352187)

Also, props to submission author Victor Powell. Those force-directed graphs
are visually interesting, add to the article, and are even responsive (try
changing the window size).

------
granttimmerman
Markov chain circle generator:
[https://gist.github.com/grant/29afaa91349b1928b05e](https://gist.github.com/grant/29afaa91349b1928b05e)

------
_nullandnull_
Beautiful. What did you use to create the graphics?

~~~
bunkat
I'd be interested in this as well. Is it D3?

~~~
tdaltonc
From browsing the source, it looks like the graphics are rendered in D3 and
the sliders are connected to the graphics using Angular.

------
matthewcanty
I'm outta control on this page!

[ [0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],
[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1] ]

------
tigroferoce
Great explanation! Very easy. It could be perfect if you added some easy to
understand real life examples to play with.

------
Max_Horstmann
Very cool. Would love to see this generalized to an interactive visualization
of Markov Decision Processes (MDPs).

------
elwell
And where is Temple OS creator's comments? I believe they are markov chains of
a sort.

------
skriticos2
This totally reminds me of SpaceChem on higher levels (puzzle game for
programmers).

------
lnkmails
I would not "require" people to know Markov chains but I am usually surprised
how many programmers have no idea what it is and how it works and how it can
be used. It is a very powerful tool to model queues which is something most
distributed systems deal with :).

------
_raoulcousins
Can I use this for my class? Creative commons with attribution?

~~~
vicapow
To answer your first question, yes.

------
chenshu_ivory
Look so cool! Still hard to understand

------
lynchdt
This is really cool, nice work.

------
suchetchachra
Excellent visualization!

------
dolom
Really cool: well done!

------
mrcactu5
can I fork these?

------
richcuteguy34
Nice

Here's a model of chutes and ladder using Markov
[http://www.datagenetics.com/blog/november12011/index.html](http://www.datagenetics.com/blog/november12011/index.html)

And another for Candyland
[http://www.datagenetics.com/blog/december12011/index.html](http://www.datagenetics.com/blog/december12011/index.html)

~~~
moron4hire
Nice. These articles are great. Very much appreciated.

------
hyperliner
_" For example, the algorithm Google uses to determine the order of search
results, called PageRank, is a type of Markov chain."_

I had to research that to understand it:
[http://en.wikipedia.org/wiki/PageRank](http://en.wikipedia.org/wiki/PageRank)

Here is some key text from Wikipedia:

Google recalculates PageRank scores each time it crawls the Web and rebuilds
its index. As Google increases the number of documents in its collection, the
initial approximation of PageRank decreases for all documents.

The formula uses a model of a random surfer who gets bored after several
clicks and switches to a random page. The PageRank value of a page reflects
the chance that the random surfer will land on that page by clicking on a
link. It can be understood as a Markov chain in which the states are pages,
and the transitions, which are all equally probable, are the links between
pages.

If a page has no links to other pages, it becomes a sink and therefore
terminates the random surfing process. If the random surfer arrives at a sink
page, it picks another URL at random and continues surfing again.

~~~
mqsiuser
That's increadibly easy (it is!)

I can't understand why Google owns it.

Especially today ("~15 years later") and with a lot of good open source
software available.

~~~
angersock
[http://queue.acm.org/detail.cfm?id=988407](http://queue.acm.org/detail.cfm?id=988407)

------
Pinn2
The problem with Markov chains is that they are named after a person, which
makes math seem more like a "private club". For instance, why use "abelian
group", when "commutative group" will do? The reasons for wanting to be a
member of an exclusive group are psychological.

