
Ask HN: Best place to start learning about Markov Chains? - chrisherd
A progressive reading list or process to follow would be awesome
======
dcwca
Just pick a random place to start, read some stuff, and then take a guess as
to which direction to go in next, based on what's probably a good next thing
to read. Then keep repeating the process over and over again.

~~~
Scarblac
It's also important that you base your guess of what's probably good to read
next only on the previous thing you read. Forget everything that came before
that.

~~~
yonkshi
My friend Gibbs invented this really efficient way to learn.

------
gtycomb
So many there are. Starting with basic Probability, this lecture series is a
good first intro.

[https://www.dartmouth.edu/~chance/teaching_aids/books_articl...](https://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf)

Or starting from the basics, and learning how to actually do the number
crunching, this is unusually good (Stewart, Introduction to numerical solution
of Markov Chains):

[https://press.princeton.edu/titles/5640.html](https://press.princeton.edu/titles/5640.html)

Robert Gallager's MIT lecture series, very well presented, titled Principles
of Digital Communications, takes you on another train based on Markov Chains
(Kalman filters, etc).

[https://ocw.mit.edu/courses/electrical-engineering-and-
compu...](https://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-450-principles-of-digital-communications-i-fall-2006/)

------
activatedgeek
Markov chains in essence are simple. Instead of diverging and reading all the
theory, I'd recommend do it on a need basis. Learn as you go. So pick up a
problem and move ahead. I don't think it is fruitful to just learn everything
about Markov Chains just for the sake of it.

Markov Chain Monte Carlo to sample from probability distributions is a good
start - [https://arxiv.org/abs/1206.1901](https://arxiv.org/abs/1206.1901) if
you are into sampling.

~~~
AlexCoventry
Betancourt's survey is at least as good, and more up to date.

[https://arxiv.org/pdf/1701.02434.pdf](https://arxiv.org/pdf/1701.02434.pdf)

~~~
activatedgeek
That's a great reference too for the geometric intuitions!

------
thedevindevops
Tough one, I'd have to say:

45% [http://setosa.io/ev/markov-chains/](http://setosa.io/ev/markov-chains/)

30%
[https://en.wikipedia.org/wiki/Markov_chain](https://en.wikipedia.org/wiki/Markov_chain)

25% Youtube

~~~
snakeboy
The wikipedia page for Markov chains is really one of the best wikipedia pages
I've ever seen for a technical topic.

Covers a ton of ground, and gives concrete examples to motivate the ideas.

------
usgroup
1\. Elementary probability theory.

2\. Poisson processes.

3\. The Markov property.

4\. Stochastic processes.

5\. Realise that you’re missing a background in analysis, therefore you don’t
know sh?t about measure theory but you actually need it to know anything
deeper . Wonder to yourself if you really want to spend the next 3 years
getting a maths background you don’t have.

6\. Convince yourself that it’s all just engineering and middle through by
picking a project involving non trivial markov chain.

7\. Go back and spend 3 years doing foundational maths then repeat point 1-5.

~~~
larrydag
While I agree with the progression of knowledge listed here I don't think it
requires 3 years of foundation to math. If you have a basic understanding of
math already you should be able to pick up the theory fairly well in a couple
of months of research and application.

~~~
usgroup
I think when you get out of the basic linear algebra and calculus prerequisite
and into the analysis and measure theory prerequisite nothing takes a few
months anymore :)

------
YorkshireSeason
If you are not already intimately familiar with them learn about FSA (= finite
state automata), aka FSM (finite state machines).

Most interesting facts about Markov chains (e.g. the _Stationary Distribution
Theorem_ ) really are probabilistic generalisations of simpler facts about
FSAs (e.g. FSAs cannot be used to "count"). In my experience, understanding
them first for FSAs and then seeing how they generalise for the probabilitic
case is a good way of approaching this subject.

------
Vaslo
Here is an excellent place to start:

[http://setosa.io/ev/markov-chains/](http://setosa.io/ev/markov-chains/)

~~~
DevX101
Highly recommended. Preferred way to learn is to grasp an intuitive
understanding before diving deep into theory. This visual explainer is great
first step.

------
notinventedhear
For a broad introduction to Bayesian analysis, MCMC and PyMC I'd suggest
Bayesian Methods for Hackers[1]

[1] [http://camdavidsonpilon.github.io/Probabilistic-
Programming-...](http://camdavidsonpilon.github.io/Probabilistic-Programming-
and-Bayesian-Methods-for-Hackers/)

------
localhostdotdev
markov chains are very simple at their core (e.g. simple version could be:
take the probability of the next word given the known probabilities of words
that follow the previous word)

it can be implemented in a few lines of code, that's the beauty of it:
[https://github.com/justindomingue/markov_chains/blob/master/...](https://github.com/justindomingue/markov_chains/blob/master/lib/markov_chains/dictionary.rb)

obviously then you could take the previous n words into account, tweak the
starting word, add randomness, etc.

now replace "word" with "state" and "probability(next state | previous state)"
to edges of a graph:
[https://static1.squarespace.com/static/54e50c15e4b058fc6806d...](https://static1.squarespace.com/static/54e50c15e4b058fc6806d068/t/5650d16ee4b033f56d20ae6b/1459882428797/markov+chain+graph+all.png?format=1500w)

and you got a generic markov chain :)

footnotes: p(A | B) is probability of A given B, e.g. p(rain | clouds) >
p(rain | sun) :)

------
crshults
I thought this recent post: 'Generating More of My Favorite Aphex Twin
Track'[1] had a good beginner-level write up on Markov Chains.
[1][https://news.ycombinator.com/item?id=19490832](https://news.ycombinator.com/item?id=19490832)

------
nrjames
What I would do is use the Markovify python library and feed it with several
texts from Project Gutenberg... try to generate some Lovecraftian prose or
something...

[https://github.com/jsvine/markovify](https://github.com/jsvine/markovify)

------
YeGoblynQueenne
Personally, I started with Eugene Charniak's Statistical Language Learning [1]
then continued with Manning and Schütze's Foundations of Statistical Natural
Language Processing [2] and Speech and Language Processing by Jurafsky and
Martin [3].

The Charniak book is primarily about HMMs and quite short, so it's the best
introduction to the subject. Manning and Schütze and Jurafsky and Martin are
much more extensive and cover pretty much all of statistical NLP up to their
publication date (so no LSTMs if I remember correctly) but they are required
reading for an in-depth approach.

You will definitely want to go beyond HMMs at some point, so you will probably
want the other two books. But, if you really just want to know about HMMs,
then start with the Charniak.

______________

[1] [https://mitpress.mit.edu/books/statistical-language-
learning](https://mitpress.mit.edu/books/statistical-language-learning)

[2] [https://nlp.stanford.edu/fsnlp/](https://nlp.stanford.edu/fsnlp/)

[3]
[https://web.stanford.edu/~jurafsky/slp3/](https://web.stanford.edu/~jurafsky/slp3/)

------
evmar
For hidden Markov models (which only look into after you get the basics), I
recall that this widely-cited paper (perhaps the original?) is pretty
readable. From the title it looks like it's about speech but ignore the speech
parts and read the math:

[https://www.robots.ox.ac.uk/~vgg/rg/papers/hmm.pdf](https://www.robots.ox.ac.uk/~vgg/rg/papers/hmm.pdf)

------
danaugrs
I really like this short, relaxed video: "Information Theory part 10: What is
a Markov chain?" by Art of the Problem
[https://www.youtube.com/watch?v=o-jdJxXL_W4](https://www.youtube.com/watch?v=o-jdJxXL_W4)

If you like it I recommend watching the whole series.

------
jotaf
These are my favorite lecture notes, they assume almost no a-priori knowledge
(with an awesome review of basic probabilities) and yet they don't shy away
from explaining all the rigorous math.

If you have time to read step-by-step derivations and want to understand the
fundamentals, I think this is an excellent self-contained resource.

[https://ermongroup.github.io/cs228-notes/](https://ermongroup.github.io/cs228-notes/)

~~~
usgroup
“No prior knowledge” and “explain all the rigorous maths” are mutually
exclusive in my opinion. I stress this as honest advise to anyone reading.

Rigorous maths is akin to trying to explain to your non technical friends what
you do in devops: colloquialise it all you want, it’ll always be a shallow
story.

------
twiecki
If you are looking for an explanation of MCMC that focuses on intuitive
understanding to complement more mathematical introductions, I wrote a blog
post trying to explain things in simple terms here:
[https://twiecki.io/blog/2015/11/10/mcmc-
sampling/](https://twiecki.io/blog/2015/11/10/mcmc-sampling/)

------
ivan_ah
If you're interested in a basic math intro (starting from linear algebra
concepts), check out Section 8.2 in this excerpt from the book "No Bullshit
guide to Linear Algebra":
[https://minireference.com/static/excerpts/probability_chapte...](https://minireference.com/static/excerpts/probability_chapter.pdf#page=12)
This excerpt contains some exercises (with answers in the back) as well an
examples application (PageRank).

Technically Linear Algebra is not "required" to understand Markov Chains, but
it's a very neat way to think about them: each "step" in the chain is
equivalent to multiplication of the state vector by the transition matrix.

------
maurits
My personal favorite introduction to MC(MC) is lecture 1 of statistical
mechanics and computations [1]

[1]: [https://www.coursera.org/learn/statistical-
mechanics](https://www.coursera.org/learn/statistical-mechanics)

------
melling
I’ve got a couple of links here:

[https://github.com/melling/MathAndScienceNotes/tree/master/s...](https://github.com/melling/MathAndScienceNotes/tree/master/statistics)

------
jerednel
I learned quite a bit by exploring attribution modeling with them. There is an
R package where you can just faceroll a model without really understanding
anything so I tried recreating it in Python
[https://github.com/jerednel/markov-chain-
attribution](https://github.com/jerednel/markov-chain-attribution) \- its
messy for sure but it is a learning exercise and it helped me understand the
concept quite a bit. That currently only supports the simplest use case of a
first order markov chain.

------
jamesb93
Make one with a direct application. I did one to model melody from Bach in a
stupid way. It was made in Max, so I can't provide the size of the code in any
meaningful way, but its basically just a text file with an index and a number
of possibilities related to that index.

[https://soundcloud.com/jamesbradbury/9th-order-markov-
chain-...](https://soundcloud.com/jamesbradbury/9th-order-markov-chain-of-
bach)

------
sublimino
Markov Chains can be quite amusing when applied to a corpus of similar texts,
and often stunningly human-like. I maintain a list of humourous applications:
[https://github.com/sublimino/awesome-funny-
markov](https://github.com/sublimino/awesome-funny-markov)

Some favourites:

\- Erowid trip reports and tech recruiter emails -
[https://twitter.com/erowidrecruiter](https://twitter.com/erowidrecruiter)

\- Calvin and Markov - Calvin and Hobbes strips reimagined
[http://joshmillard.com/markov/calvin/](http://joshmillard.com/markov/calvin/)

\- Generate your future tweets based on the DNA of your existing messages -
[http://yes.thatcan.be/my/next/tweet/](http://yes.thatcan.be/my/next/tweet/)

\- Fake headlines created by smashing up real headlines -
[https://www.headlinesmasher.com/best/all](https://www.headlinesmasher.com/best/all)

\- The most confusing subreddit (often on the front page) -
[https://www.reddit.com/r/subredditsimulator](https://www.reddit.com/r/subredditsimulator)

The original Markov-generated content prank: "I Spent an Interesting Evening
Recently with a Grain of Salt"
[https://web.archive.org/web/20011101013348/http://www.sincit...](https://web.archive.org/web/20011101013348/http://www.sincity.com/penn-
n-teller/pcc/shaney.html)

And of course (un-amusingly!) - Google's PageRank algorithm is built on Markov
Chains
[https://en.wikipedia.org/wiki/PageRank#Damping_factor](https://en.wikipedia.org/wiki/PageRank#Damping_factor)

n.b. there used to be parodies of Hacker News, but both are down:
[https://news.ycombniator.com/](https://news.ycombniator.com/) and
[https://lou.wtf/phaker-news](https://lou.wtf/phaker-news)

------
maxmouchet
For an introduction to discrete and continuous-time Markov chains, as well as
an application to queuing theory, you can check the MOOC "Queuing Theory: from
Markov Chains to Multi-Server Systems" on edX [1].

[1] [https://www.classcentral.com/course/edx-queuing-theory-
from-...](https://www.classcentral.com/course/edx-queuing-theory-from-markov-
chains-to-multi-server-systems-10079)

------
thepill
[http://setosa.io/ev/markov-chains/](http://setosa.io/ev/markov-chains/)

------
DanBC
Not sure it's introductory, but A Mathematical Theory of Communication, page 5
onwards, is useful:
[http://www.math.harvard.edu/~ctm/home/text/others/shannon/en...](http://www.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf)

------
segmondy
The wikipedia page is actually good and how I learned about it.
[https://en.wikipedia.org/wiki/Markov_chain](https://en.wikipedia.org/wiki/Markov_chain)
follow through with some random googling, read then implement it. It's really
simple for something that sounds so fancy. :)

------
mindcrime
David Silver's course on Reinforcement Learning contains some good information
on Markov processes. See Lecture #2 in particular.

[https://www.youtube.com/playlist?list=PL7-jPKtc4r78-wCZcQn5I...](https://www.youtube.com/playlist?list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT)

------
platz
\-- Markov Decision Processes

there is a lot of info out there about markov chains, but very little about
markov decision processes (MDP).

How popular are MDP? What are their strengths? weaknesses?

\-- Kalman Filters vs HMM (Hidden Markov Model):

"In both models, there's an unobserved state that changes over time according
to relatively simple rules, and you get indirect information about that state
every so often. In Kalman filters, you assume the unobserved state is
Gaussian-ish and it moves continuously according to linear-ish dynamics
(depending on which flavor of Kalman filter is being used). In HMMs, you
assume the hidden state is one of a few classes, and the movement among these
states uses a discrete Markov chain. In my experience, the algorithms are
often pretty different for these two cases, but the underlying idea is very
similar." \- THISISDAVE

\-- HMM vs LSTM/RNN:

"Some state-of-the-art industrial speech recognition [0] is transitioning from
HMM-DNN systems to "CTC" (connectionist temporal classification), i.e.,
basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well.
Speech was one of the places where HMMs were _huge_, so that's kind of a big
deal." -PRACCU

"HMMs are only a small subset of generative models that offers quite little
expressiveness in exchange for efficient learning and inference." \- NEXTOS

"IMO, anything that be done with an HMM can now be done with an RNN. The only
advantage that an HMM might have is that training it might be faster using
cheaper computational resources. But if you have the $$$ to get yourself a GPU
or two, this computational advantage disappears for HMMs." \- SHERJILOZAIR

------
micheda
The hmm_filter project implements Viterbi-inspired algorithms and transition
matrices in Python, might be also a useful learning resource:
[https://github.com/minodes/hmm_filter](https://github.com/minodes/hmm_filter)

------
orasis
The most important thing is to realize just how damn simple they are. As you
get mired in the literature everything will seem overwhelmingly complex. Just
grok the very very basic idea of them and it will come easier.

Also, they’re just a convenient model (for some problems), not a holy truth.

------
AlexCoventry
You could try Gelman et al.'s _Bayesian Data Analysis_. It has a good overview
of MCMC.

If you want an overview of Markov chains as statistical models in their own
right, Durbin et al.'s _Biological Sequence Analysis_ is a well-motivated
overview.

------
ggggtez
There isn't really very much to learn. Just start on wikipedia, and expand out
if you think there is something more. Markov Chains are very simple in
practice.

------
i_am_proteus
If the "motivation-theorem-proof" style appeals to you, find a copy of _Finite
Markov Chains_ by Kemeny and Snell. ISBN 0442043287

------
ackbar03
How about a textbook maybe? There aren't always easy alternatives out there,
sometimes you have to bite the bullet and do the work

------
tnecniv
Do you have an application in mind to help guide suggestions?

As others have said, if you know know probability, start there.

------
currymj
You can find a copy of “Markov Chains and Mixing Times” online, which is good
and relatively accessible.

------
graycat
E. Cinlar, _Introduction to Stochastic Processes_

Covers limit theorems and continuous time.

