Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Best place to start learning about Markov Chains?
237 points by chrisherd 12 days ago | hide | past | web | favorite | 68 comments
A progressive reading list or process to follow would be awesome

Just pick a random place to start, read some stuff, and then take a guess as to which direction to go in next, based on what's probably a good next thing to read. Then keep repeating the process over and over again.

It's also important that you base your guess of what's probably good to read next only on the previous thing you read. Forget everything that came before that.

My friend Gibbs invented this really efficient way to learn.

If you don't feel ready to move on to something new, you could always read the same thing again.

Brilliant joke!

Are you describing Markov chains or how to learn about Markov chains?

More specifically Markov chain with monte carlo method (MCMC)

Did something like that: An organization with some boats, quite a lot of boats, some that might be involved in global nuclear war, maybe limited to sea, wanted to know how long some of the boats might survive. The ocean had Red and Blue boats and airplanes, and the Reds and Blues were looking for each other and trying to kill each other.

So, the state of the system was the remaining Red/Blue inventories.

Some work by Koopmans showed that the encounter rates were a Poisson process. So, the time to the next encounter had exponential distribution, depending on the current state.

At an encounter, depending on the types, could have the Red die, the Blue die, both die, or neither die. Then after the encounter, the state of the system changed. So, the state of the system was a continuous time, discrete state space Markov process subordinated to a Poisson process. That is, in part, a Markov chain.

Yes, there is a closed form solution, but the combinatorial explosion of the discrete state space size meant that a direct attack via the closed form solution was not reasonable.

But it was easy enough to do Monte-Carlo, that is, generate a few hundred sample paths and average those, get confidence intervals, etc. While in grad school working on operations research I did that. While the state space was enormous, the Monte-Carlo was really fast. On any computer of today, the code would run before could get finger off the mouse button or the Enter key. And running off 1 million sample paths would be feasible. For the random numbers I looked in Knuth's appropriate volume of The Art ... and used

X(n + 1) = X(n) * 5^15 + 1 mod 2^47

programmed in assembler.

Work passed review by famous applied probabilist J. Keilson.

Apparently the work was sold to some intelligence agency. I could guess which one, but then I'd have to ...!

I think you need to create a truly immersive experience to truly learn them.

It's a meta joke

It's hilarious because it's also a "semi"-decent method on how to learn knew topics in general.

And goes well with the article[1] we discussed yesterday[2] !

[1] https://billwadge.wordpress.com/2016/01/08/b-before-a/

[2] https://news.ycombinator.com/item?id=19608883

If there's a finite amount of literature on the subject, this advice will send the OP in circles with probability one.

You can then model the probability that you'll end up at any one given place after n steps as a markov chain.

Funniest thing I have read this week...

Well said ;)


So many there are. Starting with basic Probability, this lecture series is a good first intro.


Or starting from the basics, and learning how to actually do the number crunching, this is unusually good (Stewart, Introduction to numerical solution of Markov Chains):


Robert Gallager's MIT lecture series, very well presented, titled Principles of Digital Communications, takes you on another train based on Markov Chains (Kalman filters, etc).


Markov chains in essence are simple. Instead of diverging and reading all the theory, I'd recommend do it on a need basis. Learn as you go. So pick up a problem and move ahead. I don't think it is fruitful to just learn everything about Markov Chains just for the sake of it.

Markov Chain Monte Carlo to sample from probability distributions is a good start - https://arxiv.org/abs/1206.1901 if you are into sampling.

Betancourt's survey is at least as good, and more up to date.


That's a great reference too for the geometric intuitions!

The wikipedia page for Markov chains is really one of the best wikipedia pages I've ever seen for a technical topic.

Covers a ton of ground, and gives concrete examples to motivate the ideas.

1. Elementary probability theory.

2. Poisson processes.

3. The Markov property.

4. Stochastic processes.

5. Realise that you’re missing a background in analysis, therefore you don’t know sh?t about measure theory but you actually need it to know anything deeper . Wonder to yourself if you really want to spend the next 3 years getting a maths background you don’t have.

6. Convince yourself that it’s all just engineering and middle through by picking a project involving non trivial markov chain.

7. Go back and spend 3 years doing foundational maths then repeat point 1-5.

While I agree with the progression of knowledge listed here I don't think it requires 3 years of foundation to math. If you have a basic understanding of math already you should be able to pick up the theory fairly well in a couple of months of research and application.

I think when you get out of the basic linear algebra and calculus prerequisite and into the analysis and measure theory prerequisite nothing takes a few months anymore :)

You don't need much math to pick up the very basic theory, but after a certain point you're going to hit a hard wall unless you have a strong background in analysis.

Poisson processes are continuous time though. If you're interested in Markov chains you only need the discrete-time theory.

In discrete time and discrete space, it mostly just reduces to linear algebra.

No, can do continuous time discrete state space theory -- the jumps in the discrete state space are at the arrival times of the Poisson process -- that works out easily enough, especially if using Monte-Carlo. See my other post here on Red/Blue stuff.

Unless you’re interested in continuous markov chains, infinite state spaces, renewal theory, excessive functions, and so on.

That whole math sequence was part of my MBA program that culminated in Markov chains for synthetic options pricing after like, 9 months. And this is for business school students; not engineers :)

Sure, and for any given application it’ll be possible to explain Markov chains as they apply to it. I recently did a financial valuation course where we did an “intuitive derivation of Itos formula” so that we could skip the measure theory prerequisites. We also skipped talking about Reimann integrals and just accepted that sums are integrals at a limit ... we also glossed the separating hyperplane theorem so that we could say “no arb iff risk neutral measure exists”, and so on.

However, if you actually want a background in the theory of Markov chains, I don’t think this approach works.

Just work with discrete state spaces and otherwise be less concerned with measure theory. E.g., in stochastic control problems, don't sweat measurable selection!

Can you recommend textbook on these topic?

If you are not already intimately familiar with them learn about FSA (= finite state automata), aka FSM (finite state machines).

Most interesting facts about Markov chains (e.g. the Stationary Distribution Theorem) really are probabilistic generalisations of simpler facts about FSAs (e.g. FSAs cannot be used to "count"). In my experience, understanding them first for FSAs and then seeing how they generalise for the probabilitic case is a good way of approaching this subject.

Here is an excellent place to start:


Highly recommended. Preferred way to learn is to grasp an intuitive understanding before diving deep into theory. This visual explainer is great first step.

markov chains are very simple at their core (e.g. simple version could be: take the probability of the next word given the known probabilities of words that follow the previous word)

it can be implemented in a few lines of code, that's the beauty of it: https://github.com/justindomingue/markov_chains/blob/master/...

obviously then you could take the previous n words into account, tweak the starting word, add randomness, etc.

now replace "word" with "state" and "probability(next state | previous state)" to edges of a graph: https://static1.squarespace.com/static/54e50c15e4b058fc6806d...

and you got a generic markov chain :)

footnotes: p(A | B) is probability of A given B, e.g. p(rain | clouds) > p(rain | sun) :)

For a broad introduction to Bayesian analysis, MCMC and PyMC I'd suggest Bayesian Methods for Hackers[1]

[1] http://camdavidsonpilon.github.io/Probabilistic-Programming-...

I thought this recent post: 'Generating More of My Favorite Aphex Twin Track'[1] had a good beginner-level write up on Markov Chains. [1]https://news.ycombinator.com/item?id=19490832

What I would do is use the Markovify python library and feed it with several texts from Project Gutenberg... try to generate some Lovecraftian prose or something...


For hidden Markov models (which only look into after you get the basics), I recall that this widely-cited paper (perhaps the original?) is pretty readable. From the title it looks like it's about speech but ignore the speech parts and read the math:


Personally, I started with Eugene Charniak's Statistical Language Learning [1] then continued with Manning and Schütze's Foundations of Statistical Natural Language Processing [2] and Speech and Language Processing by Jurafsky and Martin [3].

The Charniak book is primarily about HMMs and quite short, so it's the best introduction to the subject. Manning and Schütze and Jurafsky and Martin are much more extensive and cover pretty much all of statistical NLP up to their publication date (so no LSTMs if I remember correctly) but they are required reading for an in-depth approach.

You will definitely want to go beyond HMMs at some point, so you will probably want the other two books. But, if you really just want to know about HMMs, then start with the Charniak.


[1] https://mitpress.mit.edu/books/statistical-language-learning

[2] https://nlp.stanford.edu/fsnlp/

[3] https://web.stanford.edu/~jurafsky/slp3/

I really like this short, relaxed video: "Information Theory part 10: What is a Markov chain?" by Art of the Problem https://www.youtube.com/watch?v=o-jdJxXL_W4

If you like it I recommend watching the whole series.

These are my favorite lecture notes, they assume almost no a-priori knowledge (with an awesome review of basic probabilities) and yet they don't shy away from explaining all the rigorous math.

If you have time to read step-by-step derivations and want to understand the fundamentals, I think this is an excellent self-contained resource.


“No prior knowledge” and “explain all the rigorous maths” are mutually exclusive in my opinion. I stress this as honest advise to anyone reading.

Rigorous maths is akin to trying to explain to your non technical friends what you do in devops: colloquialise it all you want, it’ll always be a shallow story.

If you are looking for an explanation of MCMC that focuses on intuitive understanding to complement more mathematical introductions, I wrote a blog post trying to explain things in simple terms here: https://twiecki.io/blog/2015/11/10/mcmc-sampling/

If you're interested in a basic math intro (starting from linear algebra concepts), check out Section 8.2 in this excerpt from the book "No Bullshit guide to Linear Algebra": https://minireference.com/static/excerpts/probability_chapte... This excerpt contains some exercises (with answers in the back) as well an examples application (PageRank).

Technically Linear Algebra is not "required" to understand Markov Chains, but it's a very neat way to think about them: each "step" in the chain is equivalent to multiplication of the state vector by the transition matrix.

My personal favorite introduction to MC(MC) is lecture 1 of statistical mechanics and computations [1]

[1]: https://www.coursera.org/learn/statistical-mechanics

I learned quite a bit by exploring attribution modeling with them. There is an R package where you can just faceroll a model without really understanding anything so I tried recreating it in Python https://github.com/jerednel/markov-chain-attribution - its messy for sure but it is a learning exercise and it helped me understand the concept quite a bit. That currently only supports the simplest use case of a first order markov chain.

Make one with a direct application. I did one to model melody from Bach in a stupid way. It was made in Max, so I can't provide the size of the code in any meaningful way, but its basically just a text file with an index and a number of possibilities related to that index.


Markov Chains can be quite amusing when applied to a corpus of similar texts, and often stunningly human-like. I maintain a list of humourous applications: https://github.com/sublimino/awesome-funny-markov

Some favourites:

- Erowid trip reports and tech recruiter emails - https://twitter.com/erowidrecruiter

- Calvin and Markov - Calvin and Hobbes strips reimagined http://joshmillard.com/markov/calvin/

- Generate your future tweets based on the DNA of your existing messages - http://yes.thatcan.be/my/next/tweet/

- Fake headlines created by smashing up real headlines - https://www.headlinesmasher.com/best/all

- The most confusing subreddit (often on the front page) - https://www.reddit.com/r/subredditsimulator

The original Markov-generated content prank: "I Spent an Interesting Evening Recently with a Grain of Salt" https://web.archive.org/web/20011101013348/http://www.sincit...

And of course (un-amusingly!) - Google's PageRank algorithm is built on Markov Chains https://en.wikipedia.org/wiki/PageRank#Damping_factor

n.b. there used to be parodies of Hacker News, but both are down: https://news.ycombniator.com/ and https://lou.wtf/phaker-news

For an introduction to discrete and continuous-time Markov chains, as well as an application to queuing theory, you can check the MOOC "Queuing Theory: from Markov Chains to Multi-Server Systems" on edX [1].

[1] https://www.classcentral.com/course/edx-queuing-theory-from-...

Not sure it's introductory, but A Mathematical Theory of Communication, page 5 onwards, is useful: http://www.math.harvard.edu/~ctm/home/text/others/shannon/en...

The wikipedia page is actually good and how I learned about it. https://en.wikipedia.org/wiki/Markov_chain follow through with some random googling, read then implement it. It's really simple for something that sounds so fancy. :)

David Silver's course on Reinforcement Learning contains some good information on Markov processes. See Lecture #2 in particular.


-- Markov Decision Processes

there is a lot of info out there about markov chains, but very little about markov decision processes (MDP).

How popular are MDP? What are their strengths? weaknesses?

-- Kalman Filters vs HMM (Hidden Markov Model):

"In both models, there's an unobserved state that changes over time according to relatively simple rules, and you get indirect information about that state every so often. In Kalman filters, you assume the unobserved state is Gaussian-ish and it moves continuously according to linear-ish dynamics (depending on which flavor of Kalman filter is being used). In HMMs, you assume the hidden state is one of a few classes, and the movement among these states uses a discrete Markov chain. In my experience, the algorithms are often pretty different for these two cases, but the underlying idea is very similar." - THISISDAVE


"Some state-of-the-art industrial speech recognition [0] is transitioning from HMM-DNN systems to "CTC" (connectionist temporal classification), i.e., basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well. Speech was one of the places where HMMs were _huge_, so that's kind of a big deal." -PRACCU

"HMMs are only a small subset of generative models that offers quite little expressiveness in exchange for efficient learning and inference." - NEXTOS

"IMO, anything that be done with an HMM can now be done with an RNN. The only advantage that an HMM might have is that training it might be faster using cheaper computational resources. But if you have the $$$ to get yourself a GPU or two, this computational advantage disappears for HMMs." - SHERJILOZAIR

The most important thing is to realize just how damn simple they are. As you get mired in the literature everything will seem overwhelmingly complex. Just grok the very very basic idea of them and it will come easier.

Also, they’re just a convenient model (for some problems), not a holy truth.

The hmm_filter project implements Viterbi-inspired algorithms and transition matrices in Python, might be also a useful learning resource: https://github.com/minodes/hmm_filter

You could try Gelman et al.'s Bayesian Data Analysis. It has a good overview of MCMC.

If you want an overview of Markov chains as statistical models in their own right, Durbin et al.'s Biological Sequence Analysis is a well-motivated overview.

There isn't really very much to learn. Just start on wikipedia, and expand out if you think there is something more. Markov Chains are very simple in practice.

If the "motivation-theorem-proof" style appeals to you, find a copy of Finite Markov Chains by Kemeny and Snell. ISBN 0442043287

How about a textbook maybe? There aren't always easy alternatives out there, sometimes you have to bite the bullet and do the work

Do you have an application in mind to help guide suggestions?

As others have said, if you know know probability, start there.

You can find a copy of “Markov Chains and Mixing Times” online, which is good and relatively accessible.

E. Cinlar, Introduction to Stochastic Processes

Covers limit theorems and continuous time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact