Markov Chains Explained

tlarkworthy · on July 5, 2014

Some advanced techniques applying Markov chains to load testing. I submitted to HN but no-one was interested in it as a stand alone. Maybe people already interested in them here would like to see how to make them dance in practice.

http://edinburghhacklab.com/2014/03/taming-randomized-load-t...

TravisDick · on July 6, 2014

It's nice to see an explanation of Markov chains (and some techniques for predicting their behaviour) targeted at a wide audience. My only complaint is that the provided definition of a Markov chain is somewhat unclear and imprecise (as pointed out already by lambdaphage). An alternative definition that I like a little better is given below.

A Markov chain is a sequence of random variables X_1, X_2, X_3, ... with the property that the distribution of X_t given the complete history X_1, ..., X_{t-1} is identical to the distribution of X_t given only the previous state X_{t-1}. Intuitively, this means that the state X_{t-1} sufficiently summarizes the history of the chain so that knowing the earlier states of the chain does not allow you to better guess the next state. This property is called the Markov property.

For those looking for a free and in-depth reference, I encourage you to check out Byron Schmuland's course notes: http://www.stat.ualberta.ca/~schmu/stat580/2012notes. He also links to several other free references for Markov chains on his course webpage: http://www.stat.ualberta.ca/~schmu/580.

kernel_sanders · on July 7, 2014

Thank you for the clear explanation.

In a previous comment, I sought to further understand how the Lottery possesses the Markov property. Based on your definition above, I can see that it does simply because the distribution X_t of winning numbers has the same dependence on X_{t-1} as it does on X_1, ..., X_{t-1}, that is, zero. Do I have that correct?

TravisDick · on July 7, 2014

Yes, exactly. More generally (and for the same reason), any sequence of iid random variables will form a Markov chain.

For an example without the Markov property, consider the sequence of random variables X_1, X_2, ... with X_1 being either -1 or 1 with equal probability, and X_t being normally distributed with mean X_1 and standard deviation 1.

Knowing the history X_1, X_2, ..., X_{t-1} gives you the exact distribution of X_t (since you know X_1), while only knowing X_{t-1} gives you much less information. This fails to be a Markov chain because the state X_{t-1} doesn't "remember" which of the two possible distributions is being used.

kernel_sanders · on July 7, 2014

Awesome. Thanks! Again, very clear. This is interesting stuff. It makes me curious about applications of stochastic processes in general - time to read the course notes you linked. They look like fun problems to program and model.

Edit: Wikipedia, of course, has a good overview of applications of Markov chains. http://en.wikipedia.org/wiki/Markov_chain#Applications

lambdaphage · on July 6, 2014

The Markov property does not require dependence on the previous state-- it requires that the distribution of the next state given the history of the chain be the distribution of the next state given the current state. The lottery trivially satisfies this property since tomorrow's winner doesn't depend on today's winner at all.

ltjohnson · on July 6, 2014

This is correct. Sadly, the first paragraph of the article contains some glaring errors.

"For Markov chains to be effective the current state has to be dependent on the previous state in some way;"

This is trivially untrue. A sequence of independently and identically distributed (iid) random variables is a Markov chain. An iid sequence is clearly effective at many things (e.g. Monte Carlo integration).

"Not every process has the Markov Property, such as the Lottery, this weeks winning numbers have no dependence to the previous weeks winning numbers." As lambdaphage pointed out, the Lottery does have the Markov property.

kernel_sanders · on July 7, 2014

I'm interested in the above article and the above two comments, however, I don't understand the Lottery example. Can you clarify how it does have the Markov property?

I'm not seeing how the distribution of possible winning numbers relates at all to the current state. I'm trying to phrase this in the language of the above two comments. Help me out if I've got it all wrong. =)

tlarkworthy · on July 7, 2014

The Markov property is that you can model the system as being dependent on the immediately previous state + noise.

The lottery ignores the previous state and is defined purely by the noise, so it is a (trivial) markov process.

More complex systems depend on the entire history (e.g. to model a poker player you have to consider all of their actions up to the current). Newtonian systems are markov, if you know the state of the system you can run it forward in time deterministically. Even if your knowledge of the state of the Newtonian system is not fully known, you can still run the distribution of states forward in time precisely.

current_state = previous_state + process_noise

typically expressed in matrix math but the idea is as simple as that.

pash · on July 6, 2014

Among my favorites is Claude Shannon's model of language as a Markov processes in his seminal paper on information theory [0]. His presentation of ergodicity is also remarkably clear compared to the usual muddle about ensemble and time averages.

(The book version of Shannon's paper [1] is even a bit better.)

0. http://www3.alcatel-lucent.com/bstj/vol27-1948/articles/bstj...

1. http://www.amazon.com/Mathematical-Theory-Communication-Clau...

michaelq · on July 5, 2014

This is an excellent introduction to first and second-order Markov chains. Please re-post once you've got (Python?) code snippets. Gifs are really helpful at illustrating these principles.

There's a typo: You have the word "leaving" twice in a row.

jlukecarlson · on July 5, 2014

Interesting article, would have been helpful when I was writing a paper on Google PageRank since it is a markov chain. I do think it would be valuable to talk about equilibrium distributions and how you can determine the tendency of the system

cpeterso · on July 5, 2014

How does PageRank use Markov Chains? That's an interesting idea.

pedrosorio · on July 6, 2014

It models the web as a Markov Chain where web pages are states and the transitions are the outgoing links.

jlukecarlson · on July 6, 2014

Exactly and the actual page rank of the site comes from finding the equilibrium distribution from the transition matrix, which in this case is called a Google matrix.

jimmaswell · on July 6, 2014

A good way I found to store and use markov chain data was through the filesystem. Each words gets its own file, and every word that comes after that word is put in that file. The problem of the probability of one word coming after another solves itself if you just let the word be added to that list multiple times, one for each occurrence.

wlievens · on July 6, 2014

But a file system forms a tree whereas a Markov chain is a directed graph, cycles allowed. Or are you thinking of a file system with cyclical symlinks?

e12e · on July 6, 2014

As I understand it you would have, as an example (example as a cycle):

    file: As
    contains: I an a

    file: an
    contains: example

    file: example
    contains: example as

Here "as" points to "example", and "example" points to "example" and to "as". You have your cycle: pick a random file (get example) pick a random word (get as), lookup as (open the "as"-file), pick a random word (get "as" again, then "an" then "example" … ).

No links needed, the words are the links -- for US English, and words no longer that 8 characters, you could this on FAT16.

(The files are vertices, and they contain a list of edges to other vertices, in the form of the name of the vertices).

jimmaswell · on July 6, 2014

Yeah, that's how I had it. Newlines separating the words, and the chars that can't go in the filesystem get escaped to base64 or something.

adamcanady · on July 6, 2014

Is it then possible to model 2nd order Markov chains with every two words containing their own file? Filename: this_example Contains: is, can, etc...

jimmaswell · on July 8, 2014

The way you described it sounds possible, yes

GFK_of_xmaspast · on July 6, 2014

" No more randomly changing parameters until approximately close to the mandated testing scenario! " --> "I have not developed a function that goes from desired frequencies to Markov probabilities ... So I prefer tuning the transition matrices by hand. "

So he did a bunch of work and the net result is he still has to manually search a (different) parameter space? How often are the load testing requirements changing that this is any kind of net time savings at all?

michaelfeathers · on July 5, 2014

Here's a Mark V. Chaney text generator in 16 lines of quasi-functional Ruby. Considerably simpler than the table method. if you keep all successors to a word (including duplicates) in an array, any random selection of the array works. No need to compute probabilities.

https://gist.github.com/michaelfeathers/2cf9e1599e06d0563f2e