
Markov Chains Explained - cpeterso
http://techeffigy.wordpress.com/2014/06/30/markov-chains-explained/
======
tlarkworthy
Some advanced techniques applying Markov chains to load testing. I submitted
to HN but no-one was interested in it as a stand alone. Maybe people already
interested in them here would like to see how to make them dance in practice.

[http://edinburghhacklab.com/2014/03/taming-randomized-
load-t...](http://edinburghhacklab.com/2014/03/taming-randomized-load-
testing-2/)

------
NarcolepticFrog
It's nice to see an explanation of Markov chains (and some techniques for
predicting their behaviour) targeted at a wide audience. My only complaint is
that the provided definition of a Markov chain is somewhat unclear and
imprecise (as pointed out already by lambdaphage). An alternative definition
that I like a little better is given below.

A Markov chain is a sequence of random variables X_1, X_2, X_3, ... with the
property that the distribution of X_t given the complete history X_1, ...,
X_{t-1} is identical to the distribution of X_t given only the previous state
X_{t-1}. Intuitively, this means that the state X_{t-1} sufficiently
summarizes the history of the chain so that knowing the earlier states of the
chain does not allow you to better guess the next state. This property is
called the Markov property.

For those looking for a free and in-depth reference, I encourage you to check
out Byron Schmuland's course notes:
[http://www.stat.ualberta.ca/~schmu/stat580/2012notes](http://www.stat.ualberta.ca/~schmu/stat580/2012notes).
He also links to several other free references for Markov chains on his course
webpage:
[http://www.stat.ualberta.ca/~schmu/580](http://www.stat.ualberta.ca/~schmu/580).

~~~
kernel_sanders
Thank you for the clear explanation.

In a previous comment, I sought to further understand how the Lottery
possesses the Markov property. Based on your definition above, I can see that
it does simply because the distribution X_t of winning numbers has the same
dependence on X_{t-1} as it does on X_1, ..., X_{t-1}, that is, zero. Do I
have that correct?

~~~
NarcolepticFrog
Yes, exactly. More generally (and for the same reason), any sequence of iid
random variables will form a Markov chain.

For an example without the Markov property, consider the sequence of random
variables X_1, X_2, ... with X_1 being either -1 or 1 with equal probability,
and X_t being normally distributed with mean X_1 and standard deviation 1.

Knowing the history X_1, X_2, ..., X_{t-1} gives you the exact distribution of
X_t (since you know X_1), while only knowing X_{t-1} gives you much less
information. This fails to be a Markov chain because the state X_{t-1} doesn't
"remember" which of the two possible distributions is being used.

~~~
kernel_sanders
Awesome. Thanks! Again, very clear. This is interesting stuff. It makes me
curious about applications of stochastic processes in general - time to read
the course notes you linked. They look like fun problems to program and model.

Edit: Wikipedia, of course, has a good overview of applications of Markov
chains.
[http://en.wikipedia.org/wiki/Markov_chain#Applications](http://en.wikipedia.org/wiki/Markov_chain#Applications)

------
lambdaphage
The Markov property does not require dependence on the previous state-- it
requires that the distribution of the next state given the history of the
chain be the distribution of the next state given the current state. The
lottery trivially satisfies this property since tomorrow's winner doesn't
depend on today's winner at all.

~~~
ltjohnson
This is correct. Sadly, the first paragraph of the article contains some
glaring errors.

"For Markov chains to be effective the current state has to be dependent on
the previous state in some way;"

This is trivially untrue. A sequence of independently and identically
distributed (iid) random variables _is_ a Markov chain. An iid sequence is
clearly effective at many things (e.g. Monte Carlo integration).

"Not every process has the Markov Property, such as the Lottery, this weeks
winning numbers have no dependence to the previous weeks winning numbers." As
lambdaphage pointed out, the Lottery does have the Markov property.

~~~
kernel_sanders
I'm interested in the above article and the above two comments, however, I
don't understand the Lottery example. Can you clarify how it does have the
Markov property?

I'm not seeing how the distribution of possible winning numbers relates at all
to the current state. I'm trying to phrase this in the language of the above
two comments. Help me out if I've got it all wrong. =)

~~~
tlarkworthy
The Markov property is that you can model the system as being dependent on the
immediately previous state + noise.

The lottery ignores the previous state and is defined purely by the noise, so
it is a (trivial) markov process.

More complex systems depend on the entire history (e.g. to model a poker
player you have to consider all of their actions up to the current). Newtonian
systems are markov, if you know the state of the system you can run it forward
in time deterministically. Even if your knowledge of the state of the
Newtonian system is not fully known, you can still run the distribution of
states forward in time precisely.

current_state = previous_state + process_noise

typically expressed in matrix math but the idea is as simple as that.

------
pash
Among my favorites is Claude Shannon's model of language as a Markov processes
in his seminal paper on information theory [0]. His presentation of ergodicity
is also remarkably clear compared to the usual muddle about ensemble and time
averages.

(The book version of Shannon's paper [1] is even a bit better.)

0\. [http://www3.alcatel-
lucent.com/bstj/vol27-1948/articles/bstj...](http://www3.alcatel-
lucent.com/bstj/vol27-1948/articles/bstj27-3-379.pdf)

1\. [http://www.amazon.com/Mathematical-Theory-Communication-
Clau...](http://www.amazon.com/Mathematical-Theory-Communication-Claude-
Shannon/dp/0252725484)

------
michaelq
This is an excellent introduction to first and second-order Markov chains.
Please re-post once you've got (Python?) code snippets. Gifs are really
helpful at illustrating these principles.

There's a typo: You have the word "leaving" twice in a row.

------
jlukecarlson
Interesting article, would have been helpful when I was writing a paper on
Google PageRank since it is a markov chain. I do think it would be valuable to
talk about equilibrium distributions and how you can determine the tendency of
the system

~~~
cpeterso
How does PageRank use Markov Chains? That's an interesting idea.

~~~
pedrosorio
It models the web as a Markov Chain where web pages are states and the
transitions are the outgoing links.

~~~
jlukecarlson
Exactly and the actual page rank of the site comes from finding the
equilibrium distribution from the transition matrix, which in this case is
called a Google matrix.

------
jimmaswell
A good way I found to store and use markov chain data was through the
filesystem. Each words gets its own file, and every word that comes after that
word is put in that file. The problem of the probability of one word coming
after another solves itself if you just let the word be added to that list
multiple times, one for each occurrence.

~~~
wlievens
But a file system forms a tree whereas a Markov chain is a directed graph,
cycles allowed. Or are you thinking of a file system with cyclical symlinks?

~~~
e12e
As I understand it you would have, as an example (example as a cycle):

    
    
        file: As
        contains: I an a
    
        file: an
        contains: example
    
        file: example
        contains: example as
    

Here "as" points to "example", and "example" points to "example" and to "as".
You have your cycle: pick a random file (get example) pick a random word (get
as), lookup as (open the "as"-file), pick a random word (get "as" again, then
"an" then "example" … ).

No links needed, the words are the links -- for US English, and words no
longer that 8 characters, you could this on FAT16.

(The files are vertices, and they contain a list of edges to other vertices,
in the form of the name of the vertices).

~~~
jimmaswell
Yeah, that's how I had it. Newlines separating the words, and the chars that
can't go in the filesystem get escaped to base64 or something.

------
GFK_of_xmaspast
" No more randomly changing parameters until approximately close to the
mandated testing scenario! " \--> "I have not developed a function that goes
from desired frequencies to Markov probabilities ... So I prefer tuning the
transition matrices by hand. "

So he did a bunch of work and the net result is he still has to manually
search a (different) parameter space? How often are the load testing
requirements changing that this is any kind of net time savings at all?

------
michaelfeathers
Here's a Mark V. Chaney text generator in 16 lines of quasi-functional Ruby.
Considerably simpler than the table method. if you keep all successors to a
word (including duplicates) in an array, any random selection of the array
works. No need to compute probabilities.

[https://gist.github.com/michaelfeathers/2cf9e1599e06d0563f2e](https://gist.github.com/michaelfeathers/2cf9e1599e06d0563f2e)

