

Using Markov Chains to Generate Test Input - signa11
http://www.electric-cloud.com/blog/2009/09/15/using-markov-chains-to-generate-test-input/

======
NathanKP
Markov chains are a fun programming project. One of my early web projects was
an independent reinvention of the Markov chain:

[http://experimentgarden.blogspot.com/2009/11/software-
tool-f...](http://experimentgarden.blogspot.com/2009/11/software-tool-for-low-
order.html)

This is what happens when you are a self taught programmer who learned mostly
from basic coding primers rather than from the internet. I ended up
independently reinventing a lot of code and algorithms, including a basic
Markov chain implementation such as the one described here.

------
anonymousDan
It would be interesting to see how something like this could be integrated
into Quickcheck. However, it is not clear to me how you would use it to
generate _erroneous_ input. In the example he gives the training relies on a
sample of correct makefiles. Would it not be useful to also have a separate
network that produce makefiles that should probably cause an error?

~~~
ezy
It really depends on what units of data your chain (in this case, an n-gram
model) is dealing with. You can increase or decrease the granularity and
modeling depth to get different kind of erroneous data using the same training
data.

For example, given the same corpus, if you have an n-gram model of english
characters, obviously some[1] of the words will be erroneous. But if that
model is over words, none of the words will be erroneous, but the grammar will
still be broken in some cases. Extend that to an n-gram over collections of
grammatical phrases where the words, grammar might be correct, but it may have
a nonsensical meaning in some cases, etc.

[1] Obviously "some" is dependent on the amount and diversity of the training
data, model detail, etc.

