
Using Markov Chains to Generate Test Input - fogus
http://blog.electric-cloud.com/2009/09/15/using-markov-chains-to-generate-test-input/
======
kurtosis
This sounds like a bad idea to me if taken too seriously. Makefiles come from
a grammar not a 5th order markov process. The sampled strings from the markov
chain are not likely to be representative of the makefiles encountered in
practice. I suspect that the makefile language is probably too complex for
formal verification so I understand the problem, but I would prefer their
other approach where they take large numbers of real world makefiles and
compare the output seems a better choice. Just spend the effort and get a
larger and more diverse set of test makefiles.

~~~
emelski
The problem with relying solely on "real world makefiles" is that there is not
a lot of variation amongst those that are publicly accessible. The vast
majority of open source projects that use make are autoconf based, which means
the makefiles are all more-or-less the same, except that the list of source
files and the project name change. Once you can successfully build one of
these, you can probably build all of them.

There are a handful projects that don't follow this pattern, such as Mozilla,
but once you've built those they cease to be useful for flushing out new bugs.

So, what do you do once you've built all the open source projects you can
find, and you are still finding incompatibilities in your parser when you go
to customers? Markov chain-based makefiles are a reasonable way to extend the
breadth of our testing. Obviously it's not the only thing we do, but it is
another valuable tool to add to the toolbox.

~~~
kurtosis
Okay I understand where you're coming from. I'm really curious though, what
were some of the bugs that showed up? Were they easy to fix or did they
require adding a lot of special cases and patchwork to the code?

~~~
emelski
There were about a dozen issues uncovered using this technique. I would say
the majority of them were what I would call "real" bugs, as opposed to "sure,
that doesn't work but nobody cares". One example is that gmake allows $$ in
variable names, as in "FOO$$BAR=abc"; at the time our parser did not handle
this correctly. I did not have to create special cases to fix these bugs, at
least not any more so than any other parser feature.

------
albemuth
With order=4 you can ditch the monkeys and still write Shakespeare!

------
skwiddor
Bentley built his Markov chain using his Bell labs colleagues Pike &
Kernighan's Markov algorithm from The Practice of Programming

[http://books.google.co.uk/books?id=to6M9_dbjosC&pg=PA62&...](http://books.google.co.uk/books?id=to6M9_dbjosC&pg=PA62&lpg=PA62&ots=3YL0Fgy14b&dq=markov+practice+of+programming)

Pike had used this technique earlier when coding up
<http://en.wikipedia.org/wiki/Mark_V_Shaney> with Bruce Ellis

I'm sure they were not the first

~~~
brown9-2
I don't understand this. Are you under the impression that they are claiming
to have invented Markov chains? Or using them?

~~~
skwiddor
no, just providing more info, the author linked to Bentley's text for further
reference so I was filling in the back story.

