
Who Will Test the Tests Themselves? (2017) - vp
https://blog.scottlogic.com/2017/09/25/mutation-testing.html
======
sitkack
Mutation Testing [1] is a generalized form of fuzzing. It is also analogous to
Sensitivity Analysis [2]. As part of closing the feedback loop between the
code and the tests, if one had a repeatable way to break the code and measure
the selectivity in the test result, one could ensure that they are testing the
same thing as the code evolves.

Automatic program repair [3] tries to find patches to broken code, maybe goal
directed program breaking (possibly using DNNs) could be used to infer
properties for code so that better invariants could be discovered.

[1]
[https://en.wikipedia.org/wiki/Mutation_testing](https://en.wikipedia.org/wiki/Mutation_testing)

[2]
[https://en.wikipedia.org/wiki/Sensitivity_analysis](https://en.wikipedia.org/wiki/Sensitivity_analysis)

[3] [https://arxiv.org/abs/1807.00515](https://arxiv.org/abs/1807.00515)

~~~
drewcoo
The Wikipedia article states "fuzzing can be considered to be a special case
of mutation testing." I'm not sure I agree with that. Regardless, that does
not mean "mutation testing is a generalized form of fuzzing."

Normally, fuzzing is done from outside an interface to verify that the
interface works and the code behind it doesn't fail horribly. Normally,
mutation testing is done from behind an interface to verify that the tests are
doing something. Compared with typical unit tests, they may seem similar with
their use of randomness or ASTs as machines determine what to test and in that
they are not functional tests, but they are very different things.

~~~
sitkack
Both are mutation, I feel like this is splitting hairs, interior vs exterior.
It doesn't strike me as very different. What would be some other examples?

------
hansvm
Derailing the conversation a bit, what other strategies beyond mutation
testing do you use for validating your tests? I've caught test bugs with a few
techniques, but none of them are comprehensive, and I'd love to hear more
thoughts. Here are a few examples:

(1) Validate assumptions in your tests -- if you think you've initialized a
non-empty collection and the test being correct depends on it being non-empty,
then add another assert to check that property (within reason; nobody needs to
check that a new int[5] has length 0).

(2) Write tests in pairs to test the testing logic -- if your test works by
verifying that the results for some optimized code match those from a simple
oracle, verify that they don't match those from a broken oracle.

(3) If you're testing that some property holds, find multiple semantically
different ways to test it. If the tests in a given group don't agree then at
least one of the tests is broken.

~~~
divbzero
Approaches I take to reduce errors in tests include:

– Keeping tests as short, simple, and few in number as possible. Focus on
integration tests for overall input/output, and limit unit tests to the most
critical components only.

– Getting extra eyes on the tests. In some cases, this can include product
managers who help to define the expected behavior.

– Writing code without continually running the tests. This deviates from
strict TDD but allows for a second independent effort at specifying the
desired functionality. There are times when writing the code correctly reveals
that the tests were written wrong.

