Hacker News new | past | comments | ask | show | jobs | submit login
Who Will Test the Tests Themselves? (2017) (scottlogic.com)
20 points by vp on Sept 7, 2020 | hide | past | favorite | 6 comments



Mutation Testing [1] is a generalized form of fuzzing. It is also analogous to Sensitivity Analysis [2]. As part of closing the feedback loop between the code and the tests, if one had a repeatable way to break the code and measure the selectivity in the test result, one could ensure that they are testing the same thing as the code evolves.

Automatic program repair [3] tries to find patches to broken code, maybe goal directed program breaking (possibly using DNNs) could be used to infer properties for code so that better invariants could be discovered.

[1] https://en.wikipedia.org/wiki/Mutation_testing

[2] https://en.wikipedia.org/wiki/Sensitivity_analysis

[3] https://arxiv.org/abs/1807.00515


The Wikipedia article states "fuzzing can be considered to be a special case of mutation testing." I'm not sure I agree with that. Regardless, that does not mean "mutation testing is a generalized form of fuzzing."

Normally, fuzzing is done from outside an interface to verify that the interface works and the code behind it doesn't fail horribly. Normally, mutation testing is done from behind an interface to verify that the tests are doing something. Compared with typical unit tests, they may seem similar with their use of randomness or ASTs as machines determine what to test and in that they are not functional tests, but they are very different things.


Both are mutation, I feel like this is splitting hairs, interior vs exterior. It doesn't strike me as very different. What would be some other examples?


Derailing the conversation a bit, what other strategies beyond mutation testing do you use for validating your tests? I've caught test bugs with a few techniques, but none of them are comprehensive, and I'd love to hear more thoughts. Here are a few examples:

(1) Validate assumptions in your tests -- if you think you've initialized a non-empty collection and the test being correct depends on it being non-empty, then add another assert to check that property (within reason; nobody needs to check that a new int[5] has length 0).

(2) Write tests in pairs to test the testing logic -- if your test works by verifying that the results for some optimized code match those from a simple oracle, verify that they don't match those from a broken oracle.

(3) If you're testing that some property holds, find multiple semantically different ways to test it. If the tests in a given group don't agree then at least one of the tests is broken.


Approaches I take to reduce errors in tests include:

– Keeping tests as short, simple, and few in number as possible. Focus on integration tests for overall input/output, and limit unit tests to the most critical components only.

– Getting extra eyes on the tests. In some cases, this can include product managers who help to define the expected behavior.

– Writing code without continually running the tests. This deviates from strict TDD but allows for a second independent effort at specifying the desired functionality. There are times when writing the code correctly reveals that the tests were written wrong.


(1) Randomise your test inputs. If you think that will lead to flaky tests, then you already have flaky code.

(2) Be able to run the software locally in some meaningful way, (as opposed to only seeing the software in action in prod.) My most productive code/test/debug cycle always involves main(). I spend less time imagining what the software will do, and more time seeing what the software actually does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: