
State of Mutation Testing at Google (2017) - estsauver
https://ai.google/research/pubs/pub46584
======
modeless
This tool seems like it would encourage writing exactly the kind of tests I
hate: change detectors. Tests that break every time you change the code
regardless of whether the change introduced a bug or not.

Change detector tests are worse than useless because they double the time it
takes to make a code change, while not providing any useful information. I
already know I changed the code, I don't need a test to tell me that.

We should aspire to write tests that aren't just change detectors, but catch
real bugs. Perhaps a tool like this could be useful in an inverse way, to tell
you that your test is sensitive to irrelevant details and should be made more
general.

~~~
rossjudson
Effective code tends to put a lot of functionality behind a relatively small
interface, and tends to work for a long time without needing attention.

A "change detector" is exactly what you want in this situation. Does the code
still fulfill the spec (tests)?

Tests that feel like they need to be changed every time the code changes are
generally white box instead of black box, or the interface to the code being
tested is too large.

~~~
jjoonathan
> Effective code tends to put a lot of functionality behind a relatively small
> interface

Do you have actual strategies for reducing high surface-area:volume
requirements into more easily managed low surface-area:volume code or is this
just wishful thinking?

Nobody likes shim layers but they (and other horizontal strata) happen for a
reason.

------
Vinnl
I feel like "mutation testing" is somewhat of a misnomer, since it sounds like
it's a form of testing and is e.g. complimentary to or a replacement for unit
testing. Rather, it's a measure of code coverage by your test suite.

As such, it's mostly useful when you've exhausted other, easier methods of
finding parts of your code you forgot to test, such as regular line, branch,
etc. coverage. I think there's few projects for which that's the case, and
that that more than anything "hindered" the adoption of mutation testing.

~~~
SloopJon
Although I'm sure I've heard the term before, I forgot what it meant. Your
comment helped me put it into context as a fault injection technique.

One point the paper makes in the introduction is that "coverage alone might be
misleading, as in many cases where statements are covered but their
consequences not asserted upon." To satisfy profile-guided coverage (e.g.,
gcov), the test doesn't have to be correct or useful, it just has to execute
the line or take the branch.

------
deckarep
I am not up on all the fancy A.I. tech out there....but how different is this
from something like go-fuzz which does fuzz testing based on genetic
algorithms (if my memory serves) written by Dmitry Vyukov from Google.

Edit link: [https://github.com/dvyukov/go-fuzz](https://github.com/dvyukov/go-
fuzz)

~~~
jjwsteele
Fuzz testing is different than the kind of mutation testing being referred to
here. Mutation testing is about creating 'mutant' versions of your source code
and determining whether your test suite detects the mutant. It is a measure
similar to code coverage, in that it is a measure for determining the
effectiveness of your test suite.

For example, a mutation may be changing a '==' to '!='. Your test suite is
then run over the mutated source code and the mutant is said to then be
'killed' if at least one test fails. This is repeated many times, each with a
different mutation to your source code. Your test suite is then given a
mutation score based on the number of mutants killed divided by the total
mutants.

Of course, there are some mutations that actually produce functionally
identical code to the original. This means it isn't always possible for your
test suite to kill every mutant.

Because many different mutations are made, with each one resulting in your
test suite being run, mutation testing can become expensive. Having only read
the abstract, this looks to be about a way for determining parts of source
code not worth mutating, hence reducing the amount of times your test suite
needs to be run.

~~~
Puer
Thank you for the clear explanation!

------
fenollp
There's also "An Industrial Application of Mutation Testing: Lessons,
Challenges, and Research Directions (2018)" by the same authors:
[https://ai.google/research/pubs/pub46907](https://ai.google/research/pubs/pub46907)

------
rbongers
It's interesting that they mention lines and line coverage so much and not
statement coverage. I would think that statement coverage would be a much more
effective measure of what should be mutated if the metric is available for the
instrumenting tool being used, otherwise it's often just going to be testing
which covered lines contain uncovered statements. In other words, it's doing
the coverage tool's job.

In any case, it seems like it could be a useful tool if developers know how to
use it. It seems like this is ideal for catching tests which fail to actually
test statements despite covering them. Like the post below mentions, it will
probably result in tests that just detect change if developers are not trained
on the tool and testing strategies.

------
mendelbot
Go Goran!!

