This is relatively subtle stuff, but here's an attempt at describing the general problem. I'm going to describe the deterministic case, but the probabilistic case is effectively the same.
Let's say you have a bug you suspect is from an interaction of any one pair of 10 features being "on" or "off", but you don't know which specific pair causes the problem. Encode each of the states you could set up your code by a 10-digit binary string: 0000000000, 0000000001, 0000000010, 0000000011, etc.
We could try the 45 possibilities in some order, and we would expect that on average it'd take us 22.5 tries to find the bug. But notice how your "target set" is smaller than the universe of strings: there's only 45 pairs of features, but 1024 strings.
What happens if we try a random string of ones and zeros? Now, instead of catching just one possible pair, we are covering many pairs. The only problem is that we now won't be able to know exactly which pair caused the problem when it does. But we can build a corpus of strings that don't trigger the error vs. strings that trigger the error, and a random sampling soon converges on the correct pair.
If you think about why this works, it's because any of these random strings has about a 1/4 chance to trigger the bug: wlog we can reorder the bits so that the buggy feature are the first two digits, and then we see that we have a 1/4 chance of hitting "11" on those two digits.
The problem is that as you increase the size of the subset that needs to be active, the probability that your random strings will actually catch the bug decreases exponentially. For any _fixed_ target size k (the number of features that need to be active), the overall complexity is still polynomial in n (the number of existing features). But if k is a constant fraction of n, then this technique takes exponential time in n.
Let's say you have a bug you suspect is from an interaction of any one pair of 10 features being "on" or "off", but you don't know which specific pair causes the problem. Encode each of the states you could set up your code by a 10-digit binary string: 0000000000, 0000000001, 0000000010, 0000000011, etc.
We could try the 45 possibilities in some order, and we would expect that on average it'd take us 22.5 tries to find the bug. But notice how your "target set" is smaller than the universe of strings: there's only 45 pairs of features, but 1024 strings.
What happens if we try a random string of ones and zeros? Now, instead of catching just one possible pair, we are covering many pairs. The only problem is that we now won't be able to know exactly which pair caused the problem when it does. But we can build a corpus of strings that don't trigger the error vs. strings that trigger the error, and a random sampling soon converges on the correct pair.
If you think about why this works, it's because any of these random strings has about a 1/4 chance to trigger the bug: wlog we can reorder the bits so that the buggy feature are the first two digits, and then we see that we have a 1/4 chance of hitting "11" on those two digits.
The problem is that as you increase the size of the subset that needs to be active, the probability that your random strings will actually catch the bug decreases exponentially. For any _fixed_ target size k (the number of features that need to be active), the overall complexity is still polynomial in n (the number of existing features). But if k is a constant fraction of n, then this technique takes exponential time in n.