Avoiding Test-Case Permutation Blowout

algorithmsRcool · 2024-09-04T18:54:53.000000Z

This is where I fall back to Property Based Testing[0] and let the engine probe the high dimensional test space for me.

The test then becomes something like

  Check.Sample(
    Gen.Bool, Gen.Bool, Gen.Bool 
    (loggedIn, hasGreenHair, likesBananas) => { 
      var anyTrue = loggedIn | hasGreenHair | likesBananas;//the property to test
      var messagePrinted = formatMessage(loggedIn, hasGreenHair, likesBananas);

      Assert.Equal(anyTrue, messagePrinted);
    }
  );

This is obviously overkill for a test space of cardinality 8. But for complex cases this can provide better coverage of unexpected interactions than just stabbing a few positive and negative results.

Another approach for this test would be to use a metamorphic property (also possible with CsCheck). You basically come up with a rule that says something like, "if a user is logged in, then it should not matter if they have green hair or if they are like bananas". Then just let the framework search for falsifiable cases.

A real-world example of this was when I designed a scheduling system that would compile a set of abstract rules down to concrete events. I used metamorphic testing to look for cases where "If I add a new rule to a calendar, the total observable complexity of the calendar should be greater than or equal to before" (the equal case is if the new rule was equivalent to an existing rule. This allowed me to ensure that the internal complexity wasn't dropping observable details.

[0] https://github.com/AnthonyLloyd/CsCheck

tantalor · 2024-09-04T19:54:39.000000Z

> var anyTrue = loggedIn | hasGreenHair | likesBananas;//the property to test

Does this do what I think it does?

This is terrible practice. Tests should not replicate the business logic of the production code.

ris · 2024-09-04T21:43:17.000000Z

Implementing a very cut-down and basic sketch of the business rules of a particular observable effect can allow you to cover a lot of cases in a very compact and digestible form without falling into the traps that your actual business rules do. This is fairly fundamental to property-based testing.

If you use literal expected results for all of your test cases, you're probably either not covering enough cases or making your tests so verbose and long that they will start to diverge in weird ways.

sahmeepee · 2024-09-04T23:07:12.000000Z

I was looking for a way to handle this some time ago, and found that the good people at NIST had already thought about it a lot more.

They made a tool called ACTS which allows you to specify some variables which affect your testing, then generate a bunch of tests with combinations of just n variables at a time where typically n is 2 or 3 but can be higher, perhaps if you're automating the tests or have exceptional patience.

They have some empirical data on how the chance of finding additional bugs decreases as you increase n.

More info here or by searching for NIST ACTS:

https://csrc.nist.rip/groups/SNS/acts/documents/comparison-r...

fwip · 2024-09-04T17:43:21.000000Z

I feel like any time I've run into a situation like this, the function body is either a trivial `if` statement, or it's dependent on some complicated state.

In the former, truth-table testing doesn't really add much value, and in the latter, I'd reach for hypothesis testing/fuzzing.

That said, writing exhaustive truth-table test doesn't look so bad with table-driven tests, like:

    tests = [
        #logged_in, green_hair, likes_bananas, msg_shown
        (true, true, true, true),
        (true, true, false, false),
        ...
    ]
    for t in tests:
        assert(foo(t.0, t.1, t.2) == t.3)

randomdata · 2024-09-04T18:34:08.000000Z

> I'd reach for hypothesis testing/fuzzing.

These tend to be not so great at documenting intent and usage, though, which is the primary goal of your tests. Generally, I'd reach for both – offering a limited number of clearly defined tests to demonstrate common usage to future readers, while letting the fuzzing, etc. sort out the rest.

spatley · 2024-09-04T18:44:26.000000Z

This relates closely to a software test technique known as Equivalence class partitioning. https://en.m.wikipedia.org/wiki/Equivalence_partitioning

diatone · 2024-09-04T17:36:35.000000Z

Good idea, generalising: it’s sampling mappings from arbitrary domains to a binary range (the message appears, the message does not appear). This gets more complicated when your range isn’t binary but the underlying intuition, to take an informed statistical approach, is a necessary evil for tests to keep pace with a program as its complexity grows.

Just like many other industries. I recently learned about inspectors who sample cacao beans at the warehouse for the many ways they can go wrong before being processed.

As someone who really appreciates strong examples to motivate learning generalised approaches to work, a variation of this article could be a great motivation to introduce people to parameter based testing, or (closely related) fuzzing.

charlie0 · 2024-09-04T20:09:24.000000Z

What happens when you need to make a modification and now you need to re-learn the current set of test cases vs understanding just the variables and making a new test case for the matching set of variables?

JohnMakin · 2024-09-04T21:44:51.000000Z

This is generally how I first approach writing tests, however, there is a danger lurking here - sometimes these properties will be coupled in ways that are not immediately obvious and your tests will not catch it, occasionally causing weird problems elsewhere in the stack. Of course, you can always just add more tests after you uncover that, but this is always my deepest fear when approaching it in this manner. This is also a relatively binary/simple case - off/on. Business logic is frequently much more complicated than that.

w10-1 · 2024-09-04T18:15:17.000000Z

I think you have to distinguish dependent variables from independent.

If all variables are independent, then the number of test cases is the max of the number of (relevant) values for each variable (i.e., you can combine cases).

When a variable depends on another, you have combine them. But even then you can quantize the outcomes (i.e., which combinations result in different behaviors) and then partition the values to produce those combinations.

If you don't know if variables are dependent, you have to do some stochastic testing and find associations to investigate for causality, distinguishing confounders, etc. -- assuming that's cheaper than just running a full gamut of combinations.

Sometimes you can use grey-box indicators from the system itself to quantize behaviors, or define coverage as traversal through such indicated system states.

So, yeah: great start, but more to it...

codeulike · 2024-09-04T20:55:13.000000Z

Basically 'write less tests'

This reads like someone rationalising the slow loss of their religious faith

airbreather · 2024-09-04T17:55:18.000000Z

This is a fairly standard test quality/coverage used in industrial process control systems - test all normally expected behaviour and and all normal behaviour with any one given fault.

satisfice · 2024-09-04T22:26:21.000000Z

This sounds almost scientific, but it isn’t. It’s an arbitrary technique.

First, he speaks of permutations when he means combinations, and assumes that all variables are binary. Then he decides that eight test cases is “getting ridiculous.” Huh? It depends on how they are performed. No number is ridiculous a priori.

What he should be doing is thinking through the technical risk and his coverage goals. It may be that all pairs coverage is worth doing, or random testing, or exhaustive testing. Or testing based on sensitivity analysis. Or testing based on a decision table.

I don’t have a problem with simple heuristics, but the author does nothing to justify his personal preference.