> "Use Fixed Data Instead of Randomized Data
Avoid randomized data as it can lead to toggling tests which can be hard to debug and omit error messages that make tracing the error back to the code harder.
They will create highly reproducible tests, which are easy to debug and create error messages that can be easily traced back to the relevant line of code."
Good article, but I don't get this one at all; it almost seems like an anti-pattern. Choosing fixed data instead of random because the results are more "reproducible" seems to miss the point. If random data eventually helps uncover more bugs, then it's worth using!
I have seen that peoples response to randomized testing hinges on the answer to one failing question: Does the test log the input that led to a failure?
If the answer is no, then it's an immediate slam on the brakes. Tests aren't reproducible, they're flaky, this is impossible, etc.
If the answer is yes, then it's a revelation. The tests are finding me corner and edge cases I never would have thought of on my own, and I can enshrine and document them by creating a fixed-data unit test to cover them even before I start to work the bug (I am still doing TDD, after all), and hey, maybe I don't even have to think that much about the specific data, just its shape, and that will keep me honest and protect me from accidentally writing tautological tests. A short while later, you discover property testing, start achieving ever increasing test coverage with ever decreasing test code, have some sort of satori moment, tell your friends what you've discovered, and they tell you you're insane, randomize tests are flaky by definition, and flaky tests are Taboo. You slink away and change your legal name to Jonathan Livingston Tester.
I think the general idea is that you should be picking your input data to be as evil as possible, so that you're explicitly testing the edge cases; if you use random data you're not thinking about this aspect.
Fuzz testing is a separate thing, and should be done too. But it's (in my experience) supplemental to well-chosen hand-crafted input. It's hard to do TDD or ensure you have all your edge-cases covered if you're using randomized inputs.
For example, for datetimes, you can pick something nasty and unlikely like 1999-12-31T23:00 Pacific Time, which will read as a different day, month, and year if you have your timezone logic broken somewhere (e.g. you're parsing it as a naive UTC datetime somewhere). If you test randomly, that "dates at the end of the year parse incorrectly for 8 hours" bug is unlikely to be found.
If you're using generative testing, you'll most likely hit all those edge cases, and then some. Not sure what's the situation like in Java, for doing these types of tests, though, but something Quickcheck-like is a godsend in general.
Actually, you can get reproducible tests even with randomized tests, by setting the random seed (I always do that in e.g. Quickcheck). So this is really a moot point, not a good excuse not to do random testing (or not to use QuickCheck ;-P).
I don't believe any test frameworks in Java have built-in support for randomisation using a seed, so this is a foreign concept to most Java programmers. Which is a shame, because it's useful.
It would actually be really easy to package up seeded randomisation as a JUnit rule / extension. As far as i can tell, nobody has done that.
Reproducibility is a subtle but important part of all parts of a build, including tests.
What happens, in my own experience, is that randomized data in tests has a few unplanned downsides.
First, it makes you think your testing all possible values (since they all might be used, right?). But you aren't- you'll never run the test enough to see all values. And what you really should do is think for a few minutes about the edge cases and ranges and write multiple tests covering each edge.
The second downside I saw was "oh the tests failed so I just reran it and now it passes". This is especially difficult as your team brings in new people, less experienced people. Everyone is in a rush and yeah, that test fails like 1 in 30 of its runs but we don't have time to look into that right now. So you end up with this frustrating build that lets you continue when you have bugs, but sometimes fails.
Your unit tests are a spec. They specify your expectations of the system under test. If you use random data, then the only way you can make your spec accurate is to reimplement the program in your unit tests, also matching the spec. But this will incredibly tightly couple your code and your test, as changing one implementation will require you to change the other. Tests shouldn't be like that, imho.
Tests that use random input data are much more difficult to write correctly. Your test needs to know the expected result not just for one case, but for every possible case. That will vastly increase the number of bugs in your test code, leading to a seemingly endless stream of false-positives.
The worst part is that the feedback is late. The test will randomly fail long after it was written. There's a lot of needless overhead in relearning the context of the failing code just so you can fix a test that is not general enough for the data it was provided.
There are ways to effectively use randomly-generated test data, but it's harder than you'd think to do it right.
Tests with random inputs can be much easier to write. For example here's a test for a single hard-coded example:
public bool canSetEmail() {
User u = new User(
"John",
"Smith",
new Date(2000, Date.JANUARY, 1),
new Email("john@example.com"),
Password.hash("password123", new Password.Salt("abc"))
);
String newEmail = new Email("smith@example.com");
u.setEmail(newEmail);
return u.getEmail().equals(newEmail);
}
Phew! Here's a randomised alternative:
public bool canSetEmail(User u, Email newEmail) {
u.setEmail(newEmail);
return u.getEmail().equals(newEmail);
}
Not only is the test logic simpler and clearer, it's much more general. As an added bonus, when we write the data generators for User, Email, etc. we can include a whole load of nasty edge cases and they'll be used by all of our tests. I've not used Java for a while, but in Scalacheck I'd do something like this:
The other advantage is that automated shrinking does a pretty good job of homing in on bugs. For example, if our test breaks when there's no top-level domain (e.g. 'root@localhost') then (a) that will be found pretty quickly, since 'tlds' will begin with a high probability of being empty and (b) the other components will be shrunk as far as possible, e.g. 'user' and 'domain' will shrink down to a single null byte (the "smallest" String which satisfies out 'nonEmpty' test); hence we'll be told that this test fails for \0@\0 (the simplest counterexample). We'll also be given the random seeds used by the generators.
That generator function illustrates exactly the problem I'm talking about. The maximum length of a string in Java is 2^31-1 code points. If user is an 'arbitrary string', then it could be 2^31-1 code points long. If domain is also an arbitrary string, then it can also be 2^31-1 code points long. When you concatenate them and exceed the maximum string length, you will cause a failure in the test code.
There are almost always constraints within the test data, but they're complex to properly express, so they aren't specified. Then one day, the generator violates those unstated constraints, causing the test to fail.
> one day, the generator violates those unstated constraints, causing the test to fail
Good, that's exactly the sort of assumption I'd like to have exposed. As a bonus, we only need to can fix this in the generators, and all the tests will benefit. I've hit exactly this sort of issue with overflow before, where I made the mistaken assumption that 'n.abs' would be non-negative.
In this case Scalacheck will actually start off generating small/empty strings, and try longer and longer strings up to length 100.
This is because 'arbitrary[String]' uses 'Gen.stringOf':
The "size" of a generator starts at 'minSize' and grows to 'maxSize' as tests are performed (this ensures we check "small" values first, although generators are free to ignore the size if they like):
> Tests that use random input data are much more difficult to write correctly.
Interestingly, I personally find them easier to write. I actually find classic unit tests hard to write, probably because I am painfully aware of the lack of coverage.
While with property-based testing, I start from the assumption I have on what the code should do. Then the test basically verifies this assumption on random inputs.
Doing unit test with the given input seems to me backwards - it's like a downgrade, because I always start from what kind of assumption I have and based on this I choose the input. And why not encode the assumption, when you already have it in your mind anyway?
Your implementation is necessarily complex. That's why it may have bugs, and why it needs tests.
You have many more tests than implementations. In my experience, ~20x more. If your tests had bugs at the same rate as your implementation, you'd spend 95% of your time fixing test bugs and 5% fixing implementation bugs. That's why tests should be simple.
If you're going to be spending that much time on validating assumptions, I think you're better off trying to express them formally.
I think I disagree, but it really depends what you mean by "test" or "test case". I assume that test case is for a given input, expect certain output, and test verifies certain assumption, such as for a certain class of inputs you get a certain class of outputs.
I believe that you always test two implementations. For example, if I have a test case for a function sin(x), then I compare with the calculator implementation, from which I got the result. So if the tests are to be comprehensive (and automatically executed), then they have to be another implementation of the same program, you can't avoid it, and you can't avoid to (potentially) have bugs in it.
Now, the advantage is that the test implementation can be simpler (in certain cases); or can be less complete, which means less bugs, but also (in the latter case), less comprehensive testing.
In any case, you're validating the assumptions. The assumptions come from how the test implementation works (sometimes it is just in your head). And to express them formally, of course, that's the whole point.
For example, if you're given an implementation of sin(x) to test with, you can express formally the assumption that your function should give a similar result.
By formalizing this assumption, you can then let the computer create the individual test cases; it is a superior technique than to write test cases by hand.
You should test the edge cases explicitly instead of hoping the randomization will save. If there's some bounded set of values, like an enum, then test every value instead of randomly picking things and hoping for the worst. I don't want to know eventually. I want to know now.
> I don't want to know eventually. I want to know now.
That would be nice, but it's not the choice we're facing. The choice is between knowing eventually (by randomising) or knowing never (with determinism); or alternatively, between definitely finding out at 3AM when there's a production outage, or possibly finding out during testing.
Test suites get run a lot, so even a small chance of spotting cases we hadn't thought of can be worthwhile.
Also, it's much easier to run a randomised test with specific examples than it is to run a hard-coded test with randomised inputs (this is because "randomised tests" are actually parameterised tests, which take test data as arguments). Hence we might as well write the randomised version, then also call it with a bunch of known edge-cases (in QuickCheck-style frameworks this is just a function call, in Hypothesis we can also use the '@example' decorator).
If we go down this route there are also automated approaches to make things easier, e.g. Hypothesis maintains a database of counterexamples that it's found in the past, which it mixes into its random data generators. We can ask for these values and use them as explicit examples if we like.
In my experience writing "randomised tests" (i.e. property checking) is much easier and far more powerful than writing lots of hard-coded examples. I've done this in Haskell with QuickCheck, Scala with Scalacheck, Python with Hypothesis, Javascript with JSVerify and I hand-rolled a simple framework when I wrote PHP many years ago. Occasionally I find the urge to sprinkle a few hard-coded tests into the suite, but it rarely seems worth it.
I'm not sure who told you that non-random tests means non-parameterized tests but I don't think you'll get a lot of push back on parameterized tests from any of the commentators here.
The point is you should write exhaustive tests such that you couldn't imagine the randomized test finding anything new, especially on bounded sets of inputs. If you're not writing exhaustive tests because of a random strategy then yes, the choice is exactly between knowing immediately or later.
I've seen novices assume random tests are exhaustive even when they could think of several edge cases.
Lucene has randomized test cases, but the framework also prints out the (random) seed used at the beginning, so if you find an interesting scenario, you can reproduce it by using the same seed again.
+1. I personally like using random data if the test is concerned with only one input, but I avoid random data when checking e.g. pairs of numbers because of the slight chance that the random numbers would be unexpectedly equal.
Good article, but I don't get this one at all; it almost seems like an anti-pattern. Choosing fixed data instead of random because the results are more "reproducible" seems to miss the point. If random data eventually helps uncover more bugs, then it's worth using!