The Design and Use of QuickCheck

tmoertel · on June 13, 2017

One of my favorite examples of property-based testing is in the docs for Haskell's Text.ParserCombinators.ReadP module. The semantics of the combinators are succinctly (but fully) documented in the form of QuickCheck properties.

https://hackage.haskell.org/package/base-4.9.1.0/docs/Text-P...

moomin · on June 13, 2017

The first line sounds like hubris, but amazingly it's actually true. QuickCheck makes extremely good (not over-complex or 'clever') use of return type polymorphism and applicatives.

Property testing is a good idea in any language, but its particularly elegant in Haskell.

davnn · on June 13, 2017

Also: it's the coolest example of type classes Simon Peyton Jones knows and he explains why here (around 34-35 min):

https://youtu.be/6COvD8oynmI

k__ · on June 13, 2017

This seems to be the way to go for testing.

I've wanted to use it for my current React-Native project. I found a JS implementation AND a babel plugin that creates the tests from flow-types.

The question for me here is, how long does this take compared to regular testing? I read people complained about AVA because of bad test performance (it does many parallel tests).

Also, where to put it in the chain?

Should I put it pre-push, pre-build, somewhere in the CI?

mukeshsoni · on June 13, 2017

Can you please provide the link to the babel plugin which creates the random values from flow types?

scriptdevil · on June 13, 2017

Very likely this: http://github.com/unbounce/babel-plugin-transform-flow-to-ge...

https://medium.com/@gabescholz/randomized-testing-in-javascr...

lacampbell · on June 12, 2017

Has anyone here converted from 'regular' unit testing to quickcheck, or other property-based unit testing libraries? I'm wondering if it's worth taking the time out to learn, but I have my reservations regarding non-deterministic testing.

notjack · on June 13, 2017

Some property testing tools like Python's Hypothesis[1] allow specifying specific example values for properties in addition to a general set of values so you get some specific deterministic tests. Hypothesis also saves falsifying values it found previously in a database that it reads from the next time you run it.

[1] http://hypothesis.works/

dbaupp · on June 13, 2017

I'm curious how the database works with, say, external CI systems like Travis and on multi-developer projects? Is it (or can it be, sensibly) committed to the repository or otherwise persistent with/near the code so that it transfers across machines and everyone gets the same testing environment?

Of course, with randomised testing, there's an inherent non-reproducibility, so maybe this isn't as unfortunate as it sounds?

alexwlchan · on June 13, 2017

For CI systems like Travis, people add it to the cached directories, and it's shared between runs. I know Travis, Circle and AppVeyor all have some way to cache data between runs – nominally for dependencies, but .hypothesis works too.

According to our docs (http://hypothesis.readthedocs.io/en/latest/database.html?hig...), you can check the examples DB into a VCS and it handles merges, deletes, etc. I don't know anybody who actually does this, and I've never looked at the code for handling the examples database, so I have no idea how (well) this works.

If tests do throw up a particularly interesting and unusual example, we recommend explicitly adding it to the tests with an `@example` decorator, which causes us to retest that value every time. Easier to find on a code read, and won't be lost if the database goes away.

(Disclaimer: I'm a Hypothesis maintainer)

notjack · on June 13, 2017

I think the default storage format Hypothesis uses is a flat file with a diff-friendly format so it's easy for developers to check it into source control, and it's easy for patches to update the database without exploding the git repo size due to giant binary diffs. Sqlite3 might also be an option but I'm not up to date on the details. As a neat side effect of the diff-friendly format, it's easy to review new falsifying inputs added to the database in pull requests.

brepl · on June 13, 2017

> a diff-friendly format ... it's easy for patches to update the database without exploding the git repo size due to giant binary diffs

Interesting - I understood that Git stores whole files, not diffs, so I'm surprised this is a significant feature.

zaphar · on June 14, 2017

I'm pretty sure git stores diffs not just whole files everytime.

chriswarbo · on June 13, 2017

I use QuickCheck for all of my testing whenever I'm using Haskell (along with tasty-quickcheck). Note that property testing can also be used for unit testing; just don't use any variables in the properties :)

The rule seems to be that about 1/3 of the problems it finds are legitimate bugs in the codebase, 1/3 are problems with the property (e.g. stating something a bit too strong), and 1/3 are problems with the data generators (e.g. not satisfying some invariant).

The latter 2/3 can either be seen as overhead, or as fixing bugs in one's understanding of the codebase. Usually it's somewhere in between.

IanCal · on June 13, 2017

I also find that at times the middle third, where your statement is a bit strong, it can show errors in the specification. I've had this before when developing a quickcheck for actionscript3, in trying it out on a library I thought was well tested one of the problems I found was that a general rule I thought should have held didn't. We actually had an inconsistency in the specification, and had coded to that, which caused an odd user experience in the edge case.

joshlemer · on June 13, 2017

I've done a fair bit of testing with ScalaCheck. I do think it's useful to define your "generators" for your domain model(s), because without them, unit tests can end up being mostly data creation code. With your generators out in their own (testing) package, and named appropriately, you can get quite a bit of mileage out of them, even if you don't take advantage of the property-based testing framework beyond that. Something like..

    object Generators {

      val commentsGen: Gen[Comment] = ???
      val userWithTwoCommentsGen: Gen[User] = Gen.listOfN(2)(commentsGen).map(comments => new User(comments))

      val userWithNoCommentsGen: Gen[User] = Gen.const(new User(List.empty)
    
      val userGen: Gen[User] = Gen.listOf(comments).map(comments => new User(comments))
    }

And then your test code can get right to the point so to speak

    "Users with comments" should "karma equal to the sum of their posts" in {
      val users = Generators.userGen.take(10).foreach(u =>
        u.karma should be (u.comments.map(_.points).sum)
      )
    }

thijsvandien · on June 13, 2017

It looks like what you really needed is a fixture factory.

estsauver · on June 13, 2017

In a lot of ways, generators are like a cousin of factories. Fixtures are usually static though, and so they don't help you explore the problem space in the same way as generators do.

anentropic · on June 13, 2017

sounds like https://factoryboy.readthedocs.io/en/latest/ (for Python, based on a Ruby lib)

I never thought about the connection of that with Quickcheck

nshepperd · on June 13, 2017

I never really understood the arguments against "non-deterministic" testing. A property based test will check more cases, and therefore detect more bugs, than the corresponding hand-chosen unit tests (in which you probably forgot at least 2 of the important corner cases).

All determinism guarantees is that you'll either reliably detect a bug or reliably not detect a bug, and hence get a nice aesthetic line of green checkmarks followed by red crosses in your test history. Or just get a nice looking line of green and not even know the bug is there. This seems like a small thing to sacrifice to detect more bugs.

Of course, regular unit tests have their place too, when your specification can't really be expressed as a "forall" property.

IanCal · on June 13, 2017

> All determinism guarantees is that you'll either reliably detect a bug or reliably not detect a bug, and hence get a nice aesthetic line of green checkmarks followed by red crosses in your test history. Or just get a nice looking line of green and not even know the bug is there. This seems like a small thing to sacrifice to detect more bugs.

The situation you want to avoid is this:

Run tests, find bug.

Try and fix bug.

Rerun tests, tests pass.

Deploy fixes to production.

Later find out that the bug is not fixed, your second run of tests simply didn't hit the right combination.

sethammons · on June 13, 2017

We print the rand seed and allow tests to take a set seed. This allows us to use random generated data and still have deterministic results for reproducing failed cases.

IanCal · on June 13, 2017

Yep, this is a really useful approach, and I highly recommend anyone here who is thinking of trying this stuff out to take a note of the seed (or ensure it's printed) as the first time you realise you want to do this it's often too late!

henrik_w · on June 13, 2017

I've used the same technique when running randomly generated traffic when testing telecom systems. Very useful to be able to re-run the same random sequence!

moomin · on June 13, 2017

It's the second case that's more important: if you find a bug in production, and the test that covers that functionality is green, then there's a problem with your test suite.

Truth is, there's no need for an either/or holy war about this. The two styles are easily mixable: property based testing for broad coverage, specific runs for specific coverage.

zmonx · on June 13, 2017

A few good points have already been made in the other responses, and I want to add an additional observation: To me, the core of the issue is not in choosing between "non-deterministic" testing vs. a tiny set of hand-chosen examples.

Rather, to me, the issue is exhaustive testing vs. non-exhaustive testing. Typically, critical issues in complex systems only arise extremely rarely. Consequently, randomized testing typically does not bring them to light. There may definitely be cases where it is useful, and it can certainly easily be more exhaustive than a small hand-picked set of unit tests. But the core issue is still that I typically need exhaustive tests to be more certain about relevant properties.

mattnewton · on June 13, 2017

I've found quickcheck fantastic when building anything that takes user input. It will find some amazingly specific strings that break that one library you use never thought to test, or code that unexpectedly shells out.

But it isn't the answer to everything- sometimes you really want to test just one case.

chadaustin · on June 12, 2017

It's not one or the other, IMO. They're both useful tools. Unit tests for the bulk of my testing, and then property tests to do some light fuzzing against properties I want to hold.

nickpsecurity · on June 13, 2017

One thing to remember when using an assertion-based method is what it can be used for:

1. Unambiguous specifications [vs human language] of exactly what the function expects, should maintain, and should output. This is basis of Design-by-Contract method of verification in Eiffel (original), Ada, and SPARK.

2. Ability to check inconsistencies on that if a formal, specification tool. An example is where unique types are defined for miles and kilometers that require a conversion before used in same-sized variable. Might have prevented a rocket from being destroyed.

3. Ability to formally prove through automated or interactive means that your code has those properties in all cases rather than whatever you tested for or your fuzzer came up with. Frama-C, Java w/ JML, and SPARK Ada can do this with SPARK the champ so far. It's hard and limits how you express the problem or what problems it can handle but with highest payoff in correctness. Anything unproven can be handled by a runtime check the tool might even insert for you.

4. Academic and commercial tools exist that can automatically generate tests from those specs. So, you get tests covering whatever you can specify automatically with no human effort. You can use your brain on stuff the tools can't handle.

5. Ability to do equivalence checks more easily on annotated (i.e. constrained) programs can help show optimizations or compilations didn't break your program.

6. The annotations might also be used for synthesizing programs or optimizing them even more than usual.

So, these are some benefits that come from leaning toward formal specs of code or within code instead of just manual, unit testing.

lacampbell · on June 13, 2017

I'm a big fan of design by contract - at least how it's implement in racket (blame at module boundaries and higher order contracts). I wonder how quickcheck type testing slots into the picture.

mmjaa · on June 14, 2017

I don't see it as a 'conversion' per se, but I've been doing something similar to property checking for years in various tests, and therefore I don't think this is necessarily a new 'discovery' but more of a refinement of a codification of an existing, long-standing (decades) practice.

dudul · on June 13, 2017

I use ScalaCheck all the time. I love defining my tests in terms of properties, and in some cases have a real, mathematic proof that my code does what I want (only possible when the domain of possible values is small enough, that's true).

frou_dh · on June 13, 2017

In Go's standard library there is a basic QuickCheck facility. But apparently it has only been imported by 20 opensource packages ever (see bottom of page):

https://godoc.org/testing/quick

sethammons · on June 13, 2017

We've used it in at least one internal project. I found it to lessen readability and make maintenance harder. It could be the way things were set up. I like the basic idea, but that lib just wasn't doing it for me. We got close to something usable with test case generators where you replaced the intended bad value(s). There ended up being too much indirection for my taste for too little reward.

frou_dh · on June 13, 2017

Actually, that 20 number probably relates to oddball uses of it outside of *_test.go files. Adoption in tests is likely decent after all.

tosh · on June 13, 2017

You might also want to look into https://clojure.org/about/spec