
Supporting Hypothesis - darwhy
https://stripe.com/blog/hypothesis
======
laughinghan
This is awesome, I'm excited to see more attention being paid to property-
based testing, and MacIver deserves to be rewarded and supported for his past
and ongoing work on Hypothesis.

When I first encountered property-based testing, I had some trouble coming up
with interesting properties to test, since I was so used to thinking in terms
of individual test cases. This blogpost surveys some great starting points for
identifying useful properties to test:
[https://fsharpforfunandprofit.com/posts/property-based-
testi...](https://fsharpforfunandprofit.com/posts/property-based-testing-2/)

~~~
m-j-fox
Sure but it's a lot of hype for porting QuickCheck to Python. So glad the rest
of you caught on 10 years later.

~~~
StavrosK
If only you had taken the initiative to do it earlier!

------
StavrosK
Hypothesis is one of the libraries I _aim_ to be using in every project. If my
application doesn't allow for that (because it's all views/HTTP/etc), I
refactor various things into their own (side-effect-free functions) so I can
then use Hypothesis.

Plus, it's the only thing that doesn't suffer from the "you can't find the bug
you just introduced" disease, where you're (by definition) blind to the
problems you don't know are there. We usually tend to test the things we wrote
the code to guard against, but Hypothesis finds the actual bugs.

Great work.

------
fermigier
The python code in the post looks a bit funny with the `end` statement at the
end. I had to ready it twice to be sure that Hypothesis hadn't been rewritten
for Ruby!

Also, by "supporting Hypothesis", you mean "giving money to DR MacIver",
right? This is not completely clear from the post.

~~~
slackoverflower
Lol. Not that surprised since Stripe's backend is basically still Ruby on
Rails. Probably a Ruby engineer that wrote the article, writing Python for the
first time in a while.

~~~
sritchie
I'm the guilty party, though not of the "Ruby engineer" title... these days
I'm a Python developer that identifies as a Scala developer :)

I botched the copy pasta of the embedded code snippet container. All fixed up!

------
hprotagonist
this is great news.

Hypothesis is a quickcheck port to python, and it is immensely useful in
finding edge case bugs that naive testing usually misses.

It's lovely to see it get more support and backing.

~~~
DRMacIver
> Hypothesis is a quickcheck port to python

If you'll pardon the pedantry (I think the distinction matters, honest),
Hypothesis isn't really a QuickCheck port. It started out life that way, and
it can be used in a way that that is pretty compatible with QuickCheck, but
implementation-wise it's very different and it has a bunch of interesting non-
QuickCheck features.

I totally agree that this is great news though, and I'm very grateful to
Stripe. :-)

~~~
bennofs
> very different and it has a bunch of interesting non-QuickCheck features.

I'm only familar with QuickCheck, but what are those differences / features?
Couldn't tell from a quick look at the website of Hypothesis.

~~~
DRMacIver
Internally, Hypothesis is built on a core engine called Conjecture, which
looks more like a byte stream fuzzer than anything else. Hypothesis data
generators are then effectively parser combinators on top of that fuzzer.

This gives Hypothesis a lot more freedom to generically manipulate the data
than QuickCheck has, because it has total control of a concrete representation
of it.

As a result:

* Specifying a data generator is a much more declarative process - Hypothesis largely handles size/distribution issues itself.

* The fuzzing nature of Hypothesis means that it can do much smarter generation than a normal QuickCheck (though this is more potential than actual right now. The current fuzzer is "pretty good". I'll be starting a PhD soon where I hope to work on making it amazing)

* Shrinking is just a built in part of the process and users never have to define their own shrinker.

* All examples (both pre and post shrinking) can be serialized because the IR is just bytes, so Hypothesis can replay failing tests automatically without rerunning the shrinking process. This matters both because shrinking tends to be slower than generation in most QuickChecks, but also because in general shrinking is subject to "slippage" where the bugs you find after shrinking are different from the bugs you started with.

* Because you can just keep drawing data from the stream, Hypothesis tests can be much more interactive than normal QuickCheck (which does support mixing test execution and generation, but it doesn't work especially well).

* Because Hypothesis never needs to touch the values it generates, Hypothesis is much better for mutable data than the approach most QuickCheck takes.

* I haven't actually done this yet, but Conjecture is deliberately designed to be quite C like, so "at some point" it will become possible to rewrite the core in C (more likely Rust) and then new Hypotheses will spring up everywhere because 90% of the work can be done by writing bindings to a C library. QuickCheck's approach on the other hand is inherently very tied to the semantics of the host language.

The property-based testing libraries for some dynamic languages (test.check
for Clojure, the various Erlang ones including QuickCheck) have some of these
features, but I think they're more naturally supported in the Hypothesis
model, and the Hypothesis model has a lot more room to grow.

~~~
feanaro
I understand what you mean on most of these, but I'd appreciate some
clarification on the following.

> The fuzzing nature of Hypothesis means that it can do much smarter
> generation than a normal QuickCheck (though this is more potential than
> actual right now. The current fuzzer is "pretty good". I'll be starting a
> PhD soon where I hope to work on making it amazing)

Smarter in what way?

> Because you can just keep drawing data from the stream, Hypothesis tests can
> be much more interactive than normal QuickCheck (which does support mixing
> test execution and generation, but it doesn't work especially well).

What makes QuickCheck's support for this makes it work less than ideal?

> Because Hypothesis never needs to touch the values it generates, Hypothesis
> is much better for mutable data than the approach most QuickCheck takes.

What do you mean by not having to "touch" the values it generates?

~~~
DRMacIver
> Smarter in what way?

Better at generating values that exhibit interesting behaviour in tests.
QuickCheck style testing mostly only works well because it turns out that
there are a lot of bugs that are relatively "dense" in the search space of
tests, and doesn't do very well at finding hard to reach bugs. This means that
some bugs are found with very low probability, which is bad both because you
want to find them reliably and because it means that even running your tests
for longer is often not enough to find interesting behaviour.

(ETA: This mostly applies to Haskell QuickCheck and derivatives. I believe the
Quviq Erlang QuickCheck has had a great deal of hand tuning of its generators
to get better behaviour, so its data generation is probably strictly better
than Hypothesis's in many cases right now. I'm trying to get a generic
mechanism for improving things without requiring this hand tuning, so
hopefully at some point I'll be able to reverse that situation)

Security-oriented fuzzers do a lot of clever things to actually determine the
shape of the search space and adapt to it so that they can take advantage of
the structure of the program under test so that running for longer gives them
more power than just repeatedly trying the same thing over and over again. I'm
hoping to incorporate some of those ideas into Hypothesis but don't currently.

> What makes QuickCheck's support for this makes it work less than ideal?

Mostly that it plays very badly with shrinking (this is better but still not
good in test.check and friends) and the API for it is on the clunky side.

> What do you mean by not having to "touch" the values it generates?

In classic quickcheck, the shrink API is based on taking a value and replacing
it with a simpler version of itself. This means that if the value has been
mutated (which is mostly not a problem in Haskell, but can be if you're using
ioProperty, and is definitely a problem in QuickCheck ports to impure
languages) then you run into problems. e.g. a test that appends an element to
a variable sized array argument can get the shrinker into an infinite loop.

It also means that QuickCheck is limited by the type constraints on the
generated values. You can't e.g. do duplicate detection because you aren't
constrained to generate values on which that is meaningful.

In Hypothesis in comparison everything is based off its IR, so it can do
manipulations and comparisons on that, and it doesn't matter what type the
generated value is.

(Hypothesis also can't do perfect duplicate detection because it has the
problem that many IR values may map to the same value, but its duplicate
detection still seems to be mostly good enough in practice)

------
tommikaikkonen
Hypothesis has been highly useful in the team I work in. It is also one of the
most robust property-based testing library in any language. Thanks to the
project contributors and Stripe for investing resources in making it even
better!

------
rawnlq
Is this similar to fuzz testing? (like american fuzzy lop?)

~~~
laughinghan
Fuzzing can refer to randomly generated test inputs without bothering to
minimize them when bugs are found. Property-based testing tools like
Hypothesis and QuickCheck (which pioneered this approach), in addition to
randomly generating test cases, when they find a test case that violates an
assertion, they then start reducing the test case to try to find the minimal
test case that violates the assertion.

Also, while they're very similar in a mechanical sense since they're both
about randomly generated test inputs, in a broader sense my impression is that
fuzzers have a very different focus from property-based testing. I think fuzz
testing typically refers to testing very complex software, like compilers,
interpreters, and virtual machines, and searching for crazy edge cases,
especially ones that may expose security vulnerabilities. So a lot of effort
goes into generating test inputs that are unlikely to happen with a naive
testing strategy (automated or manual), yet still interesting and relatively
likely to trigger a bug (rather than spending too much time in the very large
space of uninteresting test inputs that a naive testing strategy wouldn't come
up with but also is unlikely to trigger a bug).

By contrast, I think property-based testing is typically used for testing
programs that are of ordinary complexity, and quickly finding bugs due to
simple but common programmer errors, which can then be quickly fixed. So a lot
of effort goes into reducing any bugs that are found to the very minimal test
case, which if the test input type is a nontrivial data structure can be
surprisingly involved. MacIver has actually written a series of blogposts on
Hypothesis' approach: [http://hypothesis.works/articles/compositional-
shrinking/](http://hypothesis.works/articles/compositional-shrinking/)

~~~
mononcqc
It depends. One of the places where property-based testing buys you the most
benefit is when testing systems that are conceptually simple, but use a
complex implementation.

In doing so, you tend to have clearly defined properties that are fairly
simple to validate, but a code implementation that may have very intricate
internal interactions that cause all sorts of bugs.

See
[http://htmlpreview.github.io/?https://raw.github.com/strange...](http://htmlpreview.github.io/?https://raw.github.com/strangeloop/lambdajam2013/master/slides/Norton-
QuickCheck.html) for example.

------
_raoulcousins
I've been looking for something like this, and I almost didn't click the link
because the title didn't sound interesting. Really great stuff.

------
dbcurtis
Great news! I love Hypothesis. I'm glad to see more support.

------
stirner
So, Prolog but in reverse?

------
bitmadness
So, Python gets Quickcheck... 20 years after Haskell

