
Proofs and Refutations Using Z3 - yminsky
https://blog.janestreet.com/proofs-and-refutations-using-z3/
======
nanolith
Model checkers are amazing tools. I've been using one for C development for
the past two years, and with the model checker, I feel more confident about
code I write in C than I do about code in many other languages.

That being said, model checkers find counter-examples. This is not the same as
a formal proof. Just because a counter-example cannot be found does not mean
that a given property has been proven. It is _extremely_ important to
understand that point. Model checking, combined with unit testing, is a
formidable tool that should be used whenever possible. But, don't assume that
model checking is the same as a proof. The subtle difference does matter, and
it can bite you.

It is possible to write formal proofs about software, using Calculus of
Constructions, Separation Logic, Hoare Logic, etc. However, this is much
harder than using a model checker. For 95% of applications, a model checker is
good enough.

~~~
ahelwer
To the extent the model corresponds with reality, a given property _is_ proven
due to lack of counterexample.

~~~
nanolith
To the extent that the model checker is complete and error free, and to the
extent that that the model corresponds with that under instrumentation.

I've been far down that rabbit hole, and I've found cases where certain model
checkers fail to find counterexamples that other model checkers find.
Sometimes, this is due to errors in one or more of the checkers. Sometimes
this is due to differences in representation. But, ultimately, the argument is
epistemological. There is a corollary to the phrase: "Absence of proof isn't
proof of absence": absence of a counterexample isn't an absence of a
contradiction.

As I've said, I love model checkers. But, they are not perfect, nor are they a
panacea.

~~~
seattleeng
That's very interesting, do you happen to have any toy examples where two
model checkers differ?

~~~
OscarCunningham
The other day I found an error in the Glucose SAT solver, but only when
running with one of the options (-rcheck) changed away from its default
setting. Glucose produced an alleged satisfying assignment that didn't
actually work.

So it's possible for mainstream SAT solvers to have errors, but I imagine
their default configurations are very thoroughly tested.

~~~
schoen
It's interesting that the SAT solvers haven't put in a check at the very end
to confirm that satisfying instances really satisfy the constraints, since
that step is of course supposed to be the radically easy one! I guess their
authors have had a lot of (usually well-placed) confidence in the solvers'
logic and correctness.

~~~
OscarCunningham
When I told them about it they said that the -rcheck code was inherited from
MiniSAT. Since it's also not enabled by default it's understandable that it
hadn't been as thoroughly checked as usual. I wouldn't blame them if they just
dump it instead of fixing it.

But yes it is strange that it doesn't verify its outputs. On the other hand
SAT solvers want to be as fast as possible and there shouldn't be a need to do
it if the solver is operating correctly.

------
ahelwer
Interesting coincidence! I just wrote a post on checking Azure firewall
equivalence with Z3 two days ago: [https://medium.com/@ahelwer/checking-
firewall-equivalence-wi...](https://medium.com/@ahelwer/checking-firewall-
equivalence-with-z3-c2efe5051c8f)

This post deals with a significantly harder problem, though.

~~~
BruceM
Have you seen [http://www.margrave-tool.org/v3/](http://www.margrave-
tool.org/v3/) which builds on Kodkod (which came after Alloy...)?

------
jwilk
If you're not a fan of lispy languages, Z3 has bindings for .NET, C, C++,
Java, OCaml and Python.

For example, Python version of the “x + +0 = x” check looks like this:

    
    
      from z3 import *
      s = SolverFor('QF_FP')
      x = FP('x', FPSort(11, 53))
      z = fpPlusZero(FPSort(11, 53))
      r = RNE()
      s.add(Not(fpAdd(r, x, z) == x))
      print(s.check())
      print(s.model())
    

(Except that I hardcoded rounding-mode, because I couldn't figure out how to
make an unknown one. :-/)

------
seattleeng
Does anyone know of any good resources for high-level guidelines on how to
incorporate model checking into a typical software development flow? I've only
used TLA+ in my own time for some fairly basic modeling of simplified service
interactions at work. The team I'm on is soon inheriting a fairly complex
codebase that is one of the core backend services at my company. At a high
level, it is responsible for kicking off a "state machine" (which spans a few
backend services) that ultimately updates a single "item", which is the
smallest unit of data we care about. I'd be interested in spending a weekend
or two with getting a rudimentary model of it up and running to aid in tech
spec writing of new features and perhaps documenting possible bugs in the
entire state machine flow. As it stands today, the state machine spans
multiple services and is ill defined so I'll be diving into that for
documentation purposes regardless, and it seems like creating a codified model
simultaneously won't be too much overhead (at the very least, it could be
fun).

~~~
nickpsecurity
You can combine the model with property-based testing of that code to
essentially try your model on the code and vice versa. You might also encode
some of your expectations in there as preconditions, invariants, or
postconditions that are checked at runtime during the tests. Throw a fuzzer
like AFL at them.

Hwayne has write-ups on both of those concepts:

learntla.com

[https://hillelwayne.com/post/pbt-
contracts/](https://hillelwayne.com/post/pbt-contracts/)

Here's one on contracts that even your management might like:

[https://www.win.tue.nl/~wstomv/edu/2ip30/references/design-b...](https://www.win.tue.nl/~wstomv/edu/2ip30/references/design-
by-contract/index.html)

~~~
seattleeng
Property-based testing seems to be an interesting idea and fairly easy to
trial and then recommend to my teammates. The js library I stumbled upon even
has TypeScript support and plays nicely with mocha, both of which are directly
relevant to my work
([https://github.com/jsverify/jsverify](https://github.com/jsverify/jsverify))!

