
The Case for Formal Verification (2013) - dllthomas
http://permalink.gmane.org/gmane.comp.encryption.general/14818
======
dkarapetyan
Yes on all counts. I never understood why the TDD culture was happy to write
down a complicated function and then only verify that on input 2 the output
was 4. It always seemed backwards to me especially when you could have just as
easily verified that in the REPL and called it a day. To me TDD on its own is
just glorified documentation and tells me nothing about the actual properties
of the software. Formal proofs and verification on the other hand is
definitely something everyone should be striving towards especially for
foundational components of computing, e.g. compilers, virtual machines,
kernels, security/network protocols, etc.

~~~
Pacabel
Your description of TDD is a pretty blatant misrepresentation of the actual
philosophy and techniques.

Furthermore, I think that you're ignoring that TDD tries to be a practical
solution to certain real-world problems. This means that trade-offs are made
in the name of creating software that works sufficiently well, while keeping
expenses in check.

Sure, extensive formal verification of all software would probably bring some
benefit in terms of correctness and perhaps security. But you're not
considering the cost involved.

So far, such verification has only proven to be possible in rather limited
situations, namely where there are significant resources available, or in
academic exercises. Maybe this will change in the future, but in the present
it's generally far more cost-effective to use a statically-typed programming
language along with some form of automated testing. That gives many of the
benefits of more formal methods of verification, but while also avoiding many
of the costs, even if the end result doesn't conform to some theoretical idea
of "perfection".

~~~
anaphor
The QuickCheck model is at least as good as unit tests and is more likely to
_actually find bugs and corner cases_. The resulting code is no more complex
either. The only drawback I see is that it requires you to design your code in
a very modular way that makes things amenable to automated input generation.

~~~
adrianhoward
Yes, but unit tests and TDD are not synonymous. TDD is a design technique
first and a testing tool second. It's goal is to get you to a good design
place, not produce a unit test suite - let alone one with 100% coverage.

I love declarative testing tools like QuickCheck. They're excellent. I still
test-drive my code though.

------
userbinator
I think one of the biggest barriers to formal verification is that it is, for
lack of a better term, "too formal". There's a bunch of other terminology,
language, etc. you have to learn, and the learning curve is steep. For someone
who just wants to prove a few pre/post-conditions on some functions (that's
why I looked into Coq originally - and gave up because it was too hard), it's
too much. There is a feeling that it is too theoretical. I eventually found it
easier to prove what I needed to, manually.

~~~
jnbiche
Bingo, unfortunately. I've been able to learn functional programming in
Haskell and OCaml without any formal academic background, but looking at Coq,
Agda, Idris, and F*, I despair of learning them without more accessible
tutorials or a PhD in computer science concentrating on type theory.

My first intro to Haskell was through "Learn you a Haskell". I think without
that type of introduction, I never would have progressed past basic pattern
matching and folds/maps/filters. I need something similar for dependent
typing.

~~~
gsnedders
There's Learn You An Agda at <[https://github.com/liamoc/learn-you-an-
agda>](https://github.com/liamoc/learn-you-an-agda>), though it never got that
far. (I've done comparatively little with theorem provers (v. model checkers),
so I can't point you anywhere that useful!)

~~~
solomatov
All these tutorials won't teach you how to use coq or agda. The main problem
is that in order to use coq or agda, you need to learn Martin-Lof type theory
(or calculus of inductive constructions, which is a similar formalism to MLTT)
first, and learn to write code later. Otherwise everything will seem like a
magic. There are however, good books on the topic:

* Type Theory and Functional Programming ([http://www.cs.kent.ac.uk/people/staff/sjt/TTFP/](http://www.cs.kent.ac.uk/people/staff/sjt/TTFP/))

* Programming in Martin-Lof type theory ([http://www.cse.chalmers.se/research/group/logic/book/](http://www.cse.chalmers.se/research/group/logic/book/))

~~~
EzraVinh
But Haskell is also based on some type system, and yet Learn You a Haskell
teaches Haskell without formally teaching this type system.

I've been learning Idris and reading the HoTT book at the same time. I'm not
sure what it would have been like learning Idris without any formal type
theory, but I believe it would be possible.

~~~
solomatov
The difference here is that type system in case of agda and coq is the core of
the language, it's very similar to what operational semantics does with usual
programming languages. The type systems is a logic via Curry-Howard
correspondence, with which program correctness is proved. In case of Haskell
it's just a software engineering tool which helps you find errors in your
program in a semi-automatic way.

~~~
EzraVinh
I would say its a sliding scale, I don't think there is a sharp qualitative
difference between Idris and Haskell. Haskell programs can also be thought of
as proofs, namely proofs that the variables/functions you define have the
types you claim they have.

You mention Agda and Coq. Maybe one difference in our viewpoint is that Idris
really is designed for general purpose programming. E.g. you can write a
program with almost identical structure to a Haskell program.

~~~
solomatov
>I don't think there is a sharp qualitative difference between Idris and
Haskell.

Yes, there's no such a difference. However, in order to use Idris to its full
potential, you need to use dependent types. It's just like writing procedural
programs in object-oriented or functional language. It's possible, however,
it's not a very bright idea.

------
ramidarigaz
Relevant: I attended LambdaConf yesterday at CU Boulder, and there was a
_great_ intro workshop for Idris, which is in the same domain as Coq. Idris is
similar to Haskell (it's actually written in Haskell), but with a dependent
type system, a tactic-based theorem prover like Coq has, and a bunch of other
fun features.

~~~
tluyben2
Idris is really worth checking out if you have any interest in this kind of
stuff. I used Coq and now am playing with Idris and F*; the latter two feel
good and practical. With an FP background they are easy to pick up.

~~~
jnbiche
I've also been playing around with these two. I have a little Haskell and some
OCaml, but very little understanding of the mathmatics involved in dependent
typing, so I'm struggling but still very interested. Wish there were an
accessible guide like "Learn you a Haskell" for Idris.

By the way, for anyone interested in these two languages with dependent
typing, you can try out both Idris and F* online, without installing anything:

[http://www.tryidris.org/console](http://www.tryidris.org/console)

[http://rise4fun.com/FStar/tutorial/guide](http://rise4fun.com/FStar/tutorial/guide)

~~~
tluyben2
Cannot edit for reason, but I thought I would mention, as this is of vital
importance to me when i'm learning something new, that both Idris and F* are
open source;

[https://github.com/nikswamy/FStar](https://github.com/nikswamy/FStar)
(Apache) [https://github.com/idris-lang/Idris-dev](https://github.com/idris-
lang/Idris-dev) (do whatever, just retain copyright)

~~~
dllthomas
coq is LGPL, apparently.

------
a-saleh
As a q/a by profession, I allways wonder, how to have some better assurances
that the software I have does what it is supposed to.

I believe I have even incidentaly reimplemented quick-check on several
occasions.

When we were working in Clojure for a little while, I wondered if it might be
possible to combine contracts and logic programming, to verify that contracts
don't contradict each other on compile ... and then I realized, that for more
complex constraints I might need to solve halting problem.

On the other hand, I remember how much productivity I gained, after I wrapped
our json library in simple macro, that verified that the data I feed it
conforms to schema (that was before prismatics schema existed, or even
core.typed).

------
meowface
I had no idea sqlite put so much effort into testing. 84,300 lines of
production code with 91,452,500 lines of testing code is pretty damn crazy.

I assume a lot of that code may be auto-generated, but it's still impressive
regardless.

------
wsxcde
Coq is an interactive theorem-prover, which is exactly what it sounds like.
You prove your theorems more or less by typing out the proofs and the system
mechanically verifies that each step in your proof is sound. I've used Coq and
I'll be honest. This is unquestionably a solid way to prove things about your
program but it is too much of pain to expect this to have significant adoption
in the "real" world.

In the hardware world, there's been a lot of progress in automated
verification thanks to modern model checkers [1,2] (which incidentally build
on modern SAT, and in some cases SMT, solvers [3-6]). The nice thing about
model checkers is that you just specify the property you want proven and let
the verifier crunch away and it will (hopefully) come up with a proof or a
counterexample. This has been successful enough that there are companies like
JASPER and OneSpin which make money by selling hardware companies formal
verification tools.

I worked with JASPER's tools in the recent-ish past and one of the big things
they seem to have done is make the tool much more usable. With the JASPER
tool, it was much less of a pain to configure the model checker, abstract away
parts of the design, keep track of the properties specified and proven,
examine counter example traces and so forth than I was expecting. A lot of
this sort of thing doesn't get done in academic tools like ABC because it
doesn't count as research. But such improvements are extremely important if
you want to push adoption of formal tools in an industrial setting. And from
what I can see the emphasis on usability seems to paying off for JASPER.

Model checking in software has been less successful because the state
explosion problem is much more pronounced but there have been notable success
stories like Microsoft Research's SLAM project [7]. And I definitely think
there is an opportunity here to build upon the algorithmic progress in
automated verification in order to build tools that are much usable in a
software setting.

[1]
[http://ecee.colorado.edu/~bradleya/ic3/](http://ecee.colorado.edu/~bradleya/ic3/)

[2]
[http://www.eecs.berkeley.edu/~alanmi/abc/abc.htm](http://www.eecs.berkeley.edu/~alanmi/abc/abc.htm)

[3]
[https://www.princeton.edu/~chaff/zchaff.html](https://www.princeton.edu/~chaff/zchaff.html)

[4] [http://minisat.se/](http://minisat.se/)

[5] [http://fmv.jku.at/picosat/](http://fmv.jku.at/picosat/)

[6] [http://z3.codeplex.com/](http://z3.codeplex.com/)

[7] [http://research.microsoft.com/en-
us/projects/slam/](http://research.microsoft.com/en-us/projects/slam/)

~~~
rossjudson
I get what you're saying about the state explosion problem, but the article
specifically calls out the idea of proving a _lack_ of negative behaviors. It
seems to me it might be quite useful to be able to prove, for example, that a
program never reads memory at random, or that it never exceeds the bounds of
any allocated buffer.

That's a different problem scale than "prove the whole thing works as
specified".

~~~
wsxcde
Model checking deals with two kinds of properties - safety and liveness.
Safety properties effectively say nothing bad ever happens while liveness
properties say that something good will eventually happen. For example, "my
program will never crash due to a null point dereference" is a safety
property. "My arbiter module will output a grant for every input request" is a
liveness property.

It is true that model checkers are much better are proving safety properties
than liveness properties. I think it's not too far from the truth to say that
model checkers are no good at proving liveness properties in real designs and
that only safety properties work (somewhat well) in practice.

An alternative here is to abandon model checking altogether and focus on a
powerful static analysis. I think the main challenge here is coming up with
effective property specification schemes. A powerful type system like Haskell
does in fact enable you to prove quite strong statements about your program.
But you are inherently limited in terms of what you can prove to whatever it
is that the type system can express. To me, it seems that model checkers allow
more flexibility in specifying your property, especially when you take into
account the fact that you can do your model checking on an
augmented/instrumented version of your design.

> That's a different problem scale than "prove the whole thing works as
> specified".

On a vaguely related note, equivalence checking between designs, especially in
the hardware context, is one thing that formal tools have had a lot of success
with.

~~~
lmm
> you are inherently limited in terms of what you can prove to whatever it is
> that the type system can express

Does that actually limit you? E.g. I can imagine using a monad-like structure
in Haskell to construct things like "procedure guaranteed to terminate in <k
primitive steps".

~~~
chriswarbo
Well, considering that the article talks about Coq, which is completely built
around a type system (plus a termination checker), it's not that limiting.

------
Igglyboo
Can someone explain why CompCert is not more popular? A formally verified
compiler seems to be much more useful than something that can produce bugs.

~~~
cwzwarich
Even ignoring the licensing issues and limited subset of C supported, the code
quality doesn't approach that of modern optimizing compilers, and the
correctness proof doesn't include things like concurrency or actual machine
memory models (although forks of CompCert exist for some of these), so you're
still off in the unverified world if you rely on them in your code.

Also, for the vast majority of programs the possibility of bugs in the
compiler is not really that impactful in terms of total effect on reliability.

~~~
Igglyboo
The article said something about being as good as GCC with O1, is that not
enough? Pardon my ignorance, I've never touched c/c++ (python guy).

~~~
betterunix
1\. O3 is way better.

2\. GCC is not even the best optimizing compiler out there. Intel's C compiler
puts GCC to shame for some tasks.

------
comex
Somewhat off topic and very speculative, but I'm curious how feasible it would
be to propagate safety proofs through compilation - not just formulaic memory
safety rules but hopefully also arbitrary behavioral proofs - all the way down
from a source language to machine code, so that essential properties could be
formally verified without needing to either trust a compiler or use a provably
correct one, in the latter case with corresponding difficulty of modification
and low optimization level. The compiler would still have to be modified to do
the propagation, and a machine code model and verifier constructed, but
theoretically this would be easier than proving the whole thing works
correctly.

I guess Typed Assembly Language works along these lines:

[http://www.cs.cornell.edu/talc/](http://www.cs.cornell.edu/talc/)

but I haven't read up on the papers, and it seems outdated.

My imaginary end goal (not that I'd be able to do anything remotely as
ambitious myself, but I still like to think about it) is an operating system
where all code is run in kernel mode after being checked for safety - like
Singularity OS but without trusting a compiler.

Perhaps that trust doesn't actually matter very much, since the compiler is
unlikely to contain too many exploitable bugs (AFAIK most Java vulnerabilities
are not related to the JIT, for instance), and there are plenty of other
places in such an operating system bugs could hide anyway. But it's inelegant
to require all code to go through a single compiler. For example, it would be
cooler if the assembly verifier were not baked into the system, but simply a
program proven to correctly check whether some code is safe in the machine
code model; if you (any user) could prove a JIT never generates unsafe code,
you could submit the JIT in place of the verifier, and run wild with it
without going through any slow compilation or verification processes.

~~~
anon4
I think you're not thinking far enough. Why not have a compiler that can take
your specification and just write your program for you? You'll probably need
to write some code yourself, which will then interface with the written-from-
formal-proof code a-la quark.

My intuition is that a specification that can be checked and is good enough to
guarantee that your program is 100% correct should be enough to compile a full
program from, possibly with some hand-written lower-level code for guidance so
it doesn't fall in pathological cases like "the empty program satisfies these
constraints and is easiest to generate, so here".

~~~
chriswarbo
> Why not have a compiler that can take your specification and just write your
> program for you?

One difficulty with this is that programs (by definition) are 'computationally
relevant' whereas proofs are not. In other words, as long as we have a proof
of X it doesn't make a difference _which_ proof we happen to have. On the
other hand, different functions of type X can have a big impact on a program.

For starters, there are properties which are difficult to express using types.
For example, we only have rudimentary ways to encode space and time usage (eg.
'cost semantics'). Without this, when we ask for a sorting algorithm we may
get back bubblesort, since it's a perfectly acceptable implementation of a
sorting algorithm.

Also, our types will have to become _incredibly_ precise. Rather than just
encoding the properties we care about (eg. security guarantees), we need to
include lots of uninteresting properties to guide the computer to what we want
(compare this to guiding a genetic algorithm via a fitness function, or
getting a genie to grant you a wish in exactly the way you want). At this
point, you're basically writing your program in a very indirect way; you may
be better off just writing one or two lines 'manually' instead of trying to
steer the automated process.

------
Confusion
Formal verification and testing are not mutually exclusive and should ideally
both be used. If you read this article and conclude that formal verification
is the way to go and writing tests is unnecessary, then you are failing to
appreciate the concessions Metzger makes. Take e.g.

    
    
      but Quark's formal verification doesn't try to show that
      the entire Web browser is correct, and doesn't need to --
      it shows that some insecure behaviors are simply
      impossible. *Those* are much simpler to describe.
    

Let's assume this is true: we can write interesting programs of relevant size
and complexity and _prove_ they are secure. Then we still need a whole bunch
of tests to show the program actually does what its users want it to do,
because formally specifying _that_ behavior is hard.

------
jarrett
Will formal verification ever be possible for systems composed of a bunch of
heterogeneous components working together? I'm thinking of a web application,
where the software's behavior depends on the interaction of client-side
scripts, stylesheets, server-side programs, databases, caching layers, and
probably other components, all operating in the request-response cycle that
fragments behavior into a bunch of separate runs of the various programs.

~~~
gsnedders
Certainly — it's just another parallel system. This is something things like
CSP (Communicating Sequential Processes) — and see FDR2/FDR3 for programmatic
verification — excel at.

------
jderick
Let me also recommend any interested in theorem proving check out ACL2. It is
based on lisp and might be easier to get started with than Coq. There is a
pretty cool website that lets you run it in a browser here along with a basic
tutorial:

[http://tryacl2.org/](http://tryacl2.org/)

------
shlomifr
This is very relevant in the light of Heartbleed. Core security code should be
verified formally, there is no other way to guarantee correctness of
implementation.

~~~
jeffreyrogers
If my understanding of model checking is correct (which I believe is the
primary method of formal verification) I don't think this would help much in
the case of Heartbleed. The Heartbleed error had to do with an expression
being in the wrong scope if I recall correctly. (I think someone forgot to put
brackets around an if statement that contained two expressions).

In this case the model would be correct, but the implementation would be
wrong. So I don't think formal verification would be of much help. That said,
I think there are a number of static analysis tools that would pick up on the
error, so a combination of approaches would work.

And of course, it would be great if we could verify that our security critical
code was sound in theory, even if we can't necessarily verify that our
implementation is free of coding errors, so I agree with your main argument.
Of course, whether we're at the point where doing this verification is
feasible in practice is another matter unto itself.

~~~
jderick
It is also possible to apply model checking or theorem proving directly to the
implementation. Doing so would be able to catch any sort of error that a
static analysis tool would. Of course, static analysis typically scales better
and would be a good place to start for catching this type of error.

~~~
jeffreyrogers
Ah okay, thanks for clearing that up, I wasn't aware that was possible.

