

Proving a Computer Program's Correctness - billswift
http://www.schneier.com/blog/archives/2009/10/proving_a_compu.html

======
jgrahamc
Yes, this stuff is hard. My doctoral thesis was about using formal methods to
prove that a system met a security policy. Specifically, I developed an
algebra for reasoning about a system and creating its code directly from a
specification.

If that sort of stuff excites you it's here: <http://www.jgc.org/blog/jgc-
thesis.pdf>

~~~
billswift
Actually, Edsger W Dijkstra advocated this stuff for many years. Almost all of
his writings are collected on his home page at the University of Texas,
<http://www.cs.utexas.edu/users/EWD/> . There have been some advances since,
but as the Schneier article shows, not much that is very useful yet.

------
makecheck
It seems to me like a big part of this would depend on a stable foundation.
And in the software world, that foundation shifts _a lot_. Last I checked, no
one in the industry had come together and agreed on a list of programs that
would _never be changed again_. And even if you _could_ stabilize it all, how
long would it be before you're forced to port it to new hardware?

The moment anything shifts, any validation work you've done is called into
question. Okay, for 30 seconds, your program was proven correct; but how about
now? Did the change cause a problem? And how, exactly, would you detect it,
without repeating the validation entirely?

To me, this does underscore the importance of keeping programs small, stable,
and well understood. If anyone's bragging about how many millions of lines of
code they have, as Microsoft put in a bullet point for Windows once, they
should not be considered serious programmers. The bigger something is, the
less it takes to introduce one of these unvalidated quirks.

------
gdp
Extrapolating the effort involved in proving huge numbers of complex safety
properties for an operating system written in C to programs in general is a
bit silly. Doing formal verification in more abstract languages gets
considerably easier, and considerably easier to mechanise large chunks of the
work (a Hindley-Milner type system is a kind of proof of a safety property,
for example, and that's totally mechanical).

So it might take 200,000 years to verify Windows, but considerably less to
verify an equivalent amount of non-operating system Haskell code, for example.

~~~
billswift
The language specification and the compiler would have to be proven correct
first. Which is increasingly difficult with higher level languages and with
optimizing compilers. The critical aspect is that the MACHINE CODE has to be
correct. That is one of the reasons the problem is so hard.

~~~
gdp
For this project, that's assumed. Provided you have some kind of formal
semantics, you can define translations. And in fact, Adam Chlipala's work on
certified compilers show that it's not impossible.

<http://adam.chlipala.net/papers/CtpcPLDI07/>

(there are subsequent papers too, but that's probably a good place to start).

------
gruseom
I've never gotten this. How do they know there aren't bugs in the proof?
Mathematical proofs often have bugs. They're caught by inspection by other
mathematicians. Sometimes it takes years before that happens.

The problem is that most proofs about code are likely to go without much
inspection, just as most code goes without inspection.

~~~
jerf
Well, it works exactly like mathematics. You start with a small kernel that
seems likely to be true, then you build on top of that with transforms likely
to be true, and at the end you end up with a proof statement that is fairly
likely to be true.

Then you sit and pound on the core axioms and transformations to make sure
they are correct. This is impossible, in the strictest sense, but it is often
practical.

All programs are mathematical proofs of one sort or another, so in the end,
it's the same problem that _every_ programmer faces: How do I know this
program isn't buggy? Well, you don't, and yet, here we are, with broadly
functioning programs of great complexity. Ultimately, going through a rigidly
mechanical proof is a quantitative change, in that it takes your confidence in
the lack of bugs from less than 100% to another value much closer to 100% but
still lesser, but as I like to say, a large enough quantitative change _is_ a
qualitative change. So while it is true that the proof may not be 100%, that
doesn't mean it would be useless.

Also, it should be pointed out that in the real world there is a real error
rate that real CPUs have; a percentage of computations will go wrong. Pushing
your confidence of your algorithms below that point is largely useless. In the
real world, the difference between 10 or 15 9s and actual, factual 100% isn't
actually relevant, so the fact that the proofs can't get to 100% is a much
lesser concern than you might think. It would only matter if we had 100%
reliable computers in every other way. I think I can safely say that the
dominant problem in computing today is not unreliable hardware, though, it's
bugs in our algorithms, so this is the right approach. (Note I said _dominant_
; massive server farms have to worry about unreliable hardware, but even then,
you'll be handling that with algorithms and code and your dominant problems on
that front will be bugs in the code.)

~~~
gruseom
But you didn't answer my point: mathematical errors that escape the original
prover are caught by inspection by other mathematicians, and one thing we
already know about the software business is that inspection is an extremely
scarce resource relative to the amount of code produced.

~~~
jerf
Oh, well: "Verifying the kernel—known as the seL4 microkernel—involved
mathematically proving the correctness of about 7,500 lines of computer code
in an project taking an average of six people more than five years." Observing
that that is not practical was pointed out by Bruce, too. In this case there
_was_ a lot of examination.

(Frankly, sounds pretty dull to me.)

------
jah
If the correctness of a program requires the ability to terminate, how do
these proofs get around the Halting Problem?

~~~
MaysonL
You can't, for the general case, prove whether or not a program terminates.
For many programs, you can.

------
rpcutts
Ah reminds me of when I used to get to play around with B.

------
edw519
_...said for the first time a team had been able to prove with mathematical
rigour that an operating-system kernel..._

In this case, exactly what does "mathematical rigour" mean? I imagine there
are several responses to this question, but even a conceptual overview of how
math can prove bug-freeness would have been nice.

~~~
gdp
As has been said before, it's not about being "bug free" - it's about a number
of safety properties always holding as long as the assumptions are true.

The safety properties might be things like who can write to what memory, and
when it is acceptable for bits of the kernel to do certain things. Lock
freedom in scheduling is probably also a big thing. I haven't looked at any of
this very closely, but I'm just guessing based on the sorts of properties I
would try to prove initially.

~~~
stcredzero
_it's about a number of safety properties always holding as long as the
assumptions are true_

And no project is ever without such assumptions. You can't have _abstractions_
without them.

~~~
gdp
I was using 'assumption' in the strict mathematical-logic sense. These
assumptions are stated and it is trivial to check that the proof is not true
if the assumptions aren't true.

~~~
stcredzero
I was using 'assumption' to talk about the faith that abstractions and their
(sometimes mathematical) assumptions can be successfully applied to solve
real-world problems.

Some may argue that there's no guarantee that such proofs apply to real world
systems. But really, there's no guarantee that any abstract model does. We
humans have to get paid for _something_.

~~~
gdp
> _Some may argue that there's no guarantee that such proofs apply to real
> world systems._

I would argue that those who would argue that have probably not spent a lot of
time doing any real-world formal verification.

People seem to like arguing about its utility in the abstract without ever
doing it.

~~~
stcredzero
You might be replying as if I'm disagreeing. If so, please re-read ancestor
posts with the other spin.

In short: most of the objections people have to formal verification also apply
to modeling something in software to begin with.

~~~
gdp
No, I wasn't suggesting you were disagreeing, I was simply responding to your
hypothetical disagreement that some might argue. My counter-argument could be
just as applicable to any of the people who have professed similar beliefs to
the ones you speculated that some may hold.

~~~
stcredzero
It's not so hypothetical. It's more the typical response of a programmer when
formal methods come up. But I meant it as an observation of typical reactions,
not as a hypothetical debate.

------
ilyak
What if your proof contains errors?

Your program is code, your proof is code, both are going to be buggy.

And what are you going to prove?

~~~
gdp
<http://news.ycombinator.com/item?id=760547>

There was some discussion on the original post to HN about this announcement.
In particular, look at this:

<http://ertos.nicta.com.au/research/l4.verified/proof.pml>

Which talks about the limits of provability.

In short, a "proof" that contains errors isn't really a proof of anything. And
you build your proof inside an automated theorem prover so that it can be
mechanically checked for internal inconsistency (i.e. contradictions), or for
a completely inability to be false (i.e. tautologies).

~~~
ilyak
Your automated theorem prover contains bugs, too.

~~~
gdp
Read the link.

If you understand a little bit about how something like Isabelle is
constructed, only a small number of axioms are "hand coded", and then the rest
is built from those. You can always build an external proof checking tool that
checks that the proof you've constructed is correct. And if you're not
confident in that, you can build another one.

