
Why Writing Correct Software Is Hard and Why Math (Alone) Won’t Help Us [video] - DanielRibeiro
https://www.youtube.com/watch?v=dWdy_AngDp0
======
Animats
Oh, another halting problem argument.

He's quite correct that you can write programs for which there's no faster way
to find out if they halt than running them. His example is one that tests
Goldbach's conjecture by brute force. A more practical example is trying to
break a crypto key by brute force trials. Or a Bitcoin miner.

His analysis applies to the worst case. Most programs aren't like that. If
your program is that close to undecidability, you probably need some kind of a
loop counter to keep it under control.

Microsoft has a Static Driver Verifier for Windows kernel drivers. It's a
formal proof of correctness system using symbolic execution. It tests whether
a driver will halt (return, really), call any of the driver APIs incorrectly,
or violate the execution model by subscripting out of range, using a bad
pointer, or other crash-type error. It doesn't try to verify that the driver
will run the device correctly, just that it won't take the entire kernel down.
This is the main reason that driver-caused crashes stopped being a major
source of Windows problems in Windows 7 and later. Before the Static Driver
Verifier, more than half of Windows crashes were caused by driver bugs.

Microsoft's experience is that 5% to 10% of the time, the Driver Verifier
can't decide if a program will get to a bad state. The Driver Verifier has
explored too many cases and gives up. This is usually an indication of a flaky
driver. If the Verifier can't tell if the driver will hang, the programmer who
wrote it probably can't either. So the driver needs to be fixed. This does not
turn out to be a serious problem in practice.

(I've worked on proof of correctness systems. There are lots of hard problems,
but this just isn't one of them.)

~~~
pron
I'm the speaker. First of all let me say that much of what I do these days
formal specification and verification, so I am aware of what's possible in
practice and what isn't, and the blog post does go into some detail. Also, I
am well aware of Microsoft's great accomplishments with driver verification --
SLAM was a landmark project -- but those results don't seem to translate well
to other domains. As I say in the beginning, the main purpose of the talk is
to explain why the problem is so hard, not to list the special cases that are
feasible in practice (although I do quickly go over some approaches).

Nevertheless, it is not true that all the complexity theory results I present
are worst case. Of particular interest is the result by Schnoebelen from 2005
(25:16 in the talk, but it's better to look it up in the post) that showed
that verification isn't FPT (fixed-parameter tractable), which is very strong
evidence that we cannot find a good method way to restrict programs or
structure them in a certain way -- even in practice -- so that verification
becomes tractable. Another is the result Schnoebelen quotes of symbolic model
checking (28:20, but better in the post), i.e. the ability to use syntactic
constructs and abstractions to aid in verification), which, as he says, has
been shown to hold in practice and is now considered an empirical fact. Of
course, the fact that feasible verification of nonlocal properties is unsolved
and that virtually all efforts attempt to tackle special cases (or don't care
about feasibility), shows that the problem is very hard in practice, and I
talk about that towards the end of the talk/post.

Also, the Goldbach example has little to do with brute force. It is about the
difficulty proving a seemingly simple reachability problem on a 10-line
program. That the program itself uses brute force is irrelevant, and only used
to make it simple and short.

~~~
Animats
There are lots of things you can verify if you don't choose to bang your head
against unsolved problems in number theory.

The first stage is simply verifying that the program stays within the language
- no integer overflow or underflow, no subscripts out of range, no bad
pointers. If you can't do that for a program, the program is just broken. It
should be possible to do this for much "unsafe" Rust code, such as collection
classes. To deal with collection classes, you need to be able to talk about an
array for which only some of the elements are valid. Rust lacks syntax for
that, but many verification systems have ways of talking about such things.
Simple inductive proofs can handle situations such as growing or compacting a
collection.

The next step is verifying validity semantics for data structures. Is a tree
valid before and after an insertion - that sort of thing. There are good ways
to do that. One way is to add information to the tree, such as back pointers
and depth counts, and verify that the updating operations maintain the
invariant that back and forward pointers are consistent, and that depth counts
decrease towards the root. That info doesn't have to be carried along into the
executable code if it can't affect the results. We did this in the Pascal-F
verifier, and it was easy to explain this to programmers as a way of inserting
checking code that had no overhead.

Specifying the problem formally remains tough. There are formal specification
languages. I used one of the first, SRI's SPECIAL. They're useful for some
kinds of problems. Industrial control often has well-defined constraints on
the overall system. Databases do something which can be simply defined;
consider a database as a tuple store with linear search. An efficient
implementation of a database is quite complex, but the desired end result is
defined by a simple model.

Security properties can often be addressed formally, and DoD has put a lot of
work into that. Anyway, enough for tonight.

~~~
pron
Have you watched the talk or read the post?

~~~
Animats
Yes, unfortunately.

From the paper: "I hope it’s clear by now that it’s unlikely we’ll find a
general useful category of programs for which verification is affordable, as
verifying even the most simple computational model, the FSM, is in general
also infeasible."

The basic approach of this paper is to generalize the problem until it becomes
unsolvable. Others less inclined to claim the problem is impossible have
succeeded in verifying some useful systems.

Useful reading:

"Formal Verification of floating point RTL at AMD."[1] After the famous
Pentium floating point bug, AMD decided to use formal proofs of the
correctness of their FPU, starting from the Verilog representation used to
create the chip. They found several bugs, which were corrected before making
the chip.

L4 microkernel verification: "The binary code of the seL4 microkernel
correctly implements the behaviour described in its abstract specification and
nothing more. Furthermore, the specification and the seL4 binary satisfy the
classic security properties called integrity and confidentiality."[2]

[1]
[http://www.russinoff.com/papers/paris.ps](http://www.russinoff.com/papers/paris.ps)
[2]
[https://ts.data61.csiro.au/projects/TS/l4.verified/](https://ts.data61.csiro.au/projects/TS/l4.verified/)

~~~
january48
I remember that old ACL2 paper, they didn't perform verification on just any
RTL module written in Verilog, they restricted it to a subset of Verilog that
most matched their ACL2 language, it was a lisp like functional language, and
their ACL2 system was descendant of the Boyer-Moore thereom prover Nqthm.
Anyway they translated that subset of Verilog into ACL2, and then their prover
would try do verification by using term rewriting. To use their prover you had
to basically find a functional version of whatever program you wanted to
verify. On top of that, there was no guarantee that it would perform
verification successfully, you would either get a result or ACL2 would just
hang/or fail! Its mode of use is very similar to Microsoft's SLAM you
mentioned earlier, if it took too long then you really had no choice but to
rewrite. Verifying any Verilog module was way outside its capabilities and by
today's standards it wasn't a very sophisticated mechanical prover.

Anyway, the way verification is done these days is too reduce the state space
as much as possible and check the FSM, that's not "generalizing the problem
until it becomes unsolvable". There are techniques post 1990's to do just that
for example abstractions and minimization, partial order reduction, and more
famously symbolic model checking. But user pron is right, most useful programs
in production are just not feasible to formally verify because even after
using many of the state of the art techniques to reduce the state space of the
resulting FSM, the FSM is still too large. The option you have today, if
verification is important to you, is to structure your program so that its
verification by mechanical provers is tractable. I apologize for any
typos/grammar errors, I'm on my phone. Also for anyone interested in the
history of model checking check out Turing Laureate Edmunde Clarke's paper
"Birth of Model Checking", if your interested his textbook "Model Checking",
and a more recent text is "Principles of Model Checking" by Baier&Katoen.

~~~
pron
BTW, one of the coolest modern model checking techniques is "counterexample
guided abstraction refinement" (CEGAR) which automatically soundly reduces
(abstract) the state space, tries to prove the property, and if it fails it
refines the abstraction (enlarging the state space) and tries again.

------
RossBencina
From the YouTube link:

Extended transcript with more results and full references can be found here:

[http://blog.paralleluniverse.co/2016/07/23/correctness-
and-c...](http://blog.paralleluniverse.co/2016/07/23/correctness-and-
complexity/)

~~~
moyix
Thanks so much for posting this. I probably would have missed out on the
content otherwise, and it's a fantastic survey. I hadn't seen the 2005 result
that decomposing programs doesn't make them easier to prove!

------
ScottBurson
> Complexity is essential. It cannot be tamed, and there is no one big answer.
> The best we can do [...] is apply lots of ad hoc techniques

I think this is right -- and yet wrong. In fact, I would say that it's only by
accepting the correctness of this statement that we can begin to glimpse how
it might be wrong.

It is right that there is no one big answer _that will work for any program_.
There is no algorithm, no language, no logic, no formalism, no decision
procedure that will solve the problem, and there never will be.

But I'll wager that we can get much much better at "apply[ing] lots of ad hoc
techniques" \-- to the point that _in practice, for the programs we actually
care about verifying_ , it will _eventually_ be possible to automate most of
the verification process even for our largest programs.

If you look at what deep learning is -- to take one example of how this could
happen -- it's essentially a method for generating ad hoc techniques by the
zillions. When we train a network we generally neither know nor care what
features it learns to detect at each layer. Each such feature detector is an
ad hoc heuristic for _something_ , but we don't even need to know what.

My suspicion is that the current state of machine learning, as exciting as it
is, is still in its infancy, and the techniques are going to get much more
powerful yet.

And the thing is, as humans, I believe we do often understand why our programs
work. That means that a significant fraction of the time, we have in our heads
informal correctness proofs that are actually right. I said "a significant
fraction"! I am keenly aware how often we get them wrong, but I think that if
we were practically never right about these things, our software wouldn't work
even as well as it does. Sure, our short-term memory is minuscule, and our
long-term memory unreliable. We make logic errors, we overlook corner cases by
the thousands, and we simply forget required preconditions and postconditions.
Anyone who writes software for a living knows how bad we are at it, _and yet
we can do it to a point_.

What we have that the machine doesn't have is an amazing pattern-matching
ability. We're seeing that machines are getting a lot better at pattern
matching. What the machine has that we don't is massive reliable memory and
absolute indefatigability.

Just to be clear, I'm not suggesting that pattern matching is all that stands
between us and proving Goldbach's conjecture. (If that were true, we would
have done it.) I'm saying that the proof tasks involved in verifying real-
world programs are not difficult in that way. What makes them hard is not that
they require great leaps of mathematical insight; a lot of them aren't even
very long. (What makes them hard is, to return to the author's theme,
_complexity_. To prove any particular fact about a large program requires
figuring out what facts about the program are even relevant to one's task.
Even just at the AST level, it would take many millions of facts (as
relational tuples, say) to describe a large program. Naive brute-force search
in a space like that would be beyond hopeless.)

~~~
pron
> it's essentially a method for generating ad hoc techniques by the zillions

But in the end, the network is just a single algorithm. Complexity theory
doesn't care that from your point of view you consider it to be zillions of ad
hoc techniques. Complexity theory tells us which problems are learnable and
which aren't:
[https://en.wikipedia.org/wiki/Probably_approximately_correct...](https://en.wikipedia.org/wiki/Probably_approximately_correct_learning)

> Anyone who writes software for a living knows how bad we are at it, and yet
> we can do it to a point.

I agree, which is why I think that with enough empirical research there's hope
(remember, the point of the talk is that math alone won't help, but empirical
research might), but don't expect miracles. What we want is something that far
exceeds what we think we know about our programs. That gap between what we
think we know and what we actually know is where _all_ bugs lie.

> Naive brute-force search in a space like that would be beyond hopeless

What complexity theory says is that knowledge comes at a price. If you want to
know something you _must_ pay _that_ price. Your algorithm for knowing it is
irrelevant. Knowing some things about programs can get really expensive, no
matter the technique.

~~~
ScottBurson
The PAC learning framework takes as one of its starting assumptions that the
learner cannot be instructed as to the form of the answer. I think that's a
stronger restriction than we need for this purpose. It's okay if we have to
provide the learner with new concepts, as long as it can remember them and
apply them appropriately after that.

(I'm not actually sure that deep learning, at least as it exists today, is
going to be particularly useful for this problem. I was just trying to come up
with an example that people would be familiar with.)

~~~
pron
In the conclusion of the talk I say that the only way forward towards anything
resembling a sufficiently general approach to verification. Any form of
statistical analysis (which is what many current learning algorithms are) is
simply an algorithmic view of that very statement. One of the attendees even
asked if machine learning could assist in that empirical study, to which the
answer is, of course! But it is important to understand that this is what the
learning algorithms would do: empirically study human-written programs. I
don't think there's currently a whole lot of research in that direction,
though.

~~~
ScottBurson
Yes, I think we agree on a lot of this. Where we seem to disagree is on the
prospects for a "silver bullet". As much respect as I have for Fred Brooks, I
have always thought he was too quick to dismiss the potential of AI (though
admittedly, when he first wrote that essay, AI was much better known for its
failures than its successes). In particular, this dismissive, oddly moralistic
quote from Turski, that you've requoted, I think is absolute horseshit. (For
one thing, I'd wager that the biotech people will actually solve the hair loss
problem within the next 10 years. I sure hope so :-)

But we agree that it's essential to understand what _has_ been proven to be
impossible. There will be no algorithm (in the sense of an effective procedure
that works on any input) for computing correctness of programs. Indeed, _all_
the familiar tools of computer science are inadequate: languages, logics,
indeed formalisms of any kind. They may all have roles to play, but none of
them can be the silver bullet in and of itself. It is exactly as you say, that
the breakthrough, if there is to be one, must come through empirical analysis
of actual programs.

But we now live in a world where, at long last, a machine has beaten the world
champion Go player. Many experts thought that wouldn't happen for decades. If
we can do that, I think that means we are finally starting to have solutions
to problems of perception and pattern-matching that have been stymieing us
this whole time. I am very excited.

I probably came out sounding a little too combative at the beginning. I think
you've written an excellent summary here (modulo the Turski quote :-), and I
think it's kicked off a very interesting conversation. (Even Animats, I think,
doesn't disagree with us as much as he/she initially appears to.)

> In my job, I formally specify and verify large, complex distributed systems.

I am actually working on a project to put some code where my mouth is: to
build a reasoning system that uses AI and ML and learns to reason about
programs. It's a long-term thing and I'm probably a couple of years from
having anything to demo, but I'd love to learn more from you about the
problems you encounter. Are you by any chance in Silicon Valley? If so, let's
get together. My email is in my profile.

~~~
pron
> I have always thought he was too quick to dismiss the potential of AI

I have no idea what AI is (in a practical context), and I think no one else
does, either. Do you mean the statistical algorithms that people call machine
learning?

> But we now live in a world where, at long last, a machine has beaten the
> world champion Go player.

It is very important to understand how that was achieved and the difference
between that and verification. It's unknown how good Google's Go program is
compared to some arbitrary opponent, as the algorithm it employs is not the
algorithm to "win" at Go (which is far too computationally expensive to be
feasible). It is "merely" able to beat other _human_ players, but that is
mostly because humans can beat human players, and the program learned from
humans. Humans, however, do _not_ efficiently verify other humans' programs,
and that makes the problem very different.

Don't get me wrong: I think that statistics are absolutely necessary, and that
machine learning could certainly help. But "help" does not necessarily mean
"solve". At the end of the day, even if we pretend machine learning does
resemble intelligence (which it does not, as any expert would tell you; we're
still in the vicinity of zero IQ), we must remember that intelligence --
unlike what some people like to believe -- is not a very good "general problem
solving" algorithm; it is a reasonable general algorithm for problems that
humans tend to confront in many situations. For example, humans are even worse
than computers in finding solutions to NP complete problems, and are in
general terrible at solving many problems. In general, humans are good at
forming conceptually deep but undetailed theories. Is verifying programs one
of those problems that intelligence can solve? It _could_ be, but we have no
good theory that can give us intuition into whether it is or it isn't. I do
know that a model checker was far better than me when reasoning about
programs.

But instead of hypothesizing, a good first step would be to gather statistics,
and to use machine learning in analyzing those statistics. I think it's more
likely that machine learning would uncover important patterns that we could
then exploit in more robust algorithms.

Thanks for the invitation, but I am very far away from Silicon Valley.

------
paulddraper
Saying math (alone) won't help you write correct software is like saying math
(alone) won't help us create good AI.

Of course that's true.

In either case, it's a combination of math and more ad-hoc heuristics. But you
_can_ push a lot of that into math, and you should, and we should keep doing
that.

~~~
pron
By "math" I didn't mean the use of math as a tool, as done in, say, physics,
but the view of computer science as a branch of the academic discipline of
mathematics, which normally deals with very different problems from those
computer science does. For example, much of physics relies on numerical
methods, which, while mathematically justified themselves, are not really the
tool of choice for mathematicians, because their concerns lie elsewhere. But
physics regularly tackles problems for which there are no closed-form
solutions -- this is the norm rather than the exception -- so the mathematical
tools used by physicists and mathematicians are quite different. Computer
science is more similar to physics in that regard.

