Hacker News new | past | comments | ask | show | jobs | submit login

> Whatever language is chosen, only a small subset of programs written in that language will be verifiable.

You're still focusing on arbitrary programs. Just don't write arbitrary programs. Let your intended proof tell you what program you need to write.

> Composing proofs of program correctness requires proof effort that is intractable in the complexity of each component, not in the number of components.

So keep each component small.

> However, that genetic algorithm I was referring to is conducted by all living systems as a single computational system.

This is still not verification. It's just testing on a much larger scale.




> Just don't write arbitrary programs. Let your intended proof tell you what program you need to write.

Of course, but easier said than done. In other words, we haven't figured out how to do this yet. So far, the cost of proof exceeds what we're willing to pay for what needs proving. OTOH, researchers often focus not on the programs we need to write, but on the properties they can prove (searching under the lamppost, so to speak); this is the best way of getting results. Some of them then try to convince themselves and others that the properties they can prove of the programs they can prove them on are really those that need proving, and that the cost is affordable. Sadly, this wish hasn't been shown to be true just yet.

Look, I suggest that you try to write a large-scale formally verified program. Forget large; small and non-trivial will do. And I don't mean "verified" as in it type-checks in Haskell, because the Haskell type system can only prove extremely weak properties, but actually verified with respect to interesting logical properties. There is a reason why no one in the history of computing has never come close to achieving such a feat. Interestingly, you find that it is the people who actually do formal verification (you can watch talks by Xavier Leroy, who comes to verification from the PLT direction, or statements made by the people who worked on seL4) are usually the ones most restrained in discussing its possibilities. It can work, depending on what exactly you want to do, but requires a great investment, and end-to-end verification is possible only on very small programs (like seL4 or CompCert) and only at a prohibitive cost, that makes the process worth it only for a handful of very special projects.

Let me put it another way: so far, practice has shown to be very much in line with the difficulties predicted by the theory. Even Schnoebelen says, when discussing the result that language abstractions don't help, that, while true for the worst-case, "has been observed in many instances, ranging from reachability problems to equivalence problems, and can now be called an empirical fact"

Personally, I do formal verification extensively on large projects (much larger than CompCert or seL4), but I don't even attempt end-to-end or manual deductive proof because the cost of those becomes very clear very fast once you start trying these things on big systems. These observations have been reported, AFAIK, by every single industry or research unit that carries out significant software verification efforts.

Formal verification can and does work. But in order for it to work, you need to approach it humbly and know the limitations. Also, you need to know where best to spend your efforts, which is a bit of an art. When you do, you discover that maybe you've not found a way to consistently churn out bug-free programs, but that the quality of your programs has improved considerably.

Just to give you a taste of program proof, this is a simple exercise by Leslie Lamport:

Consider N processes numbered from 0 through N-1, in which each process i executes

    x[i] := 1;
    y[i] := x[(i-1) mod N]
and stops, where each x[i] initially equals 0. (The reads and writes of each x[i] are assumed to be atomic.) This algorithm satisfies the following property: after every process has stopped, y[i] equals 1 for at least one process i. It is easy to see that the algorithm satisfies this property; the last process i to write y[i] must set it to 1… The algorithm satisfies this property because it maintains an inductive invariant. Do you know what that invariant is?

If you can produce a machine-checked proof of this 2-line program that has no loops or recursion in under one hour, hats off to you. The difficulty of the problem grows super-linearly with the size of the code. People writing real programs, even those intended for verification, write programs that are several orders of magnitude more complex than this toy example.

> This is still not verification. It's just testing on a much larger scale.

It's verification (by an algorithm that is not common in software). The cost of verification comes from the knowledge you wish to obtain, not how you obtain it. Computational complexity doesn't care if you know something by proving it using FOL and natural deduction, dependent types, or by exhaustively trying each input. BTW, as far as the theory is concerned, a test is verification of a more specific property (e.g., the property that your program, when given input x, would, say emit output y followed by output z).


> Look, I suggest that you simply try to write a large-scale formally verified program. And I don't mean "verified" as in it type-checks in Haskell, because the Haskell type system can only prove extremely weak properties, but actually verified with respect to interesting logical properties.

This is exactly what I'm doing: a library of data structures and algorithms, where the verification effort is suitably split between the language and me. The language (in this case, Standard ML) ensures that abstract types can only be used abstractly, and I prove (using paper and pen) that the implementations of these abstract types are correct. A primary concern is simplicity, so if a module has too many invariants to manually prove, it is a natural candidate for splitting into smaller modules.

I'm not saying that Standard ML is the ideal language for this kind of thing, e.g., some limited form of dependent typing would be very helpful (as long as it doesn't compromise type inference), but it's much better than anything else I have tried (including Haskell).

> Do you know what that invariant is?

The invariant to maintain is “there exists some distinguished index i such that x[i] is written by process i before it is read by process (i+1) mod N”. If N = 1, then i = 0. If N = N' + 1, consider some valid interleaving of N' processes, with distinguished index i', and then consider the effect of interleaving one more process.

> The cost of verification comes from the knowledge you wish to obtain, not how you obtain it.

In this case, what I wish to ascertain is whether a living organism is fit to survive in a range of circumstances, most of which will never actually happen.

> BTW, as far as the theory is concerned, a test is verification of a more specific property (e.g., the property that your program, when given input x, would, say emit output y followed by output z).

Tests can verify atomic propositions. They can't verify that a predicate holds for all inhabitants of an infinite set.


> a library of data structures and algorithms

That's excellent, but I didn't say "program" for nothing. Programs are usually interactive, with temporal components. But if data structures are your thing, try writing and verifying, say a concurrent R-Tree.

> The only invariant I can see is that “y[i] has been assigned to” implies “x[i] has been assigned to”. From this, we can see that “all y[i]s have been assigned to” implies “all x[i]s have been assigned to”. But this doesn't involve any induction on N so far.

Your reasoning is OK, but it shows how much you're thinking in terms of the language rather than more abstractly. It's also not enough to prove the property.

The crucial state here is the hidden state in the program counter. The induction is not on the number of processes, but on the steps of the algorithm. Each step executes one line of one process (chosen nondeterministically). The invariant, expressed in TLA+ is:

    Inv ≜  (∀ p ∈ 0..N-1 : Done(p)) ⇒ ∃ p ∈ 0..N-1 : y[p] = 1 \* (1)
          ∧ ∀ p ∈ 0..N-1 : pc[p] ≠ "L1" ⇒ x[p] = 1 \* (2)
(1): this is the simple property invariant

(2): this is the inductive bit; `pc` models the program counter

This is instantly verified with a model checker. A machine-checked proof is about 60 lines long (I try to avoid formal proofs as they are extremely laborious, but I guess that a more experinced machine-prover than me could do it shorter).

> Tests can verify atomic propositions. They can't verify that a predicate holds for all inhabitants of an infinite set.

Exactly, and each of those atomic propositions has a cost of verification. Moreover, we can prove (again, it's in my post) that we can't in general generalize from one input to another, so the cost of verification over an infinite number of inputs is infinite in the general case. Of course, it can be done in practice, but we can tell exactly how hard that would be: the cost of verification is proportional to the number of states in the algorithm -- the minimal algorithm that satisfies the full spec; not necessarily your actual program. Properties that are easily checked by simple type systems like those of Haskell and SML can always be satisfied by much simpler programs (in fact, a FSM or, if polymorphic recursion is used, then a PDA).


> It's also not enough to prove the property.

Errr, yes, you're right. I noticed I messed up after I posted my comment, then edited it accordingly. I don't see why I need to perform induction on the program counter, though.

I posit that, no matter how you interleave your processes, there is some distinguished index i such that x[i] is written by process i before it is read by process `(i+1) mod N`. To prove this, I use induction on N:

(0) If N = 1, then i = 0.

(1) If N = M + 1, inductively assume that we have a collection of M processes with distinguished index i. Now consider the effect of adding one more process, respecting the interleaving of the original processes:

(1.a) If the new process is added at any index other than `i+1`, then the original processes i and `(i+1) mod M` still have contiguous indices in the new collection, and the former still writes to the common variable before the latter reads it.

(1.b) If the new process is added at index `i+1`, then at least one of the two following things must happen:

(1.b.i) The original process i writes to x[i] before the new process reads it.

(1.b.ii) The new process writes to x[i+1] before the original process `(i+1) mod M` (now `(i+2) mod N`) reads it.

No analysis of the program counter was required, and as a result we get a far more beautiful proof. You used a sledgehammer, I used a sculptor's chisel.

> Properties that are easily checked by simple type systems like those of Haskell and SML can always be satisfied by much simpler programs (in fact, a FSM or, if polymorphic recursion is used, then a PDA).

I never claimed that I could delegate the entirety of the formal verification task to a type system like that of Haskell or SML. And, to be honest, I don't even think using computers to verify everything is such a good idea. What I want is automation for the boring and repetitive parts (the vast majority), so that I can concentrate exclusively on the hard ones. (Which isn't to say that I'm happy with the little amount of work Damas-Milner does for me. It's possible to do much better, without sacrificing automation.)


> and the former still writes to the common variable before the latter reads it.

This is not true. At no point in your induction is it proven that it is always the last process that writes x before the first reads it, nor is it necessarily the case.

> then at least one of the two following things must happen:

This, too, is untrue. The events may occur in the precise opposite order. If this were true, then it would have been an invariant that either process N-1 or process 0 would necessarily write 1, but this is not the case.

> No analysis of the program counter was required, and as a result we get a far more beautiful proof. You used a sledgehammer, I used a sculptor's chisel.

Excellent, except that my "sledgehammer" is a proof technique that's been proven complete, i.e. if a program property is provable, it is provable via an inductive invariance, which is amenable to mechanical proof checking (and, indeed, I mechanically checked it in TLA+). Good luck getting your proof (which is called "behavioral") -- even if it were correct -- to verify formally.

Chord is a distributed algorithm that was published in 2001 and quickly became popular. It got thousands of citations and plenty of industry implementations. In 2011 it even won a "test of time award". Its authors said, "Three features that distinguish Chord from many peer-to-peer lookup protocols are its simplicity, provable correctness, and provable performance". They, of course, provided proofs. However, in 2012, Pamela Zave, using a model checker, showed that "of the seven properties claimed invariant of the original version, not one is actually an invariant". The point is that when algorithms are concerned, "proofs" are often not proofs.

See: https://brooker.co.za/blog/2014/03/08/model-checking.html and https://www.youtube.com/watch?v=iCRqE59VXT0

> I never claimed that I could delegate the entirety of the formal verification task to a type system like that of Haskell or SML.

Just to be sure: proof with pen and paper isn't formal verification. It is pretty the very definition of informal verification. Formal proof means using natural deduction or sequent calculus. This is too hard, so a machine-checked proof is acceptable. But any proof or any other kind of verification not carried out or verified by a computer is not formal.


> At no point in your induction is it proven that it is always the last process that writes x before the first reads it, nor is it necessarily the case.

I didn't say anything about “the last process”, though. I just took a collection of M processes, with some valid interleaving, and inserted one more process, at whatever index j, displacing all pre-existing proceses j...M-1 one place to the right.

>> then at least one of the two following things must happen:

> The events may occur in the precise opposite order.

Let A and B be processes with adjacent indices (for my purposes, M-1 and 0 are considered “adjacent”), such that A's write happens before B's read. If we insert a new process C in between A and B, then A and B no longer manipulate a shared variable, but instead A writes to a variable C reads, and C writes to a variable B reads. Then at least one of the two following things must be true:

(0) A's write happens before C's read.

(1) C's write happens before B's read.

This follows from these facts:

(0) A's write must happen before B's read.

(1) C's write must happen before C's read.

Was that too hard to understand? If it was, you need to relearn to prove things by hand.

> Just to be sure: proof with pen and paper isn't formal verification. It is pretty the very definition of informal verification.

While the proof I gave above is most certainly informal, you can prove things formally using pen and paper. The formality comes from the use of explicitly given deduction rules, not from the use of a computer.


> I didn't say anything about “the last process”, though

Of course. Sorry. For some reason my mind read i as N. It was late at night... I'm convinced the proof follows from your invariant, but the inductive step is a bit tricky, as i is not the same for every interleaving. My proof is simpler, as the invariant holds for -- and more importantly, maintained by -- every step made by the system, and there's no need to consider various interleavings. It also doesn't require induction on N.

> The formality comes from the use of explicitly given deduction rules, not from the use of a computer.

That's true and I said as much, but doing so is extremely laborious. Do you actually use natural deduction?


> but the inductive step is a bit tricky, as i is not the same for every interleaving

Ah, the joys of nondeterminism! It gives “there exists” an, ahem, “interesting” meaning. This is why I only do imperative and concurrent programming when it's absolutely necessary (e.g., when no functional and deterministic program could be asymptotically as efficient).

> My proof is simpler, as the invariant holds for -- and more importantly, maintained by -- every step made by the system

My invariant is prettier. It doesn't need to be maintained “throughout the computation”, because it's a static assertion about the program: “regardless of the interleaving of operations, there exists some i such that...” Almost invariably, the program is a much simpler mathematical object than the computations it gives rise to. Dijkstra noted “that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed.” https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...

> It also doesn't require induction on N.

Induction on the program counter is even harder! The inductive case of my proof doesn't even look at the entire computation: it looks at most at three processes at a time. It's literally less work than what you propose.

> Do you actually use natural deduction?

In the design of type systems, yes. If I want to prove something by hand, hell no. Natural deduction is much nicer to prove things about than it is to prove things in.


> Ah, the joys of nondeterminism!

In TLA+ nondeterminism is as natural and graceful as determinism (it's a matter of ∨ vs ∧), and the functional concepts of parameterization and function types are just special cases. Nondeterminism is what allows refinement relations. You have it whenever you have types, but the language you use just doesn't use the same words to describe it.

> because it's a static assertion about the program

Mine is as static and as global as yours (and has the benefit of being easily verifiable mechanically)! You're just not trained to see it. You've been told that concurrency and imperative constructs are "dynamic" and are hard to reason about. That's just because FP is bad at reasoning about them, and prefers syntactic decomposition. There is nothing dynamic about a proper, pure description of what to you seems a messy dynamic process. The entire program is just a single, pure semantically unambiguous first-order predicate.

I had a discussion about this with a PLT person once, and he kept asking how you prove static properties in TLA+. It took me a while to understand what he meant, as "static" vs. "dynamic" are necessary when you want to reason about syntax. In simple logic, there's just a single formula.

> Almost invariably, the program is a much simpler mathematical object than the computations it gives rise to

You don't reason about each computation, but about what you'd call the "program", which in TLA+ is called the algorithm, as not to unnecessarily confuse with ideas that are related to syntax. The algorithm is just a boolean predicate, and my proof is that that predicate implies another (which implies the property).

Now when I look at how FP people reason about code I wonder why they have to complicate things so much. Sequential, concurrent, parallel, nondeterministic, deterministic, algorithms, properties -- they can all be reduced to a single first-order predicate, with the simple implication relation proving anything you want to know about any of them (that the program satisfies a property or a "type", that the deterministic program is a refinement of the nondeterministic one, that the parallel program computes the same result as the sequential one, that the sequential program has a worst-case complexity of such-and0such), all with simple, familiar logic and semantics that fit on a single page.


> You've been told that concurrency and imperative constructs are "dynamic" and are hard to reason about. That's just because FP is bad at reasoning about them, and prefers syntactic decomposition.

The hard fact is that, while you can reason about purely functional programs using little more than high-school algebra, reasoning about imperative programs requires bringing in the heavy apparatus of logic right from the start, because imperative programs denote predicate transformers. Distinguishing a class of computations that can be expressed as functional programs is distinguishing a class of computations whose meaning can be established with simpler tools, and thus obviously a beneficial activity.

> There is nothing dynamic about a proper, pure description of what to you seems a messy dynamic process. The entire program is just a single, pure semantically unambiguous first-order predicate.

Last time I checked, imperative programs don't denote first-order predicates, but rather predicate transformers. Quite different things. A predicate transformer is a higher-order mathematical object that relates two (possibly first-order) predicates - a precondition and a postcondition.

> You don't reason about each computation, but about what you'd call the "program", which in TLA+ is called the algorithm, as not to unnecessarily confuse with ideas that are related to syntax.

Your use of induction on the program counter reveals that you're just using the program as a proxy for a class of computations.

> The algorithm is just a boolean predicate,

Um, no. Predicates don't express computation.

> and my proof is that that predicate implies another (which implies the property).

Would you have been able to produce your proof entirely by yourself? That is, unaided by a computer. I produced my proof using nothing other than my brain.


> The hard fact is that, while you can reason about purely functional programs using little more than high-school algebra, reasoning about imperative programs requires bringing in the heavy apparatus of logic right from the start, because imperative programs denote predicate transformers.

1. It is immediately obvious that TLA+ is simpler than SML, yet as powerful as Coq; it is TLA+ that requires little more than high school math. You can reason with it about any kind of algorithm you like, using the same simple, familiar math (FOL, ZF plus three temporal operators, of which only one is really used for safety properties).

2. Imperative/functional is a language concept. In TLA+ there is no such thing as an imperative or a functional program. There is an algorithm that, say, computes a factorial in some sequence of steps. Is that sequence imperative or functional? Hard to say. TLA+ is pure, yet doesn't model an algorithm as a function.

3. Imperative programs do not denote predicate transformers. They denote whatever sound model you decide they do. The whole idea of "denotation" is a semantic one, i.e., tied to a specific language. E.g., in TLA+, all algorithms -- functional, imperative (although there's no difference), parallel, concurrent, quantum -- denote a set of behaviors, where each behavior is an infinite sequence of states.

Lamport's entire idea is that thinking of languages and semantics when reasoning about algorithms complicates things unnecessarily. Think of algorithms as abstract state machines, and reasoning becomes -- if not simple -- then as simple as possible (today, at least).

> Your use of induction on the program counter reveals that you're just using the program as a proxy for a class of computations.

That is what a program reasonably "denotes" (and what it, in fact, does in TLA+). But the proof works on the representation of the program, which is just a logical predicate.

> Um, no. Predicates don't express computation.

Um, yes, they do. In 1994 Leslie Lamport wrote a paper (2500 citations) that shows how they can, and that makes everything simpler. It is one of the best known and most widely used algorithm logics. It's nothing obscure.

> Would you have been able to produce your proof entirely by yourself?

Yes. First I thought of it, then I wrote it, then I added details until it checked. But let me tell you: if a proof assistant would ever make proofs easier than doing them in your head, then I'd be very happy. Usually they make things harder because they need lots of detail. Sometimes they're necessary if the proof is shallow but very big, and requires remembering lots of things. But, unfortunately, proof assistants don't help you uncover a proof. Model checkers can do that sometimes.


> TLA+ is ... as powerful as Coq

This is an astonishing claim. Can you formulate measure theory in TLA+?


1. Why would that be astonishing? The "+" in TLA+ is FOL + ZFC, the same foundation used for virtually all of math (except some branches of logic).

2. I am not a logician so I don't know of the formal particulars of measure theory, of any limitation ZFC has in formulating it, or of any TLA+ specific limitations in that regard. However, TLA+ is not a general proof assistant (it can certainly be used to prove general math theorems, it just wasn't designed for that task), but a language for specifying and reasoning about algorithms and large software systems. In that, it is as powerful as Coq.

3. There are completeness proofs for TLA, i.e., that anything that you can state about a computation, you can state in TLA, and anything you can prove about a computation you can prove in TLA. Now, this isn't quite accurate: TLA is a linear temporal logic, and there are some things that you can state and prove in a branching-time temporal logics that you can't in a linear logic, nevertheless, as Moshe Vardi says in Branching vs. Linear Time: Final Showdown[1], it makes more sense to prefer linear time logic because empirical evidence suggests that it's far more useful. There is one glaring downside: TLA is unable to reason about probabilistic algorithms[2]. However, it is possible to embed reasoning about computation in the + part of TLA (i.e. ZFC) and reason about probabilistic algorithms in that way, and I assume that this is how it would be done in Coq, too. So you could do everything in TLA+ without the TLA bit, but that would be inconvenient, as the natural representation of algorithms in TLA is especially convenient and the main justification for the approach. So I would be very interested in seeing an extension of TLA meant to deal with probabilistic algorithms.

4. In the end, what matters isn't theoretical strength (by which Coq and TLA+ are equivalent in anything that concerns algorithms), but convenience. I am sure that there are some problems that are easier to formulate and reason in Coq than in TLA+. But the fact that no one has ever used Coq for a large software system, and that the handful of use cases in industry were always done in cooperation with academic experts while TLA+ is used in very large projects by "plain" developers speaks volumes to the strength of the approach.

[1]: https://www.cs.rice.edu/~vardi/papers/etaps01-ver13.pdf

[2]: You can describe any probabilistic algorithm in TLA (i.e., state that an algorithm performs a certain step with a certain profitability). What you can't do is then state a probabilistic theorem about that algorithm, as "in 70% of the cases, the algorithm gives the right answer".


Because Coq was designed (as I understand it) to help people prove theorems about mathematics, not about algorithms. If TLA+ -- a language designed to prove theorems about algorithms, not about mathematics -- is actually better for that purpose then a lot of people have wasted a lot of time.

Sure, algorithms are mathematical and much of mathematics can be recast as algorithmic, but still they have quite a different flavour. If a tool designed to help with one is actually the best tool to to help with the other, then that is astonishing.


> is actually better for that purpose then a lot of people have wasted a lot of time.

I don't think it's better, but it is as mathematically powerful, certainly when within the realms of ordinary math. As it is not designed for this task, I would assume that Coq would be more convenient.

Coq is a full decade older than TLA+ (and Lamport mentions Coq when he discusses the design of TLA+ as something to avoid because of the arcane, virtually unknown math; even the TLA logic is 5 years younger than Coq), and even Coq isn't the first theorem prover, nor the most popular. It wasn't even designed to be the "best" math prover, but (I would imagine) to be a tool for type theorists to see how well how their ideas work. After all, Isabelle (which preceded Coq and serves as a TLAPS backend) is very good at "general" logic (not type theory).

BTW, TLA+'s proof languages looks very different from Coq, and is declarative in nature. It is based on Lamport's "how to write a proof", and, as I understand, Isabelle's Isar is also inspired by that style[1], so TLAPS looks a lot more like Isar.

Also, remember that the very concept of machine-checked formal proofs of mathematical theorems are currently used mostly by people who design those provers and hobbyists. Few mathematicians are keen on the approach (although I think they should be). Proof of software correctness, OTOH, is something that a larger group is interested in.

But back to my main claim: TLA+ lets you express specifications about a program that are as rich as anything Coq would let you do. It was designed by people who are experts in software specification rather than by type-theorists. I would guess that Coq may be a better for some things (maybe proving general theorems about some classes of programs), but when it comes to actually reasoning about large software in the industry, TLA+ and its older kin (Z, B-method and more) are almost the only game in town. Isabelle is also ahead of Coq in verifying real-world systems.

[1]: http://isabelle.in.tum.de/Isar/isar-thesis-Isabelle2002.pdf


> It is immediately obvious that TLA+ is simpler than SML, yet as powerful as Coq

What facilities for data abstraction does TLA+ have? How do I hide the representational choices of a module whose internal implementation I'm not currently concerned with?

> it is TLA+ that requires little more than high school math. You can reason with it about any kind of algorithm you like, using the same simple, familiar math (FOL, ZF

First-order logic is not “high-school math”. You can't expect most high schoolers to know how to correctly manipulate logical quantifiers. (Of course, a competent programmer should know, but that doesn't mean one should force themselves to use a more complicated tool when a simpler one is available.) And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”. Have you ever seen the crazy things that Harvey Friedman does with large cardinals?

> plus three temporal operators, of which only one is really used for safety properties).

That's the thing: I want to reason in a time-independent manner as much as possible. Time-invariant reasoning is simpler and mathematically more beautiful than exploring state spaces alongside with your processes.

> In TLA+ there is no such thing as an imperative or a functional program. (...) E.g., in TLA+, all algorithms (...) denote a set of behaviors, where each behavior is an infinite sequence of states.

Then in TLA+ all algorithms are expressed as imperative programs.

> Think of algorithms as abstract state machines, and reasoning becomes -- if not simple -- then as simple as possible (today, at least).

By virtue of exploring large state spaces?

> But the proof works on the representation of the program, which is just a logical predicate.

Which is it?

(0) Two programs that compute the same result, but have different efficiency characteristics, are represented as logically (not just syntactically) different predicates.

(1) You don't distinguish between programs that compute the same result, but have different efficiency characteristics.

The former is unusable as a basis for reasoning about extensional equality. The latter is unusable as a basis for reasoning about complexity.

> Yes. First I thought of it, then I wrote it, then I added details until it checked.

Checked by whom?

> But let me tell you: if a proof assistant would ever make proofs easier than doing them in your head, then I'd be very happy. Usually they make things harder because they need lots of detail.

I don't disagree with this one.

> But, unfortunately, proof assistants don't help you uncover a proof. Model checkers can do that sometimes.

Type inference reconstructs type derivations all of the time. Reliably. Of course, those type derivations are all boring routine, but it means that human programmers themselves aren't burdened with it.


> What facilities for data abstraction does TLA+ have?

The full strengths of set theory (sets, functions), plus sugar for sequences and records.

> How do I hide the representational choices of a module I'm not currently concerned with?

TLA+ has a simple module system. Definitions can be local or public.

> First-order logic is not “high-school math”. You can't expect most high schoolers to know how to correctly manipulate logical quantifiers.

Depends on the country I guess. But OK, first year college.

> And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”.

None of that has any bearing on specifying algorithms, though.

> That's the thing: I want to reason in a time-independent manner as much as possible. Time-invariant reasoning is simpler and mathematically more beautiful than exploring state spaces alongside with your processes.

That's only because you're used to thinking in a way that is tied to languages. I have yet to encounter a powerful type system as mathematically elegant as TLA+.

> Then in TLA+ all algorithms are expressed as imperative programs.

No. They're expressed as nondeterministic abstract state machines, written as a first order logic predicate. Each step of the machine is arbitrarily complex. You can specify a program in TLA+ that in each step solves a difficult SAT problem, or even the halting problem.

Imperative is an attribute of a programming language with programming language semantics; TLA+ is ordinary math and doesn't have any of those attributes. It's like saying that calculus is imperative. However, it certainly isn't functional because it doesn't identify the concept of a function with the concept of a computation.

I guess you could specify a functional program as a machine that computes its result in a single state (although I wouldn't). You could then model, I don't know, LLVM machine code, and shoe that the machine code algorithm is a refinement of the functional program.

> By virtue of exploring large state spaces?

I don't understand the question. The reasoning -- as any other formal reasoning -- is done by manipulating logical formulas. The state space is a property of the model, not of the logic.

> Checked by whom?

TLAPS, the TLA+ proof system.

> Type inference reconstructs type derivations all of the time.

It's done reliably because the properties they represent are kept very computationally simple. In TLAPS uses SMT solvers to reduce the effort, and where a type would be inferred, so would a proposition. The problem is with more interesting propositions.


> The full strengths of set theory (sets, functions), plus sugar for sequences and records.

Set theory doesn't have any facilities for data abstraction. If anything, it's the antithesis of data abstraction: every mathematical object (internal to your set theory of choice) is represented as a set (or urelement, but those are even weirder) somehow, even if the encoding is irrelevant for your purposes.

> TLA+ has a simple module system.

Hierarchical modules? Parameterized module families? Multiple views of the same module, showing or hiding different aspects of it? I use this stuff everyday not only to keep my programs modular, but also to keep their proofs of correctness modular.

>> And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”.

> None of that has any bearing on specifying algorithms, though.

You were the one who originally brought up ZF, describing it as “high-school math”, so I was just refuting that.

> That's only because you're used to thinking in a way that is tied to languages. I have yet to encounter a powerful type system as mathematically elegant as TLA+.

If elegance is your only criterion, it's very hard to beat Martin-Löf type theory. Of course, usability matters too, and there MLTT fails miserably, at least for programming purposes. Higher-order dependent types don't come for free.

> It's done reliably because the properties they represent are kept very computationally simple.

That's a feature. It helps you delimit the routine stuff (and can be reliably verified mechanically, and should be most of your program, if it's designed sensibly) from what requires creativity to prove.

That being said, of course, existing type systems don't automatically verify everything that could be considered “routine”, so this is an area where further improvement is necessary.

> The problem is with more interesting propositions.

Prove them by hand. Good old-fashioned brain usage is never going away.

---

By the way, I'd like to see your full proof, for the exercise you posted a while back. (“Prove that there exists some i such that y[i] = 1 at the end of the whole computation.”) Is it as short and sweet as my proof?


> every mathematical object (internal to your set theory of choice) is represented as a set (or urelement, but those are even weirder) somehow, even if the encoding is irrelevant for your purposes.

Yes, but it's completely opaque to a spec. There's no way of obtaining the representation of, say, a function. But let's not get into the debate of set theory pros and cons, because it really makes no difference.

> Hierarchical modules?

Yes.

> Parameterized module families?

I don't know what you mean, but you can instantiate the same module multiple times with different parameters.

> Multiple views of the same module, showing or hiding different aspects of it?

Not that I know of, but as modules are instantiated, you can instantiate them and import, and then export. I can't see this being too useful, though. TLA+ specs are usually small. A 3000 line spec (over multiple modules) is considered very big.

> You were the one who originally brought up ZF, describing it as “high-school math”, so I was just refuting that.

Those parts of ZF that are used in practice when writing a spec are high school -- sorrt, first-year college -- math. Come on. That people write PhDs on algorithms they implement in Python doesn't mean that Python isn't a high-school-appropriate language.

> If elegance is your only criterion, it's very hard to beat Martin-Löf type theory.

Actually, it's extremely easy, especially where algorithms are concerned. In TLA, deterministic, nondeterministic, parallel, concurrent, neural, quantum algorithms are all simple special cases, and even better: properties of algorithms and algorithms are the same object. Think of it as type theory with nothing but types, only using ordinary familiar math. That in any type theory a type is a different object from its inhabitants (even if types and programs can use the same terms), already makes it much more complicated than TLA+.

> Of course, usability matters too, and there MLTT fails miserably, at least for programming purposes. Higher-order dependent types don't come for free.

Right. Amazon, Intel, Oracle and Microsoft engineers use TLA+ all the time to specify and verify very large distributed systems.

> Prove them by hand. Good old-fashioned brain usage is never going away.

Sure. Except when you have a transactional distributed database algorithm to verify, that's kind of hard. But nobody has the time for deductive proof anyway. TLA+ comes with a model checker. You just press a button and if your assumptions are incorrect you get a counterexample.

> By the way, I'd like to see your full proof, for the exercise you posted a while back. (“Prove that there exists some i such that y[i] = 1 at the end of the whole computation.”) Is it as short and sweet as my proof?

Shorter and sweeter :)

Informally, it goes as follows.

This is our partial correctness property

    PartialCorrectness ≜ AllDone ⇒ ∃ p ∈ ProcSet : y[p] = 1
And this is the inductive invariant:

    Inv ≜ PartialCorrectness
            ∧ ∀ p ∈ ProcSet : pc[p] ≠ "Line1" ⇒ x[p] = 1
We need to show that Inv implies PartialCorrectness (trivial), that Inv holds in the initial state, and that if it holds in any state s, then it holds in any possible consecutive step s'. It's easy to see that it holds in the initial state. Now, let's assume it holds in s, and prove for s'. To make this transition, some process p either executes line 1 or executes line 2. If it executes 1, then PartialCorrectness doesn't change because no new process is done. The second conjunct holds because we've just left line 1 and x has been assigned (by the definition of line 1). If we are currently in line 2, the second conjunct of the invariant doesn't change. By the definition of this action, we'll be done. Here we have two cases, either we set y to 1, or we set y to zero. If we set y to 1, we're done and PartialCorrectness holds. If we set y to 0 then by the assumption of the invariant, the process we depend on must not be done, hence AllDone is false, and PartialCorrectness holds. QED.

Now, the machine-checked formal proof no longer looks so short and sweet because, well, it's completely formal and verified by a checker, so there's a lot of detail that's absent from an informal proof (I'd like to see you formally explain to a proof assistant: respecting the old interleaving, or even your inductive step). TLA+ proofs are written in Lamport's structured "modern" proof style: http://research.microsoft.com/en-us/um/people/lamport/pubs/p...

Also, note that I wanted to show this to someone else, and for pedagogical reasons I wrote the program in an imperative language called PlusCal that is then compiled to TLA+. In TLA+, the "program" is the formulas Init and Next. They are combined to form the temporal formula Spec. A non-modal subformula means that the condition holds in the first state of the behavior. A square operator ("always") means that the subformula holds in all suffixes of the behavior (and a diamond means that it eventually holds). The square brackets with the "vars" subscript has to do with stuttering, which is important for composition and refinement, but I won't explain it here.

Here is the spec with the formal proof: https://www.dropbox.com/s/7i239785jngat2b/Foo.pdf?dl=0


>> Multiple views of the same module, showing or hiding different aspects of it?

> Not that I know of, but as modules are instantiated, you can instantiate them and import, and then export. I can't see this being too useful, though. TLA+ specs are usually small.

So it doesn't scale to entire programs?

> A 3000 line spec (over multiple modules) is considered very big.

A 3000 line program is fairly small.

> Come on. That people write PhDs on algorithms they implement in Python

Presumably they don't actually prove things about Python programs directly. If they do, what formal semantics for Python are they using?

> doesn't mean that Python isn't a high-school-appropriate language.

Formally reasoning about Python programs is a huge pain. Why would anyone want to subject high-schoolers to that?

> Informally, it goes as follows. (proof)

Okay, it was much shorter and sweeter than I expected. Well done.

> TLA+ comes with a model checker. You just press a button and if your assumptions are incorrect you get a counterexample.

What I want is proofs, not just truth values. A proof is a syntactic object that can be analyzed, partially reused, combined with other proofs, etc.


> A 3000 line program is fairly small.

TLA+ is not a programming language. A 3000 line spec easily specifies a 100KLOC program.

> What I want is proofs, not just truth values. A proof is a syntactic object that can be analyzed, partially reused, combined with other proofs, etc.

Sure, but: 1. if you're wrong, a model checker gives you a counterexample; this is invaluable in understanding what's going on and fixing your bugs. It's much easier to prove something when it's true. 2. Unfortunately, the difference between the cost of deductive proof and running a model checker is at least one if not two orders of magnitude. It is common industry experience (reported most recently by Amazon about TLA+), that a model checker provides the best bang-for-the-buck. It actually reduces development costs overall. Deductive proofs are cost-prohibitive for all but the most well-funded, patient and rather modestly sized programs. They're worth it mostly if you're willing to pay 10x to go from, say, 97% certainty to 99.9%.


> TLA+ is not a programming language. A 3000 line spec easily specifies a 100KLOC program.

So, which is it?

(0) After you verify a TLA+ specification, you still have the rather large task of making sure your actual program satisfies the specification. In this case, why should I bother verifying something that isn't the final program?

(1) The program can be generated from the specification. In this case, the TLA+ specification is a program, although it's written in an unusually high-level language.

> Deductive proofs are cost-prohibitive for all but the most well-funded, patient and rather modestly sized programs.

It would be a lot more cost-effective if we used languages, libraries and even operating systems geared towards formal verification.


> After you verify a TLA+ specification, you still have the rather large task of making sure your actual program satisfies the specification. In this case, why should I bother verifying something that isn't the final program?

Theoretically, you can verify the program itself in TLA+, in a manner similar to how seL4 verified their code. You compile your final code to TLA+ to produce a low-level spec that captures the actual semantics of your progam, and check that it is a refinement of your high-level spec. Research teams have done this for Java and C (http://tla2014.loria.fr/slides/methni.pdf). The problem is that it doesn't scale, and neither does any other form of end-to-end proof. Humanity has never been able to formally prove an actual program that is anywhere near what we would call "large".

The reason you'd bother doing it is that 1. unlike end-to-end verification it is not only affordable, but also reduces costs, and 2. it actually works in uncovering the most costly problems, as verified by years of experience in industry on large projects.

> The program can be generated from the specification.

Ah, that is an interesting research question. Obviously in some cases a program could be generated. The spec would need to be deterministic for one (or nondeterministic in a way that makes sense for an actual program, like when representing multiple processes), computable (you can easily specify a program that takes an arbitrary program as input and says, "in the first step, let x be equal to TRUE if the input program terminates on the input 13, and FALSE otherwise"), and pretty low level because you can easily write things that would be extremely inefficient to run (like "let x be the smallest prime that's the sum of three squares" or something). Lamport believes that it may be possible (for a properly amenable spec) using machine learning. But it's not like compiling any high-level programming language.

> It would be a lot more cost-effective if we used languages, libraries and even operating systems geared towards formal verification.

Research doesn't seem to indicate that it would be, or rather, even if it is a lot more cost effective, it is still beyond affordable for all but high-assurance software, and high-assurance software already makes use of such languages (SCADE and SPARK work very well). But even in high-assurance, the industry doesn't bother with deductive proofs, as model checkers are automatic and work where they're most needed (SCADE) and SMT solvers are also reported to be effective for very restrictive languages (SPARK, but I've never tried it). Manual deductive proofs are used sparingly, to tie together results from automatic verification processes: http://events.inf.ed.ac.uk/Milner2012/J_Harrison-html5-mp4.h...


> Humanity has never been able to formally prove an actual program that is anywhere near what we would call "large".

And we probably never will. AFAICT, our only hope is to build large software out of small components that can only be composed in obviously correct ways, so that the bulk of the verification effort is spent on each individual component, rather than on their interactions (a hopeless task). This is reflected in my ML programming style: A function exported by a module must not impose any preconditions on its argument that can't be enforced by the module itself. On the other hand, at a higher-level, a functor (parameterized module) may impose nontrivial preconditions on its argument.

> Lamport believes that it may be possible (for a properly amenable spec) using machine learning.

That is extremely unpalatable to me. I'm more than perfectly willing to use a different formalism (like TLA+, although I'd like better integration with actual programming languages), but not to resort to massive trial and error (which is what machine learning ultimately boils down to) to generate programs.

---

Anyway. What I can take from this whole discussion is:

(0) As your original blog post says, there are two ways to look at computation. One view simply pays attention to the physically realizable aspects of computation: nature certainly doesn't care that I want to use bits (or more generally, symbols from a finite alphabet) to represent values from a problem domain, or that this bit means one thing but that bit means another thing. The other view pays attention to how computation can be structured and given useful meanings: the (non-C) programmer in general doesn't care that the same bit pattern can be used to represent different values of different types.

(1) If I may use a physical analogy, writing purely functional programs and using equational reasoning on program terms as the basis for verification are akin to trying to solve every physics problem by appeal to conservation laws. The two main drawbacks of this approach are that not every physical system is conservative (we want effects, for which monads provide a solution) and that it's useless for reasoning about time (for which there is no solution).

(2) On the other hand, the verification style you advocate, by looking directly at state transitions, is akin to solving physics problems by appealing to the laws of motion, suitably expressed as differential equations. The two main drawbacks of this approach is that it may require lengthier computations (which is less of a problem if the computations can be delegated to a computer) and that it doesn't gracefully handle computations whose state is split across several modules, none of which should inspect the state of the others (for which AFAICT there is no solution).

(3) Do we have an equivalent of Lagrangian or Hamiltonian mechanics, to be able to relate both styles?


> AFAICT, our only hope is to build large software out of small components that can only be composed in obviously correct ways, so that the bulk of the verification effort is spent on each individual component, rather than on their interactions (a hopeless task).

But Schnoebelen's theorem proves this is impossible. (in my post: http://blog.paralleluniverse.co/2016/07/23/correctness-and-c...)

The post also has an example of that: the foo/bar program, AKA Goldbach's program. bar is correct (i.e., it correctly decides whether its parameter is a prime -- maybe there's a bug in my implementation, but we can assume it's correct). This leaves foo. The first two lines are trivial. This leaves the next three lines (I guess that in Haskell it would be one line). I don't see how you can get more modular and composable than that! And yet, we haven't been able to prove the behavior of those three lines for 300 years, even though they're nothing but a simple decrementing for-loop, a single subtraction operation and a single disjunction! Why? Because bar (i.e. `prime?`), in spite of being correct, has a very complex behavior; it can be reasoned in some contexts, but not in this one.

In any event, every attempted project decomposed the problem as much as possible. The task still proved prohibitively expensive (although maybe not for high-assurance development!).

> The other view pays attention to how computation can be structured and given useful meanings

Almost, but not exactly. The values in machines matter greatly to the machine. The language models are human centric, i.e., they compute with values that matter to humans. But the important point of the post is that the two views are incomparable, as the second requires significantly higher work, i.e., expressing a computation in a way that's meaningful to humans is in itself a significant computational effort.

>and that it's useless for reasoning about time (for which there is no solution).

And that it doesn't work on real world systems. You can reason equationally about the gravity equation all you like, and compose as many instances as you like. But that wouldn't help you to decide interesting properties.

> On the other hand, the verification style you advocate, by looking directly at state transitions

I don't have an aesthetic reason to advocate this style. I'm a pragmatist, so I chose it and advocate it because it is the only style that has so far proven to work. That you can pick it up in a week or two instead of six months, and that it supports automated tools that make verification affordable are also great pragmatic advantages. If you show me another one that works as well on real-world systems, I'll be happy to use it.

> and that it doesn't gracefully handle computations whose state is split across several modules, none of which should inspect the state of the others

I'm not sure how you get that. TLA+ handles composition very nicely. There are operators that specifically hide variables, called temporal existential quantification. The formula basically says, "there exists a behavior over these (hidden) state variables, such that those other (observable) variables satisfy X".

> Do we have an equivalent of Lagrangian or Hamiltonian mechanics, to be able to relate both styles?

That's above my pay-grade, but TLA can be embedded in FP (in fact, the standard Isabelle distribution contains an embedding of TLA+, as TLA+ uses Isabelle as a backend prover, and I think there are embedings in Coq, too[1]), and FP can be embedded in TLA by conisdering extensional equality only. It's mostly a matter of convenience, possibly for different classes of problems.

The thing is that deductive proofs of any kind so far seem untenable in all but extreme circumstances. The problem with approaches that rely on types (and this may be orthogonal to FP, maybe not) to prove interesting logical properties are more rigid, and less able to accommodate more partial forms of verification. Maybe there's work on that, though. As a result of that, the TLA approach (in a general sense) is simply more battle tested, and with better real-world results.

[1]: http://ucsd-pl.github.io/veridrone/papers/memocode2015.pdf


> But any proof or any other kind of verification not carried out or verified by a computer is not formal.

Very suprised to hear this from someone who wrote this very article on the nature of computation!


It was an informal statement :) In any event I said deduction rules or a computer, by which I meant some tool that can reduce the rote effort, and that we can be sure is itself correct.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: