> No, I'm not. I'm talking about math.Good. In that case, verifying that a program conforms to spec in the general case is of as much computational complexity no matter when it's carried out. This is proven and well known. For an overview, you can see my post here: http://blog.paralleluniverse.co/2016/07/23/correctness-and-c...In short, the reason for that is that the every computational problem with a computable upper bound on complexity can be efficiently reduced to verification. The complexity required to verify is a function of the spec, and can be made arbitrarily hard.> The amount of mental gymnastics required to produce this statement (not to mention expect someone else to believe it) completely baffles me.OK, so let me use another analogy. That you can wind up a wind-up toy before I enter the room doesn't mean that just because you hid that from me, the toy requires any less external energy to run than when you'd wind it at any other time. Discussions of ergonomic ways to wind the toy (with a larger lever perhaps) may be interesting, but they're separate from this basic fact.Verification is a computational problem, and one of a certain complexity. That complexity must be paid, no matter what, no matter when. The knowledge that a certain program has a certain property has a price-tag, and there are no discounts. Because all computational problems efficiently reduce to verification, if you believe you've found a loophole, I could use it to solve much bigger problems than ensuring that your recipe application is correct.> Nature doesn't do “verification”, because there's no specification in the first place. Nature just evolves a variety of systems, and those that aren't fit enough to survive, well, just die.You're wrong because, again, you're blind to computational complexity and to computational processes that aren't man-made programs. What you described is precisely how nature computes. That things that aren't fit enough to survive "just die" is a verification process by a genetic algorithm (that requires tremendous time and space complexity) to conform with the spec for genes -- i.e. life's "software" -- which is: reproduce.Computational complexity results are used in the study of biological systems. Look at the work by Leslie Valiant (Turing award) or Stuart Kauffman (https://en.wikipedia.org/wiki/Stuart_Kauffman). Incidentally, TLA+ can also be used to model systems such as metabolic pathways, as those are very easily described as nondeterministic state machines: http://www.cosbi.eu/research/publications?pdf=5437> Also, the evolution of biological systems happens way too slowly to use it as inspiration for how to design computational systems for our own purposes.Maybe, but TOC doesn't study how to design computational systems for our own purposes. It's pure science. If its discoveries turn out to be useful, then that's a happy coincidence.For that matter, neither does most of TOP for the past couple of decades or so. If you do math to study a system designed for use by a computationally-complex collaborator (the programmer in this case), you must study the collaborator as well. TOP shows no interest in such empirical study, and has therefore been utterly unable to prove that the approaches it advocates actually help write very large complex software systems; indeed it has been even unwilling to test this hypothesis, not even once.

 > In that case, verifying that a program conforms to spec in the general case is of as much computational complexity no matter when it's carried out.Then, as Dijkstra said, don't focus on the general case: “So, instead of trying to devise methods for establishing the correctness of arbitrary, given programs, we are now looking for the subclass of "intellectually manageable programs", which can be understood and for which we can justify without excessive amounts of reasoning our belief in their proper operation under all circumstances.” https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...> The knowledge that a certain program has a certain property has a price-tag, and there are no discounts.So what? If you're designing programs out of reusable components and you don't throw away their proofs of correctness, then much of the cost has already been paid by someone else before you even start.> You're wrong because, again, you're blind to computational complexity and to computational processes that aren't man-made programs. What you described is precisely how nature computes. That things that aren't fit enough to survive "just die" is a verification process by a genetic algorithm (that requires tremendous time and space complexity) to conform with the spec for genes -- i.e. life's "software" -- which is: reproduce.This still doesn't make sense. No biological system is “correct” in any meaningful sense. Correctness is always relative to a specification, and, even if your specification is as vague as “survive”, what's happening in nature is that some life forms are lucky enough to be fit for their current circumstances, not all possible ones.
 > Then, as Dijkstra said, don't focus on the general caseExactly, except that since Dijkstra gave that talk (early seventies) we have learned some important results in computational complexity. The bottom line is that we know that it is impossible to create a generally useful programming language (even not Turing-complete; even a finite-state machine) where every program is relatively cheap to verify (in 2000 it was proven that language abstractions can't significantly reduce verification cost even though it would have been reasonable to believe that they could; that result is discussed in my linked post). Whatever language is chosen, only a small subset of programs written in that language will be verifiable. Identifying that subset requires extensive empirical research. This is done in the software verification community; unfortunately the PL community is reluctant to conduct empirical research.> much of the cost has already been paid by someone else before you even start.Not true. See the Schnoebelen results quoted in my post that I linked to. Composing proofs of program correctness requires proof effort that is intractable in the complexity of each component, not in the number of components. This was proven in 2005. In technical terms, verification isn't Fixed-Parameter Tractable, or FPT, a relatively modern[1] complexity class created to incorporate problems that are intractable in the general case but may be tractable in practice. In simple terms, that each of your components has been proven correct still means that the difficulty of proving the composite's correctness is an intractable function (exponential at least) in the complexity of each component. Of course, theoretically it's better than nothing, but not by enough.> is that some life forms are lucky enough to be fit for their current circumstances, not all possible ones.That is only if you view each life form as the computational system. However, that genetic algorithm I was referring to is conducted by all living systems as a single computational system.
 > Whatever language is chosen, only a small subset of programs written in that language will be verifiable.You're still focusing on arbitrary programs. Just don't write arbitrary programs. Let your intended proof tell you what program you need to write.> Composing proofs of program correctness requires proof effort that is intractable in the complexity of each component, not in the number of components.So keep each component small.> However, that genetic algorithm I was referring to is conducted by all living systems as a single computational system.This is still not verification. It's just testing on a much larger scale.
 > Just don't write arbitrary programs. Let your intended proof tell you what program you need to write.Of course, but easier said than done. In other words, we haven't figured out how to do this yet. So far, the cost of proof exceeds what we're willing to pay for what needs proving. OTOH, researchers often focus not on the programs we need to write, but on the properties they can prove (searching under the lamppost, so to speak); this is the best way of getting results. Some of them then try to convince themselves and others that the properties they can prove of the programs they can prove them on are really those that need proving, and that the cost is affordable. Sadly, this wish hasn't been shown to be true just yet.Look, I suggest that you try to write a large-scale formally verified program. Forget large; small and non-trivial will do. And I don't mean "verified" as in it type-checks in Haskell, because the Haskell type system can only prove extremely weak properties, but actually verified with respect to interesting logical properties. There is a reason why no one in the history of computing has never come close to achieving such a feat. Interestingly, you find that it is the people who actually do formal verification (you can watch talks by Xavier Leroy, who comes to verification from the PLT direction, or statements made by the people who worked on seL4) are usually the ones most restrained in discussing its possibilities. It can work, depending on what exactly you want to do, but requires a great investment, and end-to-end verification is possible only on very small programs (like seL4 or CompCert) and only at a prohibitive cost, that makes the process worth it only for a handful of very special projects.Let me put it another way: so far, practice has shown to be very much in line with the difficulties predicted by the theory. Even Schnoebelen says, when discussing the result that language abstractions don't help, that, while true for the worst-case, "has been observed in many instances, ranging from reachability problems to equivalence problems, and can now be called an empirical fact"Personally, I do formal verification extensively on large projects (much larger than CompCert or seL4), but I don't even attempt end-to-end or manual deductive proof because the cost of those becomes very clear very fast once you start trying these things on big systems. These observations have been reported, AFAIK, by every single industry or research unit that carries out significant software verification efforts.Formal verification can and does work. But in order for it to work, you need to approach it humbly and know the limitations. Also, you need to know where best to spend your efforts, which is a bit of an art. When you do, you discover that maybe you've not found a way to consistently churn out bug-free programs, but that the quality of your programs has improved considerably.Just to give you a taste of program proof, this is a simple exercise by Leslie Lamport:Consider N processes numbered from 0 through N-1, in which each process i executes`````` x[i] := 1; y[i] := x[(i-1) mod N] `````` and stops, where each x[i] initially equals 0. (The reads and writes of each x[i] are assumed to be atomic.) This algorithm satisfies the following property: after every process has stopped, y[i] equals 1 for at least one process i. It is easy to see that the algorithm satisfies this property; the last process i to write y[i] must set it to 1… The algorithm satisfies this property because it maintains an inductive invariant. Do you know what that invariant is?If you can produce a machine-checked proof of this 2-line program that has no loops or recursion in under one hour, hats off to you. The difficulty of the problem grows super-linearly with the size of the code. People writing real programs, even those intended for verification, write programs that are several orders of magnitude more complex than this toy example.> This is still not verification. It's just testing on a much larger scale.It's verification (by an algorithm that is not common in software). The cost of verification comes from the knowledge you wish to obtain, not how you obtain it. Computational complexity doesn't care if you know something by proving it using FOL and natural deduction, dependent types, or by exhaustively trying each input. BTW, as far as the theory is concerned, a test is verification of a more specific property (e.g., the property that your program, when given input x, would, say emit output y followed by output z).
 > Look, I suggest that you simply try to write a large-scale formally verified program. And I don't mean "verified" as in it type-checks in Haskell, because the Haskell type system can only prove extremely weak properties, but actually verified with respect to interesting logical properties.This is exactly what I'm doing: a library of data structures and algorithms, where the verification effort is suitably split between the language and me. The language (in this case, Standard ML) ensures that abstract types can only be used abstractly, and I prove (using paper and pen) that the implementations of these abstract types are correct. A primary concern is simplicity, so if a module has too many invariants to manually prove, it is a natural candidate for splitting into smaller modules.I'm not saying that Standard ML is the ideal language for this kind of thing, e.g., some limited form of dependent typing would be very helpful (as long as it doesn't compromise type inference), but it's much better than anything else I have tried (including Haskell).> Do you know what that invariant is?The invariant to maintain is “there exists some distinguished index i such that x[i] is written by process i before it is read by process (i+1) mod N”. If N = 1, then i = 0. If N = N' + 1, consider some valid interleaving of N' processes, with distinguished index i', and then consider the effect of interleaving one more process.> The cost of verification comes from the knowledge you wish to obtain, not how you obtain it.In this case, what I wish to ascertain is whether a living organism is fit to survive in a range of circumstances, most of which will never actually happen.> BTW, as far as the theory is concerned, a test is verification of a more specific property (e.g., the property that your program, when given input x, would, say emit output y followed by output z).Tests can verify atomic propositions. They can't verify that a predicate holds for all inhabitants of an infinite set.
 > a library of data structures and algorithmsThat's excellent, but I didn't say "program" for nothing. Programs are usually interactive, with temporal components. But if data structures are your thing, try writing and verifying, say a concurrent R-Tree.> The only invariant I can see is that “y[i] has been assigned to” implies “x[i] has been assigned to”. From this, we can see that “all y[i]s have been assigned to” implies “all x[i]s have been assigned to”. But this doesn't involve any induction on N so far.Your reasoning is OK, but it shows how much you're thinking in terms of the language rather than more abstractly. It's also not enough to prove the property.The crucial state here is the hidden state in the program counter. The induction is not on the number of processes, but on the steps of the algorithm. Each step executes one line of one process (chosen nondeterministically). The invariant, expressed in TLA+ is:`````` Inv ≜ (∀ p ∈ 0..N-1 : Done(p)) ⇒ ∃ p ∈ 0..N-1 : y[p] = 1 \* (1) ∧ ∀ p ∈ 0..N-1 : pc[p] ≠ "L1" ⇒ x[p] = 1 \* (2) `````` (1): this is the simple property invariant(2): this is the inductive bit; `pc` models the program counterThis is instantly verified with a model checker. A machine-checked proof is about 60 lines long (I try to avoid formal proofs as they are extremely laborious, but I guess that a more experinced machine-prover than me could do it shorter).> Tests can verify atomic propositions. They can't verify that a predicate holds for all inhabitants of an infinite set.Exactly, and each of those atomic propositions has a cost of verification. Moreover, we can prove (again, it's in my post) that we can't in general generalize from one input to another, so the cost of verification over an infinite number of inputs is infinite in the general case. Of course, it can be done in practice, but we can tell exactly how hard that would be: the cost of verification is proportional to the number of states in the algorithm -- the minimal algorithm that satisfies the full spec; not necessarily your actual program. Properties that are easily checked by simple type systems like those of Haskell and SML can always be satisfied by much simpler programs (in fact, a FSM or, if polymorphic recursion is used, then a PDA).
 > and the former still writes to the common variable before the latter reads it.This is not true. At no point in your induction is it proven that it is always the last process that writes x before the first reads it, nor is it necessarily the case.> then at least one of the two following things must happen:This, too, is untrue. The events may occur in the precise opposite order. If this were true, then it would have been an invariant that either process N-1 or process 0 would necessarily write 1, but this is not the case.> No analysis of the program counter was required, and as a result we get a far more beautiful proof. You used a sledgehammer, I used a sculptor's chisel.Excellent, except that my "sledgehammer" is a proof technique that's been proven complete, i.e. if a program property is provable, it is provable via an inductive invariance, which is amenable to mechanical proof checking (and, indeed, I mechanically checked it in TLA+). Good luck getting your proof (which is called "behavioral") -- even if it were correct -- to verify formally.Chord is a distributed algorithm that was published in 2001 and quickly became popular. It got thousands of citations and plenty of industry implementations. In 2011 it even won a "test of time award". Its authors said, "Three features that distinguish Chord from many peer-to-peer lookup protocols are its simplicity, provable correctness, and provable performance". They, of course, provided proofs. However, in 2012, Pamela Zave, using a model checker, showed that "of the seven properties claimed invariant of the original version, not one is actually an invariant". The point is that when algorithms are concerned, "proofs" are often not proofs.> I never claimed that I could delegate the entirety of the formal verification task to a type system like that of Haskell or SML.Just to be sure: proof with pen and paper isn't formal verification. It is pretty the very definition of informal verification. Formal proof means using natural deduction or sequent calculus. This is too hard, so a machine-checked proof is acceptable. But any proof or any other kind of verification not carried out or verified by a computer is not formal.
 > I didn't say anything about “the last process”, thoughOf course. Sorry. For some reason my mind read i as N. It was late at night... I'm convinced the proof follows from your invariant, but the inductive step is a bit tricky, as i is not the same for every interleaving. My proof is simpler, as the invariant holds for -- and more importantly, maintained by -- every step made by the system, and there's no need to consider various interleavings. It also doesn't require induction on N.> The formality comes from the use of explicitly given deduction rules, not from the use of a computer.That's true and I said as much, but doing so is extremely laborious. Do you actually use natural deduction?
 > but the inductive step is a bit tricky, as i is not the same for every interleavingAh, the joys of nondeterminism! It gives “there exists” an, ahem, “interesting” meaning. This is why I only do imperative and concurrent programming when it's absolutely necessary (e.g., when no functional and deterministic program could be asymptotically as efficient).> My proof is simpler, as the invariant holds for -- and more importantly, maintained by -- every step made by the systemMy invariant is prettier. It doesn't need to be maintained “throughout the computation”, because it's a static assertion about the program: “regardless of the interleaving of operations, there exists some i such that...” Almost invariably, the program is a much simpler mathematical object than the computations it gives rise to. Dijkstra noted “that our intellectual powers are rather geared to master static relations and that our powers to visualize processes evolving in time are relatively poorly developed.” https://www.cs.utexas.edu/users/EWD/transcriptions/EWD02xx/E...> It also doesn't require induction on N.Induction on the program counter is even harder! The inductive case of my proof doesn't even look at the entire computation: it looks at most at three processes at a time. It's literally less work than what you propose.> Do you actually use natural deduction?In the design of type systems, yes. If I want to prove something by hand, hell no. Natural deduction is much nicer to prove things about than it is to prove things in.
 > You've been told that concurrency and imperative constructs are "dynamic" and are hard to reason about. That's just because FP is bad at reasoning about them, and prefers syntactic decomposition.The hard fact is that, while you can reason about purely functional programs using little more than high-school algebra, reasoning about imperative programs requires bringing in the heavy apparatus of logic right from the start, because imperative programs denote predicate transformers. Distinguishing a class of computations that can be expressed as functional programs is distinguishing a class of computations whose meaning can be established with simpler tools, and thus obviously a beneficial activity.> There is nothing dynamic about a proper, pure description of what to you seems a messy dynamic process. The entire program is just a single, pure semantically unambiguous first-order predicate.Last time I checked, imperative programs don't denote first-order predicates, but rather predicate transformers. Quite different things. A predicate transformer is a higher-order mathematical object that relates two (possibly first-order) predicates - a precondition and a postcondition.> You don't reason about each computation, but about what you'd call the "program", which in TLA+ is called the algorithm, as not to unnecessarily confuse with ideas that are related to syntax.Your use of induction on the program counter reveals that you're just using the program as a proxy for a class of computations.> The algorithm is just a boolean predicate,Um, no. Predicates don't express computation.> and my proof is that that predicate implies another (which implies the property).Would you have been able to produce your proof entirely by yourself? That is, unaided by a computer. I produced my proof using nothing other than my brain.
 > TLA+ is ... as powerful as CoqThis is an astonishing claim. Can you formulate measure theory in TLA+?
 1. Why would that be astonishing? The "+" in TLA+ is FOL + ZFC, the same foundation used for virtually all of math (except some branches of logic).2. I am not a logician so I don't know of the formal particulars of measure theory, of any limitation ZFC has in formulating it, or of any TLA+ specific limitations in that regard. However, TLA+ is not a general proof assistant (it can certainly be used to prove general math theorems, it just wasn't designed for that task), but a language for specifying and reasoning about algorithms and large software systems. In that, it is as powerful as Coq.3. There are completeness proofs for TLA, i.e., that anything that you can state about a computation, you can state in TLA, and anything you can prove about a computation you can prove in TLA. Now, this isn't quite accurate: TLA is a linear temporal logic, and there are some things that you can state and prove in a branching-time temporal logics that you can't in a linear logic, nevertheless, as Moshe Vardi says in Branching vs. Linear Time: Final Showdown[1], it makes more sense to prefer linear time logic because empirical evidence suggests that it's far more useful. There is one glaring downside: TLA is unable to reason about probabilistic algorithms[2]. However, it is possible to embed reasoning about computation in the + part of TLA (i.e. ZFC) and reason about probabilistic algorithms in that way, and I assume that this is how it would be done in Coq, too. So you could do everything in TLA+ without the TLA bit, but that would be inconvenient, as the natural representation of algorithms in TLA is especially convenient and the main justification for the approach. So I would be very interested in seeing an extension of TLA meant to deal with probabilistic algorithms.4. In the end, what matters isn't theoretical strength (by which Coq and TLA+ are equivalent in anything that concerns algorithms), but convenience. I am sure that there are some problems that are easier to formulate and reason in Coq than in TLA+. But the fact that no one has ever used Coq for a large software system, and that the handful of use cases in industry were always done in cooperation with academic experts while TLA+ is used in very large projects by "plain" developers speaks volumes to the strength of the approach.[2]: You can describe any probabilistic algorithm in TLA (i.e., state that an algorithm performs a certain step with a certain profitability). What you can't do is then state a probabilistic theorem about that algorithm, as "in 70% of the cases, the algorithm gives the right answer".
 Because Coq was designed (as I understand it) to help people prove theorems about mathematics, not about algorithms. If TLA+ -- a language designed to prove theorems about algorithms, not about mathematics -- is actually better for that purpose then a lot of people have wasted a lot of time.Sure, algorithms are mathematical and much of mathematics can be recast as algorithmic, but still they have quite a different flavour. If a tool designed to help with one is actually the best tool to to help with the other, then that is astonishing.
 > is actually better for that purpose then a lot of people have wasted a lot of time.I don't think it's better, but it is as mathematically powerful, certainly when within the realms of ordinary math. As it is not designed for this task, I would assume that Coq would be more convenient.Coq is a full decade older than TLA+ (and Lamport mentions Coq when he discusses the design of TLA+ as something to avoid because of the arcane, virtually unknown math; even the TLA logic is 5 years younger than Coq), and even Coq isn't the first theorem prover, nor the most popular. It wasn't even designed to be the "best" math prover, but (I would imagine) to be a tool for type theorists to see how well how their ideas work. After all, Isabelle (which preceded Coq and serves as a TLAPS backend) is very good at "general" logic (not type theory).BTW, TLA+'s proof languages looks very different from Coq, and is declarative in nature. It is based on Lamport's "how to write a proof", and, as I understand, Isabelle's Isar is also inspired by that style[1], so TLAPS looks a lot more like Isar.Also, remember that the very concept of machine-checked formal proofs of mathematical theorems are currently used mostly by people who design those provers and hobbyists. Few mathematicians are keen on the approach (although I think they should be). Proof of software correctness, OTOH, is something that a larger group is interested in.But back to my main claim: TLA+ lets you express specifications about a program that are as rich as anything Coq would let you do. It was designed by people who are experts in software specification rather than by type-theorists. I would guess that Coq may be a better for some things (maybe proving general theorems about some classes of programs), but when it comes to actually reasoning about large software in the industry, TLA+ and its older kin (Z, B-method and more) are almost the only game in town. Isabelle is also ahead of Coq in verifying real-world systems.
 > It is immediately obvious that TLA+ is simpler than SML, yet as powerful as CoqWhat facilities for data abstraction does TLA+ have? How do I hide the representational choices of a module whose internal implementation I'm not currently concerned with?> it is TLA+ that requires little more than high school math. You can reason with it about any kind of algorithm you like, using the same simple, familiar math (FOL, ZFFirst-order logic is not “high-school math”. You can't expect most high schoolers to know how to correctly manipulate logical quantifiers. (Of course, a competent programmer should know, but that doesn't mean one should force themselves to use a more complicated tool when a simpler one is available.) And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”. Have you ever seen the crazy things that Harvey Friedman does with large cardinals?> plus three temporal operators, of which only one is really used for safety properties).That's the thing: I want to reason in a time-independent manner as much as possible. Time-invariant reasoning is simpler and mathematically more beautiful than exploring state spaces alongside with your processes.> In TLA+ there is no such thing as an imperative or a functional program. (...) E.g., in TLA+, all algorithms (...) denote a set of behaviors, where each behavior is an infinite sequence of states.Then in TLA+ all algorithms are expressed as imperative programs.> Think of algorithms as abstract state machines, and reasoning becomes -- if not simple -- then as simple as possible (today, at least).By virtue of exploring large state spaces?> But the proof works on the representation of the program, which is just a logical predicate.Which is it?(0) Two programs that compute the same result, but have different efficiency characteristics, are represented as logically (not just syntactically) different predicates.(1) You don't distinguish between programs that compute the same result, but have different efficiency characteristics.The former is unusable as a basis for reasoning about extensional equality. The latter is unusable as a basis for reasoning about complexity.> Yes. First I thought of it, then I wrote it, then I added details until it checked.Checked by whom?> But let me tell you: if a proof assistant would ever make proofs easier than doing them in your head, then I'd be very happy. Usually they make things harder because they need lots of detail.I don't disagree with this one.> But, unfortunately, proof assistants don't help you uncover a proof. Model checkers can do that sometimes.Type inference reconstructs type derivations all of the time. Reliably. Of course, those type derivations are all boring routine, but it means that human programmers themselves aren't burdened with it.
 > What facilities for data abstraction does TLA+ have?The full strengths of set theory (sets, functions), plus sugar for sequences and records.> How do I hide the representational choices of a module I'm not currently concerned with?TLA+ has a simple module system. Definitions can be local or public.> First-order logic is not “high-school math”. You can't expect most high schoolers to know how to correctly manipulate logical quantifiers.Depends on the country I guess. But OK, first year college.> And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”.None of that has any bearing on specifying algorithms, though.> That's the thing: I want to reason in a time-independent manner as much as possible. Time-invariant reasoning is simpler and mathematically more beautiful than exploring state spaces alongside with your processes.That's only because you're used to thinking in a way that is tied to languages. I have yet to encounter a powerful type system as mathematically elegant as TLA+.> Then in TLA+ all algorithms are expressed as imperative programs.No. They're expressed as nondeterministic abstract state machines, written as a first order logic predicate. Each step of the machine is arbitrarily complex. You can specify a program in TLA+ that in each step solves a difficult SAT problem, or even the halting problem.Imperative is an attribute of a programming language with programming language semantics; TLA+ is ordinary math and doesn't have any of those attributes. It's like saying that calculus is imperative. However, it certainly isn't functional because it doesn't identify the concept of a function with the concept of a computation.I guess you could specify a functional program as a machine that computes its result in a single state (although I wouldn't). You could then model, I don't know, LLVM machine code, and shoe that the machine code algorithm is a refinement of the functional program.> By virtue of exploring large state spaces?I don't understand the question. The reasoning -- as any other formal reasoning -- is done by manipulating logical formulas. The state space is a property of the model, not of the logic.> Checked by whom?TLAPS, the TLA+ proof system.> Type inference reconstructs type derivations all of the time.It's done reliably because the properties they represent are kept very computationally simple. In TLAPS uses SMT solvers to reduce the effort, and where a type would be inferred, so would a proposition. The problem is with more interesting propositions.
 > The full strengths of set theory (sets, functions), plus sugar for sequences and records.Set theory doesn't have any facilities for data abstraction. If anything, it's the antithesis of data abstraction: every mathematical object (internal to your set theory of choice) is represented as a set (or urelement, but those are even weirder) somehow, even if the encoding is irrelevant for your purposes.> TLA+ has a simple module system.Hierarchical modules? Parameterized module families? Multiple views of the same module, showing or hiding different aspects of it? I use this stuff everyday not only to keep my programs modular, but also to keep their proofs of correctness modular.>> And axiomatic set theories (such as ZF), which are built on top of first-order logic, are definitely not “high-school math”.> None of that has any bearing on specifying algorithms, though.You were the one who originally brought up ZF, describing it as “high-school math”, so I was just refuting that.> That's only because you're used to thinking in a way that is tied to languages. I have yet to encounter a powerful type system as mathematically elegant as TLA+.If elegance is your only criterion, it's very hard to beat Martin-Löf type theory. Of course, usability matters too, and there MLTT fails miserably, at least for programming purposes. Higher-order dependent types don't come for free.> It's done reliably because the properties they represent are kept very computationally simple.That's a feature. It helps you delimit the routine stuff (and can be reliably verified mechanically, and should be most of your program, if it's designed sensibly) from what requires creativity to prove.That being said, of course, existing type systems don't automatically verify everything that could be considered “routine”, so this is an area where further improvement is necessary.> The problem is with more interesting propositions.Prove them by hand. Good old-fashioned brain usage is never going away.---By the way, I'd like to see your full proof, for the exercise you posted a while back. (“Prove that there exists some i such that y[i] = 1 at the end of the whole computation.”) Is it as short and sweet as my proof?
 > every mathematical object (internal to your set theory of choice) is represented as a set (or urelement, but those are even weirder) somehow, even if the encoding is irrelevant for your purposes.Yes, but it's completely opaque to a spec. There's no way of obtaining the representation of, say, a function. But let's not get into the debate of set theory pros and cons, because it really makes no difference.> Hierarchical modules?Yes.> Parameterized module families?I don't know what you mean, but you can instantiate the same module multiple times with different parameters.> Multiple views of the same module, showing or hiding different aspects of it?Not that I know of, but as modules are instantiated, you can instantiate them and import, and then export. I can't see this being too useful, though. TLA+ specs are usually small. A 3000 line spec (over multiple modules) is considered very big.> You were the one who originally brought up ZF, describing it as “high-school math”, so I was just refuting that.Those parts of ZF that are used in practice when writing a spec are high school -- sorrt, first-year college -- math. Come on. That people write PhDs on algorithms they implement in Python doesn't mean that Python isn't a high-school-appropriate language.> If elegance is your only criterion, it's very hard to beat Martin-Löf type theory.Actually, it's extremely easy, especially where algorithms are concerned. In TLA, deterministic, nondeterministic, parallel, concurrent, neural, quantum algorithms are all simple special cases, and even better: properties of algorithms and algorithms are the same object. Think of it as type theory with nothing but types, only using ordinary familiar math. That in any type theory a type is a different object from its inhabitants (even if types and programs can use the same terms), already makes it much more complicated than TLA+.> Of course, usability matters too, and there MLTT fails miserably, at least for programming purposes. Higher-order dependent types don't come for free.Right. Amazon, Intel, Oracle and Microsoft engineers use TLA+ all the time to specify and verify very large distributed systems.> Prove them by hand. Good old-fashioned brain usage is never going away.Sure. Except when you have a transactional distributed database algorithm to verify, that's kind of hard. But nobody has the time for deductive proof anyway. TLA+ comes with a model checker. You just press a button and if your assumptions are incorrect you get a counterexample.> By the way, I'd like to see your full proof, for the exercise you posted a while back. (“Prove that there exists some i such that y[i] = 1 at the end of the whole computation.”) Is it as short and sweet as my proof?Shorter and sweeter :)Informally, it goes as follows.This is our partial correctness property`````` PartialCorrectness ≜ AllDone ⇒ ∃ p ∈ ProcSet : y[p] = 1 `````` And this is the inductive invariant:`````` Inv ≜ PartialCorrectness ∧ ∀ p ∈ ProcSet : pc[p] ≠ "Line1" ⇒ x[p] = 1 `````` We need to show that Inv implies PartialCorrectness (trivial), that Inv holds in the initial state, and that if it holds in any state s, then it holds in any possible consecutive step s'. It's easy to see that it holds in the initial state. Now, let's assume it holds in s, and prove for s'. To make this transition, some process p either executes line 1 or executes line 2. If it executes 1, then PartialCorrectness doesn't change because no new process is done. The second conjunct holds because we've just left line 1 and x has been assigned (by the definition of line 1). If we are currently in line 2, the second conjunct of the invariant doesn't change. By the definition of this action, we'll be done. Here we have two cases, either we set y to 1, or we set y to zero. If we set y to 1, we're done and PartialCorrectness holds. If we set y to 0 then by the assumption of the invariant, the process we depend on must not be done, hence AllDone is false, and PartialCorrectness holds. QED.Now, the machine-checked formal proof no longer looks so short and sweet because, well, it's completely formal and verified by a checker, so there's a lot of detail that's absent from an informal proof (I'd like to see you formally explain to a proof assistant: respecting the old interleaving, or even your inductive step). TLA+ proofs are written in Lamport's structured "modern" proof style: http://research.microsoft.com/en-us/um/people/lamport/pubs/p...Also, note that I wanted to show this to someone else, and for pedagogical reasons I wrote the program in an imperative language called PlusCal that is then compiled to TLA+. In TLA+, the "program" is the formulas Init and Next. They are combined to form the temporal formula Spec. A non-modal subformula means that the condition holds in the first state of the behavior. A square operator ("always") means that the subformula holds in all suffixes of the behavior (and a diamond means that it eventually holds). The square brackets with the "vars" subscript has to do with stuttering, which is important for composition and refinement, but I won't explain it here.Here is the spec with the formal proof: https://www.dropbox.com/s/7i239785jngat2b/Foo.pdf?dl=0
 >> Multiple views of the same module, showing or hiding different aspects of it?> Not that I know of, but as modules are instantiated, you can instantiate them and import, and then export. I can't see this being too useful, though. TLA+ specs are usually small.So it doesn't scale to entire programs?> A 3000 line spec (over multiple modules) is considered very big.A 3000 line program is fairly small.> Come on. That people write PhDs on algorithms they implement in PythonPresumably they don't actually prove things about Python programs directly. If they do, what formal semantics for Python are they using?> doesn't mean that Python isn't a high-school-appropriate language.Formally reasoning about Python programs is a huge pain. Why would anyone want to subject high-schoolers to that?> Informally, it goes as follows. (proof)Okay, it was much shorter and sweeter than I expected. Well done.> TLA+ comes with a model checker. You just press a button and if your assumptions are incorrect you get a counterexample.What I want is proofs, not just truth values. A proof is a syntactic object that can be analyzed, partially reused, combined with other proofs, etc.
 > A 3000 line program is fairly small.TLA+ is not a programming language. A 3000 line spec easily specifies a 100KLOC program.> What I want is proofs, not just truth values. A proof is a syntactic object that can be analyzed, partially reused, combined with other proofs, etc.Sure, but: 1. if you're wrong, a model checker gives you a counterexample; this is invaluable in understanding what's going on and fixing your bugs. It's much easier to prove something when it's true. 2. Unfortunately, the difference between the cost of deductive proof and running a model checker is at least one if not two orders of magnitude. It is common industry experience (reported most recently by Amazon about TLA+), that a model checker provides the best bang-for-the-buck. It actually reduces development costs overall. Deductive proofs are cost-prohibitive for all but the most well-funded, patient and rather modestly sized programs. They're worth it mostly if you're willing to pay 10x to go from, say, 97% certainty to 99.9%.
 > TLA+ is not a programming language. A 3000 line spec easily specifies a 100KLOC program.So, which is it?(0) After you verify a TLA+ specification, you still have the rather large task of making sure your actual program satisfies the specification. In this case, why should I bother verifying something that isn't the final program?(1) The program can be generated from the specification. In this case, the TLA+ specification is a program, although it's written in an unusually high-level language.> Deductive proofs are cost-prohibitive for all but the most well-funded, patient and rather modestly sized programs.It would be a lot more cost-effective if we used languages, libraries and even operating systems geared towards formal verification.
 > After you verify a TLA+ specification, you still have the rather large task of making sure your actual program satisfies the specification. In this case, why should I bother verifying something that isn't the final program?Theoretically, you can verify the program itself in TLA+, in a manner similar to how seL4 verified their code. You compile your final code to TLA+ to produce a low-level spec that captures the actual semantics of your progam, and check that it is a refinement of your high-level spec. Research teams have done this for Java and C (http://tla2014.loria.fr/slides/methni.pdf). The problem is that it doesn't scale, and neither does any other form of end-to-end proof. Humanity has never been able to formally prove an actual program that is anywhere near what we would call "large".The reason you'd bother doing it is that 1. unlike end-to-end verification it is not only affordable, but also reduces costs, and 2. it actually works in uncovering the most costly problems, as verified by years of experience in industry on large projects.> The program can be generated from the specification.Ah, that is an interesting research question. Obviously in some cases a program could be generated. The spec would need to be deterministic for one (or nondeterministic in a way that makes sense for an actual program, like when representing multiple processes), computable (you can easily specify a program that takes an arbitrary program as input and says, "in the first step, let x be equal to TRUE if the input program terminates on the input 13, and FALSE otherwise"), and pretty low level because you can easily write things that would be extremely inefficient to run (like "let x be the smallest prime that's the sum of three squares" or something). Lamport believes that it may be possible (for a properly amenable spec) using machine learning. But it's not like compiling any high-level programming language.> It would be a lot more cost-effective if we used languages, libraries and even operating systems geared towards formal verification.Research doesn't seem to indicate that it would be, or rather, even if it is a lot more cost effective, it is still beyond affordable for all but high-assurance software, and high-assurance software already makes use of such languages (SCADE and SPARK work very well). But even in high-assurance, the industry doesn't bother with deductive proofs, as model checkers are automatic and work where they're most needed (SCADE) and SMT solvers are also reported to be effective for very restrictive languages (SPARK, but I've never tried it). Manual deductive proofs are used sparingly, to tie together results from automatic verification processes: http://events.inf.ed.ac.uk/Milner2012/J_Harrison-html5-mp4.h...
 > But any proof or any other kind of verification not carried out or verified by a computer is not formal.Very suprised to hear this from someone who wrote this very article on the nature of computation!
 It was an informal statement :) In any event I said deduction rules or a computer, by which I meant some tool that can reduce the rote effort, and that we can be sure is itself correct.
 There is still one glaring issue with your thesis that remains unresolved.Verification is hard. We all agree. But that is for the worst case. Alternatively we can say that it is hard to verify arbitrary programs. I am utterly unconvinced that it is hard to verify programs that are designed to be verified, and I am utterly unconvinced that it is hard to design programs to be verified.How should I convince myself that it is hard either (1) to verify even programs that are specifically designed to be verified, or (2) to write such programs?(I don't mean hard in practice, I mean in theory.)
 > How should I convince myself that it is hard either (1) to verify even programs that are specifically designed to be verified, or (2) to write such programs?Didn't we discuss that issue on Reddit? Anyway, as the complexity of verification is proportional to the number of states. If you write the program from scratch, the number of states is dictated by your spec (as every computational problem, every function has an inherent computational complexity). Your spec can be arbitrarily hard.Now, it's not a matter of worst-case. That writing and verifying arbitrary programs is the very same problem is easy to see: given a program A, and a property I wish to verify P, I then create a spec S = A ∧ P, i.e, the program you write must behave like this arbitrary program and satisfy P. So there is no difference between writing from scratch or verifying. Indeed, you can see that static analysis tools can often prove on arbitrary programs the very same properties that are enforced by type systems.Of course, not every algorithm and property are worst-case, regardless of whether the program is being written or given. There are useful classes of programs/properties that are easier to verify, and the language can make useful restrictions. So far, examples that work well in practice are SPARK and SCADE. Each forbids memory allocation, including unrestricted stack growth.
 > That writing and verifying arbitrary programs is the very same problem is easy to see: given a program A, and a property I wish to verify P, I then create a spec S = A ∧ P, i.e, the program you write must behave like this arbitrary program and satisfy P. So there is no difference between writing from scratch or verifying.I don't follow. What has A got to do with it? I am given a spec P. You claim that it's hard to write a program that satisfies P, pretty much regardless of what P is. I don't see how A is relevant at all.
 Yes, but P can be arbitrarily hard to implement. This is hard for you to see, so I'm incorporating A into P just as an example of how we can make the spec hard. If it were easy to write programs that conform to arbitrary specifications, it would have been just as easy to verify arbitrary programs, because I can reduce the latter to the former by making the arbitrary program the spec of the new one.If you want another example of how a spec can be hard, consider this one: "write a program that for every input n, an even integer greater than two, outputs the smaller of two primes whose sum is n". Do you think writing that program and proving it correct is any easier than verifying the 10-liner in my post?Or even this: "write a program that sorts a list of integers". You know for a fact that that program has a minimal time complexity of O(n lg n). That complexity, in turn, partly determines how hard it will be proving the program correct. You know that it will be easier than "write a program that returns the first element of its input list", whose time complexity is O(1) (actually the difficulty of verification is more related to the space complexity, because the space complexity is a better approximation of the number of states the algorithm must have in order to solve the problem, unless the algorithm can be a pushdown automaton).
 Sure, I know that there are some specifications P that are hard to implement (and some that are impossible!). What's not clear to me is whether the specifications we care about implementing in practice are hard to implement.
 In practice it's obvious that they're hard! Look at the bugs people make and look at the effort required for seL4/CompCert (both are fairly simple systems).
 Ah, but that's not my question! My question is whether the specifications we are interested in implementing in practice are hard in theory.
 That's a good question and I don't think anyone has a definitive to that. I certainly don't. I think we should do empirical research and try to find out. What would be your guess and why?I do have a some ideas, though. First, I think that the effort required for end-to-end proofs -- even for heavily simplified, relatively small programs -- does indicate that the problems are inherently hard because I don't think we're lacking some secret proof technique that would all of a sudden make things significantly easier. The difference between cases that are extremely hard (Goldbach's conjecture) and real world code doesn't seem big. Second, I think that empirical research might uncover something that can have a bigger impact. End-to-end proof is not a very important practical problem, because very few software projects ever need it or anything close to it. Improving correctness or reducing development costs (two sides of the same coin) for non high-assurance programs is much more important, and where the theory leaves much more wiggle room (because once you step away from actual watertight proof, basically anything is possible). I believe that empirical research can uncover certain common program patterns and bugs that can be analyzed/prevented, and that those may give a significant boost. But without empirical research, we won't even know which patterns/bugs to tackle.I think that emphasis on deductive proofs is misplaced (in light of theory and practice) as it's hard to fine-tune[1], and indeed, the software verification community has been moving further and further away from that direction for a long while; since the seventies, I believe. It's the PL community that's decided this is interesting to them. Software verification is interested in many other approaches, like static analysis (that can verify some properties -- hopefully important -- some of the time), probabilistic methods, combination of formal methods and testing (like so called "concolic" testing) etc..------[1]: I'm referring to actual proofs with dependent types. Simpler type systems can be invaluable, if researchers only tried to figure out which properties are actually important. For example, Rust's borrow checker is extremely interesting, and tackles a class of bugs that's known to be both very harmful and easy to prevent mechanically. We should be looking for more of those. An example of what not to do if you want real-world impact, is the peculiar focus on effect systems. Other than shared state (which is a whole other matter), how did some PL researchers get it into their minds that effects are an important problem to tackle?
 > That's a good question and I don't think anyone has a definitive to that. I certainly don't. I think we should do empirical research and try to find out. What would be your guess and why?I have absolutely no idea. The entire focus of my software development philosophy for a decade or has been to make programs easier for humans to understand, not easier to write proofs about (although there's some overlap). I do wonder how big the weak but extremely useful class of properties (like Rust's memory safety) is.
 This, I think, is a beautiful example of what empirical research can uncover: https://www.usenix.org/system/files/conference/osdi14/osdi14...> almost all (92%) of the catastrophic system failures are the result of incorrect handling of non-fatal errors explicitly signaled in software.The exception-handling code is there, it's just that no thought has been put into it. They say that their simple linting tool could have detected and prevented a good portion of those very costly bugs.

Search: