Hacker News new | past | comments | ask | show | jobs | submit login

Maybe I should've said TLA+ w/ model checking since that's definitely lightweight versus full, formal verification. Far as Lean, it's specifically designed for proving stuff about "complex systems."



> Maybe I should've said TLA+ w/ model checking since that's definitely lightweight versus full, formal verification.

Why do you say that? Model checking is the most common form of full formal verification. It is true that because most TLA+ specifications -- unlike most software that's verified with model checkers -- has an infinite state space and that TLC, the model checker that's packaged with the TLA+ toolbox is an explicit state exploration model-checker with not support for infinite state spaces, then most of the time it is used on a finite-state derived specification and so doesn't fully verify the spec. But that is, AFAIK, not what is normally meant by "lightweight formal methods" (although I am not sure there is a very precise, well-accepted definition). But people take all sorts of short-cuts when working in Coq, too, and very rarely fully verify every aspect of the program. In fact, the only programs that were ever "fully" verified was 1. very small (much smaller than the vast majority of programs used in industry; I think jQuery is several times bigger than the biggest program ever "fully" verified end-to-end), 2. were heavily simplified in terms of algorithms used, and 3. took a loong time and required experts. So if that's how you'd classify "lightweight", then "full" doesn't exist in any meaningful sense.

> Far as Lean, it's specifically designed for proving stuff about "complex systems."

No, it's a Coq-style, general-purpose proof assistant. In any event, Coq, Lean and Isabelle are all designed for researchers. They are not aimed at industrial use at all. I know that the people behind Lean -- all accomplished experts in formal mathematics -- have made some claims about its suitability for verifying software, but AFAIK, none of them has any practical experience in software verification in industry, and so I find their claims to be entirely hypothetical. It's a beautiful proof assistant, but pragmatic software verification is not its first nor second goal. In contrast, TLA+ was designed with the help of actual "ordinary" industry engineers (at Compaq), who were using it for a large, real-world project, as it was being developed. That's the main difference between TLA+ and Coq/Lean -- it was designed as a product for engineers, not as a research tool for logicians.


" But that is, AFAIK, not what is normally meant by "lightweight formal methods" (although I am not sure there is a very precise, well-accepted definition). "

There isn't. The general usage in CompSci for that is notations that beginners can learn easily, apply with minimal effort, and get results with. TLA+ model-checker knocking out hard-to-test errors in protocols without having long process of formal verification qualifies it. SPIN was popular before it for a lot of similar stuff. Alloy is another commonly called lightweight in the literature with lots of resources for learning. Abstract, State Machines (ASM's) was another one engineers picked up super-easy that caught problems in all sorts of application areas. I think the consensus is the training time has to be tiny, the method should be applicable without a paid specialist constantly helping, and the method should work on useful software with not much extra cost. I say useful rather than practical because some lightweight methods are just shown to work better on academic examples than full formal whereas others were used in actually solving practical problems. There's often overlap, though, where small problems former handles indicates they might shake problems out of a component in a real project which sometimes happens.

"Model checking is the most common form of full formal verification. It is true that because most TLA+ specifications -- unlike most software that's verified with model checkers -- has an infinite state space and that TLC, the model checker that's packaged with the TLA+ toolbox is an explicit state exploration model-checker with not support for infinite state spaces, then most of the time it is used on a finite-state derived specification and so doesn't fully verify the spec."

You just contradicted your own point (and mine) there. The model checkers usually only do partial verification due to state explosion. "Full, formal verification" shows the property applies in all executions typically using some logical methods. The model-checkers are therefore not full, formal verification because they don't cover all states. They're partial verification that gets one some assurances with minimal effort vs full, formal verification that aims for max assurance at high effort.

The combo of ease of use with finite-space, model checking makes that form of TLA+ a lightweight, formal method. One could certainly use it for heavyweight stuff with proving techniques. Most won't, though. That's why my generic comments on TLA+ are about model-checking. I still will be more specific in future.

"They are not aimed at industrial use at all. I know that the people behind Lean -- all accomplished experts in formal mathematics -- have made some claims about its suitability for verifying software, but AFAIK, none of them has any practical experience in software verification in industry, and so I find their claims to be entirely hypothetical."

I was saying it was designed for verifying programs/systems which you said it was not. You contradicted the authors' own claim about its purpose. I corrected that. I don't follow Lean closely enough to have been making any broader claim past what their intention was. Coq and the HOL's are used in industry for high-assurance systems. I guess whether your claim about them is true depends on what the definition of "aiming for industrial use is." Others used in industry in past or present include ACL2, Z, and B method.

I'll be the first to agree each has a main goal of being designed by experts in such tools for use by veterans of or experts in such tools. They surely aren't putting a lot of effort into making the tools easy to apply in industry like we saw in TLA+. That's not their priority. They do get used, though, so not useless for that. Just needs costly specialists. So, they are tools with industrial application that have gotten excellent results in quality/security delivered that are just hard and expensive to apply with a need for specialists. There's a lot of things in industry that work like that where average person you pull out of college can't replace the expensive specialist. We don't usually pretend in those cases whatever the specialist is doing isn't fit for industry. We just recommend or use it very carefully while minimizing a need for it.

That said, I'm definitely one of the people in favor of getting all that stuff or something similar in shape for better industry adoption. Alternatively or in parallel, attempting to stretch user-friendly methods like TLA+, Alloy, or Design-by-Contract to hand-prove whatever properties the others do. Also, doing one or both until there's an ecosystem of reusable, verified pieces that can be composed for future projects like concept of Verified Software Repository or how so much is building on CompCert's Clight. One of only ones with industrial focus was SPARK Ada. There's a world of difference between effort involved for a newcomer to learn and use that versus Coq for a similar program even they were gonna try to run the Coq stuff through an automated prover. Like TLA+, SPARK was designed and improved by engineers for other engineers being tested regularly by use in engineering projects. I agree with you that the field needs a lot more of that thinking and investment into delivering it.


> TLA+ model-checker knocking out hard-to-test errors in protocols without having long process of formal verification qualifies it.

But that's the mainstream of formal methods. 99% of formal methods in industry is using model checkers or static analyzers. What you call "a long process of formal verification", meaning, I assume, formal manual proofs is almost nonexistent outside academia. So what you call "lightweight" formal methods is normally just called "formal methods". What you call "full" formal methods is, for the moment at least, a research conjecture.

> The model checkers usually only do partial verification due to state explosion.

That's quite inaccurate. To be called a model checker, a model checker must check a model (in the formal logic sense) -- i.e. it must exhaustively search through the entire state space (some model checkers can exhaustively check infinite state spaces), as opposed to, say fuzzers or concolic tests. What happens in practice most of the time TLA+ is that the model checker exhaustively checks a finite-state specification (or a finite-state instance), which is derived (by the user) from an infinite-space one. Model checkers are not "partial verification". They can be used to do partial verification. But so can proofs.

The theoretical difference between model checkers and proofs is that the latter is based on proof theory, while the former is based on model-theory. The challenge for model checkers is trying to automatically cover huge or infinite state-spaces; proofs have the same challenge, but they work manually with the hope that the human prover would be able to assist (this can actually be done with model-checkers as well, using refinements in TLA+, but TLC specifically is not that sophisticated). This hope is far from realized. In practice, most of the time, proofs hit a limit much sooner than model checkers, which is why they don't scale as well in practice, and so are barely used (when they're used, it's usually to tie together results from model-checkers and/or other automated tools).

> The combo of ease of use with finite-space, model checking makes that form of TLA+ a lightweight, formal method.

Again, virtually all formal methods are like that, except those that are mostly used in academia. Manual formal proofs are not only not used because they are so prohibitively expensive, but because they're simply not needed.

> You contradicted the authors' own claim about its purpose.

That's right. Those authors -- as accomplished as they are in their field -- have no experience in industrial software verification. Their claim is mere conjecture. I don't know how someone is supposed to design a tool to serve a certain market if they've never researched that market. I find Lean to be very interesting (I'm playing with it as a hobby), but its authors make some unsubstantiated claims that are well outside their area of expertise. I remember their introductory paper sometimes says things like "in practice", when it is clear to anyone who's actually done that in practice that, at best, they could say "in theory".

I should, however, qualify that Lean does have some features more oriented towards software verification, like -- most importantly -- integration with SMT solvers (which pretty much amounts to a model checker). But Lean does not differ in that regard from Why3 or Boogie, and I assume that's not the part of Lean you were referring to.

> I'll be the first to agree each has a main goal of being designed by experts in such tools for use by veterans of or experts in such tools.

Those are not just veterans and experts, but mostly academics working on research projects. There has not been a single piece of large or even medium-sized software verified end-to-end (let alone "fully") by anyone, ever, experts, veterans or otherwise. What you call "full" formal verification does not exist in any shape or form outside our hopes and dreams.

I think that the confusion between methods that are now purely research stage and are nowhere near ready for practical adoption, and actual working formal methods, does a lot of harm to formal methods. In general, I see a worrying trend of pure researchers, who have done little or no applied research, let alone actual industry work, presenting their pure research in a way that can be mistaken for "nearly ready for primetime" does a lot of harm for both research and industry. The field of formal methods in particular suffered a research winter (dried up funding) precisely because of such over-promises. Again, no one has ever "fully" verified a complex, large or even medium-sized real-world software, and even small, simplified software has virtually never been "fully" verified end-to-end outside academia. It's fascinating research, but its no more than (pure!) research at this point.

In fact, I know a company that makes a commercial formal verification product (a very sophisticated sound static analyzer) that can also integrate manual proofs in Coq or Isabelle. One of their customers wanted to try writing Coq proofs, and the team -- all formal math academics -- actively tried to dissuade them (I assume they succeeded) from doing that. How? By showing them how it's done.

> That said, I'm definitely one of the people in favor of getting all that stuff or something similar in shape for better industry adoption. Alternatively or in parallel, attempting to stretch user-friendly methods like TLA+, Alloy, or Design-by-Contract to hand-prove whatever properties the others do.

All formal methods designed for practical use are more-or-less user-friendly (or, at least aim to be), or they wouldn't be used by anyone. "All that stuff" as in tools that are actually able to "fully" verify large, complex software cannot be made user friendly as it does not exist at all. I also don't understand what you mean by "stretching" TLA+. I've used TLA+ extensively and have been learning Lean for quite some time, and there is nothing Lean can do when it comes to software verification that TLA+ can't (although, there are different relative strengths). In fact things like refinement or specification/verification of interactive/concurrent/distributed software -- all things that are of great importance in practice -- cannot be easily achieved in Coq/Lean/Isabelle without some serious work of embedding a logic like TLA in them (as, in fact, has been done when specifying/verifying such software). If anything, I would say that out of the box, TLA+ is more powerful than Isabelle/Coq/Lean for a big portion of real-world software (as most of the software people write today is interactive/concurrent/distributed).

Which is why I don't understand in what sense TLA+ is any "lighter" than Coq or Lean. The difference in power between Coq/Lean and TLA+ has nothing to do with software, and everything to do with researching novel logics and formal mathematics.

I think that you are grossly misestimating what has actually been done in CompCert and other similar projects. Even the main author of CompCert readily acknowledges that model checkers scale better. A closer look at what they did would give you a better picture of the true state of affairs.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: