
An empirical study on the correctness of formally verified distributed systems - mpweiher
https://blog.acolyer.org/2017/05/29/an-empirical-study-on-the-correctness-of-formally-verified-distributed-systems/
======
agentultra
One has to be careful and read the paper and understand the systems brought
under review.

The fine article states, _" Does all that hard work on formal verification
verify that they actually work in practice? No."_

The systems under investigation (Ironfleet[0], Chapar[1], and Verdi[2]) are
not general-purpose "formal verification" languages and tools. They allow you
to develop a high-level specification in a formal logic and extract an
implementation from it.

The conclusion is odd. The systems produce implementations that are free of
errors (so long as the specification itself is correct). The study finds that
the majority of errors happen at the interface between the verified and
unverified components of the system. This is reason why "formal verification"
doesn't actually work in practice.

The article mentions The Amazon Paper[3]. It's a fine retrospective of
Amazon's use of TLA+ and formal methods to find errors, produce correct
designs, and test ideas. Yet the paper isn't about TLA+ or formal methods in
general.

If you're interested in the analysis I'd read the paper first before this
summary and draw your own conclusions.

 _Update_ links:

[0] [https://www.microsoft.com/en-
us/research/publication/ironfle...](https://www.microsoft.com/en-
us/research/publication/ironfleet-proving-practical-distributed-systems-
correct/#)

[1]
[https://dl.acm.org/citation.cfm?id=2837622](https://dl.acm.org/citation.cfm?id=2837622)

[2] [http://verdi.uwplse.org/](http://verdi.uwplse.org/)

[3] [http://research.microsoft.com/en-
us/um/people/lamport/tla/fo...](http://research.microsoft.com/en-
us/um/people/lamport/tla/formal-methods-amazon.pdf)

~~~
mpweiher
> The conclusion is odd. [correct implementation]

> (so long as the specification itself is correct)

What is odd about the conclusion? It seems 100% obvious, as correct formal
specifications don't appear magically from heaven (though that sometimes seems
to be the assumption). Without a correct formal specification, your proof is
"worthless" (by itself in terms of correctness of the whole). And how do you
"prove" that the specification is correct?

Having a correct transformation from spec. to program does not _guarantee_ a
correct program any more than a correct compiler. In fact, you could argue
that you have made the problem harder:

    
    
         fuzzy requirements -> program 
    

vs.

    
    
         fuzzy requirements -> formal spec -> program
    

For a large percentage of problems, the really hard part is the one from
"fuzzy requirements" to whatever, and there is no formal mechanism, at least
that I can imagine, that will make that step provable.

This dawned on me when I was doing formal specification at University (in Z,
IIRC): it turned out that at least for that language, the specifications
tended to be lengthier, more difficult to understand and more difficult to map
to requirements than program code. So even if I could prove the transformation
to code as 100% correct, and even if I could automate that with 100%
accuracy/reliability, I had still made the total problem _harder_ than before!

In fact, if I had an "executable specification", which was a hot thing at the
time, how was that different (in principle) from a somewhat peculiar
programming language? So then I had come back to:

    
    
         fuzzy requirements -> program (executable specification)
    

Except that the "programming language" was a bit weird. Now if we had
specification languages that were easier to use, more high-level and easier to
map to and from fuzzy requirements, _that_ would make the whole thing more
interesting.

And of course, there are domains where writing things down
separately/differently several times and then determining whether the
different written down versions match is a big win (which is also why unit
tests are a big win).

~~~
agentultra
There are examples of big wins with formal methods in industry. I linked to
the Amazon paper in my original comment. There are others from Microsoft
[0][1] and in high-assurance safety-critical systems such as the Meteor Line
14 in the Paris Metro [2].

I found the conclusion odd because it claims from the study that formal
verification methods do not work _in practice_.

One common complaint I hear from opponents of or developers ignorant of formal
methods is that we cannot possibly specify the entire software system: it's
too complicated.

We don't have to. We can specify the important parts formally and get big
wins: critical components that are correct by design.

> _And how do you "prove" that the specification is correct?_

It depends on the level of assurance required by the project. Often a model
and a model checker is good enough over a large enough state space. Sometimes
you might require proof.

Regardless of which approach taken these are far better guarantees than a pile
of unit tests and integration tests. They only prove the presence of errors:
not their absence.

And for highly distributed, concurrent systems I think I've spent enough time
trying to grok hand-waving diagrams, execution traces, and log files to
realize that I've been wasting enough time. These systems are too complex to
reason about locally in source code alone. Whether the prover can be proved is
irrelevant: the most important part of using formal methods is that it
requires you to think in precise terms with no hand-waving.

 _Update_ links:

[0] [https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/07/leslie_lamport.pdf)

[1] [https://techcrunch.com/2017/05/10/with-cosmos-db-
microsoft-w...](https://techcrunch.com/2017/05/10/with-cosmos-db-microsoft-
wants-to-build-one-database-to-rule-them-all/)

[2]
[http://rodin.cs.ncl.ac.uk/Publications/fm_sc_rs_v2.pdf](http://rodin.cs.ncl.ac.uk/Publications/fm_sc_rs_v2.pdf)

~~~
mpweiher
> There are examples of big wins with formal methods in industry.

No one is disputing that, not the paper, not I. Even from the abstract:
"results are encouraging".

> it claims from the study that formal verification methods do not work in
> practice.

Again, since when is "results are encouraging" the same is "this doesn't work
in practice". Where do they make that claim?

I think they rightfully put a limit on what "works" means in this context, and
show that just because you have formally verified _your_ system (or parts
thereof) doesn't mean the whole is bug free. Remember Knuth? "Beware of bugs
in the above code, I have only proved it correct, not tried it"

>> And how do you "prove" that the specification is correct?

> It depends on the level of assurance required by the project.

Sorry, and again: how do you prove that the _specification_ is correct? (See
also the sibling comment with the quote from Alan Perlis). Correct with regard
to what? Itself? How would that work? Another specification?

> these are far better guarantees than a pile of unit tests and integration
> tests

In some cases that is certainly true. Distributed systems in particular are
hard to test.

> They only prove the presence of errors: not their absence.

And as the paper shows, neither does verification, for various reasons, some
of which, such as boundary conditions can be and _were_ captured by tests. No
one is disputing the verification can be useful, it is the absolutism that's
an issue.

Absolutism is always wrong ;)

~~~
agentultra
The paper is fine but I took the quote that stood out from the article as the
premise which made the conclusion seem odd to me.

> Sorry, and again: how do you prove that the specification is correct?

How do you prove that the proof is correct? By writing another proof of the
proof?

In practice it's not terribly useful to wring our hands around this issue.
I'll leave that to the mathematicians. If you want to know if your
specification is correct you either use a model checker to automatically check
that some invariants hold over a universe of states or you use a theorem
prover in which you trust the kernel.

If you want to know whether you specified the correct properties, invariants,
etc then you'll have to have someone review your work. With an interactive
theorem prover at least some of this work can be done by the assistant but at
the end of the day it's about thinking and testing your assumptions.

> Absolutism is always wrong ;)

Ha! Cute.

I had thought the idea that formal verification will produce software free of
all errors is well debunked these days. Maybe I'm wrong. I've always used to
to prove only that certain properties hold. To prove the entire system is free
of errors would be quite difficult and expensive indeed!

------
ottumm
Did anyone else find this disturbing: "Chapar used UDP, but did not handle
packet drops because it assumed the network layer was reliable."?

This is a serious case of missing the forest for the trees.

~~~
ktRolster
It demonstrates a difficulty of formal verification. You can prove the program
matches the specification, but how do you know the specification is correct?

 _Quis custodiet ipsos custodes?_

~~~
ottumm
Very true. It's also an almost unbelievably basic design error to make in a
distributed system.

------
DamonHD
Woot!

We had a crack at a formally specified networking component 30 years ago, and
it was hard in part because the tools were lacking or hard themselves to
reason about. Tried to move to simpler logics but in effect ran out of time.
Actually, our funder ran out of money and keeled over...

~~~
nickpsecurity
Things are getting better if you go to model-checking. Check out this TLA+
tutorial that covers interesting use-cases:

[https://learntla.com/introduction/](https://learntla.com/introduction/)

~~~
DamonHD
Thank you.

------
nickpsecurity
Here's repost of summary of this paper back when I posted it on Lobste.rs:

"In high-assurance systems, certification usually mandates that many forms of
verification are used since one might catch problems others missed. Sometimes,
the problems are in the verification tools themselves. The authors review
tools that seem to have only used formal verification on select aspects of
their distributed systems. The added techniques of code/doc review, observing
things in debugger, component-based testing, and network/file fuzzing caught a
bunch of problems.

Interestingly, the verified code did exactly what the spec said it would. Like
with CompCert, the formal verification made the implementation flawless in
terms of conforming to the spec. The spec errors were mostly at interfaces as
the founders of software engineering found in the 1960’s. I always say to
bulletproof those by throwing everything you can afford to at them. That
numerous components didn’t work the way the authors guessed reinforces why
high-assurance software always lists everything in the Trusted Computing Base
(TCB) along with what was verified and how. If you don’t fully understand a
3rd-party component, then there’s no way you can say your own component
correctly uses the other one. This is also why projects such as deepspec.org
are going from hypervisor code all the way down to the hardware. An example of
a lightweight method is to build monitors for 3rd-party components that watch
traces of its execution to spot inconsistencies with specifications that
reflect user’s expectations. This has been used in both embedded (Copilot) and
enterprise (NonStop) systems."

------
alsadi
1\. where is consul? 2\. less used means less reported bugs (not less bugs)

