
Formal Methods in Building Robust Distributed Systems - ctdean
http://perspectives.mvdirona.com/2014/07/03/ChallengesInDesigningAtScaleFormalMethodsInBuildingRobustDistributedSystems.aspx
======
ScottBurson
The linked paper is quite interesting [0]. It does indeed sound like TLA+, the
formal methods tool set they used, has worked out very well. One quote: "[W]e
have found that software engineers more readily grasp the concept and
practical value of TLA+ if we dub it: _Exhaustively-testable pseudo-code_. We
initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the
widespread view that formal methods are impractical."

[0] [http://research.microsoft.com/en-
us/um/people/lamport/tla/fo...](http://research.microsoft.com/en-
us/um/people/lamport/tla/formal-methods-amazon.pdf)

------
nmrm
Here's an interesting blurb about how they were using TLA+. TL;DR is is that,
unsurpisingly, model checking trumps proving in roi:

We have found formal specification and model-checking to yield high return on
investment. In addition to model checkers, many formal specification methods
also support formal machine-checked proof. TLA+ has such a system. The TLA+
proof system has several compelling benefits; for example, it supports
hierarchical proof. After doing only a small number of such proofs, author
C.N. has found that hierarchical structure is an effective way to handle the
otherwise overwhelming amount of detail that arises in formal proofs of even
small systems. Another benefit of the TLA+ proof system is that it takes

as input the same specification text as the model-checker. This allows users
to find most of the errors quickly using the model-checker, and switch to the
proof system if even more confidence is required. Even though formal machine-
checked proof is the only way to achieve the highest levels of confidence in a
design, use of formal proof is rarely justified. The issue is cost; formal
proof still takes vastly more human effort than model-checking. We are
continuing to experiment with the TLA+ proof system, but currently model-
checking remains the sweet spot in return on investment for our problem
domain. We suspect that proof will only be a worthwhile return on investment
for one or two of the most critical core algorithms.

------
pja
r/programming discussion of this paper here:
[http://www.reddit.com/r/programming/comments/277fbh/use_of_f...](http://www.reddit.com/r/programming/comments/277fbh/use_of_formal_methods_at_amazon_web_services/)

------
amund
The Envisage Research Project - [http://envisage-project.eu](http://envisage-
project.eu) \- is working on developing formal methods for software
engineering for the cloud, ref: [http://envisage-project.eu/wp-
content/uploads/2013/10/Envisa...](http://envisage-project.eu/wp-
content/uploads/2013/10/Envisage_factsheet.pdf) "ENVISAGE will create a
development framework based on formal methods to include resources and
resource management into the design phase in software engineering for the
cloud. This will improve the competitiveness of SMEs and profoundly influence
business ICT strategies in virtualized computing"

------
tlarkworthy
I have used Computational Tree Logic before
([https://www.firebase.com/blog/2014-02-04-firesafe-complex-
se...](https://www.firebase.com/blog/2014-02-04-firesafe-complex-security-
logic-for-firebase.html))

I wonder if anybody knows what the main differences between CTL and TLA are.
Maybe I should switch camps?

EDIT: oooh, you can read the book for free [http://research.microsoft.com/en-
us/um/people/lamport/tla/bo...](http://research.microsoft.com/en-
us/um/people/lamport/tla/book-02-08-08.pdf)

EDIT2: Ahh... TLA has sets for one thing

------
jzelinskie
Can someone explain give an introductory run down on model checking vs theorem
proving and why someone wouldn't just want to write these systems/algorithms
in a dependently typed language?

~~~
oggy
I'll give it a try, briefly. For both, you first need to give a mathematical
meaning to your system, i.e. represent the system as some kind of well-defined
mathematical object.

A typical choice (especially for concurrent/distributed systems) are
transition systems. To create a transition system, you first need to describe
the state of your (entire) system. This could, for instance, be the current
values of all variables used in all processes of your distributed system and
the set of all messages currently in the network. Next you need to model
transitions; ways for the state to evolve. These would look something like
"process p_i increments its variable x from 0 to 1, and sends a message to
process p_j".

Given such a transition system you can now ask different, well-defined
questions about it. The simplest kind are "safety" properties, where you ask
whether your system can reach a state which is "bad" by some criterion. In a
distributed consensus algorithm for instance, you could ask "can it be that
two replicas decide on a different value?", or "can it be that my processes
are all in the waiting state (deadlock)?". You need to phrase this question in
some kind of formal language the tool can understand.

Model checking and theorem proving then go about different ways to answer the
question. Model checking, roughly, tries to use brute force to answer the
question and requires no human interaction in doing so. You could imagine it
feeding every possible input to every process, choosing every possible
interleaving of messages and, for every state reachable in such a manner,
checking whether the bad thing happens.

In theorem proving, you try to provide the rationale of why things can't go
wrong in form of theorems. This often follows the informal reasoning. E.g.
suppose we're in the "earliest" bad state s, then previously X must've
happened, which must've been caused by Y previously happening, which must've
happened in some state s' which was also bad but earlier than s, which is a
contradiction. However, you also have to convince the theorem prover that your
reasoning is sound. So first you need to understand what methods of reasoning
you are using precisely (e.g. my example essentially relied on induction,
which might not be clear to most people), and you also need to somewhat
understand the way of how the prover "ticks" and what kinds of reasoning steps
it can perform automatically.

So theorem proving requires more training and effort than model checking. On
the flip side, model checking typically works only for finite systems. So your
model (transition system) could only have, say, 3 or 4 processes, and the
range of the variables would be restricted (say my variable x could only range
between 0 and 4), as would be the number of messages in the network.
Obviously, this doesn't give you such strong guarantees - maybe your system
works fine for 4 processes, but does something wrong when you have 5 of them.

Depending (hah!) on the type system used in your dependently typed language of
choice, those languages are actually a kind of theorem provers! This includes
Coq, but also Idris and Agda. They exploit the so-called Curry-Howard
correspondence, which equates mathematical propositions to (dependent) types,
and proofs of the said propositions to terms (programs) of the given type.

