
My experience with using TLA+ in distributed systems class - pron
http://muratbuffalo.blogspot.com/2015/01/my-experience-with-using-tla-in.html
======
pron
TLA+ is one of the most successful formal verification tools in the industry
(probably because as far as verification tools go, it is relatively simple yet
very powerful). It is used at Amazon to formally specify and/or verify their
AWS services[1], it's been used by Intel and Compaq to verify complex cache-
coherence protocols, and it was used to formally specify and prove the
distributed Raft algorithm.

[1]: [http://cacm.acm.org/magazines/2015/4/184701-how-amazon-
web-s...](http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-
uses-formal-methods/fulltext)

------
tomkinstinch
To save others the trouble, here's the link that introduces the TLA (three-
letter acronym), TLA+:

[http://research.microsoft.com/en-
us/um/people/lamport/tla/tl...](http://research.microsoft.com/en-
us/um/people/lamport/tla/tla-intro.html)

------
unboxed_type
Pros: PlusCal is relatively easy to use and understand. TLA Toolbox is self
contained. You have an editor, a model checker and other tools right out of
the box. Two or three books written by Lamport. Nice material base. Theory
behind this model checker is relatively lightweight. Compare it with some
sophisticated process algebra, for example.

Cons: PlusCal is not able to handle process spawn operation, as far as I know.
So if you are to model real system where clients come and go, you have to
write your spec in pure TLA+ which is unmaintainable, because it is just a
logical assembler.

State explosion problem.

Java as implementation language. So you are using your CPU cycles not the best
way.

Built-in proof system is not able to and was not supposed to check your proof.
To do this you have to use some other tools like Coq (fix me if I am wrong)

Nearest competitors in that nich: Spin (yeah!), Alloy, EventB. Just let you
know that there are alot of model checkers out there.

~~~
pron
TLA+ has its own proof system (TLAPS[1]) that, while it doesn't yet support
all TLA+ features, is getting better and is quite capable.

Java is _very_ efficient, and TLC uses parallelism and even distributed
parallelism (and can store states to disk).

This is first I hear of Spin, and I'll be sure to give it a try, but TLA+'s is
considered easier to use and/or more expressive than many other solutions, and
has the added advantage of being used in the industry (as far as formal
verification tools go, that is).

I would very much like to hear more about Spin.

[1]: [https://tla.msr-inria.inria.fr/tlaps/content/Home.html](https://tla.msr-
inria.inria.fr/tlaps/content/Home.html)

~~~
unboxed_type
Thanks for the link. I read about TLAPS. It seems it relies on SMT solver to
uncharge proof obligations, and no way for you to manually prove your claim
using lower level tactics, like in Coq. So you rely on heuristic nature of SMT
solver.

About Spin[1]. If you compare the user base of Spin with the user base of TLA+
I bet you will wonder how many users out there were using verification tools
all those years. Lamport released his tool like 20 years later than SPIN
started to spread. He did a great job, but there is no innovation in my
opinion.

I think that serious industry players like Intel, Google or NASA are using all
spectre of verification tools, including self written tools. Intel is using
there own tool for some of their chip verification AFAIK. My colleague went
into Intel to help develop verification scripts using this tool.

TLA+ has been promoted by Amazon with their latest technical report. Yes, it
has more expressive types (records,tuples,sets), but it comes with a cost of
lowering your verification performance rate. There are many subtle tradeoffs
to make.

We tried to use several formal verification tools in our latest distributed
project. We tried Spin, TLA+, mCRL2, Coq. There is its own philosophy, pros
and cons behind each tool, but in our case Spin made the best job: we had to
invest not so much time but found many concurrency bugs in our distributed
algorithm. Its pros is that it has very basic data types and not very good
parallelism support nor in multi thread nor in multi-node form.

[1]
[http://spinroot.com/spin/whatispin.html](http://spinroot.com/spin/whatispin.html)

~~~
pron
Could you compare model-checkers like Spin and TLA+ with type-based approaches
like F* and Coq, especially in terms of ease of use? The latter group doesn't
seem to be getting much real-world usage, and I have the sense they are _much_
harder to use.

~~~
unboxed_type
In short: Spin and TLA+ are "push-button" mechanisms where computer are trying
every possible trace of your system to check system state against supplied
invariant. Pros: You don't have to think much about your systems thin
properties before verifying. Cons: State space explosion so not every real
system is a subject to such verification.

Coq,Agda: Manual/Semi-manual correctness proof of your system. You can reason
about even "infinite" systems with this approach using relatively little
computation resources. But to do so you have to gain a _deep_ insight about
your system behaviour, computation semantics, network semantics (if
distributed system is checked) and other properties. If you are lucky you can
proof property under question, even it assumes very large moving parts. If not
then you cant tell that this property can be proved at all. So these are more
like platforms for quasi-manual deduction.

F* is trying to take up a niche between manual proof and quasi-automatic proof
using SMT solver for those of theorems which can be proved this way. The
problem is that SMT solvers are generally suck, there is no sound theory
behind it, so it is more like guessing in my opinion.

~~~
pron
Thank you.

------
felixgallo
In the case of the 2nd miniproject testing the naive physical-logical clock
algorithm, why isn't the simplest counterexample that the logical clock
advances twice for each message send/receive, and so if the message send rate
is greater than half the physical clock rate, it always skews the logical
clock at a rate greater than the physical clock rate?

------
jules
> Model checking with 2 acceptors worked, but I gave up on model checking with
> 3 acceptors since it took more than an hour.

Does anyone know how the TLA model checker works? I wasn't able to find this
information. Does it use a SAT/SMT solver? Or brute force search?

~~~
ahelwer
Exhaustive search across all states and state transitions. There's a
distributed version which helps with especially large state spaces.
Combinatorial explosion is an issue. In some cases the best/only option is to
formally prove correctness with the built-in proof language; the model checker
works well in tandem, for quickly checking assertions before putting effort
into a proof.

Generally the model checker alone works just fine, though.

~~~
pja
Yeah, it’s exhaustive search. Fortunately storage costs have dropped like a
rock over the last few decades, so problems that were hilariously impractical
only a few years ago are now checkable on commodity hardware. A fully
interlinked (ie guaranteed symmetric inter-node bandwidth) cluster on Amazon
EC2 with aggregate memory (even before you start touching secondary storage)
in the Terabytes can be rented for a few 10s of $ an hour at spot prices. It
may be that Formal Methods time has finally come!

