Hacker News new | past | comments | ask | show | jobs | submit login
My experience with using TLA+ in distributed systems class (muratbuffalo.blogspot.com)
87 points by pron on Sept 15, 2015 | hide | past | favorite | 12 comments

TLA+ is one of the most successful formal verification tools in the industry (probably because as far as verification tools go, it is relatively simple yet very powerful). It is used at Amazon to formally specify and/or verify their AWS services[1], it's been used by Intel and Compaq to verify complex cache-coherence protocols, and it was used to formally specify and prove the distributed Raft algorithm.

[1]: http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-s...

To save others the trouble, here's the link that introduces the TLA (three-letter acronym), TLA+:


Pros: PlusCal is relatively easy to use and understand. TLA Toolbox is self contained. You have an editor, a model checker and other tools right out of the box. Two or three books written by Lamport. Nice material base. Theory behind this model checker is relatively lightweight. Compare it with some sophisticated process algebra, for example.

Cons: PlusCal is not able to handle process spawn operation, as far as I know. So if you are to model real system where clients come and go, you have to write your spec in pure TLA+ which is unmaintainable, because it is just a logical assembler.

State explosion problem.

Java as implementation language. So you are using your CPU cycles not the best way.

Built-in proof system is not able to and was not supposed to check your proof. To do this you have to use some other tools like Coq (fix me if I am wrong)

Nearest competitors in that nich: Spin (yeah!), Alloy, EventB. Just let you know that there are alot of model checkers out there.

TLA+ has its own proof system (TLAPS[1]) that, while it doesn't yet support all TLA+ features, is getting better and is quite capable.

Java is very efficient, and TLC uses parallelism and even distributed parallelism (and can store states to disk).

This is first I hear of Spin, and I'll be sure to give it a try, but TLA+'s is considered easier to use and/or more expressive than many other solutions, and has the added advantage of being used in the industry (as far as formal verification tools go, that is).

I would very much like to hear more about Spin.

[1]: https://tla.msr-inria.inria.fr/tlaps/content/Home.html

Thanks for the link. I read about TLAPS. It seems it relies on SMT solver to uncharge proof obligations, and no way for you to manually prove your claim using lower level tactics, like in Coq. So you rely on heuristic nature of SMT solver.

About Spin[1]. If you compare the user base of Spin with the user base of TLA+ I bet you will wonder how many users out there were using verification tools all those years. Lamport released his tool like 20 years later than SPIN started to spread. He did a great job, but there is no innovation in my opinion.

I think that serious industry players like Intel, Google or NASA are using all spectre of verification tools, including self written tools. Intel is using there own tool for some of their chip verification AFAIK. My colleague went into Intel to help develop verification scripts using this tool.

TLA+ has been promoted by Amazon with their latest technical report. Yes, it has more expressive types (records,tuples,sets), but it comes with a cost of lowering your verification performance rate. There are many subtle tradeoffs to make.

We tried to use several formal verification tools in our latest distributed project. We tried Spin, TLA+, mCRL2, Coq. There is its own philosophy, pros and cons behind each tool, but in our case Spin made the best job: we had to invest not so much time but found many concurrency bugs in our distributed algorithm. Its pros is that it has very basic data types and not very good parallelism support nor in multi thread nor in multi-node form.

[1] http://spinroot.com/spin/whatispin.html

Could you compare model-checkers like Spin and TLA+ with type-based approaches like F* and Coq, especially in terms of ease of use? The latter group doesn't seem to be getting much real-world usage, and I have the sense they are much harder to use.

In short: Spin and TLA+ are "push-button" mechanisms where computer are trying every possible trace of your system to check system state against supplied invariant. Pros: You don't have to think much about your systems thin properties before verifying. Cons: State space explosion so not every real system is a subject to such verification.

Coq,Agda: Manual/Semi-manual correctness proof of your system. You can reason about even "infinite" systems with this approach using relatively little computation resources. But to do so you have to gain a _deep_ insight about your system behaviour, computation semantics, network semantics (if distributed system is checked) and other properties. If you are lucky you can proof property under question, even it assumes very large moving parts. If not then you cant tell that this property can be proved at all. So these are more like platforms for quasi-manual deduction.

F* is trying to take up a niche between manual proof and quasi-automatic proof using SMT solver for those of theorems which can be proved this way. The problem is that SMT solvers are generally suck, there is no sound theory behind it, so it is more like guessing in my opinion.

Thank you.

In the case of the 2nd miniproject testing the naive physical-logical clock algorithm, why isn't the simplest counterexample that the logical clock advances twice for each message send/receive, and so if the message send rate is greater than half the physical clock rate, it always skews the logical clock at a rate greater than the physical clock rate?

> Model checking with 2 acceptors worked, but I gave up on model checking with 3 acceptors since it took more than an hour.

Does anyone know how the TLA model checker works? I wasn't able to find this information. Does it use a SAT/SMT solver? Or brute force search?

Exhaustive search across all states and state transitions. There's a distributed version which helps with especially large state spaces. Combinatorial explosion is an issue. In some cases the best/only option is to formally prove correctness with the built-in proof language; the model checker works well in tandem, for quickly checking assertions before putting effort into a proof.

Generally the model checker alone works just fine, though.

Yeah, it’s exhaustive search. Fortunately storage costs have dropped like a rock over the last few decades, so problems that were hilariously impractical only a few years ago are now checkable on commodity hardware. A fully interlinked (ie guaranteed symmetric inter-node bandwidth) cluster on Amazon EC2 with aggregate memory (even before you start touching secondary storage) in the Terabytes can be rented for a few 10s of $ an hour at spot prices. It may be that Formal Methods time has finally come!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact