
Principles of TLA+ - pramodbiligiri
https://pron.github.io/posts/tlaplus_part1
======
michaelmior
Leslie Lamport has a video course[0] on TLA+ that's slowly being released. I
created an RSS feed[1] that updates as new episods are released.

[0]
[https://lamport.azurewebsites.net/video/videos.html](https://lamport.azurewebsites.net/video/videos.html)

[1]
[http://fetchrss.com/rss/58d198538a93f8ef3b8b4567741573560.at...](http://fetchrss.com/rss/58d198538a93f8ef3b8b4567741573560.atom)

------
hwayne
First of all, as always, huge respect for your work, and I'm looking forward
to reading the rest of the posts. There is something I'd like to start a
discussion on, though:

> nor will reading these posts teach you how to write good specifications; so
> this series is neither necessary nor sufficient for putting TLA+ to good
> use, but I think it is both necessary and sufficient for understanding it.

Emphasis on the 'good specifications' part: I think this is a _huge_ oversight
in the TLA+ community and the biggest barrier to its current adoption. The
problem isn't that people don't understand (or can't understand) the theory
behind it, yet it's the thing everybody focuses on teaching. The problem is
specification: _nobody_ is talking about how to actually __use __TLA+. In
fact, some of the advice given is actively harmful!

One example: in _Specifying Systems_, Lamport defines a memory cache with the
operator `NoVal == CHOOSE v : v \notin Val`. He explicitly calls this best
practice, because adding a `NOVAL` constant would be making the model more
complex. Except unbounded CHOOSE is not checkable by TLC! While `NoVal` is
nice for abstract specifications, it breaks things in the 99% of cases where
you're actually using TLA+.

I find it telling that the majority of example TLA+ specs "in the wild" are
either abstract algorithms or optimization problems. The things you don't
find? How to write good invariants. When you should be using raw TLA+ versus
the PlusCal syntax. Example models. Optimization your model checking. The
stuff that might get people to start using TLA+ and _keep_ using it.

Again, none of these are criticisms of your work or your writing. I just
wanted to openly say that if your goal is to get people interested in the
theory of TLA+, it's great, but if it is to get people interested in use of
it, a lack of good theoretical introductions isn't the main barrier.

~~~
pron
> The problem is specification: _nobody_ is talking about how to actually use
> TLA+.

I think that is Lamport's focus; he talks about this more than he talks about
the theory. Plus, you have the Dr. TLA+ series.

> Lamport defines a memory cache with the operator `NoVal == CHOOSE v : v
> \notin Val`. He explicitly calls this best practice, because adding a
> `NOVAL` constant would be making the model more complex. Except unbounded
> CHOOSE is not checkable by TLC! While `NoVal` is nice for abstract
> specifications, it breaks things in the 99% of cases where you're actually
> using TLA+.

You definitely _should_ write `NoVal == CHOOSE v : v \notin Val` in your spec
(as that clarifies the intent) _and_ override it as a "model value" (i.e., a
constant) in TLC. This is best practice. Even `∃ x ∈ Nat` is not checkable by
TLC, yet best practice is to leave it like that (as that is the intent), and
override `Nat` in TLC. In either case, the more general specification is
verifiable in TLAPS.

> Example models. Optimization your model checking. The stuff that might get
> people to start using TLA+ and _keep_ using it.

This kind of discussion is very common on the mailing list (there are a lot of
hidden gems there), but I would be very happy to see more people write "pro
tips" and publishing them.

> Again, none of these are criticisms of your work or your writing. I just
> wanted to openly say that if your goal is to get people interested in the
> theory of TLA+, it's great, but if it is to get people interested in use of
> it, a lack of good theoretical introductions isn't the main barrier.

My personal goal in this particular series is just to get people interested in
the theory. But TLA+ is still extremely young when it comes to "use in the
wild". It was virtually nonexistent before 2012, and started slowly gaining
traction only around 2014, when Amazon published their report. As you can see,
writing short and to the point tips, tricks and guides isn't exactly my forté,
but they are extremely important. As this tiny community grows, there will
hopefully be those that will fill in the missing gaps. Your own tutorial is a
great start.

------
plinkplonk
Very nice. I really appreciate you placing TLA+ in context with alternate
approaches. I look forward to seeing the followup posts.

PS: How are you generating these pages? Jekyll? something else?

EDIT: This probably needs editing -- "nor will reading the posts will not
teach you how to write good specifications;"

I _think_ you probably meant to write "nor will reading the posts teach you
how to write good specifications;"

~~~
pron
Yep, Jekyll. And I've fixed the mistake. Thank you!

------
nickpsecurity
@ pron

This is a great write-up I learn quite a bit from but way, way, too long.
There's a lot of redundancy in here much like my first sentence. You should
try to condense this since the redundancy might make a chunk of the modern
crowd think the rest of what they see in the scrollbar will be the same and
move on to next article. The last write-up you sent me didn't have that
problem.

I'm glad I saw this, too, as I recently watched your video citing all the
mathematical evidence that correctness may not decompose. I got one or two
counterpoints to email you later but overall want your data peer reviewed by
people highly experienced in abstractions in large, verification efforts and
making VCG's given their output will have to compose, too. Let all the pro's
duke it out with arguments, proofs, and attempts at actual projects. I'll
reserve judgment until the dust settles. :)

~~~
pron
You are absolutely right about the repetition (and part 1 is my least favorite
installment in the series), but reducing it in size took too much time. If you
have any editing suggestions, please email me.

As to the the "mathematical evidence", it is a published, peer reviewed, 2005
_proof_ , and a rather simple one. Practical experience has little to do with
it, because we need to separate two things: the computational complexity of
the worst case -- which is _provably_ exponential not only in the number of
components but also in the size of the state space of each component -- and
cases that arise in practice. This is just like the halting theorem, that
states the worst-case undecidability, but does not claim that _every_ instance
is undecidable. However, the interesting question is how far are commonly
encountered cases from the worst case. When it comes to symbolic model
checking (i.e., taking advantage of syntactic structure), even though we would
have thought the problem is essentially easier (in the worst case), it is
provably just as hard, and moreover, we have years of experience (cited in my
post) that common cases are close to the worst-case. When it comes to
"decomposability of correctness", I cited no empirical evidence. Nevertheless,
it should be noted that no large or even medium-sized software has ever been
fully verified, so I don't think anyone believes the problem is easy.

~~~
nickpsecurity
" it is a published, peer reviewed, 2005 proof, and a rather simple one."

Proofs can be wrong or irrelevant in practice. An example is not being able to
prove the tetmination of executing arbitrary programs. Sure I can: it's called
a watch-dog timer. So, Im wanting it peer reviewed. Plus, it's critical to
know for sure that and correctness composition to determine what to invest in
for software assurance R&D.

"Nevertheless, it should be noted that no large or even medium-sized software
has ever been fully verified, so I don't think anyone believes the problem is
easy."

Never easy. :) However, I still think we should consider counting the deep
verifications. You keep saying they havent went _upward_ to verify a program
of certain size as if it means something. I know only one that tried (Verisoft
for Hyper-V). Most went _downward_ toward the gate level including Verisoft
where Microsoft's was just a side item. Verisoft did app correctness to some
degree, down to C code, its compiler, the OS, the microkernel, and CPU tying
it all together. DeepSpec just tied piles of high-level and low-level code
together. So, such methods should allow us to achieve the same amount of
complexity going upward using same abstraction and shortcut techniques. We
havent done that much upward because they just been working the other way.

Now, what they'll ultimately achieve may or may not be that good. Best
approach might be to throw DeepSpec's DSL's and tooling at a large, complex
application whose code is FOSS and accessible. I suggest an application server
with compression, encryption, caching, and error handling as a good way to
stress that project from multiple angles.

~~~
pron
> An example is not being able to prove the termination of executing arbitrary
> programs

The theorem states that you cannot prove a program's termination any faster
than running it. You're not violating any theorems.

> So, Im wanting it peer reviewed.

But it _has_ been peer reviewed. Here is the publication
[http://www.sciencedirect.com/science/article/pii/S0022000005...](http://www.sciencedirect.com/science/article/pii/S0022000005001297)
in this journal
[https://en.wikipedia.org/wiki/Journal_of_Computer_and_System...](https://en.wikipedia.org/wiki/Journal_of_Computer_and_System_Sciences)

> So, such methods should allow us to achieve the same amount of complexity
> going upward using same abstraction and shortcut techniques.

First, I think you're misinterpreting (and exaggerating) what has actually
been achieved. Looking at Hyper-V[1], 30KLOC of (preexisting and tested) C
code have been verified in a year (out of a total of just 100KLOC), but only
function-by-function, rather than global correctness properties. They _wrote_
some global invariants, but didn't verify them. As I understand, such similar
verification of various layers has been done, but no global end-to-end work.
Whether low-level or high-level, there has been no composition across
abstraction layers.

Second, such methods would not allow us to go "upward", because there's a
reason why people concentrate on the lower level. The state space is smaller
there.

Finally, I think you're also misinterpreting DeepSpec's work. It is at the
forefront of verification of a certain kind, but if you look at their papers,
when they talk about end-to-end verification, they're discussing programs that
are 3KLOC C. These programs are smaller than many JavaScript libraries. Also,
Coq is not really a tool engineers can put to everyday use. This is what Simon
Pyton-Jones, one of Haskell's designers, has to say about Coq and Isabelle:
[https://youtu.be/re96UgMk6GQ?t=50m56s](https://youtu.be/re96UgMk6GQ?t=50m56s)

Formal methods are starting to work, but we are roughly in the same stage as
aviation in WWI, while you are talking about "using the same methods" to get a
manned craft to Mars. We may well get there, but it will take a while, and the
methods will not be exactly the same.

[1]:
[http://d3s.mff.cuni.cz/research/seminar/download/2010-02-23-...](http://d3s.mff.cuni.cz/research/seminar/download/2010-02-23-Tobies-
HypervisorVerification.pdf) and
[https://link.springer.com/chapter/10.1007/978-3-642-05089-3_...](https://link.springer.com/chapter/10.1007/978-3-642-05089-3_51)

~~~
nickpsecurity
"The theorem states that you cannot prove a program's termination any faster
than running it. You're not violating any theorems."

Someone explained it poorly to me then. Sounds like I'm still countering it,
though, since my program that terminates it can move faster than the program
it's terminating. I can have an infinite loop load and begin with a downward
counter interrupting it one or more cycles after it begins. I'm still using
two programs and context change to mitigate the artificial problem posed by
the theorem using just one program looking at another.

"But it has been peer reviewed. "

How many people looked at it? For how long? And what diversity of backgrounds?
Peer review of something critical to me doesn't mean people that look at
theorems look at it and say "Good one. Done now." It's a social process that
should involve mathematically inclined, people who do applied math in that
field, and non-mathematical people in same domain with a track record of
breaking proofs in practice. On the latter, there's been a number of breaks in
cryptography because the oversimplified, abstract, elegant proof missed key
details. Peter Guttman also had a nice rebuttal of reliability of formal
methods showing all the times formal specs and proofs were incorrect with it
taking years or so for people to spot the errors or disprove it by executing a
program.

So, I have no doubt it looks good to mathematicians. I just want more effort
on it from aforementioned kinds of people before I rely on it. Preferably a
few man-years of people trying to solidify or counter it.

"rather than global correctness properties. They wrote some global invariants,
but didn't verify them."

See, this is where being a specialist can help. I'm a bit unclear here. Can
you get implicit, global correctness if you decomposed your app into pieces
who are each locally correct? And interface conditions of function calls are
correct? Or is that inadequate to you? The literature was unclear to me on
this point.

Note: Three functions per day? Maybe we need to apply these tools to deeply-
embedded 8-bitters with _bytes_ of RAM or ROM. Then, a programmer doing full-
verified work will feel really productive after "filling up the hardware" in a
month or two. First time in verification history for someone to feel that
productive. ;)

"It is at the forefront of verification of a certain kind, but if you look at
their papers, when they talk about end-to-end verification, they're discussing
programs that are 3KLOC C. "

Oh yeah, they're small. The reason I bring them up is that they're at the
forefront of composing pieces. I figured you might look at what they're doing
on absraction and integration of proof to get a better idea of the composing
correctness aspect of your research. Then others and I might learn from your
observations.

"This is what Simon Pyton-Jones, one of Haskell's designers, has to say about
Coq and Isabelle"

You're just not the guy to have at a verification party, Ron. You stay trying
to depress us. :P I appreciate the link, though, since that's guy is a
hilarious speaker. He says they start learning Coq, disappear for two years,
get all their pain receptors destroyed, and emerge (paraphrased) as zealots
bringing the Gospel of Coq to all programs. Lmao. I've seen this disappearance
or pain process. It's why I'm still ignorant of the details due to not digging
into the full, learning process. Life is already rough enough right now haha.

"but we are roughly in the same stage as aviation in WWI, while you are
talking about "using the same methods" to get a manned craft to Mars."

Nah, I'm talking about doing a few, key components of the craft with formal
methods with the rest build with lightweight, formal specifications, model-
checking, test generation, and esp human review. Recovery procedures simple
and verified formally if possible. That's my assurance recommendations for
your Mars mission. I do like your WWI aviation comparison. It also fits with
my style of figuring out just how much I can cheat around the limitations to
kick ass. Within your metaphor, the "Biafran Babies" is something that
could've been done with WW1 avaiation.

[http://www.cracked.com/article_19472_the-5-craziest-
soldiers...](http://www.cracked.com/article_19472_the-5-craziest-soldiers-
fortune-to-ever-cash-paycheck.html)

~~~
pron
> Sounds like I'm still countering it, though

If you think you could so easily circumvent mathematical theorems, why are you
such a believer in formal methods? I don't understand exactly what your method
is, but I can assure you you're not circumventing the halting theorem. There's
a very precise definition of what it means to be _faster_. The halting theorem
poses no artificial constraints on the problem; it's an extremely general
theorem, and it certainly doesn't say _how_ the decision must be made. You're
perfectly allowed to serve as an OS and simulate the input program if you
like, but that's not going to help.

> How many people looked at it? For how long? And what diversity of
> backgrounds?

The proof is very simple, and very similar to the proof that a system of
cellular automata is Turing-complete: it points out that any nondeterministic
Turing machine with a tape of length k cells can be simulated by k finite
state machines. If it were possible to verify each component separately and
combine the result efficiently, then any such TM could be verified, yet it is
well known that it cannot be. The paper was written by well-respected
academics, published in a respected journal, and has over twenty citations.

Now, the reason our response should be, "so verification is as hard as we
thought", rather than "this is the end of the line", is because, like all
results in computational complexity, it talks about the general case. It is
possible that a large enough class of programs that are of interest are
special cases that are easier to solve.

> Peter Guttman also had a nice rebuttal of reliability of formal methods
> showing all the times formal specs and proofs were incorrect with it taking
> years or so for people to spot the errors or disprove it by executing a
> program.

I don't know what you're referring to, but I'm not aware of any cases where
mechanically checked proofs or model checker results have ever been found
wrong. The spec can very well be wrong, but you're the one arguing the
infallibility of formal methods.

> Preferably a few man-years of people trying to solidify or counter it.

No one is going to try and counter a mathematical theorem, with a very simple,
well-understood proof.

> Can you get implicit, global correctness if you decomposed your app into
> pieces who are each locally correct?

I'm not sure what you mean by "implicit". A formal spec requires a formal
proposition to check. Things can be true without us proving them, but formal
verification is about verification, not about truth. Theoretically, programs
could be correct without verifying or testing them in any way. As to building
up a global correctness property end-to-end (from very low-level code and up)
is possible by very hard, and no one has ever been able to do it for even
medium-sized programs. Obviously, working in a modular way is how people do
it; that's how both math and software works. Yet, it's very, very hard.

> Then others and I might learn from your observations.

I think you overestimate my abilities. I'm not an academic. Just a programmer
with some experience using formal methods.

> You stay trying to depress us.

This is where you're wrong. Formal methods -- like aviation -- are awesome!
But let's enjoy them for what they are, not what we imagine them to be. If you
see someone speaking of formal verification as if it were a solved problem,
and writing fully correct programs is just a matter of learning Idris, you can
know one thing for certain: they've never actually used formal methods on non-
trivial real-world programs. The FM experts are those lowering expectations
the most. Watch some talks by Xavier Leroy.

> formal specifications, model-checking, test generation

All of these _are_ formal methods (the last only if there's a formal
contract).

> [http://www.cracked.com/..](http://www.cracked.com/..).

:)

------
dgreensp
Great article, and I look forward to the next! I've been meaning to learn
about TLA+ for a while.

One thought I didn't see expressed, in all the comparisons of tools and
approaches, is that sometimes you don't care about time and state. In purely
"transformational" programs, like compilers, for example, all state is
incidental complexity, so we can analyze them as pure functions. Meanwhile,
all the questions such as whether we can just execute the spec and whether we
can prove properties of real programs remain. New web frameworks are trying to
keep the parts that deal with time and state as small and self-contained as
possible, leaving the rest to be described using functions.

Couldn't we say that the factor of "do you care about time" is behind
observations such as that TLA+ is mostly popular with concurrent and
distributed systems, or that approaches that only look at function contracts
often apply to pieces of a program but not the whole? As a more specific
statement than, for example, that distributed systems are particularly hard so
they necessitate particularly powerful tools.

~~~
pron
> all state is incidental complexity

This is an example of the "Whorfian syndrome" of functional programming.
Program state is not the same as the problem of "global mutable state" often
cited as a cause of programming bugs. When it comes to program analysis, I see
no reason why reasoning about state would be any different, let alone more
complicated, than reasoning "trasnformationally". It's _exactly_ like the
difference between:

    
    
       h(g(f(x)))
    

and

    
    
        let a = f(x)
            b = g(a)
            c = h(b)
    

Reasoning about the two is _exactly_ the same, and the program state in both
cases is also identical: you have three states.

I also disagree that there is no interest in time or space in purely
trasformative systems. First, A trasnformative algorithm that is, say, doubly
exponential in time or space is just as wrong as a database that fails to give
an answer. Second -- and more importantly -- reasoning about a recursive
algorithm is exactly like reasoning about state. There is no way to show that
Quicksort works without reasoning about state. You may not mention it
explicitly in your chosen formalism or you may call it by a different name,
but it's there, and you're taking it into consideration.

Also, TLA+ is a universal formalism, and, as you'll see in part 2, there's
nothing stopping you from representing everything as functions without any
passage of time. The reasoning is identical -- there is no added complexity at
all, incidental or otherwise -- but may be a matter of personal aesthetic
preference. Because TLA+ is a mathematical notation, substitution and
"equational reasoning" works just as well even though time can be expressed.

I would say that the main difference between TLA+ and the functional
approaches, is that in TLA+ you reason purely mathematically, while in the
functional approaches you reason within a (complex) programming language.

It's very hard to talk about popularity, because, as far as I know, there are
no "functional" tools that are used in the industry _at all_ (certainly not
without help from academics). Part of the reason may be due to the fact that
so far, software companies that require verification usually write reactive
realtime software rather than compilers, so they use tools like B-Method,
SCADE, SPIN and TLA+. Another reason is that with the exception of the very
good, but rather limited Why3, there are simply no industrial-strength
verification tools out there aimed at industrial use rather than research.

~~~
dgreensp
I mean "state" in the sense that we can distinguish time-dependent systems
with internal state or memory from time-independent systems, whose outputs are
a pure function of their inputs. Combining pure functions to create pure
functions does not introduce any state, in this sense.

An entire program may have no observable state, like a command-line tool that
reads stdin and writes stdout with no particular timing. It does have state
inside it, of course, but only because it's running on a giant state machine.
We can fully describe its behavior without making reference to any internal
states, using functions or relations to talk about the relationship between
its input and its output.

> I would say that the main difference between TLA+ and the functional
> approaches, is that in TLA+ you reason purely mathematically, while in the
> functional approaches you reason within a (complex) programming language.

This statement puzzles me, because we have great math for reasoning about pure
functions (mappings). We have lambda calculus and category theory. You don't
need a state machine to prove QuickSort works in a purely mathematical way,
either, just statements about the equivalence of functions. The recursion can
be handled using the fixed point operator and the algebraic laws it obeys.

Edit: In fairness, existing "functional approaches" to specification do seem
complex and look like programming languages.

I came across an old comment of yours
([https://news.ycombinator.com/item?id=11835178](https://news.ycombinator.com/item?id=11835178))
that seemed to support the idea that FP and TLA+ have strengths for different
kinds of programs, depending on whether they are more transformative or more
reactive. I'm just as happy if learning more about TLA+ turns my idea of this
on its head, though!

~~~
pron
> Combining pure functions to create pure functions does not introduce any
> state, in this sense.

Yes, but that's not the same as state in the TLA+ sense. You cannot have
computation without state in the TLA+ sense, whether you call your functions
pure or not. In TLA+, an algorithm that "purely" computes quicksort and one
that outputs an interim result after each recursive step is the same
algorithm. TLA+ doesn't talk about _visible_ state, but state in the most
fundamental way, without which there is no computation whatsoever.

In part 3 I ask whether Euclid's algorithm or Quicksort are imperative or
pure-functional. There is no way to answer that question, as their
descriptions are at an abstraction level above where this distinction can be
made. TLA+ can work at that level. You just can't look at Euclid's algorithm
(or quicksort) in TLA+, and tell whether it's imperative or functional. It's
an abstract mathematical description.

> We have lambda calculus

Lambda calculus _is_ a state machine (or, more precisely, it is naturally
described as a state machine).

> You don't need a state machine to prove QuickSort works in a purely
> mathematical way, either, just statements about the equivalence of
> functions.

What you're calling state and what TLA+ calls state is not the same thing.
Whichever way you prove Quicksort works _is_ the way you'd do it in TLA+ with
state. You just don't call it state. I would, however, be very much interested
to see how you prove Quicksort works using category theory. AFAIK, all
approaches to verify QS work the same way, using an inductive invariant,
namely an invariant on state.

> that seemed to support the idea that FP and TLA+ have strengths for
> different kinds of programs

I was talking about synchronous programming languages inspired by TLA+, and I
agree. I would rather write a compiler in a pure-functional language than in a
synchronous one, and an interactive program in a synchronous language than a
pure-functional one. But that is a separate discussion from that about
mathematical reasoning. I would agree that if one day there would be a FP-
based verification tool that is aimed at engineers rather than researchers,
then for ergonomics/syntax reasons, it's possible that it would be more
convenient for reasoning about compilers. But there's nothing in the _theory_
of TLA+ that makes it less convenient.

I do, however, admit that there are tradeoffs between working in "pure math"
as in TLA+, or within a programming language. There are often patterns that
are specific to certain programming styles, and reasoning within a similar
formalism may reduce effort. In other words, sometimes you may want to reason
about a specific _expression_ of an algorithm rather than about the algorithm
in its pure abstract sense, and a formalism that shares the same expression
style may make it easier. But I don't know how much of this is purely
hypothetical. Most algorithms that are interesting or subtle, are
_essentially_ interesting or subtle, and verifying them in all formalisms is
very similar (and usually involves the same proof technique: inductive
invariants).

~~~
dgreensp
Thanks for bearing with me, and responding, even if it might be more efficient
for me to wait and read the next parts.

It seems like if the quicksort of a list L is defined as equal to the
concatenation of 1) the quicksort of the subsequence of L consisting of items
less than the pivot item, 2) (the list containing) the pivot item, and 3) the
quicksort of the subsequence of L consisting of items greater than the pivot,
then we can prove properties like "any two items of the resulting list are in
sorted order" by case analysis on whether each item is part of (1), (2), or
(3).

Maybe you would interpret "the quicksort of L is defined as..." to mean "if
you're in state quicksort(L), you can proceed to state the-concatenation-of"?
If we're working in the realm of mathematics, don't we get definitions,
equations, and inductive proofs for free, without making a state machine do
it?

Edit: Reading more, I guess I'm just doing the sorting inside the "next-state
relation" of a state machine, so it's done in one state transition, and saying
you could look at quicksort as one way to factor the relation, which by math
is equivalent to other ways to factor the relation that resemble other sorting
algorithms.

~~~
pron
> Reading more, I guess I'm just doing the sorting inside the "next-state
> relation" of a state machine, so it's done in one state transition

While you could describe all of quicksort as a function in TLA+ (taking place
instantaneously), that's no what you've done. You described a single state
transition, but one that doesn't capture all of quicksort, just one
partitioning step. To show that the result is sorted, you'd need to do an
induction over such concatenations; the thing you're doing the induction over
is called state.

------
storrgie
writeup aside, this is a really nice template... similar to tufte margins. I'd
love to see the source/style for the site...

~~~
pron
It's a combination of the tufte and monochrome Jekyll templates.

------
avodonosov
There are interesting pieces of information, but its very long

