
Modeling Message Queues in TLA+ - panic
https://www.hillelwayne.com/post/tla-messages/
======
sriram_sun
I'm reading Hillel's book right now and have just finished Chapter 2 (Intro to
PlusCal). I really like it. I've had some failed starts in the past with TLA+.
A few observations on the book:

a. The language is clear. I feel like making progress at a fairly regular
pace. The examples so far could be replicated without any issues. I have both
my Kindle and cloud reader open when I read. The screenshots are better in the
cloud reader.

b. I believe it was a great decision to use PlusCal instead of TLA+ itself.
One of the issues coming up to speed in the past was grappling with TLA+.
PlusCal just makes it easier.

c. I hope that I can actually improve the quality of my C++ programs. I'm
currently working on some infrastructure software for embedded systems and
have implemented a Hierarchical State Machine "framework" in C++
([https://github.com/tinverse/tsm](https://github.com/tinverse/tsm)). My goal
is to add a pub/sub mechanism and create a HSM that can be distributed over a
network. Reviews welcome. Since I just came across this book, going forward,
I'll be attempting to develop my specs in PlusCal before actually writing
code!

d. I think other (to be written) promising books/blogs in this category would
be something in the flavor of "Specs to Code" for common design issues. With
its focus on concurrency and distributed systems, that implementation language
could very well be Rust. Any pointers on porting C++ code to Rust? I'm trying
to port above mentioned C++ code to Rust. Respond directly if you can. I don't
want to turn this into a discussion on Rust.

~~~
pron
> One of the issues coming up to speed in the past was grappling with TLA+.
> PlusCal just makes it easier.

I think this has both pros and cons. PlusCal, being essentially a formal
pseudo-code, is much more _familiar_ to programmers than TLA+, and familiarity
can greatly assist learning. The downside is that TLA+ is much simpler --
though far less familiar -- than PlusCal (or any programming or programming-
like formalism), as it shares the complexities of code (although programmers
are used to ignoring them), and at some point if you work with PlusCal you
will have to learn and understand it and go through the necessary process of
"unlearning" code and thinking in simpler terms. Some may find it easier to
start with that and see why any code is more complicated than you may think;
after that, learning PlusCal comes for free once you see the translation to
TLA+.

Personally, I find writing specifications in PlusCal less convenient and less
clear than writing them in TLA+, unless you must reason at the code level
(like in the case of low-level concurrent algorithms, where interleaving of
particular _instructions_ is crucial). PlusCal, however, has the advantage of
being more communicable to those unfamiliar with TLA+, as it reads like
pseudo-code.

> something in the flavor of "Specs to Code"

If a programmer can turn an informal specification (or even a set of
requirements to code) then transforming formal ones, which are more precise,
shouldn't be an issue. But on a theoretical level, once you become more
comfortable and experienced with TLA+ you'll see that the notion of specs to
code is a special case of a very general and powerful concept called
refinement, which forms the very core of TLA -- the Temporal Logic of Actions
-- the logic at the heart of TLA+. In fact, in TLA, the ordinary logical
expression A ⇒ B, i.e. A implies B, means A _refines_ or _implements_ B (and
conversely, B _abstracts_ A).

~~~
pwnna
I find that learning raw TLA+ to the point where you're comfortable reading it
and internalizing what the spec is saying to be a _critical_ skill to be able
to write specifications successfully. This is because PlusCal translates
directly to TLA+, which means that:

1\. To debug errors given to me by TLC, I generally find that it's easier to
look at the TLA+ spec directly.

2\. It is sometimes easier to see what your algorithm is doing directly in
TLA+. I find that I will write the algo in PlusCal, it gets translated, and I
can get a better understanding of how the system works by looking at the TLA+
spec. Example: the "process model" of PlusCal does hide some of the ideas.
Multiple process are just P1 \/ P2 \/ P3 ....,

3\. There are some oddities within PlusCal (label placement, the with
statements) that makes much more sense if you understand the TLA+ foundation.

What I really wish is that I can write partly PlusCal and partly TLA+ for
maximum flexibility. However, it's kinda ackward to do within the current
framework.

~~~
pron
Right. In addition, you miss out on the notion of refinement (ordinary
implication in TLA+) which is the most powerful (and most foundational) idea
of the formalism. But I think many developers may be threatened by the
unfamiliarity of TLA+, and PlusCal serves as a gateway drug, as it's often
useful enough on its own (you're right that you need to know at least a bit of
TLA+ to understand some of the tooling's output). If "PlusCal first" makes
TLA+ less threatening and hopefully more approachable to many developers, then
the approach is doing its job.

(BTW, I wouldn't call it "raw TLA+" as "raw TLA" is a particular logic that
Lamport uses to introduce TLA, so the term may be confusing.)

------
mooneater
Can anyone comment on where exactly lies the line between, too simple to need
formal specs, vs crazy to build it without them?

What would be the simplest system for which you would recommend taking the
time to make a formal spec?

~~~
colanderman
I readily use TLA+ or PlusCal whenever I am reasoning about a system which may
(if buggy) experience deadlock, starvation, race conditions, etc. This
includes most systems involving two or more communicating processes. (The
exceptions are those systems, such as a simple pipeline, where communication
is trivial and acyclic.)

It's perfectly suited to this domain of problems, which also happen to be
those which are difficult for humans to reason about (owing to the sheer
number of possible interleavings of execution between two or more processes).

And TLA+, an especially PlusCal, are simple enough that no system is really
"too small" to be modeled. E.g. a PlusCal model of a simple pipelined server
(say, reading stuff from the network and writing asynchronously to disk) is
maybe ¼ to ½ the size of the actual implementation, with half as much again
worth of invariants.

~~~
tigershark
Exactly what I was feeling in my gut... but then what is the overwhelming
advantage over rust that as far as I know is designed _specifically_ for this
kind of problems but is much easier to reason about, at least coming from
standard programming?

~~~
colanderman
The primary benefit of TLA+ over _any_ general-purpose language is that it –
as a modeling language – allows you to elide those parts of your system that
"don't matter". To continue my earlier server example, when modeling, I
probably don't care specifically which functions I'm using to read data from
the network, under what conditions they produce which errors, or how the OS
might decide to schedule them. I just care that such a function _exists_ ,
that it _may_ error out, and that it gets scheduled _somehow_.

Then, as a consequence of clearly expressing all the behaviors my system
_might_ express, TLC (the TLA+ model checker) is able to _exhaustively search_
these possibilities for bugs. This is at best intractable for any system where
you are unable to abstract out the things that "don't matter", like in Rust.
There are just too many variables (literally) in the search space.

That said, there's nothing _particularly_ special about TLA+, or especially
PlusCal. They just provide language features such as first-class sets and
clearly-defined atomic state transitions that make it very easy to describe a
system without too much irrelevant "stuff". The only particularly notable
feature of the modelling portion of the languages is nondeterminism, which is
necessary to express the boundaries of your model (e.g. nondeterminism is how
you model the "receive from network" function as "a function that either
returns some data or throws some error").

One could even imagine using Rust (to use your example) as a modeling
language. You'd need to add nondeterminism (or another means of specifying
contracts), and a way to specify temporal invariants to check. And also
annotate which functions act as mutexes and which may block waiting for I/O.
And you'd probably want to stub out container types so the model checker
doesn't have to model their implementations. But once you've done all that,
your mutant Rust is starting to look a heck of a lot like PlusCal, only more
complicated.

------
gklitt
For those who have used modeling tools like TLA+ and Alloy in industry --
could you give an example of at what point(s) in the development process you
used the tool and how long it took?

I'm intrigued, but having trouble visualizing how even lightweight formal
methods can fit into an agile development process at a startup building a web
application.

~~~
marco_salvatori
You might create a TLA model during the initial specification process to
assist you in thinking through your problem. You would then update the model
going forward as your understanding of the domain improved and as system
requirements changed. However, you could write a specification for an existing
system as well. Any processing where the complexity seems greater than ones
reasoning ability or that needs assurance guarantees, could signal the need
for a high level description to guide engineering efforts.

An initial model might take 4 hours to put in place. The time would be spent
thinking through how best to abstract the modeled process and its logical
properties, slowly building up a more and more complete set of events, and
checking the model by running it as one goes. With an initial model,
additional hours would probably be spent here and there adding enhancements
and finding ways to do things better, just like one would do with regular
code. The model would effectively exist over the lifetime of a corresponding
code artifact and guide work on the artifact.

For a standard web application that mostly does reads and writes to a
database, there usually wouldn't be any need for a tool like TLA to help you
reason. A general use case for TLA is describing systems where there are
multiple processes or threads coordinating on some sort of shared state.There
would be indeterminacy in the order in which events happen. There could be the
possibility of failure and the need to handle it gracefully.

~~~
fulafel
Web applications can be surprisingly complicated distributed systems, when you
take into account..

\- the JS frontend that communicates with AJAX and/or WebSockets IPC with the
backend

\- the backend may actually be a bunch of microservices

\- the fact that there are many concurrent frontends (users)

\- messages from frontends may end up reordered/dropped/rerouted on their way
to different backend ipc channels (microservices or replicated backend)

\- the frontends are untrusted (the user or browser may tamper with your code
running there)

\- the backend server is also often replicated / sharded

\- there are caches (think Redis) that the backends share in addition to the
main DB.

\- load balancers, high-availability reverse proxies with various failover
propreties, etc

\- various caching behaviours that are only partly under your control
(browser-side content caching, DNS, etc)

\- all of the above are interconnected over unreliable channels

\- there are secrets and security involved so functional assurance ("pressing
button X always makes function Y happen") doesn't tell you enough

(I have no TLA+ experience so can't say how one would use it with a web app,
though).

------
sigmonsays
i've encountered TLA before and am a complete newb. That being said, I've
wanted to use it for modeling distributed systems but felt the learning curve
was too steep. Now many years later, I think i'm finally going to take another
go.

Would love to know where the TLA community hides. I dont see much real info on
it in the wild.

~~~
BoiledCabbage
Haven't used it, but this site looks like a great resource and is in my queue
to go through.

[https://learntla.com/introduction/](https://learntla.com/introduction/)

~~~
Jtsummers
I was working on gathering my links and that was one I was going to suggest.
It's a very approachable intro, and also written by Hillel. I've added his
(now published) book to my queue to purchase and read through. I may do that
sooner rather than later so I can provide feedback here on other discussions
regarding it, beyond saying that it exists.

The talk linked at the top of that page is pretty good as well as a tutorial,
though hardly complete. As I recall he presents his examples well and has good
motivating examples for using TLA+ in it.

------
shusson
For people who have experience with TLA+ do you generate implementation code
from the TLA+? Or is the idea to only use it for some formal design
specifications?

~~~
aeneasmackenzie
TLA+ is only for specification. In one of the papers on it Lamport mentions
just putting a copy of the specification in a comment above the
implementation. If you get a bug you just need to find where your
implementation differs from your specification which he finds is usually
pretty easy.

If your code doesn't have a specification (of course usually less precise),
fixing bugs is incoherent. How can you say come behavior is a bug?

~~~
shusson
Yeah but why not generate code from your specification? Or create a model of
the specification that you can interrogate in code?

~~~
aeneasmackenzie
You would need to include irrelevant details. TLA+ is an actual declarative
language, so it's not executable, but it can be model-checked.

------
technion
While I'm reading this, I feel more and more like asking: Am I the only one
that finds TLA+ much easier to read and write than PlusCal? It's an odd thing
to say, with PlusCal positioned as the easier option.

~~~
sriram_sun
I didn't know about the PlusCal option until I saw Hillel's book and it
clicked with me instantly. Might be the damage done by programming.

Objectively, PlusCal is a higher level language. So it _should_ be easier than
TLA+ right? More specifically, look at
[https://pastebin.com/cwZaApmH](https://pastebin.com/cwZaApmH). The top half
is PlusCal and bottom half (after \\* BEGIN TRANSLATION) is the translated
TLA+ code. PlusCal reads more like pseudocode. Maybe with some more
training/effort the TLA+ would be obvious as well.

~~~
hwayne
One thing I want to caution is that that's the translation of PlusCal to "raw
TLA+". If you were not using PlusCal, you might write it a different way. For
example, you wouldn't need a `pc` variable because you could express sending
and receiving as short guarded actions.

------
anentropic
This seems as good a time as any for my mild rant about the formal modelling
toolchain...

A few things recently made me want to try out these techniques at work.
Basically some issues we'd been facing, combined with reading Hillel Wayne's
"Augmenting Agile With Formal Methods" blog post.

So I read a little bit (I have no prior experience with any of this stuff) and
decided to try out TLA+

First downer is I seem to have to download a 500MB Eclipse-based dedicated IDE
to be able to follow any of the tutorials. Seems like a bad omen.

Next I have the inevitable problem with wrong JVM runtime on my system... as
ever, the "write once, run anywhere" promise pushes these issues down to the
end-user rather than having the author compile working binaries for common
platforms.
[https://github.com/tlaplus/tlaplus/issues/194](https://github.com/tlaplus/tlaplus/issues/194)
No matter, solved easily enough.

Ok so I'm stuck with a clunky unfamiliar IDE, but that's ok for now because to
start out I will just be copy and pasting code from Hillel's beginner tutorial
[https://learntla.com/introduction/example/](https://learntla.com/introduction/example/)

Within minutes I have put the IDE in an unusable state:
[https://github.com/tlaplus/tlaplus/issues/195](https://github.com/tlaplus/tlaplus/issues/195)

At this point I'm thinking - these are just text files like any other
language, why can't I edit them in my usual IDE and then run the model checker
from the command line?

This is technically possible. But this scenario is not well documented. You
can download the tools from here
[https://lamport.azurewebsites.net/tla/tools.html](https://lamport.azurewebsites.net/tla/tools.html)
The tools have no --help output explaining what args they need.

They are not stand-alone command line tools either. They're just the source
code, which you run directly against java.

There is a .jar file available for download which might be some attempt at a
packaged version of the tools, but who knows how to run it... Googling for
instructions turns up this thread from 4 years ago, someone asking exactly
this question, which goes unanswered
[https://groups.google.com/forum/#!topic/tlaplus/R9amI13F_-M](https://groups.google.com/forum/#!topic/tlaplus/R9amI13F_-M)
...the thread ends with a dismissive post from Leslie Lamport ridiculing such
stupid ideas as having some documentation published on a website, and
recommends everyone use the Toolbox IDE instead of trying to run the cli tools
themselves.

At this point it's quite clear to me why more people don't use TLA+, and it
has nothing to do with the weird syntax or unfamiliarity with temporal logic.

Since I still wanted to experiment with modelling I switched to Alloy. This
also requires you install a dedicated (and rudimentary) IDE. At least this one
actually works and I was able to experiment with some basic models of my own
design.

I could not see any stand-alone cli tools for Alloy but they do provide some
basic instructions for using it programmatically in your own Java project
[http://alloytools.org/documentation/alloy-api-
examples.html](http://alloytools.org/documentation/alloy-api-examples.html)
(I'd prefer something accessible from Python, like a C library, but that's ok)

I do understand that visualisation (Alloy) and presentation of complex results
(TLA+) are an important part of the development process, and this is
presumably what motivated both to provide via their own IDE.

But clearly where the expertise on these teams lies is in the computer
science, logic, model checker areas. Tying all of this brilliant and clever
stuff to a cumbersome GUI app which is not your core proficiency to deliver
seems like a very bad idea. A dose of the 'unix philosphy' would be beneficial
I think.

Think from the user perspective - I'd much rather use my usual IDE, which has
great editing tools, a nice colour scheme, doesn't crash or become unusable,
and is totally familiar and comfortable.

Ok, it still needs a front-end for model-checking results... but imagine if
the checker tool was well documented and provided structured output in a
common format (json, graphviz .dot, whatever)... if these tools really are
useful then I'm sure pretty soon you will have multiple front-ends created by
people who are good at that.

It seems sad to me that a small and brilliant community are wasting so much
effort on making bad software, instead of making great tools which other
people can build great software with.

~~~
anentropic
I'm not the only one it seems
[https://lobste.rs/s/8rfign/modeling_message_queues_tla](https://lobste.rs/s/8rfign/modeling_message_queues_tla)

That said, I bought Hillel's book and will come back and try again with TLA+

------
monkpit
Hopefully you already know what TLA+ even is, because the article won’t tell
you.

~~~
Jtsummers
Does _every_ article need to be an introductory article to a topic including a
summary of what the thing is? Should every Python article start with:

    
    
      Python is a language developed by Guido van Rossum and
      named after the comedy troupe.
    

?

Hillel even gives you an out early in the article:

    
    
      This post assumes knowledge of TLA+
    

Followed by links to two earlier articles which offer better intros to TLA+
itself.

