I also proved that one of the nasty deadlock bugs I was fighting on Linux (but not on Windows or Mac) was actually glibc missing condition variables wakeup signals from time to time.
100% convinced it's worth learning to debug or verify any kind of distributed or multithreaded synchronization primitive. I was bug hunting for a couple weeks before deciding to learn formal verification / model checking via TLA+.
Might you or anyone else have some resources you could recommend for learning TLA+?
Has glibc since fixed the issue?
I unfortunately didn't have time to go through:
- compiling glibc from source
- converting my code to C
- trying to reduce multithreaded synchronization code to a minimal example given how non-deterministic any of those bugs are
- do a write-up
That said, it seems like people went through the motions 4 months after me in this bug:
- https://sourceware.org/bugzilla/show_bug.cgi?id=25847 which affects C#, OCaml, Python
Someone also fully reimplemented Glibc condition variable in TLA+ to prove that the glibc code was buggy:
- https://probablydance.com/2020/10/31/using-tla-in-the-real-w... (at the bottom)
I'd normally respond with (when looking at the very complex relevant glibc code), "complexity is a code smell", but I don't understand the problem space enough to legitimately suggest that here, other than to say that in my experience programming thus far, if you require this much complexity for any part of your system, it is in all likelihood reflective of a design flaw (possibly elsewhere).
It's really only douchy or elitist if it's strictly used in an insulting or disparaging way. Most of the time when people talk about code smells it's meant to indicate, just like with real-world odd, bad, or unexpected smells: Something doesn't seem right. I could be wrong, but it warrants investigation.
Using the examples above, deep nesting and large numbers of parameters are indications of a potential problem, but are not necessarily problems. It could be that a refactoring is in order, but it may also be that everything is fine (just not pretty).
Most code "smells" are in the former category, they are not describing the code as "repulsive" but "off". If you're making that leap then you're going to need to supply (or find) another term for what is described, for now over 20 years, as a code smell. Since most people don't have the same visceral reaction you do, they won't care.
For example, it is quite possible that code that is nested 30 levels deep is provably correct. Since we are humans, is that likely? No, it is not. Or at least, it is certainly less likely than 3-level-deep code. So you look at some code and without doing any deep analysis you can make some "probably valid" assumptions which come from your experience.
If you're so bent on accurate engineering, answer me this: Why are technology choices often "fashion-driven", without any significant empirical data to support their choice over other alternatives? (My latest pet peeve in this space is widespread use of Docker.) And where data DOES exist (such as to support the statement "Functional languages produce far fewer bugs per LOC written"), why don't more programmers/engineers heed that? Is it perhaps because... you're not as "strictly engineer" as you think?
I thought I had bugs in my code for sure, I couldn't believe glibc and musl had a bug.
Then I decided to spend a couple days to learn formal verification, solved all the bugs it found.
It worked on my Mac, it worked on Windows, it was still deadlocking on Linux.
I replaced my code with raw futexes, no deadlock anymore. And I added logs that showed signaling happened but the thread signaled wasn't unblocked.
The futex<->condvar swap is pretty simple.
Any use of condition variables where the lock is "never used" screams fundamentally broken to me. No amount of atomics and memory orderings or manually inserted fences/memory barriers can help you if you leave a race window open between checking your predicate and starting to wait on a signal. Condvar signals are not sticky. If nobody is waiting for a signal when you signal, that signal is lost. There is nothing between your atomics and lock & wait to prevent such loss.
Also your TLA+ (or rather PlusCal) does not model a condition variable correctly, so it would obviously not detect this issue. You have a sticky boolean variable and await, which blocks the procedure until the condition becomes true. This detects lockups of the class where the condition is never signalled at all, but it does not model the possibility of losing a signal because nobody was waiting for it.
I don't believe you discovered a bug in glibc and musl.
1. Given you have to learn the language, and, at first, you don't know it - how do you even trust that you're writing a proof right, and that you're not simply writing gibberish ? TDD has code execution as an "arbiter of peace". If, when calling a function, you get an unexpected result, then... you did something wrong.
If my TLA+ set of rules holds, how do I get confidence that I wrote the right set of rules ?
2. Once I've got a set of rule that I decided to trust, when I implement it and I get bugs (as demonstrated by executable code) ; how do I decide if the most plausible cause is a bug in my code, or a bug in the spec ?
Maybe it becomes obvious once you've done it, and I'm sorry to be the million'th to say such nonsense. Sorry.
The last TLA+ specification I wrote was only about two pages long. The actual model was a handful of lines. A handful of definitions. And then a page and a half of proofs and proof proofs.
So it's easy to look over your spec exhaustively. And if there's any part of it you aren't sure about, or think you might have written wrong, then write more proofs that show that it you can't have written it wrong that way.
The IDE provides lots of feedback and ways to debug. So like learning any language, you'll spend your formative time just trying out different parts of the TLA+ semantics and introspecting the outputs to make sure the models are behaving the way you expect them to.
TLA+ is very much like doing math. If you do it wrong, you get answers that don't make sense. And it's easy to check if you're doing it wrong, by doing something you _know_ is wrong. If you get a "right" answer, you know you are making a mistake or don't understand a semantic correctly. The joy is that TLA+ allows you to write out these proof proofs and have them checked every time along with the rest of your spec.
> how do I decide if the most plausible cause is a bug in my code, or a bug in the spec ?
Your implementation and your TLA+ model share the same state data structures (though the TLA+ model might have a simplified representation of those states, eliding unimportant details). If you suspect your spec, you just copy the broken state sequence your implementation is encountering into your TLA+ model. If the model displays the same behavior you know your spec is wrong. If it doesn't, you know your implementation is wrong.
I write TLA+ specs in the same way I do TDD: first write an invariant or property I know I should half, then make it fail, then modify the spec to pass the invariant.
> If my TLA+ set of rules holds, how do I get confidence that I wrote the right set of rules ?
Ultimately, nothing can guarantee this for us, just as nothing can guarantee we wrote the right code. This is one reason why I'm a strong advocate of spec reviews. Getting another pair of eyes on the spec goes a long way.
> 2. Once I've got a set of rule that I decided to trust, when I implement it and I get bugs (as demonstrated by executable code) ; how do I decide if the most plausible cause is a bug in my code, or a bug in the spec ?
It depends. This has happened to me a couple of times before. In one case, it was because I forgot to include something in the spec. Adding that caused the spec to immediately fail, which corresponded to the executable bug. In another case, I was able to identify, assuming the spec was correct, where the spec-code mistmatch must have been to cause the bug. So I was able to identify the code issue much faster than without the spec.
It helps that specs are usually much smaller than the corresponding codebase, so you can sweep it for issues much faster.
As an example of the latter, I used TLA+ once (only) for work to great success. But the spec I wrote ignored nearly every detail about the larger system, and focused on a few hardware details and modeled the shared memory system as a simple sequence of, rather arbitrarily, 5-11 elements (fixed in the models, in the spec it was a variable). What was stored in the memory? It didn't matter at all, it could literally be anything as far as the model and spec were concerned.
My point being, the spec was so simple that visual inspection and a bit of analysis could tell us that it was correct, it fit in a couple pages if printed out. Model checking let us verify that it had the desired behaviors (with respect to the invariants that I put in based on the official specs for the hardware). Later, weakening the invariants allowed me to "recreate" the race condition that existed in the hardware implementation and was causing all our problems.
I've watched all of Lamport's lectures but it really leaves me wondering what's happening under the hood: what are the primitives and how are TLA+ models mapped onto them.
(Lamport likes to say that TLA+ is just mathematics and imply that the engine underneath is not relevant but c'mon.)
I've written a series of posts that covers the TLA+ theory (but it isn't a tutorial): https://pron.github.io/tlaplus
Sounds like the first thing TLAPS does is to translate the TLA+ program into SMT and try to prove it with Z3. If that times out then it translates it into a couple of other kinds of provers before giving up (...).
Maybe I should look at how TLA+ "hello world" is compiled into SMT as a first step.
(The most popular model checker for TLA+, TLC, is a brute forcer that represents the state space as a directed graph. Lamport talks about it a bit in Specifying Systems.)
No regrets. I recommend watching the video series introducing TLA+ by Lamport, what a wonder.
So I entered the key parts of the algorithm in to TLA+. It found all of the races that I'd found in the original algorithm; and then didn't find any issues in my fixed algorithm. That gave me at least some confidence that I'd gotten things mostly right.
That said, it's really quite a wonky tool in a number of ways. The almost-C dialect which doesn't have local variables is quite a footgun. And it's quite tightly bound to their particular UI tool, which makes keeping the models in git possible but somewhat quirky.
What I'd honestly like more is to be able to make some files which could be compiled either in the real software project, or as part of a formal verification testing step. That way you could verify the properties as part of your CI loop (or perhaps part of more extended testing done elsewhere, if that's too expensive), and also have more confidence that the implementation matched your model.
I couldn't find a way of tracking the TLA+ Toolkit's files nicely in git, but with a bit of manual work it's possible, using the `tlc-bin` CLI. I used the TLA+ Toolkit to write the spec, but wrote the models by hand (eg. ).
This was before I heard of the VS-Code extension. I think it's quite good now, so there might be a better workflow that avoids the TLA+ toolkit altogether.
> What I'd honestly like more is to be able to make some files which could be compiled either in the real software project, or as part of a formal verification testing step.
Do you mean you'd like to generate a TLA+ spec from your source code?
I'm not sure what you're calling "spec" vs "model"; I wrote stuff in PlusCal, which I checked into git; then used `make` to translate that into .tla automatically as needed. The .tla files are .gitignored, so don't choke up your commit history with machine-generated stuff.
Here's a snippet of my Makefile:
cp $< $@
java -cp $(TLATOOLS_JAR) pcal.trans -nocfg $@
rm -f $*.old
SOURCETARGETS += XSA287.tla
TESTTARGETS += XSA287-A.out
XSA287-A.out: XSA287-A.cfg XSA287.tla
java -cp $(TLATOOLS_JAR) -XX:+UseParallelGC tlc2.TLC -config $< -workers 8 XSA287.tla | tee $@
You can see how I'm working against the way TLA wants to work: It wants to take your PlusCal code at the beginning of the file and rewrite the end of the file, "helpfully" backing up the old file with a ".old" extension; I want to take plain PlusCal code and generate an intermediate file (like an object file) which is not checked in. So I have to copy $FOO.pcal to $FOO.tla, run `pcal.trans` on $FOO.tla, and then delete the useless "$FOO.tla.old" file.
I always meant to publish these after XSA-299 went public, but haven't gotten around to it.
> Do you mean you'd like to generate a TLA+ spec from your source code?
Well obviously the whole project is so large as to be completely prohibitive. The model is written in the C-like variant of PlusCal, and uses a lot of the same variable names and such to help make sure they match the real C functions; and then translated into TLA. It would be nice if I could designate just one file (written in C) whose functions would be translated into TLA. The properties you want to verify about those functions would obviously need to be written in TLA directly.
This is from memory, but the TLA+ toolkit distinguishes between your "specification of your algorithm" and a (bounded) "model of your specification".
So, you might have some spec for a sorting function that works for arrays of natural numbers of arbitrary length. And your model would limit it to arrays of length 5, and "define" the natural numbers to be `range(0,10)` -- ie. you are checking a finite "model".
I think this is consistent with the language used in the wiki article (the OP).
> You can see how I'm working against the way TLA wants to work
I'd much prefer your suggested workflow. I don't recall that you can write plain old pcal
It's amusing that you want some TLA CLI tools that work kind of how the LaTex toolkit works, another of Leslie Lamport's contributions...
> I always meant to publish these after XSA-299 went public, but haven't gotten around to it.
> It would be nice if I could designate just one file (written in C) whose functions would be translated into TLA. The properties you want to verify about those functions would obviously need to be written in TLA directly.
That seems like a really tractable problem to me, if the C files were written with this in mind. I've actually been thinking about this since giving an internal talk on my TLA+ project last week!
That's right, I'm remembering now. So yeah, the "models" are the ".cfg" files (which as you can see from the Makefile snippet are actually specified with the "-config" argument.
The .pcal files are actually just "normal" TLA files with the generated content replaced with
\* BEGIN TRANSLATION
\* END TRANSLATION
> It's amusing that you want some TLA CLI tools that work kind of how the LaTex toolkit works, another of Leslie Lamport's contributions...
I've got a number of tools I've written for my own personal use over the years, and TLA+ has "one person spent a few decades being both sole author and user" fingerprints all over it. :-)
> Please do!
Tomorrow I'll look through things and if it's suitable I'll post my git repo somewhere.
A service team I worked for had to implement a back-end process that had zero tolerance for mistakes. The engineers involved had a principle engineer peer reviewed process worked out, and got encouraged to model it in TLA+ as part of that review process given what it was at hand. Two engineers learned TLA+ and PlusCal over the space of about a week, and then outlined the process over the next week or so, and kept finding minor, improbable, correctness bugs.
Once fixed, they found translating the code in to actual Java to be very straight-forward "It's fill in the blanks. The whole logical structure is there already".
In the end, the feature landed well ahead of planned time, even accounting for the time taken to learn and prove correctness, and was able to be launched with little fanfare and high degrees of confidence.
I know at least one of those engineers has gone on to use it very regularly. It gives them a good chance to model the logic first, solving the actual problem at hand, and then start thinking about "the boring stuff" like database connections, unit tests etc.
I'm guessing this is the video series for anyone looking for a link https://lamport.azurewebsites.net/video/videos.html
Is this a manual (and therefore, subject to human error) process, or do tools exist for automating it?
There was a Racket language that actually syntesized programs from specs (e.g. go from spec of a sorting algo to its implementation automatically), but I forget what it's called. Fun to play around with.
(Also important are that systems in this space are different. What something like Coq is good for is completely different to what TLA+ is good for. Another idea in the same vein is property-based-testing. Here the concrete code is tested by invoking random test examples, so you are testing the actual code/implementation. But this also means you can't use typical tricks of model checkers where they can regard whole classes of inputs the "same" in a test).
But as far as I understood it, if you know TLA+ and the target language well, it's like writing an article from a good outline.
I am a self taught dev and learning TLA+ gave me the feel of a "real" computer science.
Learning how to think of of programs excusively as state machines gives a kind of intellectual clarity I didn't know existed.
It is also the only thing that actually improved my dev productivity by magnitudes.
My understanding is that TLA+ is for concurrent applications, but with a broad definition of "concurrent." For example, it could be...
* multiple hosts messaging each other
* multiple threads in one process sharing a data structure
* multiple API calls that all use the same Postgres DB
In other words, any sort of system where different actors can step on each others' toes.
Non-determinism might be a better thing to anchor the value to. For example in a deterministic "normal" programming environment, an if/then/else statement will execute only one out of two possible code paths. You have to run the code with a complementary condition to observe the other code path.
In a non-deterministic environment like TLA+ both possible code paths are executed. You can observe state transitions in both possibilities.
In a deterministic context the combinatorics of code paths can get out of hand quickly. Non-deterministically you have one set of assertions for all code paths, or patterns of paths.
The latter is much less overhead, so it's good to use bounded models (with the TLC model checker) to become confident in the spec, then construct proofs if warranted.
It seems a bit overkill for the kinds of code I write, which represents a design flexibility that makes wet spaghetti look like a load-bearing truss.
IMO the most "practical" application for those methods is in strengthening type systems to allow more expressive constraints - one small example that comes to mind is Idris  allowing you to explicitly mark certain functions as total  and fail type-checking if they don't return a value for every possible input.
The main thing to keep in mind with TLA+ is that it's not really meant for validating code, it's more for showing that your design is consistent and satisfies safety requirements (e.g. "no temporary node failure should result in data loss"). However, having a TLA+ specification usually makes the implementation fairly straightforward, especially in languages/environments with message-passing concurrency.
You can use TLAPS  to write proofs-by-induction similar to yours for TLA+ specifications, but IMO the real power is in model-checking them with TLC, which gives you a lot of the advantages of formal methods with much less proof work.
Everything starts with two logic courses EECS/MATH 1028 which covers basics like induction and pigeon hole princele and MATH 1090 which covers first principles and axioms of logic.
Then, using the courses on logic that we did by hand, we have a system verification course (EECS 3342) with SMTs like Rodin which can solve on its up to a point where we need to manually input the proofs. Beyond this, we also cover proving preconditions and post conditions through design by contract in EECS 3311 - Software Design (which is usually tough in Eiffel, but finally being taught in Java after a year of lobbying the faculty).
Then our final courses are EECS 4312 and 4315 for requirements engineering and mission critical systems respectively. In 4312 we learn TLA+ and 4315 is focussed on tree and path logic for states. As well, we have EECS 4313 advanced software testing which is primarily about writing bug reports, creating tests (JUnit) and touches on distributed systems tests a bit.
The point is that we do learn the "practical" (quite impractical since our prof's are not in industry) only once we have had a lot of exposure to proving things by hand. But even once we have learned everything, they fail to connect material to realistic SDLCs and how to implement the tools in an agile methodology.
Viewing college as a trade school is a mistake many of us make. No, you're almost certainly never going to be manually proving things by induction in the real world. But that doesn't make the exercise valueless.