I primarily code in Elixir and Erlang, and have used property testing to perform some level of model checking when working with state machines, and I suspect TLA+ could be helpful for formalizing that a little bit, but it isn't a theory I've ever put into practise.
Slides 4 and 28 list actual bugs found by such modelling.
I do find TLA+ (or formal modelling in general) pretty useful for real world cases, only that, unfortunately, I don't always have the time.
More often I run into folks using SCADE's model checker. Tied for a distant 2nd, I'd say I've run into more folks who work with NuSMV, CADP, and mCRL2 than TLA+. This could also be mostly sample bias, and it's worth noting that the folks who use any of these things are in the very extreme minority of architects & developers that I speak to in automotive, med. tech., industrial automation, etc.
Though, like I said, we use TLA+. Not for any particularly good reason than it was a sufficient tool among the variety of comparable tools for what we needed to do, but I expect that if we had to go back and do it all over again we'd have selected something else. The specification language has some annoying warts, the checking performance isn't comparatively very good, and the provided IDE works in almost exactly the opposite way any of us wish it would work. To the point that one of our engineers figured out how to wire it all up through Emacs so we could stop monkeying around with TLA Toolbox for 90% of the work. Of course, everything has problems, so TLA+ isn't notoriously different here. We've just used it in anger more than other things.
For whatever it's worth, I also used to be an Erlang developer and used to play around a bit with McErlang, which I think might be a dead project now. However, there's an active project called Concuerror (https://concuerror.com/) which kind of covers the same domain. I don't know how well it works or what its level of maturity is, but it is very interesting in this one very important regard... it's model checking against your actual program implementation. That's an enormous and fantastic difference. Model conformance of concrete implementation to your design (and checked) model is an enormous problem, and tools which can close or remove that gap can be enormously valuable. What I don't know is the level of sophistication of Concuerror, nor if it is able to check for the kinds of properties you care about in your systems.
I don't think many of those who use TLA+ use refinement between two machine formulas, but I think that's partly because they still consider TLA+ as a frontend language for TLC, which is unfortunate, because if all you want (or, rather, think you want) is a model-checker, there are probably better ones. In fact, the little use of refinement is just a symptom of this bigger issue. I don't think refinement is always useful, but I do think that everyone who uses TLA+ should use its power for analysis rather than just mundane model-checking. I admit, TLC is so enticing, that once you use it, TLA+ can feel secondary to it. Of course, I'd be happy if people used model-checkers more, whether TLC, Spin, NuSMV, Simulink or others, but I'd be even happier if people used TLA+ for its primary purpose, perhaps after being enticed by TLC: thinking about system design. I guess the problem is exactly what Lamport pointed out: programmers (including myself) don't like to think, especially when we can use automation. To be honest, I don't think I would have come to TLA+ if it weren't for TLC. So, in a sense, TLC both pulls people to TLA+ and obscures its main purpose.
In the TLA+ workshop I imagine one day giving to my colleagues, I start with refinement. I would also start without TLC at all; just showing how TLA+ helps you think -- its main purpose and advantage. When writing specifications, I now always start with refinement, first writing how the system would behave at a very high-level, and only then refining a specific algorithm.
(If anyone reads this without having read the article, and doesn't know what TLC is, it's a model-checker for TLA+ that's included in the TLA+ Toolbox)
If your tool -- any tool -- links directly to the implementation, we know that the verified implementation will be very limited in size. It is TLA+'s ability to choose arbitrary levels of detail that gives it the power to help design large system. Such designs, however, cannot be formally (i.e. mechanically) linked to the implementation. If they could, we'd know how to fully verify large systems, but we don't.
There must be at least 2 orders of magnitude more Simulink users doing model-based design & development than there are people who use TLA+ for pretty much anything. Simulink has limitations, but being comparatively niche relative to something like TLA+ isn't one of them.
As an aside, not only does TLA+ not make an effort to provide a means of mechanically linking one's model to one's program, but when the question was asked of one of its progenitors how a developer might do this they were literally scoffed at for the suggestion.
You're not describing anything to me that I'm not already aware of insofar as TLA+ is concerned. Having also used several other similar or adjacent tools, and being in constant contact with folks who use them for avionics systems, medical devices, large scale industrial automation, etc., I'm still inclined to say that TLA+ is fine. It's unnecessarily annoying in a bunch of dimensions. I am skeptical that if TLA+ wasn't descendent from one of the distributed systems godfathers, and wasn't associated with AWS in the software developer zeitgeist, that there'd be as much intrigue & interest around it.
Also, I begrudge having had to come to the defense of SCADE and Simulink. I spend a non-trivial amount of my week essentially arguing against them in some respect, but they deserve their due.
With TLA+ you can also directly use the output of your efforts. But if you're talking about compilation to some executable, that, unfortunately, requires a certain level of detail above which you cannot rise due to computational complexity concerns. So if you want that, you'd have to use TLA+ at the same level as other tools that emit code.
> There must be at least 2 orders of magnitude more Simulink users doing model-based design & development than there are people who use TLA+ for pretty much anything. Simulink has limitations, but being comparatively niche relative to something like TLA+ isn't one of them.
Oh, of course! Those are excellent tools, but still niche (they target mostly embedded and/or real-time). You cannot use them to design and verify a 2MLOC system. I wasn't talking about popularity, but about target domains.
> As an aside, not only does TLA+ not make an effort to provide a means of mechanically linking one's model to one's program, but when the question was asked of one of its progenitors how a developer might do this they were literally scoffed at for the suggestion.
First of all, as I said, it does not make a particular effort, because it's not necessary. People have used TLA+ to verify actual C, Java and Go code (that I know of). The problem is that once you do that you are bound by the scaling limitation of all tools that operate at that level. TLA+ is relatively special in that it can rise beyond that limitation, and that is where most of its users find value. Using TLA+ at the code level is just not using it to its full potential, and so misses the point. Unfortunately, specifying and verifying at that level can only be done by abandoning what we call "end-to-end", i.e. verification all the way to the code.
> Also, I begrudge having had to come to the defense of SCADE and Simulink. I spend a non-trivial amount of my week essentially arguing against them in some respects, but they deserve their due.
Of course, and I'm sorry if it sounded as if I was suggesting that TLA+ is better than them at what they do. I, myself, used Esterel decades before first learning of TLA+. TLA+ is better at other things -- things that are more applicable to a much wider application domain. My point is that due to fundamental limitations -- we can only verify a specification of a certain size -- there is an inherent tradeoff between scale and application domain and the ability to generate and executable.
Thanks for the insightful comment, it's given me a lot of things to research!
It's worth learning and using IMHO.
BTW, property-based testing and model-checking are not really similar. Model-checking is any automated technique to find an invalidating assignment to some formula or declare there is none (so SAT solvers do model-checking). In the case of programs, model checkers automatically determine whether a program satisfies some property. Unlike property-based tests, they do not run the program on all inputs. In fact, the program is never run at all in the ordinary sense. Even TLC, an explicit state model checker for TLA+, that tries every possible state can check uncountably many executions, each of infinite length in a matter of seconds, where "running" even one of those would take infinite time.
It is sometimes easy to think of a model checker as conceptually running the program on every input, but it's also sometimes important to know that model checkers don't actually do that.
Like you said, you won't be testing all of the possible inputs, but it can still be a very effective way to detect the kinds of bugs that would otherwise be difficult to discover with more traditional testing methods.
Here's an example of what I'm talking about: http://propertesting.com/book_stateful_properties.html
Also, the use of the term "non-formal" in the article you linked is incorrect. If it runs on the computer it is, by definition, formal. The author meant "sound" or "exhaustive" (although, while all model checkers are exhaustive, many -- like the one in this article -- don't actually try each input; they're exhaustiveness implies soundness, not a particular brute-force algorithm. How well they can actually beat brute force is, however, another matter...).
BTW, it is possible to use the TLC model checker for randomized tests of TLA+ specifications in a way similar to randomized testing of "stateful properties", as checking random walks of the state graph.
I was using the same terminlogy as the article, and treating my property tests as informal forms of model checking due to their being unsound, but I can see how that isn't exactly correct.
You can see the specs here, the current TLA+ models the consistency model of data being persisted to disk (flush, snapshot), there was at some point TLA+ for describing the quorum writes/reads along with the background tick event loop but that must be elsewhere now.
The rest of the blog is really interesting too.
So far it's more a personal interest, but I'm advocating for spending more time verifying critical pieces of our code base like MPMC containers, synchronization primitives, etc.
PS. Doing C/C++/C# in gamedev.
Right now, I'm doing a similar thing. I've got a Bluetooth LE device that has a bit of a request/response layer built on top of the standard characteristic read/write stuff (it's a BLE Iridium modem, if you're curious). I didn't make the device, I'm just writing mobile apps that interact with it. There's a lot going on here:
- apps transition through different states (foreground, background, suspended). There's different things we can do when the app is in different states (e.g. there isn't really a good way to initiate a BLE characteristic read on the iOS side when the app is suspended)
- the Bluetooth hardware transitions through multiple states (off, on, not authorized)
- the BLE connection transitions through multiple states (disconnected, initial connection, services & characteristics discovered, etc)
- the stateful protocol involves writing to a characteristic and receiving either 1 or 2 messages back, later on. Transmitting to the Iridium constellation can take a minute if the modem has to reconnect to the satellites. This transmission may fail (which we will be told about), but a failed transmission remains queued on the device and may succeed later.
- The BLE-Iridium devices notifies the mobile device when something happens via BLE Notifications. Notifications are unreliable, so there's a chance we could miss receiving a notification the device has sent us.
This has already shaken out one potential show stopper: if the app is suspended, it can be woken up by a BLE notification to let us know there's new data available; however, if we miss that notification (it's not a reliable notification), there's no good way to poll the device to see if there's data available when the app is suspended. (Using only the available BLE APIs)
The TLA+ model basically pointed out that the (finite sized) message buffer can grow without bounds if notifications are missed over and over (e.g. a super shitty RF environment). I've got a strategy that might work for polling it periodically, but I don't know that I would have realized this failure path without modelling it.
I may also end up pushing back to the vendor to see if they can implement BLE Indications (reliable) as well as BLE Notifications (unreliable). If a BLE Indication fails, the BLE device should disconnect, which will trigger CoreBluetooth stuff that allows us to reconnect and poll without having to do some hacky timer-related stuff. Can't rely on sending a silent push notification either, since the device will be used in remote areas where there's likely no data coverage.
Obviously I don't know anything about your use case, and I apologize if this comes across as unsolicited advice, but this sounds like a pretty textbook example of a situation where load shedding is your only option -- even if you manage to figure out the polling problems.
The concern is that one of the deleted messages might have been important (e.g. informing us that a transmission failed, or informing us that Search & Rescue got the message and is sending a helicopter). The polling is a backup plan that at least gives the phone feedback if it's not working (a failed read informs the phone that the read failed; a failed notification doesn't tell either side that something failed). We can still obviously lose messages if enough of them come in between polling intervals, but it's better than potentially losing messages indefinitely without knowing about it.
The other nice thing is that we can keep metrics for how often the polling process actually finds things that we weren't successfully notified about. That'll either give us leverage to encourage the vendor to switch to indications instead of notifications, or encourage us to find a new vendor :)
Edit: and yes, I 100% appreciate the feedback!
If you don't mind me asking, what are you working on / what is your background? It sounds like you're doing a ton of really interesting engineering, involving a lot of moving parts.
For undergrad, my school didn't have a Computer Engineering program, so I did EE and a special CS program they had for engineers (all of the CS and Math credits of a 4-year CS degree, none of the arts electives). In EE, I focused primarily on small-signal stuff (DSP, Digital Communications, Control Systems, HDL, etc), and in CS I basically drank from the fire hose (my 4th year electives were Security, Compiler Design, Advanced Algorithms, and OS Concepts 2). I went and worked for a few years, first at a startup doing electronic design software, and then at a "hot web company". Got a bit bored, went back and did an M.Sc. in Comp Sci, focused on Distributed Systems (my thesis was basically New Relic before that was a thing...)
About 18 months into grad school, my funding ran out. Coincidentally, a friend showed up and asked me if I wanted to do some backend programming for a side gig he'd picked up. That turned into he and I going full-time running a web/mobile company for a few years. Made decent money at that, rode the "mobile is the new hotness" wave, but wasn't feeling super fulfilled. He ended up moving across the country, and we broke up the company. With my newfound freedom, I started shifting away from web/mobile stuff as much as I could and focusing more on lower-level type work (put that EE degree to use!).
I don't really have a portfolio put together, but here's a few examples of the cool projects; remember that this is like a person's social media feed and it's the highlights. There's been boring projects as well, and there's been a month here and there with no income at all.
- GNSS satellite simulator for testing GNSS receivers. We simulated the orbits of entire constellations of GNSS satellites with ~1m precision, simulated the signals they were transmitting and the delays those signals would experience (due to distance and atmospheric interference), and put the combined output into a Software-Defined Radio. Good enough to convince the receivers I tested with! This had a bunch of C code for doing the simulation and signal generation, and a Python scripting interface on top.
- Live animation/Motion Capture. Took a handful of 1st-gen Kinects, put them on tripods, and turned them into an "all aspect ratio" motion capture system that didn't require special suits/rooms/tags. After putting them around the room, they would auto-calibrate themselves to figure out where they were relative to each other, and then spit out a stream of joint positions in a world frame. This got streamed into... maybe it was 3DS Max? Either way, the joint positions and angles were mapped onto a live model.
- The ill-fated Peachy Printer (if you missed the drama, I'm pretty sure it's still online somewhere). UV laser- and resin-based dirt cheap 3D printer. I ended up in the right place at the right time and ended up designing the digital version of the main PCB (first version plugged into your sound card, my version was USB).
- A fuel-monitoring system that keeps track of tank farms all over Western Canada. I didn't make this hardware, but significantly rewrote a bunch of the software for performance reasons, and added robust networking to it. From the central office, you can see how much fuel is in every tank across 3 provinces and make scheduling/routing decisions. (These are the 80,000L tanks you see on the side of the road in rural areas).
- A giant hexcopter for agriculture. This is still a work in progress but testing is suspended until spring when there's more than just a thick layer of snow on the ground. I mostly worked on the mechanical design and the flight controller, with a bit of input here and there for the payload and other electronics.
- A Bluetooth personal safety device. When I joined the project, a 3rd-party had designed and manufactured the hardware; a different 3rd-party had designed the mobile apps. Everything was pretty flaky. I did a bit of firmware work on the device itself, and a significant rewrite of the Android Bluetooth code to make it drastically more reliable. This is the same company that is now integrating with the BLE-Iridium device to be able to provide coverage out of cell range. The coolest part of this project? There are people who are alive because of this project, and who likely would have died if the code was bad.
- A clean-sheet implementation of the LoRaWAN protocol. A client looked at the reference library provided by Semtech and felt they couldn't rely on the code. I made up the TLA+ model based on the spec (and uhh deviations from the spec that we learned about in certain hardware), and iterated on the spec until it seemed robust. "Ported" it from TLA+ to C, and it worked like a hot-damn.
Thanks for asking about this, by the way... I'm going to save this post as a bit of a resume and iterate on it to figure out what I've missed!
p.s. I love the website.
Edit: I was convinced later in the thread that this isn't a fair comparison :)
Another 20 seconds gets you https://lamport.azurewebsites.net/tla/industrial-use.html
At least personally, working at a small and understaffed startup, I'm not always going to find the time to put together big case studies on interesting things I do, but it's a lot easier to share a sentence or two on a casual Hacker News thread. I wouldn't be surprised if it's the same for other people.