Hacker News new | past | comments | ask | show | jobs | submit login
Finding Goroutine Bugs with TLA+ (hillelwayne.com)
125 points by headalgorithm 30 days ago | hide | past | favorite | 40 comments

This program finds the problem not because TLA+ is nifty, but because it sweeps out the parameter space, which is exactly what the original Go program should have done in a unit test. What happens when `len(pss)` is just below, at, or just above, or way above `concurrencyProcesses`? That is the type of question that unit tests are supposed to address. Go has a perfectly good deadlock detector.

The real problem cannot be solved by re-modelling in a different high-level language, because the real problem with that code is it gets `pss` from a library call and `concurrencyProcesses` from a global variable, when they should both be function parameters a test case can exercise.

This is why I invest more time Fuzzing [1] than writing software models or formal specifications.

Google’s gofuzz [2] and dvyukov’s go-fuzz [3] libraries have been instrumental for my success every time I embark on a journey to write a complex system with concurrency in its heart. The list of bugs I have found in my co-workers programs was so big we decided to spend a month configuring the fuzzing tests and our continuous integration environment to do it automatically. Code quality increased by an order of magnitude and we are now more confident about the code we ship to production.

[1] https://en.wikipedia.org/wiki/Fuzzing

[2] https://github.com/google/gofuzz

[3] https://github.com/dvyukov/go-fuzz

Once you have enough variables you would not be able to write enough unit tests to ensure every safety property of your design.

TLA is math, not a typical programming language.

I think this spec tries to do two things, only one of which fits in with the purpose of TLA+. I think this is caused by some ambiguity regarding the source of the bug: is it due to a difficulty writing a subtle algorithm, or is it due to a difficulty understanding the Go language's semantics.

If it's the latter, then TLA+ isn't the right tool, and neither is SPIN. There are other tools for verifying language semantics. If it's the former -- i.e. the author of the spec knows the meaning of each line and its intent -- then TLA+ is a very good tool, but then a simpler spec could suffice, perhaps something like this: https://pron.github.io/files/Channels.pdf

The point was to do demonstrate the bug while staying close to the Go example. You can find the bug at a higher level, but then there's the objection that you're not finding the bug in the Go code, you're finding the bug in simpler toy code. That raises the further objection that we only find the bug because we were looking for it, which is why I showed we find the bug even if we do a straightforward translation of the code.

I understand, but I think this strikes at the core of our pedagogical differences regarding TLA+ -- after all, TLA+ is super-Turing-complete, and even TLC can check a subset of TLA+ that is still effectively Turing-completish. Of course you can specify at any level in TLA+, and people have even written compilers from C and Java to TLA+. But doing so suffers from the same scalability problems that all code-level verification techniques suffer from.

The question we're asking is how best to get people to use TLA+. Do you show them what model-checking can do and then show that TLA+ can model virtually anything and TLC can check many things, and do so at the code-level because that is something they already know from programming -- but then they ask, as they do in this discussion, why you need TLA+ at all if all you're doing is translating Go to some other language -- or do you first try to show them the TLA+'s real strength, which is mathematical modelling of systems, but that is a new concept that is different from programming and can be foreign to them?

It's a hard problem, and maybe the right approach is to do both, which is what you're doing if we take this post together with others you've written. But I think it's worth mentioning that in this case, this is not the spec someone who knows both TLA+ and Go well would likely write.

BTW, I'm trying to remember which of those two features sold me on TLA+, and I think it was the combination of both. I knew the limitations of model-checkers (and had worked with NASA's Java model-checker before) that I knew that a model-checker alone wouldn't give me what I need for the complex distributed system I was working on, and mathematical modelling alone of something so subtle wouldn't have sufficed either without some help checking (cheaply!) if my ideas were right or wrong. So neither aspect would have sold me in isolation, I think.

The barrier to TLA+ remains the high-cost of rewriting a piece of code in another language.

The translation process itself could be error prone. The disintegrated concepts requires considerable amount of mental effort.

I am very interested in using the host language (the language that used to write the software one wants to verify) compiler to compile into a TLA+ spec.

It's not rewriting, it's modeling. And that's a critical distinction. You're able to elide many, if not most, of the details and distill the effort to the critical algorithms and design elements. I don't care how the memory is stored, or even how big, just that it exists (as an example). If I do care, I can add that to the model.

And in the case of the example with Go as the implementation language (in the linked article), many of the details (like how channels work) can be reused between specs. This permits analysis of Go programs generally, without needing to respecify Go's concurrency mechanisms every. single. time.

This article notwithstanding, I don't think the right mental framework to approach TLA+ is to think of it as a way to verify your code. I view TLA+ as a formal design language. It is there to verify and explore the consequences of architectural and design decisions. It is also there to present a formal design document that engineers can refer to as a common artifact when discussing various design decisions.

Now architecture and design is a very vague domain. It could be anything from specific algorithms all the way up to entire distributed systems (and indeed TLA+ can be used for any of those), but the point is that it is a separate entity from code.

I also think in the current formal methods environment this is where TLA+ can make the most impact. It is a far less risky business proposition to use TLA+ to record design decisions where the maturity of its toolchain has minimal impact on your actual codebase vs using a formal methods toolchain that directly affects the code you write, where the maturity of that toolchain is an extremely important business decision.

I think that the specification in this article -- that tries to do too many things at once -- gives a somewhat wrong impression of how TLA+ is best used. It's not intended to find errors in your code. See my other comment on this matter: https://news.ycombinator.com/item?id=24593196

To apply your claim to how TLA+ is best is to say that the problem with architecture is the high cost of having two (or more) representations of the building, once in steel and concrete and once as a blueprint, and conclude it would be better to not have a blueprint at all.

The spec is supposed to focus on different things than your code and show the bigger picture. While it would be great to then verify the correctness of your actual program, humanity currently does not possess this capability. The biggest programs ever verified "end-to-end" were ~10KLOC, or 1/5 the size of jQuery, and required years of work by world experts.

While TLA+ is not intended to verify specific code implementations, we have to admit that that is the holy grail. In my view there are 2 main spheres of practical software: design and construction. Unfortunately, today's construction tools are so error-prone that simply implementing a given design doesn't guarantee that the code is error-free.

I think people see TLA+ as an escape from this construction nightmare - because its model is so simple and elegant. I know my wheels have been spinning trying to think about the best way to unify the design and construction processes. So I definitely don't fault people for trying to specify lower and lower levels.

> The biggest programs ever verified "end-to-end" were ~10KLOC, or 1/5 the size of jQuery, and required years of work by world experts.

Out of curiosity, what programs are you referring to? seL4 seems to fit the description.

seL4 and CompCert are probably the biggest and they both correspond to under 10KLOC of C, i.e. they're very small by modern software standards (and not only are they small and still took years of efforts by experts, they had to be seriously simplified in order for the verification to be feasible). There have been much bigger programs verified with model-checkers (I vaguely recall seeing Samsung mention they model checked a 1MLOC Java program), but in those cases I think the properties were very specific.

I believe the main use case of TLA+ is to first design the code as a spec and then implement it, rather than the other way around.

I'd like to learn more: how would one go about ensuring the implementation was an accurate implementation of the specification? Is there a process for doing this?

There's no perfect answer and it's always been a limitation to take into account. There are model-checking solutions which take a spec and generate code, but you don't get this with TLA+.

In a way, you could make the same argument about writing tests: the code paths you exercise in your CI environment are not quite the same as what's running in prod, and neither is the environment in which the code is running (e.g. the load your process or host is under, the memory pressure, etc).

That said, don't let this small gap and the lack of a perfect solution discourage you from using model checking, there is still a lot to gain from adding it as one more tool in your toolbox.

To the best of my knowledge there's no universal, formal method to translate the spec into code and verify the code properly implements the spec. Just like without TLA+, you're left to conduct your own testing and analysis to verify your code is correct, but with TLA+ you at least have a specification to verify it against (even if it's tedious).

That said, some languages offer more capabilities and tooling to support this. Check out Frama-C and its Mthread plugin [0] (NB: I have no experience with either) for something that could help. In Ada there's SPARK, but I'm not sure the extent that it helps with verifying concurrency properties.

In some languages, like Erlang, Elixir, and Go, the concurrency mechanism is sufficiently well integrated into the languages that it becomes very clear while reading the code what's happening (especially if you properly segment your code into functions and modules to aid in readability). These can make the analysis much easier to perform.

And regardless of language, if you have a specification and design the code tends to come out much cleaner, which makes the manual task of reviewing and verifying these critical parts less onerous. Additionally, it's not as hard to conceive of tests when you have the simpler spec to look at versus the much larger code constructed from it.

[0] https://frama-c.com/mthread.html

The way I think of it is that TLA+ is a tool for ensuring that the thing you intended to write will work. It's not a tool for checking whether you wrote the thing you intended to.

I think it's like writing an outline for an article and writing the actual article.

I mean, both tasks are writing, but the outline is on another abstraction level and makes the article a breeze to write.

I don't think you can call it "re-writing" when the implementation has to handle a lot of the details that you can and should ignore on the specification.

Trying to compile a specification from the implementation would be equivalent to trying to do "top-down" via "bottom-up".

TLA: It's the thing I wish I know but haven't got a chance to learn.

You'd now be building more robust software, except 100 times slower.

The crucial thing about TLA+ is that because it's not actually executable code your investment in it is completely up to you. You could decide to put in 100x the time if you wanted, you could decide to put in 0.1x the time if you wanted. Your investment in it is totally up to you and completely independent of the time you'd spend in writing the code.

TLA+ makes development faster, not slower (once you get a feel for how to use it).

I mean, it would be slower if you'd spend time writing mechanically-checked formal proofs, but that's working 10x for just slightly more confidence than you can get with TLC (a model-checker for TLA+) alone, so in almost all cases people just stick to specifying and model-checking.

I was working on something similar to say, stepped away, and saw your post. So I'll add to it:

TLA+'s place is as part of your design process (you have one, right?). It's the thing you do before committing any code to the project or radically restructuring an existing project (prototype code fits in the design process, but it shouldn't be considered committed most of the time). By spending a bit of time thinking about the design, and using tools like TLA+ to validate that design, you can save yourself months and years of heartache down the road.

By way of anecdote, some colleagues were working on a concurrent algorithm, needing to share data between two embedded processors talking over a databus. A bit more time spent on the design, versus the rather slapdash approach they took, would've saved them 6 months of debugging and tweaking until it worked. The issues were subtle, so it "worked" but wasn't totally correct and tracing the errors they were seeing back to that code was non-obvious (and hard, debugging embedded isn't trivial). It was quite literally a case where skipping a couple weeks of design cost them months later.

> TLA+'s place is as part of your design process ... By spending a bit of time thinking about the design, and using tools like TLA+ to validate that design

I think you dramatically underestimate the amount of effort required to produce a formal model of any nontrivial system, and overestimate the stability of system definitions in response to changing business requirements.

Eh. I used TLA+ for a few projects inside Microsoft. It probably takes a couple of weeks to write a spec for a complicated problem. Then less time to implement it, because all the heavy lifting has already been done - in many cases you're directly transcribing the spec logic without having to think very hard.

The sort of things for which you write specs are generally "core" functionality that doesn't change very often. If people are constantly mucking around in that area without writing specs, your system will be a pile of crap.

I think you dramatically underestimate the benefit of writing a formal model! I've run classes where we've managed to discover bugs in the client's actual production system with an afternoon of modeling.

How important is it to discover new bugs? Surely any large production system already has a backlog of tons of known ones. So if I want to fix bugs in my system, I can just go to the issue tracker and pick one.

Depends on what "discover" means. In the sense that Hillel is speaking of here, it probably means more like: Discover the bug exists and also discover, to a very high degree of certainty if not absolute certainty, the location of the bug in the code base.

That discovery is actionable, many other bug discoveries are not without further investigation.

At the level that you're usually writing specs, the bugs you discover are often very serious ones, such as dropping data.

I'm not underestimating anything because I'm not suggesting producing a formal model of arbitrary, nontrivial systems. My critical point is: Design your system before committing any code to the end result [0]. Formal methods and models are one tool in your designer's kit.

[0] This statement should be qualified with: Especially if it's at all novel to you, your team, your organization, or the world. And the amount of time to spend varies based on its true novelty. A team that's made hundreds of RoR sites can churn a new one out in a week with minimal upfront design effort. A team that's making the next RoR should probably spend more time thinking about what that means before making, or committing code to, it.

In practice, the effort is smaller and cheaper than finding bugs in other ways. But yes, learning where and how to best apply TLA+ takes some practice (less than learning a programming language, but more than learning a new build tool).

I think if people built software 100 times slower, they'd probably end up with more robust software, TLA+ or no.

I feel similarly, except it's the thing I wish was _easier_ to learn! <3

Going through Lamport's video lectures (https://lamport.azurewebsites.net/video/videos.html) is a great way to get the basics down. It proceeds at a fairly lesiurely pace so if you just block off some amount of time every day (say a video or two a day which usually works out to 30min - 1 hour depending on how well you want to practice the concepts in the video) you'll be able to use TLA+ by the end of a week or so.

Hillel's book is good, as is his online tutorial (https://learntla.com/introduction/).

It's so low-level, every time I try to look into it I am reminded of the 6th normal form, it's such a mess to gather all the parts of the condition into one big conjunction...

the conversation in this thread completely changed my view of formal verification tools. I used to think of them as "code" verifier. I know realize they're actually "whiteboard drawings" verifiers. Which makes it much more usable right away.

I believe my mistake was due to the fact that most articles on the subject start by showing you a faulty piece of code.

The field of formal methods covers both code correctness and higher level proofs.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact