Hacker News new | past | comments | ask | show | jobs | submit login
Why Don't People Use Formal Methods? (hillelwayne.com)
420 points by pplonski86 88 days ago | hide | past | web | favorite | 225 comments

The really simple answer to this question is: because at a high level of abstraction, it isn't useful. Most applications are being specified by people who don't understand in any detail what they want the application to do. They can't even imagine the problem well enough to discuss it let alone specify it. A very large proportion of failed projects I've seen is due to this break down. The people who have the time and inclination to investigate the details are not the ones who care what it does. You can have an incredibly low defect count and still not build the thing you need.

But if you look at a lower level of specification, then formal methods start making a lot more sense. For example, let's say I have to implement a communications protocol. This is something I can specify and that I can build against that specification. Companies sell protocol testers that test your implementation against a protocol. It might be nicer to have a formal specification than a back end test -- especially because there are often more scenarios you care about than you can feasibly test.

At an even smaller scale, we are now realising some benefits of this mode of thinking. For example, we've had types in programming languages for a long time, but lately we're able to actually make statements about what the code does, simply by analysing the types. This is a massive step forward and I'm really looking forward to things like constrained types making it into the mainstream. Similarly things like Rust's borrow checker can allow you to make analytical statements about how much memory you are using (if you are careful), etc, etc. These are great.

I think that the success of formal methods is going to be through the backdoor and at this design/implementation level for a long time. In fact, pretty much forever.

I think you're mistaking the use of formal methods for some particular process (like confusing tests with TDD). Formal methods do not require that you have a specification in advance. They can help you come up with the specification, through interaction, before or during coding (e.g., this is a lightweight formal specification tool for designing UIs and interaction: https://sketch.systems/).

As code is itself a specification at a very low level, you must have some kind of specification in your mind by the time you write it. Is it always worthwhile writing it in some formal methods tool? No, but it certainly can be when there is some subtlety or potential complication, which can arise even in seemingly simple programs. The author of this post has some nice examples here: https://youtu.be/tfnldxWlOhM

It is true that formal methods only help; they don't solve all of your problems, and they certainly can and do result in wrong specifications (e.g. KRACK attack and Spectre/Meltdown are examples of a wrong specification), but that doesn't mean they cannot be extremely helpful, just as unit tests are helpful even if they don't find all bugs.

I have a MSc with a specialization in formal methods from a place with famous faculty in the field, so I might be biased. But I have to disagree with almost anyone here.

Formal methods still lack a bit of maturity and streamlining to get adopted. It's a bit like functional programming in the late 90s and early 2000s. Nothing was terribly different from now, yet it was not widely used. Whereas now, we are starting to see adoption by mainstream players. E.g. lots of features in Swift.

With regards to formal methods, the situation is entirely analogous. The theory is there. Already developed. And it sits at the absolute core of CS. It's all about formally describing semantics. There are several approaches: type systems, model checking, program analysis and abstract interpretation. And these offer a tremendous amount of flexibility. You can go all the way from adding some lightweight formal specifications a posteriori with e.g. a fancy type system, to developing all your code formally by starting with some axioms.

Judging by the reading list you've provided below, you may not be familiar with more "productized" formal methods, designed for use by working engineers, like TLA+ and Alloy. I don't know if I'd call them entirely "mature" yet, but they are certainly industrial-strength, learned by ordinary programmers in 2-3 weeks of part-time reading of tutorials or in a few days with guidance, after which they go on to specify and verify large, complex real-world systems.

Lamport spent years simplifying TLA+ and designing it for user-friendliness (at the cost of making it inappropriate for logicians to do research on novel logics). While extremely powerful (people have implemented TLA in Isabelle and Coq to reason about concurrent and interactive systems), it is based mostly on ordinary, high-school or first-year-college math, and entirely eschews the complications of programming language semantics.

See the following industry reports:

* Amazon: http://lamport.azurewebsites.net/tla/formal-methods-amazon.p... -- this has been so successful that managers encourage developers to use TLA+ to reduce development costs.

* Microsoft: https://youtu.be/ifFfxRCX_jw

* https://youtu.be/tfnldxWlOhM

> You can go all the way from adding some lightweight formal specifications a posteriori with e.g. a fancy type system, to developing all your code formally by starting with some axioms.

This statement particularly interested me. If I wanted to learn more about this, what resources do you recommend?

If you ask how deep I want to go, let's say we have a scale of abstraction that's from 1 to 10, where 1 is a two paragraph executive summary and 10 is an entire Ph.D's worth of training. I'd probably be looking at a 2.5 to a 5.

Some references below. You'd probably want to start with the first 3. The other 3 are classics, but much more dense (e.g. the Nielsons book requires some decent abstract algebra background):

* Software Foundations (Pierce): https://softwarefoundations.cis.upenn.edu/

* Concrete Semantics (Nipkow): http://www21.in.tum.de/~nipkow/Concrete-Semantics/

* FRAP (Chlipala): http://adam.chlipala.net/frap/

* Model checking (Katoen): https://mitpress.mit.edu/books/principles-model-checking

* Program analysis & abstract interpretation (the Nielsons): https://www.springer.com/gb/book/9783540654100

* Type systems (Pierce): https://www.cis.upenn.edu/~bcpierce/tapl/

I have a MSc with a specialization in formal methods from a place with famous faculty in the field, so I might be biased. But I have to disagree with almost anyone here.

Formal methods still lack a bit of maturity and streamlining to get adopted. It's a bit like functional programming in the late 90s and early 2000s. Nothing was terribly different from now, yet it was not widely used. Whereas now, we are starting to see adoption by mainstream players. E.g. lots of features in Swift.

With regards to formal methods, the situation is entirely analogous. The theory is there. Already developed. And it sits at the absolute core of CS. It's all about formally describing semantics. There are several approaches: type systems, model checking, program analysis and abstract interpretation. And these offer a tremendous amount of flexibility. You can go all the way from adding some lightweight formal specifications a posteriori with e.g. a fancy type system, to developing all your code formally by starting with some axioms.

Completely agree. This line sums it up:

>> Before we prove our code is correct, we need to know what is “correct”. This means having some form of specification

That is the problem. The people who come up with the spec don't fully know what is required. It's even worse than that actually; the requirements change constantly to account for changes in the technological and business environment within which the project exists.

I think it's worse than that.

I did a course on formal methods at university. The 'spec' ended up being a lot more complicated than the actual program.

I wasn't really confident afterwards that the "proved" software actually did work.

I think it's possibly a suitable method for building very low level code (e.g. sorting methods) where the spec is going to be exceptionally simple but it's effectiveness drops off quickly the higher up the stack you go.

I had exactly the same experience. We did specification in Z, but the code that was supposed to be "proved" correct was a clearer reflection of what we wanted to do than the "spec".

So I actually had more confidence in the code than I did in the spec.

And I think this answers the question of the OP: people don't use formal methods because

(a) the vast majority of problems don't start out with a formal spec

(b) the gap between informal and formal is vastly more relevant than the one between different formalisms

(c) formal methods are significantly worse at bridging the formal/informal gap than most programming languages

And yes, when you have a sub-problem that is mostly formal, then it can make sense, particularly if there's a lot riding on it.

> And I think this answers the question of the OP

Do you realize that the OP is actually linking to an article explaining this (and a bunch of other aspects) in great detail?

While it's a good article, I didn't see it covering this aspect, particularly (c).

That's a problem with specs, the most concise and complete way to write them is by writing your program. If it wasn't, people would just compile the specs, and program using those.

Now, go tell this to people that want to offshore development based on a spec... But that's a digression.

You don't prove correctness of an entire program against the entire spec. That's only possible with toy problems, even entirely proved programs (like secL4) only prove some point of view, never the entire spec. Some times in a large program you discover a property that is very important (like how to assembly some event set into data, lack of deadlocks in a distributed system, or lack of buffers overflow), you then go and prove those properties.

> That's a problem with specs, the most concise and complete way to write them is by writing your program.

I've never dealt with formal specs/design, certainly nothing like those described here. Your comment (however in/accurate) strikes me as interesting and leads me to wonder...

What if we took the code and processed it to generate a "spec"? (I quote "spec" here b/c I wouldn't consider it a spec, but don't know what else to call it.) Basically, a document that explains verbosely what the written code does... hopefully in a way that is reasonably, humanly digestable.

Then we might get a doc that says, "Function add(...) accepts two integers/nulls and returns the sum." And Joe-coder looks and realizes it shouldn't accept nulls, goes back to the code, fixes the issue.

This is essentially what static code analysis attempts to do. Unfortunately, there are many ways of writing programs that today's static code analysis cannot understand.

If there was a rule that functions could only return named variables (vs expressions that evaluate to a value that is returned) it would get closer to "this function accepts paramName1 and paramName2 and returns sumOfParams".

Would the code have been as clean and clear if you did not make the formal spec beforehand?


Indeed, I've never in my career (in high level businesses, I.e. Finance, advertising) seen such a thing as a proper 'specification'.

It's more the business has a high level, rough idea of a change/feature to improve the business, and we figure out together the exact details, some of which surface during implementation, and then verify it gradually in the wild.

We could have spent x10 time instead drawing up a formal spec up front, but it would have so many assumptions baked in, that it'd get invalidated in the wild anyway.

So the people designing the spec don't have enough information about the problem space. As the programmers begin working, they explore the problem space, extract the information they need to understand a solution, and then they can actually solve it. Bit of a Schrodinger's cat situation... The solution must remain undefined until observed.

The people who come up with the spec don't fully know what is required. It's even worse than that actually; the requirements change constantly to account for changes in the technological and business environment within which the project exists.

I think this is a bit of an overstatement, because it implies an often unrealistic separation between writing specs and writing implementations. There may be several levels of each, and the middle levels may overlap.

For example, it may well be the case that the management on a project don't get heavily involved with the technical details and just specify "A needs to communicate with B" as part of the high level requirements. That doesn't mean the technical people responsible for implementing those requirement won't specify detailed protocols at a lower level and wouldn't benefit from formal methods to help ensure correct implementation.

> it implies an often unrealistic separation between writing specs and writing implementations

I think that part of the problem is that small modification in the spec can lead to enormous changes in the proofs

think this is a bit of an overstatement, because it implies an often unrealistic separation between writing specs and writing implementations

Every government IT project that is years late and billions over budget is because of this

> Every government IT project that is years late and billions over budget is because of this

Well, that used to seem to be true before governments started experimenting with (methodologies which were notionally versions of) Agile, Lean, and related ideas. Then there were a whole new set of reasons for IT projects being years late and over budget.

False. Writing a waterfall-style “spec” on paper has nothing to do with formal methods.

It absolutely does. We studied SSADM in college.

But if you write some code, you must be writing it to do something. That something is your spec. This may or may not be what the customer wanted, but writing it down may make it easier to explain what it is that you've decided to implement, and understand whether or not it could be the right thing.

Requirements do change constantly, which is all the more reason to write specs, so that you can understand the change and its implications (and present it to your customers) before writing the code. I am not saying this is always the best approach, but changing requirements in itself is not a reason not to use formal specification, but an additional motivation to use them.

For example, this is a nice lightweight formal specification tool that allows to specify UI interactions quickly and play with them to see if they are satisfactory before implementing them: https://sketch.systems/ This is more "agile" than changing code between iterations.

I think the issue with the way people try to promote the use of formal methods, is that they assume that people will want to model their system formally before implementing it.

I feel like there'd be more productivity in showing people what their code could do, and helping them narrow that down to what it should. It is, of course, hard to do this.

From this point of view I like TLA+, it gives you a simple but powerful programming language and then allows you to check stuff like "this must not happen" or "this must happen" (and more).

Formal proof as in coq-proof are often not reasonable for many dynamic projects

There is a substantial language barrier; articles like this keep using the word 'proof', and I can't figure out what they mean.

Each discipline has it's own standards of proof. In mathematics, the word means that a statement transcends the laws of time and space. In law, it means a group of 'reasonable people' can be convinced. In science it means that there is a theory with supporting data that has not been falsified. I'm pretty sure to the historians it means a tenured professor thinks it makes a good story.

Whatever the word means applied to computer science, it does not mean:

1) That the thing proved actually works (it can be proved correct, and still have bugs)

2) That the thing proved does what you want (extensively discussed here, the spec can be wrong)

3) That the thing will be useful (also extensively discussed; and I'd add that protocols like TCP, for example, seemed to get adopted because it was so simple anyone could understand it, it isn't obvious that being provable makes it more robust to network errors)

When I see people getting excited that they have prooven tail has types [a] -> [a], all I can think of is that some programmer will make "a" some sort of super object at the root of an object hierarchy, and then what on earth is being gained?

It is more obvious for cryptography applications why proofs might be valuable, but I suspect any data showing formal methods outperform more testing generally is really just identifying that design and scoping studies add value, and nothing to do with the merits of whatever 'proof' means here. 'Proof' might just mean that you have someone who can't communicate ideas in simple English and is going to give you a page of esoteric notation.

> In mathematics, the word means that a statement transcends the laws of time and space. In law, it means a group of 'reasonable people' can be convinced.

This was historically and still is largely false. A mathematical proof is a written argument and most proofs are too complex to be "obviously correct", therefore mathematical proofs are usually more like your latter case: It is an argument that convinces people. Mathematics has a long history of doing things "that sound right but haven't been proven yet but it seems to work out ok", e.g. this was done all the time in the earlier days of analysis.

Proof in formal methods means the mathematical kind: with certainty the program / function implements the specification.

Mathematicians can also prove something irrelevant or useless, or something fundamental like 1+1=2.

Well, yes, but if mathematicians prove something then we know it is true. And if they can't prove the trivialities, that would be a crisis for the entire discipline, because 1+1=2 relates to everything they do.

If software people prove something we also know it is true, but not that it usefully relates to the software. In fact, 0.1+0.2 is likely != 0.3 for most software. You can prove you have integer types or fractional types or whatever, then a hardware bug comes along and your proof doesn't relate to the real world anyway. So what exactly is the standard we are working to? Is proof reasonably superior to more effort testing? That is a debatable point, and the economics suggests no.

Since formal methods are clearly targeted at programming, what is this link to software? We can prove trivial things like 1+1=2 but that still doesn't necessarily connect to anything that gets implemented.

Any program is a trivial proof of itself; so why are we interested in these specific proofs? Saying that they have a proof doesn't seem meaningful and is a language barrier to whatever they actually mean - which having had a good long think, is probably 'programmatic scoping', which describes what they actually seem to be doing.

> In fact, 0.1+0.2 is likely != 0.3 for most software.

This seems like a rather silly example: you're presumably talking about floating point numbers, but we have a spec for floating point, it's IEEE 754. This spec has been formalised in many proof systems, e.g. (top Google hits for various systems I know):



I'm tempted to say that using integers/rationals/reals in a proof, when the software uses 64bit words and floats, is a rookie mistake; yet I've never even seen rookies make that mistake. Maybe that's a problem for approaches which keep programs and proofs separate, but in systems based on type-checking (e.g. Agda and Coq) it wouldn't even be possible to make such a mistake (there's no way to use different types in the program and the proof, since the program is the proof).

> You can prove you have integer types or fractional types or whatever, then a hardware bug comes along and your proof doesn't relate to the real world anyway.

I'd be very happy if hardware bugs were the main cause of my software's problems ;) More seriously, this seems like moving the goalposts from "is it useful?" to "is it perfect?".

> So what exactly is the standard we are working to? Is proof reasonably superior to more effort testing?

Empirically, yes it is (see e.g. "Finding and understanding bugs in C compilers"). Testing can show the presence of bugs, proofs show the absence of (a class of) bugs. This is a worthwhile exercise, even if your model is not perfect.

This is not an all-or-nothing proposition. There are lightweight forms of formal specification and proofs which you are probably already using. One example are types. If your program is well-typed it might still crash with a division by zero or some other runtime error, but it will not crash because you tried to execute code at address 42 after you mixed up your integers and code pointers.

> You can prove you have integer types or fractional types or whatever, then a hardware bug comes along and your proof doesn't relate to the real world anyway.

Proofs always rely on some sort of axiomatic system. Otherwise you don't have any basis for starting deductions. In most cases "the hardware conforms to its spec" can be considered a reasonable axiom (unless maybe in space flight where cosmic rays may cause appreciable amounts of random behavior).

This is fundamentally a complaint about some fairly specific language use. A lot of articles I've seen talking about formal methods contain some varient of "rigorously verified the correctness [of the program]". Anyone who is half-unsure about what is happening is going to write that off, because we all know programs can't be formally verified as correct in the common meaning of 'correct' because bugs are everywhere, at every level of the stack, and what people ask for probably isn't what they wanted anyway.

That is an abuse of language. What is happening is (and this is a good article because it is quite clear on the point) that "We’re ... prov[ing] ... conform[ance] to some spec.".

Now I'm quite excited at the prospect of automatic checking of conformance against a spec. Looking foward to the non-alpha release of spec in Clojure, in fact. But I'm going to ignore anyone who talks about "proving correctness" on the assumption that they (a) can't prove a program is correct in the abstract, (b) sound pretentious and (c) I'd be worried they are more worried about proving things than scoping and testing which is where most of the value sits.

Basically, I'm not saying formal methods are bad; quite the contrary; but the word 'proof' is not a safe word. It means very different things to different people. Most people outside software would surely assume bugs in the spec disprove 'correctness', but that isn't what the language seems to mean to formal methods people.

> the hardware conforms to its spec

Not a safe assumption. Spectre and Meltdown happened just last year; and I assume caused a lot of programs which were required to be high-certainty to become uncertain in practice.

The Software Engineering implications of a 'proof' are _different_ from a mathematician's standard of proof. If a mathematician uses a prooved fact, as long as the proof is correct then they have certainty. If an engineer uses a prooven program, they still don't have certainty because they are working in the real world.

I've opened news.ycombinator.com in firefox quite a few times over the years; every time I've either gotten the web page or a network error. That is a proof of correctness to the standards of 90% of the academic disciplines.

Although it is not self evident from looking at most modern software, prgogramming and proofs are (in some instances) direclty related by the Curry-Howard correspondence. Apart from such instances one can also try to assert temporal relationships between variables in a program that have to hold under all inputs (one example would be “valid until ready”, which is a typical requirement one would have for a queue interface: the producer can not revoke the claim that the data is valid until the consumer has accepted it). Such assertions can be verified with TLA+ or in System Verilog with tools such as Formality.

Computer science is a branch of mathematics, so proofs are of the mathematics kind.

Formal proofs will prove whatever you designed them to prove. It may be that #1 of yours, or the #2 if you actually know what you want (AKA "never"). Your #3 is mathematically ill defined, and math can not prove ill defined statements.

> When I see people getting excited that they have prooven tail has types [a] -> [a], all I can think of is that some programmer will make "a" some sort of super object at the root of an object hierarchy, and then what on earth is being gained?

I don't really understand that question. The type means that whatever list you pass that tail function, you'll get a list of the same type back. That's information you can decide to use to build your program, or just ignore. Without further context, there isn't any defined value for that information.

> The really simple answer to this question is: because at a high level of abstraction, it isn't useful. Most applications are being specified by people who don't understand in any detail what they want the application to do.

The world is changing.

In high-growth-potential areas, I think we're starting to see a shift away from the style of software where "what to build is hard but how to build is easier" and toward application domains/products where exactly the opposite is true.

IoT, next-gen factory automation, robotics, ADAS, automated trading, etc. all have two extremely important things in common:

1. It's really obvious what we want the system to do (minimize energy usage; produce widgets without killing Joe the machinist; go to the store and don't kill anyone; make money; etc.)

Unlike with social apps/online communities/productivity software/games, the "what did the customer want" problem is likely not THE key failure point.

2. It's extremely non-obvious how to map those high-level desires onto software.

These application domains, which I think will drive growth in the software industry during the next cycle, are kind of dual to our current ones (web and mobile apps).

So, at the highest level of abstraction we know what the customer wants. That's not the point where we need the most derisking. Also, the hardest problems -- the ones where derisking efforts will land -- are amenable to mathematical treatment. Because of this shift, mathematical modeling tools are going to play a big role in the industries that (I think) will drive growth in the coming decades. And that means the zeitgeist of software engineering will change.

I'm not sure if those mathematical modeling tools will look anything like today's theorem provers and model checkers. Probably only in the same way that Python looks like FORTRAN.

As a counterpoint to that, another change happening in the industry now is the increasing tendency to delegate more and more to ML models, which are even less amenable to formal verification than mainstream PLs.

Absolutely, but why would that preclude formal methods?!

In fact, this is exactly what I was thinking about when writing the last paragraph of my comment.

You can prove the ANN works, but how do you check it conforms to any specification, other than by testing and accepting high probability of real world failure?

Eventually, you have a high level fuzzy requirement which is hard to even specify right. Just look at, say, latest YouTube guidelines about content they try to ML around to make tractable...

And that's a classifier, now try to do the same with a machine translation system.

There are papers that use SMT over RNNs to prove smoothness of the output functions. You can absolutely apply these techniques to ML.

So you have any links? I’m very curious, and a quick search did not turn up anything.

Clark Barrett's group at Stanford (previously NYU) is a leader here.

Thank you!

> shift "what to build is hard but how to build is easier" toward application domains/products where exactly the opposite is true.

Interesting insight, thanks for sharing. I wonder how this will affect management of teams that deal under these conditions.

> automated trading

> It's really obvious what we want the system to do


>> It's really obvious what we want the system to do


Hedge funds and trading firms are some of the top donors to/supporters of the formal methods community. They sponsor conferences, hire tons of formal methods experts, buy products and services from formal methods startups, and develop/use formal methods internally.



(I do understand there's a substantive point being made, and I think my reply to the sibling comment by temporal addresses that point. So maybe that's the right place to continue discussion.)

Buy low, sell high!

I'm not buying it; I believe the problem is more on the human side.

"Most applications are being specified by people who don't understand in any detail what they want the application to do", because most humans absolutely suck at clear communication and clear thinking. They're not aware what they're not communicating. They're not aware what scenarios they're not considering. Examples abound - remember that time where your spouse/parent asked you, "go to shop and buy milk", and you knew you now had to drill them for things like "when", "which shop", "which milk (%/brand)", and "what other %/brand is OK if the one you're looking for is sold out", because they'd be unhappy if you did the wrong thing? Most people who don't work with actual STEM, programming or law seem to be like that, almost completely unable to articulate in a clear and complete way, and completely unaware of how much they're not communicating.

> 1. It's really obvious what we want the system to do (minimize energy usage; produce widgets without killing Joe the machinist; go to the store and don't kill anyone; make money; etc.)

This is not "obvious" and is a good example of an unclear specification. The "not kill Joe" may be an implicit assumption shared by all moral people, but at some point it'll have to be quantified, e.g. for risk analysis, and the question "how low probability is low enough" will pop up. "Minimize energy usage" is limited by goals and tradeoffs that need to be stated; I can make you a sensor run a decade off a coin battery, but there's a good chance that's not what you want. Etc. Etd.

Circling back to the GP's statement, does the customer who wants the energy-efficient, safe, money-making contraption understand enough about either requirement to do a formal specification on it? I believe they do not, and it's the same situation as with cat sharing apps and all the other software of today.

I've worked on software for internal use in manufacturing, and number #1 problem I had was that the customer had a bunch of unclear ideas (ideas that I, a lowly developer with zero experience in the domain, had to fix for them on the whiteboard during meetings many times, because thinking clearly is not a common skill, and they did not have it). A lot of consumer-grade software try to get around this by giving up on clear thinking altogether, and just do gradient descent on customer demands (as perceived through A/B tests) in order to maximize money made, regardless of whether the product ends up being useful or not. I don't feel industrial software lends itself to such method; hell, I hope it doesn't - I'd prefer the firmware running in factories to be effective, not addictive.

You and twic make similar points, which I think are reasonable but miss a very fundamental difference between e.g., ecommerce webapps and something like a robot.

The how and what questions in robotics/manufacturing ground out in formal methods style mathematical modeling in a way that the sort of questions that dominated the last couple cycles of tech did not (because those tended to be behavioral specifications about humans -- click more ads, book more taxis, click the buy button, etc).

Formal methods didn't capture "engagement" or "click-through rate", but they do capture "safety" and "liveness".

Consider any of the examples you give. At the top level of the design process, we need to derisk things that demand mathematical modeling (battery life, car behavior). How do you go about determining the probability of a car colliding, or tease out the various things that might inform that probability? Not (only) by white boarding with business stakeholders or asking consumers, but (also) by building models of how the system behaves and analyzing those models.

Obviously, formal methods aren't a silver bullet, and yes, software engineers will always spend a lot of time at whiteboards and in meetings.

But when it comes to the future of formal methods, the question is whether the systems we expect to take center stage in the software world look like to sorts of systems where properties like "safety" and "liveness" are a) hard to specify, b) lend themselves to mathematical analysis, and c) are critical to business success.

I think the answer to that question is yes.

> The really simple answer to this question is: because at a high level of abstraction, it isn't useful.

Okay, but if you're implementing a complicated algorithm like Paxos or Draft, then it can be useful to put some assertions in your code, and prove them in a formal language; and also to make these proofs part of the unit tests.

Strong type systems are not new.

Strong type systems with type inference are new(ish) for a guy as old as me :-D

Standard ML has existed for 35 years. Even if you started programming with Fortran, it's been around longer than half of your career.

It's even older than that. The Hindley–Milner type system was first described in 1969. In other words, we've had the foundation for this kind of type inference for 50 years.

Which is why I'm amazed it's actually used now outside academia. It lingered so long there, it's surprising that it made the jump.

Probably due to some alums of INRIA using their particular implementation for their own projects.

What's the deal?

Really could be new to him. It's a pretty big world out there.

Be honest, until 10y ago, ml type inference was invisible to 90% of the coders. Probably rejected willingly even ('where are my objects?')

I learned Haskell in 2001, was super excited by it but there wasn’t the “day to day” stuff to get things done that you had in Java or C++. So for practical work I felt I couldn’t use it. Fast forward now, with the rise of open source sand devs volunteering to code in their free time, we have Haskell kind of useful. Definitely for web software and backend. Although I question desktop development in Haskell.

> Probably rejected willingly even ('where are my objects?')

Local type inference was rejected in Java in 2001 because "you need to spec your types precisely"

is it published online ?

Found it here: https://stackify.com/whats-new-in-java-10/

——— start quote ———

This feature has been a long time coming. It was suggested as far back as 2001, and was closed at that time with the following comment by Gilad Bracha:

Humans benefit from the redundancy of the type declaration in two ways. First, the redundant type serves as valuable documentation – readers do not have to search for the declaration of getMap() to find out what type it returns. Second, the redundancy allows the programmer to declare the intended type, and thereby benefit from a cross-check performed by the compiler.

——— end quote ———

Link to original RFE and quote source: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4459053

Interesting, I kinda agree about the need for redundancy at times (which is not forbidden by ML typing). But it rejected a very good idea for a decade.. A bit like lambda expressions. Damn Java

IIRC it was common knowledge, as it was an "obvious improvement" over C-style undeclared variables. Have in mind that type inference applied only to local values, not propagated globally through the code structure as in ML.

Exactly. And releasing formally incorrect apps to non-suspecting users is the best known way to bridge that gap.

Formal methods don't make a lot of sense when the product changes from week to week, or month to month, or even year to year.

It seems like the advocates of formal methods assume a waterfall method of software development wherein you develop a spec, formalize it, prove that it works, and then deliver it. That's just not how software works in industry. A lot of it never had formal specs and never will, and whatever specs exist change frequently. By the time a spec is perfected it is out of date and we are already moving on to the next thing.

Agile development killed waterfall development and formal methods have not adapted to the agile methodology.

Formal methods make sense when you focus them on the right part of your system. Even if it's changing week-to-week, there's an underlying structure or design that's staying consistent.

You wouldn't use formal methods on, say, the GUI layout of your system. But you could use it to prove that you're accessing your database in a way that won't cause corrupted or incorrect data to appear. If you're working on anything with concurrency, this is actually crucial. Show, for instance, that your multi-user Google Docs clone won't end up with a corrupted document when two users try to edit the same sentence or word at the same time.

That behavior won't change from week-to-week (both the multi-user nature and the desire for stability/reliability), though the nature of the interface or target platforms may change.

Also, Waterfall only existed because some people at NASA/DOD were idiots and couldn't read past the intro to Royce's paper. It needs to die a terrible death.

Formal methods can happily exist within an agile environment once you realize that you don't have to deliver code each week. You are able to deliver prototypes, mockups, and documents along the way. Formal models can exist in that group.

> That behavior won't change from week-to-week

Maybe "shouldn't" is a better world that "won't".

It is sadly common for a sudden whim to cause a new feature to thread a change all the way from the GUI down to the database representation, with plumbing at multiple stages in the middle.

At GP's level of specification, things shouldn't change even then, unless you're doing an actual redefinition of what the software does and why. Even if a new feature is cutting across all layers in your code, it's unlikely that it'll alter the requirements like "parallel edits don't corrupt the document".

>Even if a new feature is cutting across all layers in your code, it's unlikely that it'll alter the requirements like "parallel edits don't corrupt the document".

What if you're building MongoDB a decade ago, and the new feature is "perform faster on benchmarks"?

That is not a properly specified or useful requirement.

1) Your benchmarks have to be validated to correspond to real life usage. Especially problem sizes and specific structure. (Whether collisions are needed, what kind of queries etc.)

2) A proper performance specification is WCET or runtime histogram for given piece of code, which can be checked with a model checker.

3) It is a requirement to pass a given test "better". If that sounds good to your clients, well, you're selling a database to clueless people who somehow care about a specific benchmark... Competitors will pick a different benchmark to brag about.

4) Problematic parts of the code have to be identified, specified and fixed. And tested. If you do not match performance spec by then you have a different problem and need to redesign the software including algorithms used most likely. Very expensive.

I think the use of formal methods on these key features would allow you to argue that they should not be changed. Giving more weight to resisting key feature change during initial development, as well as when the project matures and develops a user base, will help a lot with stabilization. That means less technical debt and fewer random breaking changes that tend to occur with Agile and other dynamic development practices.

Yep. Or at least it would be a useful social experiment.

What is socially expedient depends on what is technically easy in the short run. One benefit bondage & discipline coding tools is to encode long term strategies into short-term constraints.

"You wouldn't use formal methods on, say, the GUI layout of your system."

That's actually a good use. Statecharts were one method used for formally specifying that. The research I saw would formally specify the individual components or modes, the transitions with their conditions, these would be checked formally, and then maybe some code is generated that's correct by construction. Example for prototyping:


totally agree, formal specs usually used to describe the desired _goals_ of the project, so it's pretty high level and probably won't change pretty often

I'm not sure that is necessarily true. I think of the modern typechecker (especially with generics, algebraic data types, and other advanced features, including sometimes dependent tpes) as basically formal-verification-lite, and it tends to work really well with changing specs. You can aggressively refactor and change code and just let the compiler catch errors as you go. A practical formal verification language would function much the same way.

This! I'd also argue that your formal verification language and your coding language are one and the same. This leads to the case where if you can specify your language, it is written. Obviously there are scenarios where it's not practical to prove a part of the program, but you can simply use an unsafe dialect as per Rust or Maude (system modules).

This works well with agile development in that you can spec out the underlying communications protocols, parallel operations etc. and layer unsafe stuff on top. While we're here then, being able to publish these proven modules for other people should lead to a situation where we can build incrementally on code proofs.

Lastly, you can consider this system to be the equivalent of unit tests, but better in that there are guaranteed to be no edge cases you haven't tested (subject to coverage, but again publishing modules should help bullet-proof them). For integration testing, you can use behavioural proofs - see Maude or BOBJ.

Not True.

Agile Dev (according to me) means you are trying to specify what the customer want as you develop your code and not specify it right in the beginning.

You can make changes to the specs when you are figuring things out. Also, you do verification when you are somewhat confident in your code is doing what the customer wants, so verification should come towards the end of your development phase (or at least end of developing parts of your code base).

I don't think it's about agile, even with waterfall "you still don't know what you are actually making" as another comment put it.

Making software is how you figure out what you are making. Formal methods are just philosophically incompatible with that. Software is how you learn the world and formal methods is how you create it, you can't have both and we don't really have much use for creating worlds.

Some things change, others stay the same.

You change the system, and now some part of the application works differently. The way for you to determine if it's expected or not is through the spec of the system. Either it's a bug to have changed it or it wasn't.

In theory, you can use stuff like formal methods while changing a system when you come across things that you think _shouldn't_ change. It's the same concept as tests, really.

In practice I have not found any good tooling for this in production systems. The best alternatives are using types to enforce some rules, and to make APIs that make expressing invalid state impossible. But this is only a small subset of the kind of guarantees you want to have.

Do unit tests or type systems make sense when the product changes from week to week? Unit tests and type systems are a formal method (technically; usually they usually do not fall under the academic discipline of formal methods). Formal methods cover a very wide range of tools, that drastically differ in cost and effort. Also, if you have code, then you have a spec, because the code is itself a formal specification. It's just not a very convenient one because it is a very low-level spec.

I wish it had killed waterfall, but the large majority of enterprise customers I have dealt with, see it differently.

That's just not how software works in industry.

If by industry you mean “webdev” but fortunately “full stack engineers” don’t work on avionics or indeed anything important.

For me, working on contracts on Ethereum, I have found formal methods to be invaluable (we use a tool by Certora [0] based on CVC4 solver). The trade-off shifts once you enter the world of immutable code, and the time we've spent designing, reviewing and now formally testing has been invaluable to shipping our product with confidence. We tend to think of our formal specifications like the ultimate unit test (instead of testing a few cases, test every possible case). Also, the prover has found subtle bugs which have really put a smile on my face.

An example spec might look like this:

    description "Admin (and only admin) can set new admin" {
      // e0, e1 can hold any possible environment and are ordered sequentially
      env e0; env e1;

      // free variable for possible values of new admin
      address nextAdmin;
      havoc nextAdmin;
      // call to set the new admin
      invoke setAdmin(e0, nextAdmin);

      // assert one of two conditionals must hold for all cases:
      // 1. We were the admin and the next admin is what we sent to `setAdmin()`
      // or 2. We were not the admin, and the next admin is whatever is was before our call to `setAdmin()`.
      static_assert (
        e0.sender == invoke admin(e0) &&
        invoke admin(e1) == nextAdmin
      ) || (
        e0.sender != invoke admin(e0) &&
        invoke admin(e1) == invoke admin(e0)
It's a few lines of code to verify a simple function, but it's incredibly powerful for us to know that our code works in all cases.

[0] http://www.certora.com/

Formal verification totally makes sense for smart contracts as they are relatively small most of the time (around 500 LOC). When your program is tied to so much money, this is probably what you should be doing.

This makes sense, but I still think it’s really hard to formally verify immutable code against a mutable VM.

Your code can be correct today but the VM can change tomorrow.

This is like the quote about getting the right answer if you put in the wrong inputs.

Of course if you change the program semantics then the program changes. This is a limitation of literally all techniques, including testing.

I did an internship at a company building aviation hardware and software. Since everything was safety critical we used formal methods and things moved at a snail's pace.

When verification becomes the most important thing everything changes. The design specification doesn't make sense? It will take months to get it changed, write whatever code you can convince someone verifies against the design. Compile times are longer than an hour? Not a problem since the majority of the job isn't writing code but instead doing code reviews, verifications and writing tests. In fact writing code was about 15% of that job for the lowest level engineer. It was a whole different world than unregulated codebases are.

are you able to share which company? I'm just curious since during my aviation software experience I only heard we were just looking into maybe adding some formal methods approaches.

Sounds like a good job.

As a person who does and who's founded a company partially on the basis of the value of them, my opinion is that, in-part, it's because the ecosystem is extremely fractured, the tools are often an esoteric projection of the inside walls of one mad scientist's head turned into software, and they often don't scale very well (mostly due to the two previously mentioned conditions).

Additionally, the tools are often designed to empower a kind of pre-existing "priesthood" of a particular kind of verification practicioner, not necessarily to generally enable a more rigorous software development life-cycle by being approachable.

Granted we're trying to change a fair chunk of that, but that's sort of the status quo at the moment.

Lastly, there are some cultural forces in the software industry which aren't all that aligned or interested in making commonplace software development as rote, boring, and predictable as steel beam construction. But, I think those can eventually be overcome when people see the value of enabling a new bar for the sophistication of software innovations (e.g. like FEA and CFD did for mechanical systems).

What's your company? I'm interested in the commercial success of these tools and bringing down the cost of using them to industry-acceptable levels so they can be used in more projects.

Auxon Corporation (https://www.auxon.io/ a placeholder site, obviously. Low priority). Founded late last year. We're working on tools and technology for enabling people to build, automatically validate, and verify life-critical automation (e.g. highly-automated driving systems, industrial automation, etc.). But the ultimate goal is broad application.

Currently we work primarily with AV companies, automotive OEMs, and partners in their respective supply-chains. Our ambition is to create for Software & AI Engineers the kinds of sophisticated, accessible CAE tools & capabilities that have been enjoyed by Mechanical & Electrical Engineers for years. Software Engineers have done a phenomenal job enabling other disciplines, but we've done a less compelling job enabling ourselves in similar ways.

If you want certain technology to be commercially successful, it needs to provide value.

For the most part, most code are tolerated to be buggy, in exchange of fast delivery. And for most internet applications, a single error has limited scope, so the damage is controlled. For those applications, former methods are not something that could help them succeed, they don't even write a lot of unit tests.

However, in different scenarios, like financial trading or consistency protocol building, where a single mistake will either cause you millions or resulting in nonrecoverable data loss that make customers lose confidence in your product, formal methods ARE indeed being used.

So it is simply a cost efficiency problem. For most disposable application project, it is not worth it, so they don't invest. If the code of interests is critical, and deserve every effort to get it right, then formal methods will be favored.

I sometimes feel this also a language or tool problem.

For example an integer. In most languages and tools this is a primitive.

But most of the time we need a specific type of integer, for example a positive natural number. You can use a unsigned integer for this but that will include a zero wich we don't want.

So maybe you can use a type/domain object for this in your code, but what about the database field or communication methods?

It all can be done but with a lot of hussle.

I think it would help if we had more primitives that would work in the complete stack.

But then the question is: should ISO 639-2 also be a primitive?

And checks on compile time are quite insightful, Ada tends to know beforehand if you're gonna run out of bounds.

Or, don't split between primitives vs objects.

I think it's a mistake to make thet distinction in many language's design.

The result of that will not be easy to use objects, but hard to use primitives.

Can you give me more examples of FM used in finance? I feel like they should be used, but in my limited experience with finance-related gigs I've been aghast at the lack of even basic testing put in place...

At Imandra (https://www.imandra.ai/) we've been collaborating with a few big names in finance, from our website: "In 2017 Aesthetic Integration partnered with Goldman Sachs to help deliver the SIGMA X MTF Auction Book, a new orderbook venue implementing periodic auctions. Aesthetic Integration used Imandra, our own automated reasoning engine, to formally model the Goldman Sachs design of the SIGMA X MTF Auction Book, verify its key properties, and automate rigorous testing. Now that the venue is live, Imandra is being used to perform ongoing validations of trading operations."

From what I've seen, most finance development seems to be "work hard not smart" – throw a lot of developers with a wide range of skill, poor development workflows, and QA at a problem, and then back them up with shifting the liability to someone else. This mostly works, it just costs a fortune and results in moving very slowly.

It depends on the stakes mainly, but also the experience of the stakeholders of a project. Both extremes (heavily tested, proven code vs "let's hope this works" code) are common but usually you're not allowed to speak about anything because of non-disclosure agreements and proven or not, all code is manually tested like crazy anyway. You commonly get insane overhead and just working code.

To counter the exhaustive number of comments about how expensive, impenetrable, and unrealistic formal methods are in practice: I have been using formal methods in my work.

To borrow the parlance of the author, I use TLA+ -- a design verification tool -- in my work. Some of my work is in lightly regulated food and drug industries but most of it is not. It hasn't cost my company a tonne of money or delayed projects. In fact I don't think anyone has really noticed it too much.

Some features in our products use an event sourcing pattern and the architecture is built using a few different components that all need to work in concert. The system can't be validated to be correct with the source code alone. The integration tests can't capture every possible path through the code to ensure we only hit valid, desired states. For those features I maintain a couple of TLA+ specifications in order to ensure our design maintains the properties we desire: we only hit correct states, never drop messages in the face of component failures, that our event store is serializable, etc. And that has informed how I direct the team to design our software implementation.

I don't go to great lengths to validate that our implementation faithfully implements the specification but I do use the TLA+ models to inform the implementation and vice versa: sometimes we discover things during the development process that I use to update my models. One informs the other and it helps me to manage the complexity of those features with confidence.

It doesn't cost us a lot of time or effort for me to do this work. The customer doesn't care at all: they only see the user interface but they do expect their data to be consistent and that failures are not catastrophic. Formal methods have been helpful in this regard for my team.

Edit I started using TLA+ a few years ago to find and resolve bugs in OpenStack; it has been an effective tool and I always recommend learning it.

I learnt and used TLA+ while working at msft, and I'm still using it today:

- anytime I have some protocol or state machine whose behaviour isn't obvious, I write a TLA+ spec. - the process of writing it clarifies my understanding and leads to new insights. These insights directly inform the code and the tests. - the model checker makes it very easy to check sophisticated properties of the algorithm.

I don't use it all the time, but it's a tool I'm very happy I invested in.

> Some features in our products use an event sourcing pattern

Ah, sounds very interesting!

Are there any examples you'd be able to share?

E.g. one example of a TLA+ assertion that your event store is serializable, or whatever you can?

The trite answer is: because it's hard.

(An entire section in the article is devoted to this point.)

Programming languages and tooling have attracted phenomenal amounts of attention and iteration in just a few decades. Folks hereabouts just about go to the damn mats over the smallest syntactic quibbles.

Meanwhile, over in mathematician land, they're limping along with a write-optimised mishmash of "the first thing that a genius in a hurry scribbled three hundred years ago and now we're stuck with it".

I have very limited attention and brainpower and remembering what ∂ means in this book vs the other twenty places it's used in my library and where it means something completely, entirely, utterly different, is just hard. So too is remembering what Q(x) and W(z) mean for two hundred densely-written pages of stuff optimistically described as accessible to anyone who has taken highschool algebra (not mentioned: and remembering it exactly).

Notational ergonomics matter, dammit.

I think this is an exaggeration/strawman. Can you give an example of a mainstream introductory math book that defines some esoteric notation like "Q(x)" and then continues using it hundreds of pages later? Note that I don't mean defining it locally within the context of one proof -- that's something different.

The closest thing I can think of is notations like U(f, P) and L(f, P) for the upper and lower Riemann sums of a function f with respect to a partition P. These do show up in introductory real analysis textbooks, but they're pretty basic/fundamental notions.

By the way, as far as I know, ∂ in introductory textbooks can only mean either "partial derivative" or "boundary". Do you have examples of it meaning other things?

Most of what I've seen lately is queueing theory, markov processes or statistics, and yes, I have seen notation used for hundreds of pages at a time and I've seen these symbols used to mean different things (not including partial differential).

One of the books I read recently[0] has a table of where symbols and formulae are first defined. It's a godsend.

But I would prefer self-describing names to a lookup table. I don't put comments at the top of each file of code with such a table.

[0] https://highered.mheducation.com/sites/0073401323/informatio...

Math is hard, and reading it is slow for even ggood mathmeticians. However, it does get easier with practice, as one learns the conventions and norms. It is not self explanatory; aliens would not glance at a notebook and immediately know that it is a proof of the cardinality of the set of prime numbers, for example.

I think it goes beyond conservatism for most devs as to why you don't formally specify your design and its mostly rooted in why AGILE is so popular.

The sooner you show a finished product, no matter how broken and incomplete, the sooner your client can realize all the things they didn't want they said they did and replace them. If you give them a formal design document it might be ironclad and foolproof but it doesn't put the finished product in their face to scrutinize.

Money doesn't judge. It ends up in the hands of all manner of people. As programmers we want money for time, and the people with the money rarely have the education or willpower to even be able to comprehend a formal design. But they are the ones calling the shots with some esoteric vision locked up in their head we are meant to realize.

As someone who dabbles in art too there are substantial parallels between the two. You sketch your piece and constantly refer back to your commissioner to verify this is what they want. Almost every single step of a painting or animation or music production or anything creative a commissioner will change their mind on it, either because you suggest something better or because they recognize flaws in their realized vision.

Thats where "run fast and break things" even comes from. Making some piece of garbage "sketch" is way more informative to your average "commissioner" than having them write down to every intricate detail their vision for the finished piece. Because like with all other creation those we write code for rarely have the education to understand what they are even asking for. They have this nebulous idea of a fraction of a finished product and you need to basically teach them to see the whole picture along the way.

Until they actually have even a remotely reasonable grasp on what they want all the testing, specs, design docs, etc are wasted because you still don't know what you are actually making. The more endemic problem would be that we still don't have an accurate way to convey the long term value in doing it right once we have a glued together rapidly iterated abomination that now needs to be redone properly. Thats where formal methods I feel have the most value - a developer can produce a formal spec for a near-finished product that, when they always fail to achieve the spec because the money stops when it "looks done" at least the next guy months or years later who inevitably has to pick up and fix this mess has a better idea of the intent than the person paying them would on the repeat attempt.

I had a meeting with a customer today.

During the previous meeting one of their highest priority feedbacks was "X doesn't look anything like we want it, we want it to look like A,B,C,D...."

In the meantime I put it on the back burner to work on some other customer's requests.

Before today's follow up meeting I realized I hadn't made any changes to X, first thing they said "X looks great, don't change a thing."

I had done nothing, I double checked and nothing had changed in any way.

I suspect they just needed time to use it a bit and then they realized that it was in fact what they had asked for months earlier. Fortunately we were moving that customer into exactly the phase of "no changes, you use it for 6 weeks then tell us what you think".

Thorough automatic testing is valued by agile practitioners because it lets them move faster even thought it might initially seem like more testing means more work means slower iterations. I think a similar effect is possible when you use specification languages and model checkers as part of an agile way of working. The spec is at a higher level than the implementation and hopefully radically shorter and easier to adapt. The model checker then lets you find flaws in design changes faster than you would have without it.

Unlike the case of building an airplane, when you’re developing software you don’t know when you start sometimes if you want to build an airplane or a helicopter, or even if you want to fly, or indeed, if you’re building a vehicle of any kind, and not some kind of corn mill or a movie theater, perhaps.

The thing about software, indeed, most of it isn't space shuttles or life support systems. But it can go from paper airplanes to model airplanes to real airplanes more quickly than you'd think. Your hobby website could take-off to be an income and then a company and have leftover code all around.

It does usually go from paper airplanes, to model airplanes, to real airplanes. But if you set out to build a real airplane, you'd probably end up with a train.

> But it can go from paper airplanes to model airplanes to real airplanes more quickly than you'd think.

It seldom does.

> Your hobby website could take-off to be an income and then a company and have leftover code all around.

When it provides you with an income, you can justify spending more time on refactoring, tests and correctness.

You are right in general where the cost of failing is not that high, but consider software on flights, a pacemaker or (the best example everyone in FM quotes) a space mission. The cost of a small bug can cost you huge money/lifes. In these cases companies have an incentive to verify their code.

Hey you don't have to go to these special cases. Amazon, google, Microsoft all do Formal verification (or have started somewhat sophisticated testing) of their critical pieces of code as a bug in them even if it causes an hour of downtime will cost them in millions.

But most software is not on flights, pacemakers or space missions (or equivalent).

It's a cost benefit analysis, if the failure costs you more than verifying it then you will verify it. Since cost of verification is currently high, if you are not losing millions due to failure it doesn't make sense to verify your code.

Look at fuzzing. No one did fuzzing a decade ago. Since fuzzing became cheap (i.e. due to cheap compute and somewhat due to cloud computing) everyone does fuzzing on pieces of code which remotely appear to be critical.

Even testing. When cost of running a program was high people didn't write test cases. They hand checked programs for bugs.

Indeed, under those circumstances, yeah formal and semi-formal methods get used.

But since these circumstance aren't common [goto: very eloquent GP].

I didn't include it in the article because I wasn't 100% confident in saying so, but I'm reasonably sure that medical devices don't use FM. There's not a whole lot of oversight in that area, and most of the standards are (to my knowledge) about software process versus software methods.

Aviation also doesn't really use FM, but that's because they follow DO-178C, and that is a scary scary standard.

It would way cheaper for them to verify their code than to probably follow these standards but bureaucracies are bureaucracies.

Also, a little nitpicking in your write up but you should have also talked about type systems. They carry the spirit of formal methods but rather than being construct then check for correctness, are construction is correct by default. Type systems are widely accepted and show that some of these ideas can gain traction.

I haven't been anywhere near it, but DO-178C is allegedly friendly to formal methods; DO-333 is the formal methods supplement.

Adacore had a white paper, "Testing or Formal Verification: DO-178C Alternatives and Industrial Experience" that naturally I can't link because I'm on mobile.

I don't know how it's used in practice, but aviation companies dump a lot of money into formal methods research because getting their code certified to meet scary specs is very expensive and they don't want to do it for every bug fix.

Formal methods assume the difficult part of software development is producing code which precisely conforms to a specification. In reality the difficult part is to even get to an unambiguous specification with represent the actual business needs, which are often vague or unknown when development begins.

This lead to iterative development. But iterative development does not go well with formal methods. With vague specs and iterative development, the code itself ends up being the spec. If well-written, the code even reads like a spec, and during the iterative verification, it is proven if the spec/code conforms to the product owners actual (often unstated) requirements. Just take user interface design: It is incredible hard to design a good GUI for a complex application and requires multiple rounds of user testing and refinements. But if you actually have a spec for an UI, implementing it is trivial and can be done by the intern.

More fundamentally: If you do have the spec stated completely and unambiguous in a formal language...why don't you just compile the spec then? Seriously, why translate it by hand into a different language and then spend a lot of effort to prove you didn't make any mistakes in this translation? DSL's and declarative languages seem more promising as ways to bridge the gap between spec and code.

That said, there are obviously domains where formal methods are worth it, because you really want the extra verification in having the logic stated twice. Especially in hardware development which doesn't lend itself to iterative. It is just a small part of software development overall.

The most promising future for formal methods are in type system, since they are not separate from the code, so they lend them selves much better to refactoring.

Absolutely love that Hillel seems to have single-handedly assumed the role of taking formal methods to the people. His Apress title (https://www.apress.com/gp/book/9781484238288) is a wonderful example of technical writing that's somehow accessible, yet deep.


You might also like _Software Abstractions_ (https://mitpress.mit.edu/books/software-abstractions-revised...), by Daniel Jackson. I read the whole thing in two days and then immediately emailed him gushing about how much I loved the book. It's absolutely fantastic.

As some other people have pointed out, you have to pick the domain and application. Airbus controllers (abstract interpretation [1]), Microsoft software driver verification (SLAM model checker [2]) were ones where formal methods found use in the pre-2010 era. Since then, the domain has led to work that brings Excel the capability to auto fill data [3], toy demonstrations of how synthesize programs while verifying them correct [4], verify file systems [5], and now verify and synthesize smart contract code (whose deployments are expected to be permanent) [6]. This is just a sampling. There are many of other uses.

[1] Astree http://www.astree.ens.fr

[2] SLAM SDV https://www.microsoft.com/en-us/research/project/slam/

[3] flashfill https://www.microsoft.com/en-us/research/blog/flash-fill-giv...

[4] Invariant based synthesis https://www.microsoft.com/en-us/research/wp-content/uploads/...

[5] FS verification https://www.usenix.org/conference/atc17/technical-sessions/p...

[6] Solidity verification and synthesis https://synthetic-minds.com

Formal verification is alive and well in ASIC design land. I suppose it's especially suitable here because of the mix of late bugs being very expensive and that the designs are fundamentally restricted to not too many fancy tricks (for example, hardware can't be monkey patched in runtime).

But where I think formal really shines and saves a lot of time is in bring-up of new code. Using formal, it's possible to write a simple unit-test style test environment, and rely on the formal tool to insert all tricky stall conditions, etc. It's then possible to verify things that are normally tricky to both design for correctly, and also to test.

The problem with formal methods is that any useful program must interact with its environment (customers, filesystems, networks, sensors etc) in order to make a difference, and nobody is able to capture, comprehend or relate to the full specifications of those items and their subsystems. This results of proofs based on a lot of incorrect and incomplete assumptions.

This problem apparently even extends to the model-checking software itself, resulting in systems reported to be correct when then network message saying it is flawed was lost:


My biggest issue with TLA+ is how detached the tooling and syntax is from what is used by developers daily. PlusCal is only a bit better, but has other shortcomings. A lot of what it offers can be simulated by writing data driven/parametrized tests with NUnit/JUnit (it can be a lot of effort and error prone though).

Ideally, I'd like to have a toolkit with less friction and to have more real life examples that show not just how to do the first spec, but how to write code afterwards, update spec with time, update the code, etc etc.

I might be missing something obvious, but from my (small) reading about TLA+ (and the lamport video course), my biggest concerns were:

A: even if you understand the spec you want to write, it's possible to get the implementation of the spec in the spec language wrong (see the Die Hard exercise in the course) ; isn't there some kind of recursive catch-22 there ?

B: even if you understand the spec you want to write, and manage to write the spec you want to write, and get its model checked, etc... Than you still have 0 lines of working code, and (unless I'm mistaken) you're going to have plenty of occasions to not implement the spec properly, and still get all the bugs that you proved would not happen if you had written the code right (you dummy.)

So it seems like the interesting part is the "game of thinking harder about what you want to do", and it is (arguably) done (at least in part) by writing automated tests beforehand, and the tests also have value after you've written the production code.

That at least could explain why the jump to formal method is not natural.

Looking forward to being wrong, though.

There’s a huge misunderstanding that pops up in every discussion about “formal methods.”

These methods are not perfect silver bullets to guarantee bug-free software forever. They are supposed to be useful engineering tools that help you think and verify your thinking especially for tricky corner cases and interactions.

So then it’s a bit like “Why would I think about the code I’m going to write when I could later make a mistake in implementing what I though?” Well, even so it has advantages over not thinking.

Not to suggest that “formal methods” are the only way of thinking. They’re just not really what people imagine they are when they criticize them.

Fair enough.

If you have experience, what would be the rebutal to my fear of the "catch-22" ?

What I mean is, how often does it happen that you think about a problem right, but fail to transcribe that in a formal method, you gives you either too much confidence in a model that does not work, or makes you feel you have wasted your time on mental self-molesting ?

How do you approach "experimenting" with formal methods to avoid this risk ?

Most of the answers already given are good, but there is a simpler one: It's too expensive.

It's so expensive that even the highly regulated and safety aware aerospace industry avoids formal verification for non-essential systems when they are allowed to.

Moreover, formal verification only makes sense in combination with certifying the software on some specific hardware, because it would be a pointless waste of time to run software that is guaranteed to behave well on hardware that makes no guarantees that it will execute the software correctly. Consequently, even small changes in hardware may force you to re-evaluate the whole system. So the maintenance costs of high integrity systems are also very high.

I think the biggest reason is that most projects depend on an enormous mountain of software written in languages like C and C++, where we can't even be sure we aren't indexing past the end of an array or casting a NULL pointer to the wrong type, let alone prove higher-level correctness properties. (To be fair, it's possible to generate C from a higher-level language/theorem proving tool, but few people actually do this.) For instance, most of us use operating systems, compilers, language runtimes, databases, web servers, web browsers, command line utilities, desktop environments, and so on written in not-terribly-safe languages.

Formal verification might be worthwhile for its own sake, but for most projects you just need your own code to not be any more buggy than the rest of the technology stack you use, which you aren't going to re-create from scratch in a formally verified way because that's way too much work and the existing tools are just so useful and convenient and just an rpm/yum/git command away.

The low-hanging fruit right now is to eradicate undefined behavior at all levels of the software stacks we use. This is a daunting task, but I think we'll get there eventually. Once we've banished heap corruption, segfaults, race conditions, and similar problems to the point where developers say "hey, I remember those things used to happen in the bad old days but these days I couldn't cause them if I tried" and we can be sure that if we call a function with the same input it will return the same result every time consistently, then is the time to consider formally proving that the result returned is correct.

(That isn't to say that there aren't some domains now where formal methods are tremendously useful. Also, sometimes one can make a productivity case rather than a product quality case for at least using semi-formal methods; using a language with a strict type system can mean spending less time trying to resolve weird bugs.)

I haven't encountered formal methods in the wild since I left university in 1998. I don't know of anyone in my wider circle of friends using formal methods. Using it is by and large a niche thing in our industry. You see it pop up in places where people are desperate to commit millions in extra work just to avoid disastrous failures. I've never worked in such places. Use of formal methods is a last resort in our industry: you use it when you've exhausted all other options.

That being said, in the same period of time, people have learned to appreciate static typing and unit tests. Unit testing was not a thing until people started pushing for this in the late nineties. Around the same time, computer science students were infatuated with languages that were dynamically/weakly typed. Now 20 years later, type systems are a lot less clunky and those same people seem to be upgrading their tools to having stronger types.

E.g. Kotlin has a new feature called contracts, which you can use to give hints to the compiler, which then is able to use the extra hints to deduce that accessing a particular value is null safe. This allows for methods called isNullOrBlank() on nullable strings (String? is a separate type from String in Kotlin). So, calling this on a nulled string is not an NPE and even better, it enables a smart cast to String after you call this.

Also, annotations are widely used for input validation and other non functional things nowadays. I think annotations were not a thing before around 15 years ago (I might be wrong on this). I think Rust is a good example of a language with very strong guarantees around memory management.

Compilers of course have a very big need to be correct since compiler bugs are nasty for their users. So, there is a notion of some of this stuff creeping into languages. Combined with good test coverage, this goes a long way in being minimally intrusive and giving you stronger guarantees. But probably calling this formal methods would be a bit of a stretch.

A few points resonated with me:

Proving that the spec is the right spec is impossible (stealing a point from Roger Penrose). At some layer, you have a transition from an informal idea in someone's head to a formal specification. Proving that the formal specification accurately matches the informal idea is impossible by formal means, since the idea being compared to is informal. Put another way, it is impossible to formally prove that the transition across the informal-to-formal boundary was done correctly.

Formal methods burn up a lot of time, which could be spent on other things - like iterating and refining the design and (informal) spec. If you build the wrong thing, formally proving that what you built matches the formal spec doesn't actually help. You might be better off getting a prototype in front of users/customers and getting feedback, rather than formally proving that you implemented your initial spec or design.

And, one thing the article doesn't mention: Formal verification isn't bulletproof. Quoting from memory: "It's amazing how many bugs there can be in a formally verified program!" (I believe said by Don Knuth.) Worse, the verifier can also have bugs.

Note well: This does not mean that formal verification is useless. It's just not 100% perfect. 95% is still worth something, though. If formal methods are worth doing, they're worth doing for only 95% of the benefit.

I've always wondered how I can write a specification in a language which is complex enough to cover the problem, clear enough to be verified by the customer, and not have it be as complex as writing the code.

The solution is to write partial specs.

For example, if your client wants a piece of code which doesn't crash and sends a message over the network. You can write 2 partial specs i.e. 1) it doesn't crash 2) it sends a message over the network.

The partial specs can be individually simple but the program which satisfies all of them will be complex.

This idea of partial specs follow the general SE design practices as well, where we describe fine grain tasks the program has to do (like when I click this button, this should happen).

Another trick is to create domain specific languages to specify certain kinds of specs. There is a DSL to write memory safety constaints and a special DSL to write Distributed coordination between machines. Each of the specs written in the respective DSL is easy to understand but a program written in general purpose language won't be.

Any message? Or a specific one?

I mean, at my work, a lot of our customers don't even know what our program needs to send to some other program until months after the change needs to be live...

If I could write a specification that satisfied all those properties I'd be happy if it was even remotely as simple as writing the code.

Then learn TLA+ :)

Which universities teach formal methods?

- q=formal+verification https://www.class-central.com/search?q=formal+verification

- q=formal-methods https://www.class-central.com/search?q=formal+methods

Is formal verification a required course or curriculum competency for any Computer Science or Software Engineering / Computer Engineering degree programs?

Is there a certification for formal methods? Something like for engineer-status in other industries?

What are some examples of tools and [OER] resources for teaching and learning formal methods?

- JsCoq

- Jupyter kernel for Coq + nbgrader

- "Inconsistencies, rolling back edits, and keeping track of the document's global state" https://github.com/jupyter/jupyter/issues/333 (jsCoq + hott [+ IJavascript Jupyter kernel], STLC: Simply-Typed Lambda Calculus)

- TDD tests that run FV tools on the spec and the implementation

What are some examples of open source tools for formal verification (that can be integrated with CI to verify the spec AND the implementation)?

What are some examples of formally-proven open source projects?

- "Quark : A Web Browser with a Formally Verified Kernel" (2012) (Coq, Haskell) http://goto.ucsd.edu/quark/

What are some examples of projects using narrow and strong AI to generate perfectly verified software from bad specs that make the customers and stakeholders happy?

From reading though comments here, people don't use formal methods because: cost-prohibitive, inflexibile, perceived as incompatible with agile / iterative methods that are more likely to keep customers who don't know what formal methods are happy, lack of industry-appropriate regulation, and cognitive burden of often-incompatible shorthand notations.

Almost University of New South Wales (Sydney, Australia) alum, and it's the biggest difference between the Software Engineering course and Comp Sci. There are two mandatory formal methods and more additional ones to take. They are really hard, and mean a lot of the SENG cohort ends up graduating as COMPSCI, since they can't hack it and don't want to repeat formal methods.

> Before we prove our code is correct, we need to know what is “correct”. This means having some form of specification

I suspect that most bugs are introduced because programmers do not have a clear idea of what their code is supposed to do. There are simply too many levels to keep track of at the same time. E.g., in C you are manually keeping track of resource ownership and data representation at the low-level, while at the same time keeping the overarching architecture present in your head. The latter typically has several moving pieces (different threads, processes, machines) and keeping a clear picture of all possible interactions is a nightmare.

Formal specifications can help with this, by first specifying the full system and then deriving precise local (i.e. modular) specifications. In my experience, programming is typically easy once I have a problem broken down into isolated pieces with clear requirements. It's just that most systems are so large that getting to this point takes serious effort.

Here you run into a cultural problem. You need to convince people to put effort into something that they do not implicitly regard as valuable. It makes sense to me that people aren't using formal methods, unless there is very little friction when getting started.

The question begged on formal methods is, who is the customer of the design? To accept a formal design, you must also be bound by the constraints and timelines it specifies, which reduces your flexibility.

The perception the OP or someone here might correct is, formal methods frustrate corner cutting and other product level optimizations that return margins, optionality, performance bonuses, positioning and leverage, "moving the needle," and other higher order fudges that are realities of human organization.

One example is how producing a security design to provide assurances to exogenous stakeholders becomes a forcing function on feature commitments. e.g. "if you do this feature, you need these controls, which means if you don't have those controls, you aren't delivering the feature by date X, and now there is a paper trail that you know that. have fun at the customer meeting!" This is a crude example, but makes the point.

From a product and exec perspective, that forcing function knocks the ball out of the air / collapses the wave function in regard to "who knew what when," as it relates to making feature commitments to customers, sales people, and channel partners, etc. These are all people whose interests are not necessarily aligned. Much of enterprise tech business is keeping that equilibrium unstable in a kind of strategic fog while you build a coalition of stakeholders that gets your company paid. A methodology that forces decisions in those relationships will get mutinied by sales pretty fast.

(It's often hard to tell whether people are solving these higher order alignment problems or just being dishonest, and many of them don't even know for themselves.)

In this sense, formal methods solve engineers problems but not necessarily business problems. Playing the cynical customer, I don't care how much marginal residual risk there is, I only care who holds it. Above a certain level, detail just means "caveat emptor."

It's a critical view of org behavior, but finding a customer for the models themselves (other than compliance) would go a long way to getting them adopted.

Surprised that no one has mentioned this yet https://www.joelonsoftware.com/2005/01/02/advice-for-compute...

> After a couple of hours I found a mistake in Dr. Zuck’s original proof which I was trying to emulate. Probably I copied it down wrong, but it made me realize something: if it takes three hours of filling up blackboards to prove something trivial, allowing hundreds of opportunities for mistakes to slip in, this mechanism would never be able to prove things that are interesting.

Do people read "The Mythical Man-Month" anymore? The answer to questions like these are part of the book. First, there is no "Silver Bullet" which makes software development easy. Second, adding a layer of additional effort like Formal Methods necessarily either (a) increases development time significantly or (b) exponentially increases headcount.

I'm not suggesting that you avoid Formal Methods. It might really fit your needs -- for example, if you need high-quality code and can afford the expense, Formal Methods may fit the bill. The important thing is that not all development projects have the same needs.

> there is no "Silver Bullet"

Your second paragraph kind of walks back on this, but I think it's still worth linking to this essay:


In particular: ...[he's] welcome to keep shouting that tools won’t help, better languages won’t help, better process won’t help, the only thing that’ll help is lots and lots of unit tests. Meanwhile, we’ll continue to struggle with our bugs, curse our model checkers, and make software engineering just a little bit better. We don’t believe in silver bullets.

Again, your second paragraph already kind of agrees with this point, but the brilliant rhetoric in the essay is actually true: "no silver bullets" is a reason for, not against, the use of formal methods (and any other quality assurance mechanism we can get!)

> adding a layer of additional effort like Formal Methods necessarily either (a) increases development time significantly or (b) exponentially increases headcount.

It really depends. Especially in the case of formal methods for specification (as opposed to implementation formal methods), formal methods can save substantial amounts of time and manpower. And there are already a lot of success stories, and not just in safety-critical industries.

This is a side-effect of receiving a computer science degree and going into a programming job; you'll learn the skills to program, but all the CS theory stuff that you won't use really should have been more practical SE stuff taken from classics like Mythical Man-Month, Design Patterns, etc.

I think you're making a good point, but your tone is unnecessarily strident.

I'll disagree, I don't think they're making a good point.

GP implies that formal methods always add time to projects, which is going to be true for:

1) Projects that are building the wrong thing and spend time verifying the wrong thing (failed validation)

2) Projects with people inexperienced in formal methods (most projects today)

The latter is just a matter of experience. Once people get used to thinking about things in a more formal way, they'll get faster. Additionally, they'll learn which parts need to be shown formally and which don't (or which parts they can show via their type system versus those which need to be modeled outside the language and type system, etc.). I don't need to formally prove that my program writes message X to memory block Y, it can be verified by: inspection, tests. I do want to prove that my two processes correctly exchange messages (since the nature of the system is such that we needed to write our own protocol rather than being able to rely on a pre-built and verified/validated one). But here I'm proving the protocol, I still have to test the implementation (unless my language permits me to encode proofs in a straightforward way).

The former is a separate issue from formal methods. It's failed validation efforts which is solved (or mitigated) by something else: lean/agile approaches to project management (versus delaying feedback from customers, and consequently delaying validation activities).

well if I read you correctly he IS making a good point:

> 2) Projects with people inexperienced in formal methods (most projects today)

which is exactly what matters and what adds a lot of time to dev.

1) cannot be addressed by formal method and is one of the most important things when building software.

It’s a temporary problem, not a permanent problem. People need training and practice. I’d expect most people were slower when first using unit tests or TDD than they are today. It takes time to build that experience into the industry but it’s hardly an impossible problem. Especially as the tooling (PlusCal, Alloy) for lighter weight formal methods gets better and more accessible.

Most software complexity is in interactions between systems. Container frameworks, operating system, hardware drivers, random libraries, 3rd party APIs, cloud provider APIs, client browser, etc.

Unless all of these are specified in a single language / framework, you can't really apply formal methods to the system. Instead you'll only be applying formal methods to individual library functions / features. At least for code that isn't security-critical, these are already quite robust thanks to the use of strong typing (especially with powerful typecheckers for algebraic data types) and unit testing. Formal methods are merely an improvement on advanced typecheckers and unit testing frameworks.

The real difficulty is in verifying complex interacting systems - the kind of thing for which you have integration tests, which are quite difficult to write in an exhaustive way and catch edge conditions.

Note also that even for security-critical code, formal methods don't really guarantee safety either, since many classes of attacks are based on timing attacks, especially cache-based timing attacks.

Part of the problem is that TLA+ syntax and tooling is horrendous. Creating something more modern and widely adoptable is a key for Mass scale applications.

TLA+ is an ugly language, just like PlusCal. I'll never understand how people can come to the conclusion that it's fine to have syntax like:

* \A p \in people

* acc = [p \in people |-> 5]

* (-- ;) for wrapping PlusCal

* /= for inequality

* /\ for and

* seq1 \o seq2 for append

* and so on

All of these are from the Practical TLA+ book.

Honestly, TLA+ (or PlusCal) looks very impractical. If I want to use mathematical notation I'll do on paper, not write some frankenstein version of it in Eclipse.

At this point the whole syntax should be written off as a failed experiment and they should kindly ask Anders Hejlsberg or Guido van Rossum for help.

That's not the syntax, that's just the ASCII representation. The syntax, which is what you get when the Toolbox presents your spec in pretty-printed form, is:

  * ∀ p ∈ people
  * acc = [p ∈ people ↦ 5]

  * ≠ for inequality

  * ∧ for and

  * seq1 ∘ seq2 for append
That's fairly standard mathematical notation. In fact, one of the things I enjoyed about TLA+ is that I didn't really need to learn the syntax, as it's so standard. Even the non-standard bits (like alignment that can replace parentheses) are quite intuitive.

And if you say, well, I still have to type the ASCII, then you're missing the point. Specifications are not internet rants where you dump a lot of characters on the screen. They are usually quite short, and you spend nearly all your time thinking or reading them.

A good ligature font solves those...

I find TLA+ syntax and tooling to be far better than that of almost all programming languages.

E.g., here's a TLA+ spec I recently wrote: https://pron.github.io/files/TicTacToe.pdf

The syntax is as close to standard informal mathematical notation as formality can afford.

> The syntax is as close to standard informal mathematical notation as formality can afford.

which is not very helpful to the average non-academic software engineer

The entire language is based on specifying using simple mathematics (little more than high-school level), so this is the appropriate notation; a more programming-language like notation would be confusing as TLA+ is not programming. Those who have not learned basic maths notation in college can learn it in an hour (and automatically learn how to read maths in the process). The language is much simpler than Python. The entire language (except the proof language, which most people won't use at all) including all of the "standard library" is described in just a few pages: https://lamport.azurewebsites.net/tla/summary-standalone.pdf

I think it's probably also a programming-language thing. When I write in haskell, the sytnax of the language and construction of the abstraction hirarchy, with all the simpel and well-specified rules you have to follow, makes it easier to see what to proof and why. Also, with property based testing, it is essentially the substitution of proofs with anecdotal, empirical evidence. I feel like that if I would have the option, there are a lot of places where i would just naturally proof properties (I know about the practical difficulties of a lazy language...but I think they are a bit overstated. See richard eisenbergs thesis on dependent haskell).

I don't think this is obvious in python, at all. You are programming without thinking much about invariants and properties.

> This means having some form of specification, or spec, for what the code should do, one where we can unambiguously say whether a specific output follows the spec.

This seems to ignore a vast majority of possible software. It's easy to confirm/test whether a function, with no state, takes a certain input and produces a certain output. You can write that in a spec. You can unit test that. But that ignores the whole concept of state; the whole state of the system. If X happened yesterday, Y happens now, and X and Q are true, and it's a Wednesday and a full moon then the output is Z. By the time you've encoded that and every other variation into a spec, congratulations you've written the software!

Even aircraft software doesn't require formal methods. I have a copy of the DO-178C on my laptop. It's mentioned once in the glossary, once in the appendix A, and once in appendix B.

Formal methods would be great, but most people don't have extremely complex state machines where lives depend on it being right.

I had a section on "high-availability software" that I cut due to not being able to research it thoroughly. For the most part cars and medical devices and stuff don't use FM because there's pretty much no oversight. Aircraft, though, are a super interesting case here. I'll be honest, I'd probably trust code that's gone through DO-178C more than I would trust code that's been formally verified.

I _believe_ there's been a push to add formal methods to avionics, mostly because it'd potentially save money. But any FM stuff has to go through DO-330 and DO-333, which are just as rigorous and intense as DO-178C. The only stuff that has so far is, unsurprisingly, very expensive and very proprietary.

Having worked on aircraft software, including some very safety critical stuff, I wouldn't necessarily trust code that went through DO-178(A|B|C).

There's an unfortunate tendency (just like CMMI and other certifications needed/desired in that industry) to achieve the certification on paper by backfilling. DO-178 requires a formal spec document (NB: Not the same as formal methods) with traceability to the design and test procedures. Oops, we let it get stale (Read: Didn't keep up to date). I've seen people turn that into 5k pages with a good enough looking RTM that no one was ever going to read, and it got certified.

A system I recently worked on (left the project, it was toxic, or maybe it was me) had a "design" document that consisted of running the code through doxygen. While that did produce documentation, the code was reasonably completely annotated, it did not produce a design document (which should include rationales and things like, I don't know, state transition diagrams or similar to illustrate the design, not just the structure). And that system is flying today (has been for a while).

These things turn into games by the developers and managers, like hitting 100% code coverage in testing (which doesn't mean you've actually verified/validated anything).

Jtsummers, I don't disagree with you. The reality is the DO-178C is no different than PCI or other compliance standards, and there are ways to game it with docs.

With that said, if formal methods were really required to fly the plane then they would add it to the DO-178[D] without hesitation =]

Can’t speak for aerospace, but at least for automotive systems, the functional safety standards that dictate what you have to do for the most critical parts of the system are kind of a joke. They amount to a lot of test coverage, lint tooling, and extensive requirements management that manages to still be full of gaping holes. Formal methods are suggested on a “wouldn’t it be nice” level, but nobody thinks they can afford it. There are a lot of reasons for this; inertia, inexperience with formal methods, and the difficulty of the tools comes to mind. None of them are acceptable excuses, especially as we are asked to turn more and more control over to increasingly sophisticated ADAS (eventually autonomy) systems.

> I'll be honest, I'd probably trust code that's gone through DO-178C more than I would trust code that's been formally verified.

Well... depends on the company/team :-)

But I think the instinct is fundamentally reaosnable because even the most abusive interpretations of DO-178C tend to demand a lot more attention to quality than even very well-run software projects in other industries.

> For the most part cars and medical devices and stuff don't use FM because there's pretty much no oversight.

This is starting to change, maybe. The automotive companies have been snapping up verification people (and letting them work on verification).

> But any FM stuff has to go through DO-330 and DO-333, which are just as rigorous and intense as DO-178C. The only stuff that has so far is, unsurprisingly, very expensive and very proprietary.

So, it's been a while, but I seem to remember DO-333 being written in such a way that if Tool X went through the process, and if you could produce a deep embedding of Tool Y's logic into that Tool X, then you could piggy back on tool X's certification by using the Tool X embedding. Which is still a lot of work, but hopefully substantially easier than getting a new tool approved from scratch. Also, I could just be mis-remembering or confusing a dream for reality... its been a while.

For automotive software the ISO 26262 is the equivalent to RTCA DO-178 in avionics. ISO 26262 highly recommends that ASIL C and D-classified systems utilize semiformal and formal verification among other techniques to verify software unit design and implementation.

That is because the formal method supplement of the DO-178C is RTCA DO-333. There you have about a hundred pages concerned with the application of formal methods.

Good to know. I will buy this doc and give it a read, thanks.

Bitcoin is at a point in valuation where a huge effort should be put into using formal verification: a bug can worth hundreds of billions of dollars. One problem is that it's already written in C++, which is a language with incredibly complex semantics for program verification (and now it's too hard to move away, so the C++ program has to be proven to be correct). The main blocker is what others have written as well: it's very hard to do the formal verification.

I agree, and there are efforts to develop formally verified implementations of Nakamoto consensus (which I'm involved in). We've published a paper a year ago about our efforts and first results and I'm currently extending that work for my Master's thesis. Others are also building on top of that work [3].

We're nowhere near having a verified Bitcoin implementation (much less verifying the existing C++ implementation), but we're making steady progress. One could reasonably expect having a verified, deployable implementation in 3-4 years if people continue working on it.

It remains to be seen whether people would switch to that from Core (very unlikely) or Core would adopt it as a component to replace its existing consensus code (interesting possibility, but also unlikely).

[1] - https://github.com/certichain/toychain

[2] - http://pirlea.net/papers/toychain-cpp18.pdf

[3] - https://popl19.sigplan.org/track/CoqPL-2019#program

First you mention Nakamoto consensus, then you mention "existing consensus code". Are you including Bitcoin script when you say consensus? It seems like a huge task to formally model the whole consensus system, including OP_CODESEPARATOR. How would you replicate something like the BIP-0050 problem?


> Are you including Bitcoin script when you say consensus?

Currently, no. Our model considers "consensus" to be agreement on the ordering of transactions. We don't yet interpret the transactions.

However, agreement on the state follows directly provided nodes have a function (in the "functional programming" sense), call it `Process`, that returns the current state given the ordered list of transactions. If all nodes have the same the function, they will have the same state.

The BIP-0050 issue (in my understanding) is that different Bitcoin versions had different consensus rules, i.e. their `Process` functions were different. As far as I know, Bitcoin script is deterministic, so it should be possible to define it functionally. We haven't begun looking at this, though, and I don't know of anyone that has -- some people are working on verified implementations of the EVM, however.

> It seems like a huge task to formally model the whole consensus system, including OP_CODESEPARATOR.

It is a huge task, yes, but it can be done modularly such that reasoning about script interpretation can be almost entirely separated from reasoning about Nakamoto consensus. (This is a bit subtle, since the fork choice rule needs to check if blocks are valid, which relies on script interpretation.)

I don't really understand OP_CODESEPARATOR. As long as it is deterministic, it shouldn't be an issue.

It looks interesting, but I believe even verifying 100-200 lines in the real Bitcoin consensus code is more useful. All the Bitcoin Core developers know that the Bitcoin implementation is a mess, but they understood that it's just not going away. As for Bitcoin script, there's a plan to upgrade it to Simplicity (https://blockstream.com/simplicity.pdf), but even then, the old code will stay (there's a concensus that there won't be any hard forks for a long time unless it's really needed).

The last bug was in the caching code (a transaction could slip twice in a block which could lead to inflation), and it was something that only formal verification of inflation in the current code would catch.

My biggest worry is with digital signatures in the Bitcoin blockchain: if there's a bug in it, it's over, building trust again is extremely hard.

Almost 10 years ago on bitcoin dev IRC channel I was asking the devs to consider having the reference specification to be formally specified, so that all clients, are at least supposed / expected to comply with it. They were not interested...

The problem is that there's no such thing as a formal specification when running C++ code, and I believe Coq runs too slow to be practical. We have Rust now as an alternative language, but it's too late to rewrite the consensus code, so we have to work with legacy code.

This is why currencies like Cardano were created.

Cardano's proof of stake model has stronger assumptions than proof-of-work (if it wouldn't be the case, and the authors could prove it, they could work on improving Bitcoin itself...but I think they preferred to make money).

New altcoins just don't solve the problem of increasing the security of Bitcoin. Schnorr signatures are are great example of not making the cryptographic assumptions stronger (actually making them weaker) while improving the scalability of the system.

I think all formal verification methods are built from bottom up, and verification begins at standard C library. The bigger you program gets, the more dependencies you have to verify.

If your tech platform does not have a long history of formal methods, you will have hard time building the verification chain on your own.

In Javascript, for example, you will have to formally verify the whole browser, and all its dependencies, before you can meaningfully verify the code running inside it. That's very hard

The main reason is formal method is still not cost-efficient, not even close.

The problem is similar to statically typed languages. Of course, having a sound type system can help you avoid a lot of stupid mistakes. But the key point of modern statically typed language does not lose a lot of velocities. Sometimes even faster because there are better IDEs out there for auto-completion, importing, and much other stuff. I switched from JavaScript to TypeScript only to find I'm faster to write code without losing anything.

Before we make formal method widely accepted by the industry, we probably need depdent type languages first. An excellent example is Idris. https://www.idris-lang.org/ It's a dependent type language, and it can also apply formal method to prove theorems. With its IDE, it is more like pair programming with the type system, it can really fill up a lot of code according to type. This kind of system could have huge potential to be much more powerful than say, IntelliJ IDEA with Java etc.

After then you could write code very faster, you could optionally write some proof for the core domain of your system if you like.

I have used statistical analysis and formal methods in production code. Mainly Astree, https://www.absint.com/astree/index.htm

When you have bought the tools to verify code in some safety critical project and know how to use it, it's easier to use it in less critical software too. You will soon hit the wall where formal correctness loses to time constraints and new features. You end up verifying only the code that is used repeatedly in multiple projects and does not change.

Formal verification is not substitute for testing and code review. You can verify that the code is correct but you don't know if you have the correct code.

The answer is: Because correctness of programs produces only limited value for normal use cases. It's more important to add more features than it is to produce solid code. If you don't have code reviews where all new code and changes are being read and analyzed by others (increasing the cost of the software 5-10 times) your interest in correct code is limited.

because (as the fine article says) "proofs are hard"

we don't do hard. we don't use APL. we don't use z or coq or isabel or any number of things, because we don't like hard things.

proofs are hard. its that simple.

I love the concept of formal methods. As someone who pushes for more determinism in software(yay state machines), I have long dreamed of using them to prove correctness. And then I go look at the goofy, domain-specific language I'd have to learn, roll my eyes, and get back to doing 'real work'.

Formal methods will win when I can point the tools at a mess of C/C++/whatever code and have the tool figure out the rest with minimal overhead from me. Sure, sprinkling a few hints in my code would be fine, but trying to implement my design in a formal language spec and then... convert it to my chosen language, with the chance that I'll introduce bugs, seems like a waste of time.

I found this quote a while ago while researching state machines. The author is using a moderately complex hypothetical system as a reason not to use a state machine.

“... For instance, you are asked to create an online order system. To do that you need to think about all the possible states of the system, such as how it behaves when a customer places an order, what happens if it is not found on stock or if the customer decides to change or delete the chosen item. The number of such states could reach 100 or more.”

This is why so much software is broken. The author just admitted that he can’t be bothered figuring out how the system he’s about to build should actually work. That’s a deeply strange mindset to me.

Well, state explosion is a real thing, which is why we have things like hierarchical state machines. There's almost always a trade off between capturing everything in a state variable and allowing a few flags and conditionals. The classic example is a keyboard state machine where you don't actually have states for each key pressed, you just store the key in a variable.

"Before we prove our code is correct, we need to know what is 'correct'. This means having some form of specification, or spec, for what the code should do, one where we can unambiguously say whether a specific output follows the spec"

What's the difference between the specification and code at that point? If it's that detail oriented, then isn't the chance of bugs in the specification at least as likely as being in code?

Back when I did my CS degree, we were taught the "B-method": https://en.wikipedia.org/wiki/B-Method

The course dropped in popularity with students and the university eventually dropped it.

Looking back on it, it really should've been a mandatory course for all software engineering students.

In the 90s, z-notation was taught in many UK CS courses.

Since I started designed before code I am at least 2x more productive, rework was reduced to almost zero. Anyway, it is difficult for us to understand things really abstractly, and to measure the future gain of not so fun activities.

Last week I took part in a formal methods workshop where one of the topics discussed was "why we don't have formal specs for most things" (which is a more modest ask than verification). Many things we came up with were what you'd expect: additional cost (which has to be taken into account when changing programs), missing incentives, lack of awareness of FM existence, lack of training, unpolished tooling.

Other, less obvious ones included the inherent difficulty in writing the specs. Namely, FM is about verifying that a system model conforms to a specification. For this to bring value, the specification has to be significantly simpler than the model that we can deem it "obviously correct". But such specifications can be extremely difficult to find. On one hand, they are abstractions of the model, and finding good abstractions is one of the hardest parts of software engineering. Another way to think about it is Kolmogorov complexity; at some point you cannot "compress" the description of a problem any further, i.e., some problems are just difficult by nature.

On the incentives front, one of the interesting viewpoints brought up was that having formal specs for your product can make you more liable for defects, and that some (in particular, hardware vendors) might prefer to have wiggle room there. E.g., consider the Spectre/Meltdown fiasco and Intel's response. Another thing brought up there was that software vendors often don't directly feel the cost of the bugs in their software, at least not short-term. This is different in hardware; e.g. ARM uses FM extensively, though I don't know if they commit to specs publicly. The RISC-V architecture comes with a formal spec, though.

On the other hand, I think that many of the current trends in the computing world are making FM more aligned with the incentives of a core part of the industry. The cloud brings centralization, with few large companies (Amazon, Google, etc) providing a lot of complicated software infrastructure that's critical to their success. An outage of an AWS service directly costs them significant money, and can easily completely halt the customers' operations. Moreover, they operate in an open and potentially hostile environment; a corner case software bug in MS Office wasn't a big deal, and maybe could be "fixed" by providing the customer a workaround. But such a bug in a cloud service can become an exploitable security vulnerability. And in fact you do see AWS having a dedicated formal methods team. Also, the TLS 1.3 development included formal verification. On the other hand, AFAIK Google doesn't employ FMs that much, even though they are in the same game as Amazon with GCP; so there's definitely a cultural component to it as well. But I still believe that it's likely that we end up with a small core of companies that provides a lot of critical infrastructure that heavily uses FM, and that we see little adoption of FM outside of this circle.

As for the article, I agree with most points made, though it unsurprisingly has a bias for using TLA with TLC. I agree that TLA/TLC can often be the best solution, but there are caveats. For example, models can be awkward (e.g., recursive data structures), the specification language is limited, and "just throwing more hardware at it" doesn't really work that great for exponential-time problems.

It's simple really: people are happy to pay $ for software that kinda works most of the time as long as it (most of the time) solves the problem they want solved. Always follow the money!

Does anyone here have any experience using semi-formal methods, like 'cleanroom' techniques? The commentaries I've heard (e.g. [1]) tend to be very enthusiastic, but I don't have enough experience to judge their credibility so I'd love to hear what others have to say on the matter.

[1] http://infohost.nmt.edu/%7Eal/cseet-paper.html

At one point a while back I emailed Stavely to learn more about how it's changed over time, but he's been out of the software world for over a decade now. Of the people I've met who've been on cleanroom projects, the _tentative_ impression I got is that it's fast and it works but it's really taxing on the people involved. One of these days I want to do more careful interviews and the like.

In some area it is almost impossible to write complete specs.

CSS is originally designed to be a end-user spec language to change the website as they like. But layout is so complicated that it take a lot of experience to define everything in position dynamically as requested.

And now people are using programs (js) to replace the job of CSS, to some extent giving up on writing specs for layout, even with years experience on it.

The hard thing is not the language itself, it is often the domain.

The cost / quality trade off isn’t worth it for most applications. In fields where it is (aerospace, smart contracts, etc.) then formal methods are used.

Not a very exciting answer, but I think it's just that; cost vs. benefit.

It's also a lot easier to do change management on an abstract model of the software you're working on than the design of that one program itself, because the design choices made by other sister programs can be immensely helpful. It's the inversion of making decisions in a vacuum. What do you call that?


Edit for "random" keywords: model-driven engineering with domain-specific meta-modelling languages

Am i not already using it when I Ctrl-Space in IntelliJ?

It is because formal methods rarely agree with the cost benefit analysis of doing so.

At some point it is less expensive to just build the car and see if it rolls.

There is a lot of work done now with formal methods used for smart contract verification. MakerDAO formally verified correctness of their contracts last year: https://github.com/dapphub/k-dss/

> In fact, the vast majority of distributed systems outages could have been prevented by slightly-more-comprehensive testing. And that’s just more comprehensive unit testing,

Key takeaway for me as far as code verification goes.

Design verification reminded me of a few subjects I studied in college.

Take that claim with a grain of salt. More comprehensive testing still requires deep expertise in resilience/reliability to even just understand what to test for. And you can't have that unless you both study that domain and experience first hand the sheer amount of problems such systems have in the wild.

For example, you might assume there is no difference if you establish TCP connection to a port 12345 or to a port 12346 of the same server. But there is, they might belong to different buckets in a network stack somewhere and one bucket might have too many packets in it and be slow and overloaded, while another isn't and be perfectly fast. This could easily cause outages. Distributed systems are strange like that.

It takes too long.

I think the easy answer is: "because life isn't formal"

There is an intersection.

The places you might need defined and formal methods to make people happy and the places you might actually want to sling some code to make people happy? That intersection point?


when I was at university we did a module on Formal Methods including writing Z Schemas. Out of a class of 30 only two people really understood them.

That's why they are not generally used, no one (or at least not enough) has a clue how to create them.

I doubt they are taught on general Computer Science degrees anymore.

Do life dependent code like those in NASA use these type of method to validate their system?

it's hard and mostly doesn't justify the cost.

Because they're slower than brute forcing your way.

Excellent E & O insurance policies.

Because the Agile Manifesto.

> Why Don't People Use Formal Methods?

Because they're an emerging and unproven technology? Why don't people rewrite everything in Rust? That would surely be a lot more straightforward than adopting the sorts of "formal methods" where you only write a handful of (C-language equivalent) LOC per day, across the software industry! So why doesn't it happen? There's your answer.

Formal Methods are 30+ years old.


AI is 30+ years old too, and formal methods (especially things like solving SAT, or theorem proving) used to be considered AI. I'm not quite sure what your point is; just because something has a 30+ years history doesn't somehow make it a mature, usable tech.

Static type checking is abstract interpretation. If you program in Java you are using formal methods.

So we do use formal methods! Problem solved.

The proofs you produce are probably weaker than those the author wants

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact