
The Business Case for Formal Methods - whack
https://www.hillelwayne.com/post/business-case-formal-methods/
======
Tainnor
I've long been intrigued by pure FP, advanced type systems and proof
assistants from an academic point of view but haven't yet been convinced that
the effort involved in learning such systems/methods pays off in most cases
(without taking into consideration the costs you still have when you already
know the methods well). I will still try to learn more about this, but without
any strong conviction that this needs to be immediately useful.

By contrast, I've only more recently started hearing more and more about model
checkers such as TLA+. The case studies in this blog post serve to give a good
motivation for why such systems might be useful (as well as honestly
presenting drawbacks; something that advocates of e.g. PFP rarely do). I would
love to use something like that one day.

As an aside, I still think it's a disgrace how irrelevant empirical research
is to most aspects of today's software development. Another blog post by the
author cites a paper claiming that very simple testing, especially of error
conditions, would be able to prevent a huge chunk of production bugs, but
things like that are rarely talked about. The whole profession is
unfortunately amateurish and always obsessed with fads and trends instead of
taking a more analytic and empirical approach which makes me sad.

~~~
keith_analog
The Redex modeling tool [1] is an interesting point in this space. Its
specialty is in making executable specifications of domain-specific languages.
Redex models can be tested by the unit-testing framework of the Racket dialect
of Scheme. There's plenty of excellent documentation on Redex, and it has some
neat visualization tools. I've gotten plenty of value from using it for my
research over the years, but suspect it'd be useful outside of academic
research.

[1] [https://redex.racket-lang.org/](https://redex.racket-lang.org/)

------
kats
Might want to show some examples in the "Why Not To Use Formal Methods"
section too.

Somewhat old numbers from seL4: [1]

\- 2,500 lines of C code

\- 100,000 lines of proofs

\- ~7.6 person-years of full time work (with a lot of involvement from PhDs)

Interactive theorem proving definitely will not scale to significantly-sized C
codebases anytime soon.

[1]:
[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5597727/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5597727/)

~~~
nine_k
Maybe C is just not the best tool for such code.

I would suspect that (subsets of) Ada, or, Rust, or maybe OCaml would be more
fitting while staying on the industrial side. (Likely Haskell or Idris would
be even easier targets, but their industrial adoption is quite limited.)

~~~
gindely
How does the industrial adoption of OCaml compare to Haskell? (The relative
placement of Ada, Rust and Idris seems apparent.)

~~~
nine_k
I know two large companies using OCaml to a serious extent: Jane Street
(obviously) and Facebook (as Reason).

I can't remember any large company where Haskell would be front and center,
but a number of smaller companies doing so exists. There was a post about
running Haskell in production recently here on HN.

------
millstone
The case studies are all race conditions in distributed systems. Is TLA+
useful for complex serial systems like compilers, etc.?

~~~
hwayne
It's possible to use TLA+ for complex serial systems, and it can def help
there. The thing is that TLA+ is designed to be good at modeling concurrent
problems, problems that have a lot of specific challenges you don't see in
serial systems. So if you know your system is serial and nondeterministic,
then you're probably better off with something that focuses on that part of
the design space.

To paraphrase Edmund Clarke's Turing lecture[1], the problems of verifying
software include "floats, strings, user-defined types, procedures,
concurrency, generics, storage, libraries…" If you're not worried about
verifying concurrency in your tool, you've got a more resources to devote
towards verifying the other things.

That's also a space I don't have as much familiarity with, so I don't know
what the best tools are there. My vague understand is that at that stage the
problem "simplifies" enough that it's a lot easier to verify the code itself.
Like how CompCert isn't just a model of a compiler, it's an actual correct
compiler. So you can apparently work much closer to the code when verifying
serial things.

At one point I got a bunch of people in FM to write provably correct versions
of Leftpad.[2] Maybe one of those would work best for you? Note also that one
of the examples uses proves Java code correct via inline JML annotations.

[1]:
[https://amturing.acm.org/vp/clarke_1167964.cfm](https://amturing.acm.org/vp/clarke_1167964.cfm)

[2]: [https://github.com/hwayne/lets-prove-
leftpad](https://github.com/hwayne/lets-prove-leftpad)

~~~
zozbot234
Do you really mean "nondeterministic"? AIUI, what people call 'TLA' was
designed by adding a bunch of modalities (you can think of these as "monads"
if you like[0]) for nondeterminism and "time" (state transitions) to standard
propositional logic. The rest of it is really a matter of what's idiomatic,
such as using the nondeterminism modality to account for spec refinement as
well. So it would seem that TLA+ should be enough to deal with these cases.

[0] Or burritos. Mmm, burritos. /s

~~~
pron
I think he meant to write _deterministic_. The exact nature of the logic TLA
is not the relevant point here. TLA (and so TLA+) extensively relies on
nondeterminism for things like specifying code at arbitrary levels of detail
(e.g. you can say, after this step, the list is sorted somehow, without
specifying how), interaction with the environment (the user performs one
operation of the possible ones, or some arbitrary machine fails) or
concurrency in its programming sense (the OS will schedule one of these
several operations next).

~~~
zozbot234
I agree, mostly, but understanding "the exact nature" of a logic (or at least
a properly-defined subset of it) is pretty important for using it effectively.

~~~
pron
If you're interested, I've written a rather detailed explanation of the TLA+
logic. Part 1 is an introduction that explains the design from a UX
perspective, part 2 is about the + side of the logic, and parts 3 and 4 are
about TLA: [https://pron.github.io/tlaplus](https://pron.github.io/tlaplus)

------
MasterPI
The use of formal methods in NASA proves that they are useful and that the
software industry should give it much more importance in my opinion.

[https://ntrs.nasa.gov/search.jsp?N=0&Ntk=All&Ntt=formal%20me...](https://ntrs.nasa.gov/search.jsp?N=0&Ntk=All&Ntt=formal%20methods&Ntx=mode%20matchallpartial)

~~~
Nursie
It proves they are useful in the extreme circumstances that NASA projects
represent.

It doesn't prove that they provide a good ROI for a business looking to ship
CRUD apps yesterday.

~~~
coldcode
Our mobile apps derive actually revenue and customer satisfaction, with
budgets already hard to justify (high 7 and low to mid 8 figures) and alway
live in a highly complex service and operational environment. If you added the
cost of formal methods nothing would ever be approved at all. The alternative
of not doing formal methods and having stuff that mostly works and makes us
money and customers reasonably happy is good enough (not crud apps, not even
sure wtf that is in a real business, is Facebook a crud app? (maybe crap app
but still)).

I could see formal methods for things that might kill people as possible but
even there spending all of your money to deliver perfect code vs still being
in business with something good enough is an easy math for most CEOs.

~~~
Nursie
Create Read Update Delete - often refers to apps that are largely concerned
with entering and transforming data stored in a backend DB.

------
jacques_chester
If we're going to talk about the formality of methods, let's remember that
"collection of positive anecdotes with back-of-envelope guesstimates" is not
what's meant by a "business case".

An ROI or IRR calculation might not have anything like the elegance of a
proof, but it's a method and it's formalised.

------
symplee
The next big breakthrough will be a program that can convert TLA+ directly
into production code. And create all of the necessary tests, with a formal
proof that the code is correct.

~~~
maxfan8
This is not always possible due to the halting problem [1] and Gödel's
incompleteness theorems [2]. It is not always possible to formally
determine/prove the behavior of a program.

[1]:
[https://en.wikipedia.org/wiki/Halting_problem](https://en.wikipedia.org/wiki/Halting_problem)

[2]:
[https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_...](https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems)

~~~
Groxx
Doesn't really matter if only unrealistic edge cases are unprovable.

~~~
TuringTest
The problem is, you don't know that. The software industry is this complex
because obscure yet realistic edge cases happen much more often than you'd
expect.

~~~
Groxx
So the ones that aren't edge cases can still benefit greatly, rather than not
at all.

~~~
TuringTest
What good is applying formal verification software to have proof that part of
your input behaves properly, but edge cases are not proven? This is already
what test cases do.

------
an-allen
Saw Hillel speak at YOW! Melbourne. Great speaker. Prop comedy and formal
methods. Highly recommend.

------
Nursie
That's not so much a business case as a contrived example. I absolutely would
expect system testing and QA to find such a bug or the testing isn't up to
scratch.

What you're proposing is that all engineers now effectively learn a new
language and translate requirements into it. This expects requirements to be
fully fleshed out themselves (a luxury few have these days) and relatively
static (or everything needs to be reworked).

Good luck with that...

~~~
zozbot234
> I absolutely would expect system testing and QA to find such a bug

You can't test what you don't have. With design-based formal methods you can
"fuzz" the high-level design and spot very real problems in it, before writing
a _single_ line of ordinary code.

~~~
Nursie
That's not what the article is claiming though, and in fact when the code is
written is irrelevant to the business case. We're talking about spotting
production issues before they arise.

The article makes the very strong claim that QA aren't going to catch the bug
that arises from this design omission and it's going to cost your company
money as everything goes wrong in the field.

I don't believe it would get that far IRL as any dev with a brain is going to
spot it, and even so QA should certainly have test cases for this sort of
situation if they're doing their job right.

You can argue this may be a better approach, and catch potential problems
earlier, have at it. You can argue that there are classes of error that this
will catch that normally you wouldn't (some of the preamble and larger scale
stuff does seem to show this) But the illustrated example doesn't hold up to
scrutiny and is overstated. As such I think it undermines the point that's the
author is trying to make, because it looks to me like something we'd catch
anyway.

~~~
ones_and_zeros
I think the argument is if you've been around distributed systems long enough
you will encounter race conditions. Sure, it's ok to say "Well, the testing
infrastructure isn't up to snuff, so we just need to fix it" but at scale this
is impractical.

Check out the fallacies of distributed computing[0]. If your testing system
can simulate all of those edge cases, it probably looks a lot like TLA+.

[0]
[https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...](https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing)

~~~
Nursie
I agree, once you've been around parallel and distributed computing for a
while you do notice this stuff, which is why I think the example given in the
article isn't a particularlyt good one - that's a _really_ noticeable design
omission and I would expect it to be caught pretty early in any dev/QA process
as the developer implements the functionality and thinks "Hey, what if ..
don't we need to invalidate the other offers?"

I'm sure there are good cases for using TLA+, I'm sure there are situations
where it's not only useful for catching errors before they even happen, but in
which this more than offsets the upfront costs of the exercise.

I guess I just came away from the article not feeling that such had been
demonstrated, in fact I came away with the feeling that the example was
contrived to fit the agenda and didn't actually show much.

~~~
ones_and_zeros
I think it's a great example. It boils down to simple code that may not be
obvious how it behaves within a degraded network. Your dev/QA could even have
the thought and actually test it but it work under ideal circumstances giving
them false confidence.

I'll go out on a limb and say it is nearly impossible to design and build a
test environment that can simulate all network conditions, so that even in
trivial cases where a dev might _know_ for a fact that there is an issue,
it'll be incredibly hard to reproduce it.

Maybe put another way, formal methods give cover to dev/QA to avoid shipping
known but hard to prove buggy code. Bugs they will ultimately be held
responsible for.

------
bordercases
I like how it emphasized the "debuggable" aspects of FMs.

Most people believe FMs are not resilient due to their coupling with the spec.
What if the spec changes?

Saying that FMs are a _debuggable_ form of specification, i.e. very close to
debugging the design itself, turns that criticism on its head and shows that
it is trivially the case that you can iterate on or prototype with FMs.

~~~
galaxyLogic
So what is a "FM"? Isn't it just any very high-level, preferably declarative
programming language?

~~~
bordercases
I asked for a comparison between TLA+ and Scala/Haskell to a dev that worked
on the AWS application of TLA, this is what they responded.

[https://news.ycombinator.com/item?id=22100536](https://news.ycombinator.com/item?id=22100536)

You're right in a limited sense, except that it's not _any_ language, it's
_particular_ languages that can count as a Formal Method in the sense of the
OP (i.e. can do model checking in some kind of termporal logic).

------
tluyben2
The problem is that explaining the upfront cost to 'the business people' is an
issue. In many cases, budgets for dev are yearly and that is kind of an
horizon for most people (even some who should know better), so even if i'm
going to say that your TCO is probably (which is a problem in itself, but
probably statistically) going to be lower with formal methods than without,
that assumption is spread over many years. There are enough systems that
require vast refactoring every year because they were badly done in the first
case, but that's the concern of every year's budget/manager, not of the one
sitting the current year.

That goes for verifying code as well, however that is even harder to sell as
it's so much more expensive and even though I 'feel' it is a good idea, I
cannot really estimate if the TCO would be lower than just patching with
bandaids over the years.

