Hacker News new | comments | show | ask | jobs | submit login
My first experience with formal methods (zipcpu.com)
96 points by matt_d 12 months ago | hide | past | web | favorite | 19 comments

I think this is a very typical first response to formal methods. But eventually it becomes clear that ultimately it's not really solving any problem, but simply kicking the can to another problem. And in my opinion it's a step backwards as the can you're kicking it to is arguably more complex than where you came from.

Depending on the system you're using formal methods will rely on some set of preconditions (state before a function), postconditions (state after a function), and general state. There's nothing magical about formal methods that prevents errors, either logical/technical or by omission, here. And suddenly very small changes in program flow/logic can result in enormous amounts of work.

They're really fun and interesting, and also help you look at your code in a different way even when not using them. I think everybody has this sort of 'epiphany' moment the first time they actually play with them. But they don't work out in practice anywhere near as well as they do in theory or very limited scale. Something you quickly realize as the effort involved increases exponentially against linear increases in program size/complexity.

I disagree. Good formal systems allow you to specify global correctness conditions (e.g., "the database is always consistent", "data is never lost if a machine fails") in a concise and clear way (a couple of lines). The correctness of that specification is then relatively easy to verify by inspection. You then check that the system as a whole satisfies those requirements.

It's true that to date we don't have tools that can efficiently verify a program all the way from code to the high-level global correctness conditions (and, TBH, I don't think we'll have them in the foreseeable future, if ever), but there are tools that help narrow the gap: code-level tools that work at the code unit level (function preconditions), and high-level tools that work at the large-scale system description. Using both (which is exactly what companies that use formal methods for software, like Altran UK, do), let you work from both directions -- top and bottom -- to minimize the "uncertainty" gap, and, most importantly, they do it affordably.

It's also true that formal methods are a tool that we've not yet learned to effectively use in practice, and don't yet know the best ways to deploy it. But we're learning, and we're already at the point that at least some are seen considerable payoffs.

All of the above is true, but can also be said of software testing, with the difference that testing scales better by giving you less guarantees.

It is really just a matter of choosing the right tool for the job. Sometimes you need to be really really sure everything goes as you expect, and testing alone won't cut it.

I'm not even sure testing always scales better -- it depends on your formal tool -- but it's definitely different. For example, some properties -- especially of concurrent/distributed systems -- are nearly impossible to test.

Also, the level of confidence offered by formal methods is often configurable to lower costs. Some formal methods can affordably scale to very large systems, with the tradeoff that the confidence is lowered, as it only covers the general algorithm design rather than actual code.

Without going to full functional proof in Spark (for example) you can go easily as a first step for proof of data flow peoperties ('ooops this buffer size variable is not initialized in this strange corner case'), and then without too much effort (compared to exhaustively testing) proof of absence of runtime error (aka this code won't crash). Of course as GP remarks, you'll have explicited all your preconditions and invariants (types or contracts). But that's a choice : either treat the damned corner cases in your code or expose them to your user ("this can't work on input buffers that aren't power-of-2"). If you don't treat the corner case, instead of putting it in the docs, or in the header comments, or asserting right when you start, add it as a precondition or type invariant/predicate. You then get all the power of automated proof tools (please use all my cycles and RAM but prove this) to prove that without the corner case you won't crash and your users get to check (by proof, static analysis, testing, fuzzing, 'careful code reading'...) that they never call your API the wrong way - if they want.

I think Frama-C & Spark should sell a bit more the AoRTE stuff. If at least your software won't crash or use uninitialized variables, you'll sleep better. The world would probably be a better place if things like x509, asn1 decoding code and all kinds of file format parsers were proved for 'just' the absence of runtime error...

The thing that is nice with most modern formal verification tools is that they're not all-or-nothing propositions. Suppose you already have some Spark code that you've checked (proof) for AoRTE, but didn't go for full proof of all the functional properties of your software (you went for pure-testing with a pinch of static analysis and some code review on critique stuff). You're already high on the correctness scale... You get a report from a customer or security researcher or code auditor that there is a strange behaviour in your code. They can't nail the reason but sometimes you send inconsistent data. No crash or memory corruption, just an output that seems impossible to you (even after full code review and exhaustive test) and out of spec anyways. No reproducer, no idea what was sent to your system, just 'please stop sending us inconsistent data'. One way to find the bug is to try to prove it's not there. Add a post-condition 'fields a and b and c should never be 0 at the same time' and use the tools. Maybe start with structured testing following the code yourself... Then go for unsound static analysis or fuzz testing. Then once you've ran 2 weeks of afl and hongfuzz and didn't trigger this post- and once you've exhausted all the codepeer (or PVS studio or Coverity) false positives and still they didn't find the code path to your bug, and you still need to find this bug, you can use the proof tools and be sure you'll get an answer (after some work) on this specific question.

2 big caveats (for Spark at least) : you'll have to restrict yourself to the Spark subset of Ada (mostly, no pointers, no aliasing - but this subset is deemed to grow once the proof tools improve...) and second : automated proof with floating point numbers is not there yet and it is a hard research topic.

It's worth mentioning that a mature code-level specification language similar to Frama-C's ACSL and SPARK also exists for Java -- JML (see http://www.openjml.org/). In fact, ACSL is based on JML.

True, and there was a question on this topic during the Frama-C and Spark day this year in may in Paris, about similar tools and environments in java. People in the room cited JML and it being the inspiration for ACSL but the conclusion seemed to be that the proof technology wasn't so advanced and supported ? I'm very interested if you have more info on this, I'd be happy to be mistaken.

If you look around there's also EVE (for Eiffel) that B. Meyer mentionned some time ago on his blog. They didn't have to add a specification language or to supplement their original language, it was built in Eiffel from the start :-D.

One funny thing about specification languages is that they 'need' to be expressive to be useful. The Ada 2012 specs added lots of sugar to help writing contracts : quantifiers on collections/arrays (for some, for one), if- and case- expressions (pattern matching isn't far) and expression functions... and in Spark 2014 contract-cases. I expect some more in Ada2020. Since contracts must be executable you get all the sugar for your regular Ada too.

> but the conclusion seemed to be that the proof technology wasn't so advanced and supported

That depends on the tool. OpenJML[1] just uses SMT solvers, but if you need bigger guns, Krakatoa[2] lets you use either SMT solvers or a proof assistant (either Isabelle or Coq).

[1]: http://www.openjml.org/

[2]: http://krakatoa.lri.fr/

SPARK has been used for a microkernel (Muen), crypto primitives (Skein) that caught error in C version, and DNS server (Ironsides). Might be worth mentioning in these so people can see practical work. Far as floating point, I might take your word for it on how hard it is but what's your take on work like Gappa?


Think such methods can be expanded to handle more of it like SPARK does runtime errors? Or it will stay really hard?

I'd read about gappa in here : https://hal.inria.fr/hal-01534533/document and then https://hal.archives-ouvertes.fr/hal-01522770/document . Seems they're building on the work done with gappa.

My guess (bystander and user of the tech) is that it'll stay hard for 5-10 years (slow progress of research projects, lots of lead bullets, and far more cores on our dev machines...). If you're motivated and are ready for semi-automated proof you can probably use lemmas (pre-proven properties) in Spark but if you can't find your lemmas in the Spark toolset, then good luck with Coq or Isabelle.

Some encouraging stuff is going on in the FP static analysis world though, with projects like herbie (based on sampling only IIRC), the Tocatta lab, and I've heard some good things about numalis (start-up).

Edit: seems Gappa can be integrated through why3, and seems they did some work on collabaration with alt-ergo so it seems it should be usable through Spark. Something to check in the next release :-D.

Formal methods are particularly effective for verifying hardware, because a) the state space is much smaller than for software (although it can still be prohibitively large for complex designs and brute force formal methods) b) people are actually ready to shell out some decent money for getting the verification done.

Not only is formal methods more effective for HW, the incentives are also more tilted towards HW. If you fuck up the HW design, well..you're fucked. With SW, "you can always push out an update" mentality exists. I'm not saying this to criticize SW, but this is more an example of how you conform to the ecosystem in which you exist.

Which is why people are keen on developing formal methods for smart contracts, which generally can't be updated. Plus they're a lot smaller than most software.

> b) people are actually ready to shell out some decent money for getting the verification done.

That's true, but may be changing.

> a) the state space is much smaller than for software (although it can still be prohibitively large for complex designs and brute force formal methods)

That's true as well, but that is why some formal methods -- like my favorite, TLA+, and all the sound static analysis tools (they work by abstract interpretation) -- try to tackle the problem of software verification by abstraction (in a precise mathematical sense, namely a description of the system using a sound approximation of one with fewer states). In TLA+ you can even describe your system in different levels of detail and show that they form a sound abstraction/refinement relation. This is not to say that it's a silver bullet -- it takes work and thought, and isn't appropriate for everything, but a system like TLA+ lets you at least choose the level of detail you wish to work in and so the cost you'll have to pay. The lower the level, the more confidence you get, but more work you need to do. The early signs of TLA+'s success in industry show the practice to be worthwhile in some/many very important -- and quite common -- circumstances.

>people are actually ready to shell out some decent money for getting the verification done.

True, "trial runs" for asic production cost impossibly huge amounts of money as mask sets cost so much to make even for a 3 generations old process.

Only top players can afford them, yet some of them do switch do limit themselves to extensive verification.

Another big thing is time, doing a trial run in actual silicon takes many times more time than the most extensive verification run you can afford for an equivalent amount of money on current mainstream litho process.

d) specifications are more explicit and they can be frozen. Once the module is verified it can be reused, justifying the cost.

There are only few software products that are formally verified. Some data diodes and real time operating systems (at least partially).

> There are only few software products that are formally verified.

That depends what you mean by "formally verified". If you mean verification all the way from global, high-level properties and down to program code or even machine code -- what's known as end-to-end verification -- and that the verification method is deductive proof, then you're right. But there is plenty of embedded software that's verified end-to-end with model checkers, and a growing list of popular software that's verified by various means, but not end-to-end, for example most of Amazon's web services.

Even in hardware, formal verification is only used for really small designs (like FIFOs, for example) and for very specific and simplistic designs (control logic or datapaths).

For everything else, people use the good old functional verification because more the state complexity of more complex designs simply explodes.

Heck, even a cache coherence protocol can't be formally verified. Imagine trying to verify an entire CPU...

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact