I have been using CBMC to formally verify C for six, make that seven years now (edit: time flies). The latest release is actually pretty good.
The secret to model checking is to understand that you're building and verifying a specification in code. It works best if specifications are built function by function. For each function, I'll build "shadow functions" that simplify any functions it calls with a range of potential non-deterministic behavior that matches that function's contract. In this way, formal specifications can be built from the ground up by first verifying a lower-level function and then replacing it with a shadow function that mimics its behavior in a non-deterministic but SMT solver friendly way.
There are things that model checking doesn't do well. Anything with recursion or looping that can't be easily broken down will not perform well with a model checker. While algorithms like balanced binary trees or sorting can be broken down in a way that can be model checked, other algorithms may require the sort of induction one gets with constructive proofs to sufficiently verify. There is an art to this which will lead to style changes in C and a slightly different way of building up functions from pieces that can be trivially model checked. Honestly, that's not a bad thing.
For the parts that can't be easily model checked, I'm working on a C parser library that can be formally verified, which will import C directly into Coq or Lean. I consider that the "last mile" before formal methods in firmware and system software is to the point that it's generally available.
> [...] other algorithms may require the sort of induction one gets with constructive proofs to sufficiently verify. There is an art to this which will lead to style changes in C and a slightly different way of building up functions from pieces that can be trivially model checked.
Is there a book or article that talks more about this? I.e. how to write code in a way that is more amenable to model-checking (bounded or otherwise)?
If the tool exists and has minimal overhead, I don't think it is a matter of permission but a matter of necessity. CBMC adds about 30% overhead over just unit testing, in my experience. It does require a different idiomatic programming style to use correctly, and there is a learning curve. Some intuition is required to learn how to use the tool effectively and efficiently, but this can be taught.
For system programming or firmware, it's incredibly useful. I've used this tool in conjunction with threat modeling and attack surface minimization to significantly improve the safety and security of software written in C. At this point, I would not consider working on any C project that did not make use of this tool.
Not fun, a serious tool, and extremely easy to use, unlike other formal methods. It works with your source code, not on some rewritten abstraction of it.
Embedded, automotive, space use it regularly.
I've setup cbmc and goto-analyzer for string searching algos here https://github.com/rurban/smart and really found some bugs in old published algos.
CBMC ist pretty cool, I've used it in my bachelor's thesis for verification of fairness properties in proportional voting. Another use case I've seen is using CBMC in multiparty computation. What you have there is methods that work on boolean circuits. Flange CBMC in front of this and you got multiparty computation of C code.
Wouldn't this project be more usefully done at the LLVM IR level so we could plug in any existing LLVM frontend and gain full support for language features like C++ templates?
Another problem with LLVM I’ve heard about is that it’s intermediate language or API or something is a moving, informally-specified target. People who know LLVM internals might weigh in on that claim. If true, it’s actually easier to target C or a subset of Rust just because it’s static and well-understood.
Two projects sought to mitigate these issues by going in different directions. One was a compiler backend that aimed to be easy to learn with well-specified IL. The other aimed to formalize LLVM’s IL.
There have also been typed, assembly languages to support verification from groups like FLINT. One can also compile language-specific analysis with a certified to LLVM IL compiler. Integrating pieces from different languages can have risks. That (IIRC) is being mitigated by people doing secure, abstract compilation.
Yes, the textual IR format is not an API. It's an internal implementation detail which changes somewhat rapidly. It's well specified but there's no guarantees that if you write some IR which works in LLVM 16 that it'll work as expected in LLVM 17, etc. Inside of LLVM itself IR and its related manipulation and analysis functions are "API" in so far as there's an expectation that if you change how the IR works that you won't break existing in-tree passes/that you'll work with maintainers to fix them.
I'm curious about what relevant details get thrown away in translation to LLVM IR and that we can't add back with annotations. It's a shame that it's not possible to use these analysis tools on modern C++ codebases.
This is achieved by latching on to rustc, and then compiling to gotoc, which is then used with CBMC.
Removed due to my misunderstanding:
~~I don't think Kani counts as a "formal" verification tool, emphasis on "formal", but it is a verification tool for rust code and is powered by CBMC.~~
I'm not sure what you're implying. Kani absolutely is formal verification. CBMC stands for C Bounded Model Checker and model checking is one of the main forms of formal verification.
Those are different kinds of tools. Clazy is pretty much a linter. Scan-build is a traditional static analyser. It tries to find bugs or possible bugs but it's not trying to disprove the presence of bugs.
CBMC is a formal verification tool which tries to prove that there aren't any bugs (where "bugs" means things like UB).
The secret to model checking is to understand that you're building and verifying a specification in code. It works best if specifications are built function by function. For each function, I'll build "shadow functions" that simplify any functions it calls with a range of potential non-deterministic behavior that matches that function's contract. In this way, formal specifications can be built from the ground up by first verifying a lower-level function and then replacing it with a shadow function that mimics its behavior in a non-deterministic but SMT solver friendly way.
There are things that model checking doesn't do well. Anything with recursion or looping that can't be easily broken down will not perform well with a model checker. While algorithms like balanced binary trees or sorting can be broken down in a way that can be model checked, other algorithms may require the sort of induction one gets with constructive proofs to sufficiently verify. There is an art to this which will lead to style changes in C and a slightly different way of building up functions from pieces that can be trivially model checked. Honestly, that's not a bad thing.
For the parts that can't be easily model checked, I'm working on a C parser library that can be formally verified, which will import C directly into Coq or Lean. I consider that the "last mile" before formal methods in firmware and system software is to the point that it's generally available.