But you can automatically derive a lot of information, e.g. after if (!x) return...

DannyBee · on Feb 20, 2023

Sound (key word here) analysis of this form, which is known flow-sensitive, is remarkably slow. Soundness is required if you want "provable" rather than "99.99%" correct.

So much so that we have even had scalable context-sensitive interprocedural analysis for at least a decade longer than we have ever scalable flow-sensitive analysis in most contexts.

For example, pointer analysis, which matters here, is ni that class.

It is easy to find context sensitive flow-insensitive pointer analysis that scales very well, and has been easy for >1 decade.

flow-sensitive -- still much harder in practice. A few papers here and there, but ...

You can do halfway flow sensitive analysis with SSA and get it in a reasonable time bound but it's not the same for something like this.

This is more of a problem than something like the standard library handling, because the standard library is a library and can be pre-computed into summaries, and the relevant summaries change very infrequently

(IE to the original example, how much of the buffer read overwrites does not change every glibc release)

Theses sorts of annotation based schemes are also not new - we've also had summary based algorithms that can do this sort of thing for decades.

The algorithmic capabilities here have not really changed, and the improvement in computing power has not helped enough because lots of the sound algorithms are, worst case, cubic or exponential.

In the end just remember if it was cheap, easy, and usable, it wouldn't be a problem, because it would already be done.

aatd86 · on Feb 20, 2023

I don't know if sound analysis (as in no false negatives or also precise as in no false positives) is just slow or if it depends on the language constructs and the amount of flow sensitive mutable state that is being tracked.

Do you have any references on why that would be slow? That's of interest to me :)

Also, perhaps is there a distinction to be made between data flow and control flow sensitivity?

Seems like these kinds of analyses are increasingly added to languages for nullability tracking so it might not necessarily be that slow. Especially if the type system is of help so that the analysis can be incremental, per compilation unit.

xavxav · on Feb 20, 2023

My personal experience has shown me that static analysis of this kind does not scale for C, the complexity of showing just the memory safety of even simple programs grows astonishingly fast.

Proof overhead for memory safety can frequently be on the order of 5-10 lines of annotation per line of code.

zozbot234 · on Feb 20, 2023

This is exactly why the Rust folks chose the kind of analysis that they did for the borrow checker. It's quite co-extensive with the subset of memory-safety proofs that happen to scale up easily and compositionally to larger programs.

That said, it would still be very helpful to have true first-class support for the GhostCell pattern in Rust, so as to be able to prove safety for somewhat more complex designs (including safe doubly-linked lists) albeit contained within a single module.

alkonaut · on Feb 20, 2023

Yeah, ask yourself, if you see

    if (!x) 
      return;

    // statment here

    do_something(x);

Which statements on the line saying "statement here" would invalidate the zero check for x?

pjmlp · on Feb 20, 2023

Now add threading, hardware traps and signals into the mix.

pron · on Feb 20, 2023

That greatly depends on how many memory-safe programs you want to prove are memory safe. If it's the same subset of programs that a language like Rust can prove to be memory-safe, the annotation overload need not be higher. If you want to prove that an arbitrary memory-safe C program is memory-safe, then, indeed, you'll likely need many more annotations. Even in those situations, adding more annotations or restructuring the code may be cheaper than a rewrite in another language. Sound static analysis for C is already not uncommon in some important domains.

uecker · on Feb 22, 2023

Thanks. This is exactly my point. Instead of Rust, we could use a subset of C with suitable annotations and will not require any more annotations than Rust.