Formal proof and analysis of an incremental cycle detection algorithm

superpermutat0r · on Feb 20, 2020

I remember reading somewhere about programs being turned into generating functions and then asymptotic complexity follows from that. Not just asymptotic in the worst-case but also the best-case and average case.

It did not require a formal proof. It just turned code into generating functions.

It was probably work by Philippe Flajolet, also at Inria.

I think I also remember it being used for memory reads and writes in addition to comparison ops analysis.

Really great stuff.

Armael · on Feb 20, 2020

(author here)

There are indeed several interesting lines of work on automated approaches to complexity analysis, using various techniques. To name a few, these include the very cool work of Jan Hoffmann and others on RAML (http://raml.co), relying on a a type-based analysis, or the work of Dan Licata, Norman Danner and others on "recurrence extraction", which is closer to what you mention (a program is turned into a recurrence equation which can be in turn analyzed further to deduce its complexity).

The strengths of the approach (based on a program logic and formal proofs) that we demonstrate in the work linked here, is basically that it scales up to very complicated programs and/or very subtle complexity analysis.

Automated approaches are intrinsically limited in the class of programs they can handle (often, they either handle purely functional programs, or first-order integer-manipulating C-like programs), and in the class of complexity bounds that they can infer without human assistance (most of the time, only polynomial bounds, no log/polylog/...).

We do require a (very qualified!) human to carry out the formal proof, but in return we get a system that can handle higher-order code with mutable data structures, and subtle complexity analysis that do depend on the functional correctness of the overall algorithm. This is something that an automated tool cannot hope to achieve in a press-button fashion.

In case some people would like to know more about the various approaches to (formal) time-complexity analysis, I should mention that the related work section of my PhD thesis contains such an overview :-) (http://gallium.inria.fr/~agueneau/phd/manuscript.pdf)

anonsivalley652 · on Feb 21, 2020

Nice work.

Formal methods, automated theorem provers and safer languages have come some ways since I started in the era of staring at glowing CRTs and fiddling with keys. To me, if we can't prove we're building things on a solid foundation, we're just doing one order-of-magnitude removed from writing code without tests.

0) Prove the compiler is correct.

1) Prove the OS is correct. (seL4, for example)

2) Prove programs are correct.

I think we're rapidly approaching barriers in effort, error-detection, and cognitive abilities that even the best handful of highly-skilled homo sapiens sapiens have difficulty proving/doing/scaling. At what point (or tapering era) do we more seriously consider systems for autonomously generating code# with automated correctness, worst-case and timing analysis? We have massive, massive code-bases of Linux, LLVM, OpenBSD, etc. without formal proofs but with unknown-unknown numbers of bugs... just the other day, someone working in Rust ran into an LLVM bug involving bounds checking. (Yikes!)

# Let's first make it easy by constraining the solution-space to simplistic, imperative, SSA w/ liveness (lifetimes) logic.

chubot · on Feb 21, 2020

Not sure how related this is, but one thing I found interesting is that the recent QuickJS by Bellard uses ref counting + cycle collection.

https://bellard.org/quickjs/quickjs.html#Garbage-collection

That is, it's the same strategy that CPython uses (although CPython's detector is generational). People seem to say that CPython's design is old or suboptimal, but a brand new interpreter made the same choices.

And as the manual says, a major reason is that the interpreter can hold pointers to values without any special consideration. In contrast, v8 has to carefully wrap everything in Handle<>, and there are a bunch of elaborations on that which seem tricky to me. Then again, doing ref counting with 100% accuracy by hand also seems tricky to me.

andreareina · on Feb 21, 2020

AIUI part of the trade-off between cycle counting and tracing is ease of implementation vs performance. I also remember reading somewhere (unfortunately the where is lost to the mists of my memory) that as the reference implementation simplicity was an explicit goal of CPython, to the extent that there's little optimization done to the bytecode.