:-) Well, gladly! I worked on combining Interactive Debugging with Runtime Verif...

:-)

Well, gladly!

I worked on combining Interactive Debugging with Runtime Verification (and we called it Interactive Runtime Verification). Interactive debugging is the usual step by step debugging you know, offered by your standard IDE or debugger like GDB.

Runtime Verification evaluates properties from a stream of events, the program execution being usually seen as a sequence of events (there are ways to represent concurrency / parallel executions too, a bit more complicated than a mere sequence in this case).

GDB offers a comprehensive Python interface [1] that lets you add breakpoints, watchpoints and catchpoints to inspect what's going on during the execution, and you decide whether to suspend the execution on conditions you evaluate. In our cases, the conditions are the runtime verification properties. See this like conditional breakpoints on steroids.

For instance, take a program that manipulates a queue. One property you don't want to break is "the program never remove an element from an empty list". Another one is "the program never pushes elements to an already full list". To evaluate these properties, you describe the property in terms of parameterized calls to functions new_list(n), push() and pop(). In a C program, the calls could "succeed", the execution continues but with a memory corruption. Which could be hard to detect. So instead, you have this runtime verification stop the execution for you at the point where the bug happens, instead of the likely crash.

So we implemented this idea, with features like checkpointing (thanks to the (limited) native experimental GDB checkpointing feature, or CRIU (Checkpoint and Restore in Userspace - the thing behind Docker container checkpointing and migration) [2], which does this very well) as a Python extension to GDB. This implementation theoretically supported any language supported by GDB, including C and C++.

Building on this implementation, we made a second version that supported distributed programs, and also supported Java programs. The debugging capabilities of the JVM are awesome / endless.

These implementations allowed benchmarking. Unfortunately, we focused on benchmarking the performance of the instrumentation, but what would have been very useful is to validate the method itself with human studies. This PhD should have been at the edge between computer science and human studies. It's not. Instead, it's very theoretical, with some formalism trying to define an interactively verified execution, and doing proofs that the instrumentation does not affect the execution.

Sounds amazing? What's the catch? Finding actual properties that relate to day to day bugs is hard. When developing a program, specifications are not given as LTL properties by the client, it's more like "I need this, be creative", and it's team work, and more often than not bugs are created by misunderstandings, moving targets, not having the full context when making changes, unforeseen interaction between more or less complex, more or less supposedly unrelated stuff, etc. Those can probably be seen as formal property violations but that's probably a bit far fetched. I guess a bunch of researchers loving formalism or performance evaluation having no clue about software development were not going to go very far in this. Still, it's a topic probably worth researching nonetheless.

See:

- the git repository of the implementation (it's free software) [3]

- the first "serious" paper presenting the idea [4]

- the PhD manuscript if you like verbosity. It's also the only place where we speak about the distributed version, I could not manage to publish this part [5]

[1] https://sourceware.org/gdb/current/onlinedocs/gdb.html/Pytho...

[2] https://criu.org/

[3] https://gitlab.inria.fr/monitoring/verde

[4] https://hal.inria.fr/hal-01592671v1/file/ieee.pdf

[5] https://theses.hal.science/tel-02460734