More on my website: https://dependenttyp.es/
I hope more people work on this problem. It is a big and promising space. And I'll need students soon :)
I am a regular software engineer with a background in math and CS. The most relevant experience I have is some tinkering with Haskell.
Do you have any suggestions for how a full time employee can engage in this world and possibly start doing research?
If you're curious, you can get a sense of what we do here: https://galois.com/blog/2021/02/2020-year-in-review/
The tool we use most often for proofs is called SAW: https://saw.galois.com/
That said, I think the final point is the most important. Making proof repair more magical will just intensify the black-box nature of proofs to the programmer - and the more divorced specification and proof are from the people who actually write the code, the more likely that errors will slip in, not from bad code, but from bad specs instead.
IMO, the most viable path forward for "proofs which repair themselves" is proofs that the average programmer can understand - at which point, their repair will be as easy as writing the code, and you wouldn't need to worry about complex self-repair techniques at all.
I think there's a tight connection between 'proofs the programmer can understand' and self-repair / automation. The aim should really be to clear away a lot of the scaffolding that's currently needed and hand it off to solvers, and leave programmers to do what they are best at, i.e. understand the meaning of the code. Type checking is a great example of a lightweight formal method that disguises most of the sophisticated automation hiding behind the scenes.
With good proof repair techniques you can fix the proof almost fully automatically in a way that mimics the change in the program, but then just ask the programmer a few questions to give the needed creativity to fix the proofs. As far as easy to understand goes, then, that "just" comes down to good UX for asking those questions. (Still not solved!)
Here is the state of the art right now (my most recent paper): https://arxiv.org/abs/2010.00774
Lots left to do but the dream is quite realistic.
Do you believe that every computer-generated human-illegible proof has an analogous human-legible proof?
It's my impression that much of the point of computational provers, is finding solutions to questions that humans would never otherwise answer in a million years, because the only proof of the conjecture (of a practical length/complexity†) is one structured in ways ill-suited to human mental processes.
† There might be a human-legible proof, but at 100x or 1000x the length of the human-illegible proof. It might be beyond human capacity to ever deduce such a proof; and it would also be beyond the capacity of the algorithms used in current provers to generate it. It would be "out of reach."
While there are examples of mechanised proofs that reach the "limits" of human proof, they do so in domains where computers are good and humans are less good, like exhaustiveness checks for e.g. the four-colour theorem.
It should be pointed out that automatic provers are surprisingly stupid. Exhaustive searches do not scale well for large and complicated proofs, and computers obviously lack the "intuition" that mathematicians lean on to prove. Lots of the work that goes into computer-assisted proofs is just bookwork for properties that humans would consider trivial.
Even the OP's article is not concerned with computers generating proofs automatically (though lots of work has gone into that, especially w.r.t. code), but instead with computers repairing human-written proofs to persist them across small changes.
The thing that most certainly argues that the proofs involved in the formal verification of software are within human reach is simply that humans make such ad-hoc proofs all the time - because they are very similar to the non-formal specifications that humans use to write code in the first place. Any programmer who can argue why their code runs correctly is most of the way towards a proof of its correctness - the real obstacle is not the complexity of the resulting proof, but the tediousness of its expression.
I'm not all that familiar with generated proofs, but from those I've seen, they will point our pedantic linkages other proofs might avoid. I suspect that can easily be overcome though be building-in aesthetic conventions in the proof presentation layer, such as we do today with ASTs and compiler optimizations. Wild speculation: automated proofs use assembly constructs and goto's when they could be using higher order constructs - but this isn't always clear because there isn't a separation of abstraction layers (or a least a a clearly articulated one).
I could imagine a project that regenerated many well known proofs (these exist in form in the solvers already, but these are intentionally redone) without some of the existing heuristics abridgments that exist in the solvers. Then pair them with the human curated proofs of oft-considered elegance and train a system to convert from one to the other.
I think there’s a possibility that’s cultural. Generations of chess players grew up learning certain heuristics.
Now, computers show those heuristics aren’t as good as we thought they were. It’s possible that new generations, trained more by computers than by older generations of chess players, will develop new heuristics.
You see something similar in math. Sometimes, when a human discovers a new way to look at a problem, whole avenues of proofs open up, and what was incomprehensible becomes much easier not only for that human, but also for those learning about his work.
On the other hand, I think computers sometimes make moves humans won’t make because they don’t have any notion of fear.
When humans get into a minefield position where almost every move is clearly losing, but that do have one winning path through it, they may not choose that path even if they see it because they fear they’re overlooking something.
Although it also depends to some extent on your definition of "suitable to human mental processes"... for instance, the famous proof that produced Graham's number is not "unsuitable to human mental processes" in the meta sense, but if you try to directly apprehend Graham's number with human mental processes, you fail. You can comprehend it and manipulate it mathematically, but you can not apprehend it.
One thing I will say is that proof is usually an expensive technique, and not just for the reasons in my post. Even setting up a proof can be demanding. So it's worth asking where in the project it's worth applying this kind of assurance? For example, Galois often applied proofs to cryptography, which are security critical and self-contained in small pieces of code. Proof and formal methods aren't one tool, but rather a toolbox that can solve different assurance problems. There's a big difference from a scalable static analysis like Infer and a fine-grained proof tool like Coq or Galois' SAW tool.
One easy place that formal methods can be useful is in modeling features of a project design for consistency. E.g. this is useful for protocols, state machines and similar. This means you can 'debug your designs' before building them into software. If that sounds like it might useful, I would suggest taking a look at Hillel Wayne's website: https://www.hillelwayne.com
Sadly most of us don't get to rewrite or greenfield stuff, and although Spark is designed to be very progressive and modular, the proof of code that wasn't "written with proof in mind" is still quite the effort.
In time, I hope and expect that the RustBelt project will become a practical tool to prove Rust programs correct. It's already found some bugs in the standard library, and the main focus is currently to firm up the semantics of the language. Its separation logic engine (Iris) builds on work from Infer.
Lastly, there are less mainstream languages that are better adapted to formal proving. To pick one, I'd probably go with Idris, as it is probably best at being a general purpose programming language in addition to a mathematical tool. But there are others. Lean is exciting because it is being used as the basis of an ambitious project to formalize mathematics, and its latest version (Lean 4) is trying to be a better programming language. Coq is mature and has been used for a lot of formal proving work (it underlies Iris, mentioned above), but is not a great executable programming language. ATS 2 generates extremely efficient code (it often wins benchmarks) but is inscrutable - probably only a handful of people in the world can write nontrivial programs in it.
The related work section of the summary paper also has a rather extensive list of projects using separation logic, though i'll note I didn't see Infer , listed.
Doing proofs is so expensive, it is better to optimise your language for your proofs, than the other way around.