I wonder if it is a problem of the research being fairly new, or a lack of software developers inside Microsoft willing to replace their existing code and learn new tools and languages.
Also, in general most papers are more useful for writing further papers to improve on their limitations (until something's great), than to be adopted right away. This one is pretty cool, but both the conclusion and the benchmarks set out significant challenges for further research. After all, this is the first OS using automated verification, not the last.
> a lack of software developers inside Microsoft willing to replace their existing code and learn new tools and languages.
I'm sure research on the topic hasn't ended, so should they switch now or wait for more research? Or wait for research on verifying existing code? And if a new paper comes out, should they switch again? Also, do universities train enough developers to use such technology?
- Windows 7 kernel reorganization
- MDIL .NET AOT compiler for Windows 8 and .NET Native for UWP
- async/await for .NET
- generics for .NET
- UWP based on COM (original idea was called ExtVOS before .NET existed)
- Task Programming Library for .NET
- Pico-processes on Windows 10
- Secure kernel on Windows 10
- Hyper-V based containers
- Driver verification framework using a theorem proving framework
- The P language used for writing Windows 8 USB drivers
I guess they just don't adopt more due to management, and how the windev and devtools relate to each other in regards to technology adoption.
The most surprised I've been was when they started using P lang for drivers (which is also used for verification, so your last two points fall under a single point if I'm not wrong). This was the first time I've seen them using a new language/tool rather than make additions like the rest of the points you made. (Also FPGAs in their data centers but that's a different topic)
I'm not deriding their decision to do so, just an observation. Google's X does research which can possibly be an actual product, and I guess I expected Microsoft to do the same. Using such cutting-edge research and include it in their products might earn them cool points from HN, but I'd say the management is perfectly content with the numbers in front of them. OTOH, I doubt any manager is willing to take the blame if their initiative fails in a released product. Microsoft has enough bugs that people complain about as it is (refer to the front page submission about performance)
Research on ExtVOS:
- UWP based on COM (.NET was to be originally based on COM, not CLR which came to be due to Java's influence)
By products of Singularity and Midori projects:
By products of DrawBridge and IronClad projects:
By products of Dafney, Z3 and theorem proving research:
Any links on the driver verification framework? I've only seen people talk about using z3 for hardware verification but didn't know about MS doing it and talking about it publicly (could only find this: https://pdfs.semanticscholar.org/b100/b1562ba0e75191d29d4c15...)
"SMT in Verification, Modeling, and Testing at Microsoft"
"Efficient evaluation of pointer predicates with Z3
SMT Solver in SLAM2"
"Tools for Verifying Drivers"
Whatever happened to that? Several papers in 2010, then nothing. Wikipedia says there was a 2013 release.
Microsoft had a later project, Ironclad (2014) which built on that to build some simple applications. That's an active project at Microsoft.
Isn't the simple answer "Microsoft had a later project, Ironclad (2014) which built on that to build some simple applications."?
Chris Hawblitzel worked on both projects.
However, I wonder about your claim "That's an active project at Microsoft". Last publication on that page is from 2015, and last fit commit is from a year ago.
We got .NET Native, async/await, W10 pico-processes, secure kernel,windows internal refactoring... out of those projects, but they still feel they could have been much more with the right amount of management support.
"I know my operating system kernel will never be proven correct."
1. They're not proving anything relevant to current software development.
That's a joke to anyone following formal verification. Key protocols, microkernels, hypervisors, safe concurrency, CompCert compiler, part of NaCl, Microsoft's static verification for device drivers... the list gets bigger every year. Quite a few are already practical.
2. I know my operating system will never be proven correct.
That's probably true if he uses one not designed for verification. The high-assurance field gave up on UNIX after prototypes like UCLA Secure UNIX and even Mach work showed it was inherently too complex and leak-prone to secure in a high-assurance way. They moved to security kernels, separation kernels, language runtimes, trustworthy hardware, and so on. Each time, tried to make the trusted part as correct and secure as possible. If they couldn't, a recovery option was added.
So, we have numerous secure and high-uptime systems deployed whose methods were published even if source wasn't. He or others can always build on these if they want to replicate the results. His OS will at least be secure against most attacks, quite reliable in practice, and most errors recoverable with the rest not leading to data loss if good backup/restore plan. Good enough for me. :)
I know that's technically true, but not helpful, and that's why this line of thinking is becoming unfashionable.
As a researcher in Programming Languages, who publishes and has reviewed code:
By that argument, there's no need to publish CompCert or Coq, which seems ridiculous. Publishing code can help adoption even if the code gets rewritten.
Publications without code often fail replication for many reasons (not necessarily involving foul play or even errors). I can't document all tricks needed to get the performance results in 20 pages. And replication in another scenario might show the results don't generalize easily (and nobody knew). So having sources provides the extra confidence you need before bothering.
And compared to other publications without code, the needed expertise seems much less common.
OTOH, papers do get used by the research community, possibly steering it in some direction (including then wrong one). And a few papers matter enough that people will attempt replication anyway (and learn if they work).
When the results of a paper are "here's what we tried and got, new ideas are needed" (honestly, many papers on hard problems) then maybe sources don't matter.
Actually, let's look at not publishing CompCert's source as an example. Under your theory, I can't learn anything at all from just the paper or binary. Here's what I think I can learn from it:
1. They claim they verified most of C language in a compiler down to the assembly in Coq. John Regehr's testing confirms it has the correctness in terms of defects detected that one would expect in a verified compiler. So, now I know (a) they did do a verified compiler for C, (b) the methods they describe might help other formal developers in their verifying compiler work (give ideas at minimum), (c) a successful verification of a C compiler clears the way for more projects/funding on C or other compilers, and (d) anyone needing a verified C compiler has a supplier, AbsInt, whose work was rigorously evaluated.
2. Building on former, the fact that the compiler will produce correct output more than most compilers means it can be used to check correctness of highly-optimizing compilers. Automated testing can be directed at an optimizing compiler and CompCert where the output of programs run through each is checked for equivalence. If not equal, probably a bug in the unverified one but maybe CompCert itself. Likewise, I considered re-running a pile of test cases for a C program that were showing bugs through CompCert version of that same program. Any bugs disappearing might mean those were compiler bugs. Help narrow things down.
3. A key method for both CompCert and CakeML was using a series of intermediate languages working from highest to lowest level w/ equivalence checks on each pass. People doing formal or informal methods for developing compilers can then apply that technique, maybe even those DSL's, to see if it improves their own work. They might start creating their own DSL's. If Racket and Red are any hint, just giving the ability to do DSL's with worked examples can lead to productivity or correctness improvements in many types of software.
Let's look at another example from high-assurance security. One of the demonstrators was closed-source GEMSOS. Its design and assurance methods are described here:
I used such papers to develop solutions that were highly resistant to penetration because the techniques supporting them naturally lead to that. Especially little code in TCB, using FSM's to brute-force the analysis, input validation, avoiding pointers much as possible, safer language, and so on. Difference between popular/mainstream INFOSEC and what I learned from INFOSEC's founders in high-assurance was mind-blowing. Common example is covert-channel analaysis of OS's is almost non-existent but Kemmerer's paper tells you how (Shared Resource Matrix). Likewise, highly-secure repo's are uncommon but Wheeler's survey (SCM Security) tells you how to fix it. Heck, I even learned to make sure my stuff could be formally verified later by just reading what was easy, moderately difficult, or hard in formal methods papers. I called Design for Verification which I later found was similar to how hardware engineers do things (i.e. DFT, DFM). I learned that from reading on Cleanroom methodology where they verified simplified code in box structures by eye before building for test runs.
So, I reject based on worked examples the claim that it's "not helpful" to have papers w/out source code that describe worked examples in enough detail to replicate for critical review or use for new project. It's been very helpful to all kinds of people. I also agree with you that there's many reasons to prefer source and/or data used in research over black boxes. It's good they published the source of CompCert. Even better that CakeML is fully open versus CompCert's restrictions. As I hinted at in 1, other teams started using CompCert and CakeML as backend infrastructure for their own work on other languages. Them being commercial w/ free use for researchers would probably have reduced adoption but I bet there still would be some if their DSL's were published. People could still do useful things, push state-of-the-art, and publish the work with much fan fare.
So, still benefit even if not maximal benefit to have detailed papers with no source.
(I know Python3 has some optional gradual typing. And I know that Python ain't very functional---especially if you go by how Guido wants Python to be written. But I still don't see the connection---apart from that functional programming and academic theory on typing go hand-in-hand.)