
FSCQ: A formally verified crash-proof filesystem [pdf] - anishathalye
http://adam.chlipala.net/papers/FscqSOSP15/FscqSOSP15.pdf
======
anishathalye
FSCQ is a certified filesystem written and proven correct in Coq. It's the
first filesystem with a machine-checkable proof that it's implementation meets
its specification and whose specification includes crashes.

The full source code of FSCQ is available online: [https://github.com/mit-
pdos/fscq-impl](https://github.com/mit-pdos/fscq-impl)

~~~
oggy
That's pretty awesome that you could get this verified down to the
implementation. We had a related project [1], where we verified a persistent
storage system under restarts. Our case study was much simpler than a full
fledged file system, and we didn't verify an implementation, but we did also
consider hardware failures (partial writes, bitrot, etc). Our proof rules are
different though, and I think pretty interesting. But some other choices that
we made were in hindsight suboptimal, and some points in your paper resonated
with me in a "yup, would do it this way now" way. Really nice work!

[1]:
[http://mynosefroze.com/pubs/fm2014.pdf](http://mynosefroze.com/pubs/fm2014.pdf)

------
Natsu
I can't help but think of a couple things:

"Beware of bugs in the above code; I have only proved it correct, not tried
it." -Donald Knuth

And this bug: [http://googleresearch.blogspot.com/2006/06/extra-extra-
read-...](http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-
about-it-nearly.html)

EDIT: I should have elaborated further. I did not intend to sound like I'm
dismissing this work, because it's a cool step forward. It's just that this
reminded me of those things that happened before.

~~~
throwaway999888
Pretty middlebrow dismissal. How about educating us about something pertinent
to the paper _in particular_ , instead of seemingly just scoffing at the idea
of formal verification of software _in general_?

~~~
tokenrove
It's pretty cool, but it's a little disappointing that they handwave away the
possibility that block writes might not happen atomically. Formal verification
is only as good as the assumptions you start from.

~~~
anishathalye
I think it's more work to reason about partial block writes.

The paper does have a paragraph on this:

> The `disk_write` specification states that blocks are written atomically;
> that is, after a crash, a block must contain either the last-written value
> or one of the previous values, and partial block writes are not allowed.
> This is a common assumption made by file systems, as long as each block is
> exactly one sector, and we believe it matches the behavior of many disks in
> practice (modern disks often have 4 KB sectors). Using CHL, we could capture
> the notion of partial sector writes by specifying a more complicated crash
> condition, but the specification shown here matches the common assumption
> about atomic sector writes. We leave to future work the question of how to
> build a certified file system without that assumption.

------
justinsaccount
That sounds nice to have, but isn't it not as useful without a formally
verified crash-proof firmware running on the disk?

~~~
jimrandomh
True, but the same argument could've been made about the value of verifying
the firmware. Eventually there will be enough formally verified pieces to put
them together into a full software stack.

~~~
justinsaccount
Yeah.. but all the correctness in the world wont protect you from bitrot.

I think it's interesting, and would be great if something like this were
applied to ZFS.

~~~
unboxed_type
If under 'bitrot' you assume random bit flip here is the quote from that
paper: << Project Zap [53,60] and Rely [8] explore using type sys- tems to
mitigate transient faults (e.g., due to charged particles randomly flipping
bits in a CPU) in cases when the system keeps executing after a fault occurs.
These models consider possible faults at each step of a computation. The type
system enables the programmer to reason about the results of running the
program in the presence of faults, or to ensure that a program will correctly
repeat its computation enough times to detect or mask faults. >>

~~~
justinsaccount
That's a start, but not quite what I had in mind. I was more thinking of the
case that you write 10001000, and then later, you read 10011000 back from the
disk and the fs panics.

The words "checksum", "digest", or "integrity" don't appear in the paper at
all.

It's probably a case of "in theory the bits you write to the disk are the bits
you later read back, in practice they are not"

