Hacker News new | past | comments | ask | show | jobs | submit login
FSCQ: A formally verified crash-proof filesystem [pdf] (chlipala.net)
39 points by anishathalye on Nov 20, 2015 | hide | past | favorite | 18 comments

FSCQ is a certified filesystem written and proven correct in Coq. It's the first filesystem with a machine-checkable proof that it's implementation meets its specification and whose specification includes crashes.

The full source code of FSCQ is available online: https://github.com/mit-pdos/fscq-impl

That's pretty awesome that you could get this verified down to the implementation. We had a related project [1], where we verified a persistent storage system under restarts. Our case study was much simpler than a full fledged file system, and we didn't verify an implementation, but we did also consider hardware failures (partial writes, bitrot, etc). Our proof rules are different though, and I think pretty interesting. But some other choices that we made were in hindsight suboptimal, and some points in your paper resonated with me in a "yup, would do it this way now" way. Really nice work!

[1]: http://mynosefroze.com/pubs/fm2014.pdf

I can't help but think of a couple things:

"Beware of bugs in the above code; I have only proved it correct, not tried it." -Donald Knuth

And this bug: http://googleresearch.blogspot.com/2006/06/extra-extra-read-...

EDIT: I should have elaborated further. I did not intend to sound like I'm dismissing this work, because it's a cool step forward. It's just that this reminded me of those things that happened before.

That's what is really cool about this work -- it's not just an algorithm that's proven correct on paper, it's a formally verified _implementation_, so there's no formality gap.

There is a gap between the formal specification and reality though. We published some work (coincidentally presented right after this paper at SOSP 2015) on building a real-world mathematical model of POSIX and using it to test existing filesystems; http://sibylfs.io

Running SibylFS on Fscq showed up some limitations:

- https://github.com/mit-pdos/fscq-impl/issues (mostly basic missing POSIXisms)

- https://github.com/mit-pdos/fscq-impl/issues/2 an interesting one where the proof about truncate was incorrect in the specification and lead to very weird observable filesystem behaviour.

Copy of the SibylFS paper here for the interested; http://anil.recoil.org/papers/2015-sosp-sibylfs.pdf

Having said that, these are edges to be polished -- Fscq is an absolutely fantastic advancement in filesystems, and it shouldn't be too many years before we can have a formally verified and specified filesystem in production.

There is a gap between human understanding and reality, so any abstraction including mathematics may have a flaw, in my opinion. But refinement of that understanding is crucial to get better at building mostly-reliable systems.

I believe that if a formal proof system had knowledge of fixed width integers, it could detect the potential overflow in your googleresearch link.

Pretty middlebrow dismissal. How about educating us about something pertinent to the paper in particular, instead of seemingly just scoffing at the idea of formal verification of software in general?

It's pretty cool, but it's a little disappointing that they handwave away the possibility that block writes might not happen atomically. Formal verification is only as good as the assumptions you start from.

I think it's more work to reason about partial block writes.

The paper does have a paragraph on this:

> The `disk_write` specification states that blocks are written atomically; that is, after a crash, a block must contain either the last-written value or one of the previous values, and partial block writes are not allowed. This is a common assumption made by file systems, as long as each block is exactly one sector, and we believe it matches the behavior of many disks in practice (modern disks often have 4 KB sectors). Using CHL, we could capture the notion of partial sector writes by specifying a more complicated crash condition, but the specification shown here matches the common assumption about atomic sector writes. We leave to future work the question of how to build a certified file system without that assumption.

My comment was in no way intended as a dismissal of this work.

Those types of comments that lead off with a "by the way" of some witty quote has the effect of being dismissive in a pretty non-informative way. It's not that I think that's your intent at all; I have no doubts about your sincerity. Just telling you what effect it has, at least to my eyes. :)

Well it's not that I can't see how someone could read it that way, that's why I edited in the disclaimer.

That sounds nice to have, but isn't it not as useful without a formally verified crash-proof firmware running on the disk?

True, but the same argument could've been made about the value of verifying the firmware. Eventually there will be enough formally verified pieces to put them together into a full software stack.

Yeah.. but all the correctness in the world wont protect you from bitrot.

I think it's interesting, and would be great if something like this were applied to ZFS.

If under 'bitrot' you assume random bit flip here is the quote from that paper: << Project Zap [53,60] and Rely [8] explore using type sys- tems to mitigate transient faults (e.g., due to charged particles randomly flipping bits in a CPU) in cases when the system keeps executing after a fault occurs. These models consider possible faults at each step of a computation. The type system enables the programmer to reason about the results of running the program in the presence of faults, or to ensure that a program will correctly repeat its computation enough times to detect or mask faults. >>

That's a start, but not quite what I had in mind. I was more thinking of the case that you write 10001000, and then later, you read 10011000 back from the disk and the fs panics.

The words "checksum", "digest", or "integrity" don't appear in the paper at all.

It's probably a case of "in theory the bits you write to the disk are the bits you later read back, in practice they are not"

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact