Writing a formally-verified image browser in Coq and Haskell

throwaway2016a · on Aug 26, 2017

Non click-bait title:

"Using Coq to write formal proofs of an image browsing app written in Haskell"

While this seems interesting, in this type of application what benefit does this have over, say... unit and integration testing?

MichaelBurge · on Aug 26, 2017

My original motivation for exploring Coq was that I was working on a compiler that repeatedly applied passes until the syntax tree stopped changing. I added some mutable state(like a dirty bit and other things) for performance, and ended up with a few bugs that were hard to track down.

One of them was that the compiler would enter an infinite loop because the syntax tree wasn't getting any less complex, but did keep changing. I didn't need any tests, because just running it demonstrated the problem. And it depended on a fairly complex interplay between many passes, so trying to individually test all the permutations would be fragile.

I ended up writing a heavy-handed consistency checks that were run after every tiny update, and a tool for searching a verbose reduction log. But I made a note to later see if it was feasible to integrate termination proofs, which seemed like a useful debugging technique that I hadn't tried.

I didn't want an article on compilers, because everyone already uses Haskell and Coq to write compilers, so that topic has been done-to-death already. Plus, I figured this would make it seem more approachable. In practice, you should probably just use Rails or something for your webapps.

tjingrant · on Aug 26, 2017

Hi, have you ever tried assigning a total order to your syntax tree? Very much like problems occurring in Buchberger's algorithm and its generalized form Knuth-Bendix completion; they have a very sophisticated way of recursively assigning weights to each term (which there can be infinitely many).

tluyben2 · on Aug 26, 2017

It is a question of completeness; the proof your provide must be complete for your case while your unit tests do not have to be. What you want is that your code does what you intend to do and in languages like Coq, Idris and f* you have to provide a proof for it that covers not some subset fuzzing test (or physics) proof but a full covering (or math) proof. Simplified and this is a subject of a lot of CS research as we know it is needed.

Before I insult anyone who studied physics; the above idea that physics just tries something 1000 times and then considers it true while math makes it all covering is a thing Dijkstra used to say when teaching at TUE in Eindhoven NL and I always remembered it as it is empirical vs math.

I use the much lighter TLA+ myself as indeed only(?) French rail pays for actual formal verification in software.

mjburgess · on Aug 26, 2017

A total aside here, but this Humean view of science isn't really accurate, though often repeated.

It isnt that something happens 1000x. It's that we arrive at a causal explanation of how something happens and then this explanation is confirmed across 1000 instances, so on the basis of this causal explanation being accurate enough, it is reasonable to predict the 1001.

The difference is that in the Humean "1000x therefore 1001" case there is no reason to suppose 1001. If the world were really just these sequences it would be inexplicable. Rather, the world has a causal structure and it this that we better understand by making inferences.

cf. induction vs. abduction

logicchains · on Aug 26, 2017

>While this seems interesting, in this type of application what benefit does this have over, say... unit and integration testing?

It provides infinitely more assurance than any amount of unit/integration testing: it guarantees that the application behaves as specified, which guarantees correctness if the spec is correct (and if the spec is not correct, then testing won't help either). It also turns what's possibly the least enjoyable part of software development into the most enjoyable part.

yodsanklai · on Aug 26, 2017

> It provides infinitely more assurance than any amount of unit/integration testing

I would say it's complementary. There are sources of error that you don't capture in the formal model so you can't skip the test. The question is whether it's worth spending this verification effort for such an application. How much more confidence do you get, how much did you spend for it, and did you really need it?

> if the spec is not correct, then testing won't help either

I think that you can detect bugs in your spec when testing.

johncolanduoni · on Aug 26, 2017

> I think that you can detect bugs in your spec when testing.

It's worth noting that when you construct and prove hypotheses in mathematics you will almost always go through an initial phase of spot testing good examples/counter-examples before you attempt a proof. So these methods are still very complementary even when you turn the abstractness up to 11.

WarDaft · on Aug 28, 2017

Actually that's not true. Now that we have Phantheck, all QuickCheck tests (which are a very pleasant superclass of unit tests) can just be lifted to the type level with TH magic. This makes them dramatically more useful. This is because you cannot traditionally write a function that requires that its arguments pass some test, but you can write a function that requires they have a certain type. But now you can require arbitrary QuickCheck-able properties via types.

Note that you can still capture any property you want via non-metaprogramming types (though you may require a trusted kernel to keep things performant) but that can be trickier to work with than a test for people who are much more accustomed to tests than types.

Also note that writing a function that should work, but won't typecheck can also reveal a flaw in the model (or in what you thought the model should be, or in what kind of operations you thought should work)

noblethrasher · on Aug 26, 2017

A computer, being approximately a Universal Turing Machine, can virtualize anything, including a "better" computer.

So, a better question to ask is: "If unit and integration testing are so great, why not create a virtual system and corresponding programming language that will do nearly all of the unit and integration testing automatically."

mdpopescu · on Aug 26, 2017

I believe Microsoft's Pex [1] is an attempt to generate all the unit tests automatically. You can't do that with integration tests, because their number grows exponentially with the number of paths in the code [2].

[1] https://www.microsoft.com/en-us/research/project/pex-and-mol...

[2] http://blog.thecodewhisperer.com/permalink/integrated-tests-...

lostmsu · on Aug 26, 2017

> While this seems interesting, in this type of application what benefit does this have over, say... unit and integration testing?

There were numerous remote code execution exploits caused by improper image handling libraries. So I can see quite a value in a verified image manipulation tool.

Karrot_Kream · on Aug 26, 2017

Thanks for finally writing an article about Coq that doesn't involve making a compiler.

Question: How performant is the generated Haskell code?

Also I'd love to see an Idris article next!

MichaelBurge · on Aug 26, 2017

Oh, the performance is the worst. Just initializing the database with 6 images ran my 16GB desktop out of memory. I was able to get it running on a 128GB server I have.

The root of it seems to be that something somewhere is using Peano naturals rather than binary ones, and the image timestamps are in the billions. So I worked around it by setting all the timestamps to 1:

      timestamp <- modificationTimeHiRes <$> getFileStatus filepath
      return $ DB.MkImage DB.N0 filepath (int_to_n 1)

If I banned Peano naturals from all Set-level code and used a better extraction library, I don't see why it couldn't be as fast as Haskell. It might also be worth setting the Strict flag on Haskell for the generated code, since Coq seems designed to target ML, which is strict.

Anyone have an idea for an Idris article? Maybe "How to make a Bitcoin exchange or Ethereum contract that's 100% guaranteed not to rob you of your money?"

iso-8859-1 · on Aug 26, 2017

It's a great idea! Bitcoin needs more formal verification. I wonder how much of e.g. Lightning Network can be proven, assuming SHA256 collisions can't be found, miners include transactions and signatures are unfalsifiable and not malleable. Roasbeef wanted to do it in TLA+, but I am not sure he will find the time.

sddfd · on Aug 26, 2017

Why not use a binary number representation?

They are in the library: https://coq.inria.fr/distrib/current/stdlib/Coq.NArith.BinNa...

If you use Z, even the Omega tactic (solver for presburger arithmetic) should work.

MichaelBurge · on Aug 26, 2017

(* edit: profquail below points out that the Integer variant of the Z extraction module has a less strongly-worded disclaimer. Just importing it would probably fix the described memory issue. *)

https://github.com/coq/coq/blob/307f08d2ad2aca5d48441394342a...

profquail · on Aug 26, 2017

Can it extract [Z] to some bigint implementation based on gmp? That should still be safe but provide better performance than using Peano arithmetic.

pron · on Aug 28, 2017

> Maybe "How to make a Bitcoin exchange or Ethereum contract that's 100% guaranteed not to rob you of your money?"

I think you'll find that specifying that and proving it is orders of magnitude more complicated and costly than the example in this post. I think that at this point, the most optimistic estimates are that producing software like that (meaning, verifying global correctness properties) would require about 1 day per line of code (and we're not there yet, but 2-5x that), and that is provided that the overall program is no more than a few thousand lines and uses just the simplest algorithms possible.

There are other formal methods that allow cheaper verification for the cost of weaker evidence of correctness, like test generation from specifications, model checking of small, finite instances etc.. It is an open question how much confidence formal proofs actually add. For example, while model checking has found errors in informal (although peer-reviewed) proofs several times, I am not aware of cases where a program verified with a model checker on small instances turned out to be incorrect.

oh_sigh · on Aug 26, 2017

Is there a reason coq can't fake peano naturals with a much more performant substitute?

sddfd · on Aug 26, 2017

It can. He didn't bother to do it.

Karrot_Kream · on Aug 26, 2017

Can you direct me to docs that show how to do this?

logicchains · on Aug 26, 2017

Did you consider using dependent types to define the image database in such a way that it's impossible to create one that's not InternallyConsistent? This approach can often make proofs shorter/simpler.

reuben364 · on Aug 26, 2017

Coq isn't very good for developing with properties encoded in the types. Such properties are defined seperately from the data structures involved.

reificator · on Aug 26, 2017

Look, HN is something that I can justify reading at work, because it's often a great source of both technical articles as well as industry trends.

Yesterday I saw this article and the name was exactly what the article was about. I scrolled past and had a laugh, no issues.

Today with the short sighted name change it's now an article I would probably click on out of interest, and before you know it I've got a porn tutorial on my screen for the world to see.

IMO this article belongs here because it's an interesting approach to an interesting problem. Could the author have chosen a better name/gimmick? Absolutely. But as long as the link reflects the content within, no harm no foul.

But changing the name to obfuscate the content within is a reckless choice that I have no respect for. Leave it or delete it, don't make it a gamble to click a link by changing the title.

mcguire · on Aug 27, 2017

* Fixpoint find_categories_ids: Uses "find_category_ids"; this is missing.

* newDb version 2 needs a period.

* untag_image needs a period.

* Theorem preserves_consistency_1: The module MF containing empty_in_iff seems to be defined in the discussion of Theorem preserves_consistency_2.

* Require Import Coq.FSets.FMapFacts requires period.

MichaelBurge · on Aug 27, 2017

Thanks for reporting these; I've corrected them on the article.

INTPenis · on Aug 26, 2017

Interesting of course but the clickbait title "Porn Browser" made me instantly think of Firefox Focus.

I have no idea what build techniques they use but it's definitely a good porn browser in the sense that you can use the family tablet safely and easily with it.

nickpsecurity · on Aug 26, 2017

Alright, what I like about this article is it shows the process of specing out the application in a way people new to verification will understand. It's also a useful kind of app instead of a toy language or something. A few critiques follow:

" It’s an unspoken truth that most large image databases people have are actually porn collections"

Flicker, Tumblr, Imgur, and Facebook Images aren't mostly porn even though one or two are popular for it. So, I doubt that highly. The title will just turn off a significant amount of readers, even thick-skin types whose company monitors web surfing. The new one is better.

"We’re going to make an image browser."

"Hopefully this example shows that there’s nothing really stopping anyone from using Coq in their Haskell programs today."

I hate saying this since the article is well-done but this isn't a demonstration of a verified, image browser in Coq. A verified app is usually considered successful when its extracted code at least runs well enough to be a reference implementation to check others against. The comment in this thread says 6 images kills it on a 16GB RAM machine (?!). It doesn't meet the functional requirement of storing an image collection. It's just an attempt at a verified, image browser that's taken steps toward one that works.

Another thing for readers to consider is that making things efficient is a big part of Coq work aimed at real-world applications. The developers usually have to refine the abstract specifications into concrete forms that run efficiently on computers. That takes additional work that looks quite tedious as a non-specialist in formal methods. So, there's two correctness criteria after the thing passes testing (this didn't): it works fast enough to be acceptable in some use case; it works fast enough to be competitive with popular implementations in (insert language).

The author might want to build on this example to get it to full correctness with performance at least meeting the first criteria. Then, potentially doing the second to show refinement proof toward efficient, concrete representation. I know Isabelle/HOL is currently really strong in the low-level stuff with some tools synthesizing imperative implementations from functional specs and others verifying assembly. It would be interesting to see what Coq ecosystem can do in automating or reducing effort on similar things.

"Using GHCJS on the resulting Haskell to get formally-verified Javascript"

Since GHCJS isn't verified, you can't get formally-verified JavaScript by compiling a verified Coq program with it. If anything, you break the verification chain since complex compilers screw things up a lot. Add abstraction mis-matches from cross-compilation to increase chance of a mistake further. I think the accurate statement would just be that using GHCJS would make the program run in a browser or on a JS engine. No longer formally-verified, though.

tempodox · on Aug 26, 2017

So there is a way to formally verify porn? Theresa May might be interested in that.

Simon_says · on Aug 26, 2017

Probably not, since the exact same photo can be classified as porn or not based on the collection of other photos it's grouped with.

KekDemaga · on Aug 26, 2017

”I'll know it when I see it, or when the breast coefficient multiplied by the tumescence index is greater than 9."

sweep4r · on Aug 26, 2017

Writing a porn browser in Coq seems appropriate.

dominotw · on Aug 26, 2017

do americans call their penis cock/dick and vagina pussy when they are having 'sexytime' ? I find saying 'suck my cock' too immature and comical but at the same time 'suck my penis' sounds too clinical and a mood killer.

wtf_is_up · on Aug 26, 2017

I prefer to not speak English at all, but to just grunt and make gestures like a chimpanzee.

on Aug 26, 2017

[dead]

damnfine · on Aug 26, 2017

You knew these would be downvoted and even made an account name specifically refrencing why. And after all that, a super low effort comment, even for a coq joke. You didn't even try to be humorous or clever...

Why bother?

mcguire · on Aug 26, 2017

Sometimes, the effort is its own reward.

lsh · on Aug 26, 2017

+1 - hooray for porn and Haskell

dang · on Aug 27, 2017

Would you please stop posting unsubstantive comments to HN? We're trying for higher quality than that and I'm afraid you've been setting a bad example.

https://news.ycombinator.com/newsguidelines.html