Hacker News new | past | comments | ask | show | jobs | submit login
Advent of Code 2018, in Coq (github.com)
149 points by szemet 64 days ago | hide | past | web | favorite | 43 comments



Hello there. I'm the author of this repo.

Not as much has been going on in that project as I would have liked. It certainly doesn't help that:

- The standard library is actually quite poor. A lot of basic functions must be implemented from scratch, and then you'll also want to write proofs about them.

- I had to reinvent IO in Coq.

- Most AOC problems are actually not that well-suited for formal verification. (It's certainly no fault of the AOC organizer!)

In particular, a lot of them basically ask you to simulate some system which doesn't have particularly interesting properties, so the implementation itself is often the best specification of the problem, and the main point of using a language like Coq is lost.

I did find some interesting problems though. For example, day 5 was a simple rewriting system and the question was to find the normal form of some starting string. Well, who's to say that the answer is unique? In this case it's easy enough to convince oneself that the given system actually is confluent, but I set out to prove it formally (and (re)learned a few things about rewriting systems on the way). (c.f., Confluent_react_steps, react_steps_injective, in day_05_common.v)


I never liked Coq's name because I'm immature, but it really is magic software in my eyes. The level of downright objectivity that you get with it makes it feel so bulletproof.

I know functional languages are slowly starting to adopt dependent types, and you can get reasonable certainty of correctness with property-based testing, but Coq just feels kind of universal. If the proof passes, so will your code.

I really wish I could get my employer to sign off on me using it for a real project.


Good enough, with bugs is what makes the real money unfortunately.


I don’t really like this trope.

The goal of a programming in Coq would, I suppose, be to shift debugging time from after compilation succeeds to before, with the hope that the compiler will direct one towards the bugs and they will be easier to find. I don’t think this is always a win. I think it is a win if what you write must be very reliable, but it seems less obvious that the advantage would still exist as the required reliability decreases.

Remember that bug Coq had when if you used more than 256 parameters in a constructor you could prove anything?

I suppose one could say Coq is good enough, with bugs.


What I would be terrified of in Coq is some tiny change in business requirements ends up pulling the linchpin out of some fundamental premise way down at the bottom of your stack of proofs, this forcing you to go and fix everything. Sort of the mother of all brittle testing scenarios.

Maybe that's FUD on my part; I've only read Coq code and never written any of my own.

In any case, at least the way I'm thinking of it, it's not that good enough with bugs is what makes the money. It's more a question of proportionality: It doesn't make sense to spend $100 to guard against a $5 bug. I don't think Coq leaves much room for being pragmatic about test coverage like that.


"What I would be terrified of in Coq is some tiny change in business requirements ends up pulling the linchpin out of some fundamental premise way down at the bottom of your stack of proofs, this forcing you to go and fix everything."

I'm still wondering how often that will be a problem if high-assurance methods get more adoption. I'll note it's mitigated a lot by the advice we give about where to apply formal methods:

1. Highly critical stuff.

2. That doesn't change at a fast pace.

3. That's known to be verifiable with existing methods in the time frame needed. As in, you want pre-verified components or it to be really similar to past projects.

One thing to support No 3 is to have straight-forward, sequential code that's composed in a hierarchical way with clearly-specified interfaces. You'll mostly be just be verifying behavior of individual components. A change at the bottom will be isolated into that module. Then, you prove the integration of whatever calls it. Then what calls that. With the proper structure, you dramatically reduce the work required with the reverification.

" I don't think Coq leaves much room for being pragmatic about test coverage like that."

People that use it often try to use it for everything because formal verification is their job. In industry, you use it where it makes sense. That might be most critical component which is also pretty simple. Actually, most in industry use model-checkers (i.e. SPIN, TLA+) and languages with solvers (i.e. Frama-C, SPARK Ada) for verification. That minimizes their work. The most-adopted methods, though, for the $5 bugs are software inspections (i.e. checklists), static analyzers, test generation from source, and fuzzing. I strongly recommend using them in order of lowest-cost and fastest results to highest-cost, hardest results to avoid wasting time.


> What I would be terrified of in Coq is some tiny change in business requirements ends up pulling the linchpin out of some fundamental premise way down at the bottom of your stack of proofs, this forcing you to go and fix everything. Sort of the mother of all brittle testing scenarios.

Isn't the same true for static typing? One small business requirement changes, you change a foundational type, and suddenly all your code needs to adapt.

Of course this is true with dynamic typing too, but at least you don't know.


> Isn't the same true for static typing? One small business requirement changes, you change a foundational type

Business changes are usually changes in logic, not in types. So it's mostly "our cashback is now calculated in percentage of the sum, not as a fixed sum" etc. Of course, this may still have repercussions through the entire system. I don't know how it would work in Coq (I don't know Coq :) ).


> Business changes are usually changes in logic, not in types.

Types _are_ logical propositions. In languages like Coq they are more expressive, and it's easier to leverage them as a way to constrain your implementation. Of course it's just a model, and there can be bugs in the specification, so it's not a silver bullet.


> types are propositions

ML and Coq types, yes. Java, C types? Not so much.


Call them "illogical propositions".


I guess. My experience has been that it's more a case with, for lack of a better term, "old school" OOP.

SOLID helps a lot with that sort of thing. Avoiding inheritance like the plague helps a lot. I slightly hate that I said old school, because Smalltalk-style tell-don't-ask helps a lot.

I think that the idiomatic ways of structuring code in the ML family of languages help the most, because they tend to be pretty radical about limiting scope of visibility. But they also tend to make more of a muddle of things than they're worth in algol-style languages.


> What I would be terrified of in Coq is some tiny change in business requirements ends up pulling the linchpin out of some fundamental premise way down at the bottom of your stack of proofs, this forcing you to go and fix everything.

To be a bit facetious: What I would be terrified of in software development is some tiny change in business requirements ends up pulling the linchpin out of some fundamental premise way down at the bottom of your stack of functions, this forcing you to go and fix everything.

There is good proof engineering and bad proof engineering, just like there is good software engineering and bad software engineering. (If only all of us could agree on what exactly is good and bad in either ;-)). Coq has all sorts of modularity and abstraction facilities that you can use to build an architecture that is hopefully resilient to change.

But yeah, it's not perfect. Particularly because proofs are not always like unit tests: At the lowest level you prove things about actual implementations of functions, not just their behavior. In tricky cases, changing a low-level function does mean that you may have to change a lot of proofs. How to avoid this is still an open problem.


This really depends on what you're doing. I suppose most of the code I've written in the last two decades wouldn't have profited from using coq. BUT then there's CompCert, an optimizing compiler written in coq [I'm affiliated, but not working on it directly]. If you're building safety critical stuff (like in failure == people die) something like this might be interesting.

Of course, for a C compiler the requirements are less prone to change than e.g. some SAP-style business application. As a dev, coq is just another tool in your belt and being a good dev means knowing when to use what tool.


I think that’s to a certain extent a matter of approach: we’ve discovered over time that certain approaches e.g. huge business logic inheritance trees are susceptible to this problem, whereas others e.g. list comprehensions, grouping constructs, functors &c aren’t.

So I’m sure you _can_ build something in Coq that is fragile as you describe, but I feel sure that there are alternative approaches that won’t be as brittle.


" with the hope that the compiler will direct one towards the bugs and they will be easier to find."

Also, less expensive to fix given bugs are cheaper to fix earlier in the lifecycle. You can also reduce breakage during changes to existing code. Finally, you can warranty the code at specific defect rates or for certain kinds of defects where the fixes are free within the warranty. Companies using Cleanroom and Correct-by-Construction did that with the above benefits.

http://infohost.nmt.edu/~al/cseet-paper.html

http://www.anthonyhall.org/c_by_c_secure_system.pdf


I’m referring to production code being good enough rather than the tooling.

Let’s take HN for example. Is it coq proven? I doubt it. Is that a barrier to its success?


I should rephrase; obviously nothing is perfect, I'll admit that a few things will slip through the cracks. I just feel that it's night and day, compared to nearly any mainstream language.


> I suppose one could say Coq is good enough, with bugs.

Wait, isn't Coq verified by itself?


The Coq compiler is unverified, it is written in OCaml.

It's not possible for Coq's consistency to be verified in itself, thanks to Gödel's second incompleteness theorem. However there are still various other aspects of the compiler that could be verified in principle. The CertiCoq project is actively going in that direction.


I love this comment, because there is this pie in the sky mathematical idea, with direct consequences for practical coding.


good enough without bugs makes you a lot of money and also lets you build something with a much smaller team.

I don't think Coq is the way forward (I don't think "code that makes money" has many interesting provable properties in itself along those axes) but being able to ship stuff that's pretty close to bug-free will get you far


It's pronounced closer to "coke".


No. The software Coq is pronounced like the French word "coq", which has a short o like the English "cock" and unlike the long o in the English "coke". https://forvo.com/word/coq/#fr

This is certainly the only pronounciation in the French formal methods community, where Coq comes from. It is also the standard pronounciation in the international formal methods community. I have never heard anyone pronounce it like "coke".

See for example Adam Chlipala at about 5:40 here: https://www.youtube.com/watch?v=4DYVJdHMV5k Benjamin Pierce has a slightly longer o here, but it doesn't sound to me like it's deliberately pronounced differently from "cock": https://www.youtube.com/watch?v=KKrD4JcfW90 And here is Andrej Bauer: https://www.youtube.com/watch?v=COe0VTNF2EA

I know from rare comments like yours that some prudish Americans pronounce it "coke", some deliberately, others apparently because they have been fooled by the first group. But this seems to be a very small minority.


> The software Coq is pronounced like the French word "coq",

Yes, but...

> which has a short o like the English "cock"

No, it doesn't. The sound it has isn't a common English vowel sound at all; it is closer to the vowel sounds in “cuck” and “cook” than the one in “cock”, but not exactly either of those, either. In fact of the four (those two, plus coke and cock), the “o” in “cock” is the one most Americans would find it most distinct from, I would think.

> https://forvo.com/word/coq/#fr

Yeah, none of those pronounce it with the vowel sound in English “cock”.

> I know from rare comments like yours that some prudish Americans pronounce it "coke",

It's not prudish, its a common way Americans that haven't mastered French pronunciation approximate the unaccented French “o” irrespective of context.


> Yeah, none of those pronounce it with the vowel sound in English “cock”.

I was mainly talking about the vowel length. However:

> the “o” in “cock” is the one most Americans would find it most distinct from, I would think.

Really? More distinct than from the "oʊ" sound in American "coke"? (IPA from https://en.wiktionary.org/wiki/coke)


Trying to have this discussion without IPA for multiple dialects is pretty tough with "long o" and "short o." I've only ever heard Coq pronounced like /kɑk/, but I'm from the Western US. I know Brits and Americans from the NE have the round low back vowel for /kɔk/. Coke sounds like /kouk/.


I think mostly because to speakers of American English, roundedness is a very important vowel distinction, as “cock” has an unrounded vowel in American English, whereas the others are rounded, except “cuck”, which while also unrounded matches “coq” in position (open-mid vs. open, on the IPA chart)


As a French Canadian, I pronounce "coq" and "cock" the same way, while "cook" and "coke" are extremely different.

It's clearly because I don't pronounce them well, but it's certainly how I do it.


No, it really is pronounced a lot more like 'cock'. Moreover, I have it on pretty good authority that this wasn't any sort of unhappy coincidence; quite the opposite. I think they may have been unhappy about 'bit'.


You may have just ruined all the fun for me.

I'm pronouncing it like "coque" from now on.


Good news for you, he's wrong. It's pronounced the way "coq" is in French, which isn't quite "cock", but certainly closer to "cock" than "coke".


Looking at the code, and having done AOC2018 and having looked at a lot of solutions in various languages, I don't really like the code, the solutions are really not obvious and it all seems like a syntactic mess.


I don't know that that's necessarily a fair characterisation. I've never seen any Coq syntax before a moment ago but it doesn't look wildly dissimilar to Haskell or Idris.


Hard to make any kind of fair comparison, but given the context, AOC2018, and the many many solutions in many languages, I think its reasonably fair to compare it to all of those and make a judgement on how it stacks up. It seems like it's being offered up for that specific comparison.

Just for reference, another HNer offered up the following in python for inspection

https://www.michaelfogleman.com/aoc18

There are other posted solution sets on HN if you search... Now of course, lots of very different languages with different goals and philosophies. Hard to make a fair comparison, but, still I'm making a subjective comparison. The Coq code just does not seem compelling.


How can you make a comparison from a glance when 1) python was explicitly made to be easy to read (English like) and 2) it’s absurdly common to see python code as opposed to Coq?

Of course it’s possible it won’t make sense if you’re unfamiliar with the syntax or style. That’s like saying Haskell or Erlang are uncompelling from a brief glance, which ignores the benefits of using those languages.


well, I gave the python as an example, there are more examples in most languages.... including Haskell. I have been programming ~40 years and have tried a lot of different languages in that time (not coq, but I have read about it and am curious about practical uses of it). There is no article with this repo, no explanations, I'm just saying I see nothing compelling... I'm completely willing to listen to a compelling argument for why I'd write Coq code and how theorem proving capabilities are worthwhile and how this code demonstrate it.


Functional programming languages, Gallina included, are far simpler and more regular in their grammars than imperative and object-oriented languages. It seems a bit off-the-wall to suggest that the Coq implementation isn't "compelling" without knowledge of how the language works or its design goals. To be clear, these design goals are not identical to Python's, which could be stated crudely as clarity and readability.

As a brief aside, I've often encountered the claim that these goals make Python "English-like." For the grand failures achieved by pursuing such a mistake, one need look no further than COBOL and AppleScript. Python is clear and readable because it is relatively consistent, concise, and reads like how we are taught to expect pseudo-code to read, not because it reads like an English sentence.

It could be argued that Coq's goal in the syntax and semantics department is mathematical clarity, which is somewhat different than the kind of clarity Python pursues. Mathematical clarity favors terseness, simplicity, and consistency above all else, because these features are necessary to express complicated ideas succinctly and unambiguously. As an argument in favor of this definition of clarity over Python's, I submit that programming correctly in a formal sense is quite difficult to do without tools that encourage it as Coq and other languages operating at the level of generality of the calculus of constructions do. Most statically typed languages do not have type systems powerful enough to express the properties that Coq can, and among those that do, I'm not sure any are syntactically simpler.

Coq indeed sacrifices what is for a lot of traditionally trained programmers (myself included) the immediate familiarity of imperative pseudocode that Python expresses so well. What is gained is a highly general yet simple set of tools which can prove things about programs that are simply impossible with other tools. I would also argue that once the syntax is learned and the terseness adjusted to that Coq code is easy to read as well, and that the tradeoff for me is worth the learning period, but you may have different needs and priorities. I am intrinsically interested in rigorous formulations of software correctness, and I realize that makes me a bit more odd than the average programmer.


the solutions appears to take an order of magnitude more code than Haskell

but Coq is also proving correctness so that seems expected


A fairer comparison would be Haskell code plus a complete suite of tests.


No suite of tests is complete enough to replace a proof. Unless the domain is finite and the tests exhaust all values in it.


Out of curiosity, are there any studies that compare error rates among languages? How much is gained by going from say, Java to Haskell, or from Haskell to Coq?


There's this, according to which Clojure does nearly as well as Haskell or Scala: https://jaxenter.com/programming-languages-defect-prone-repo...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: