Hacker News new | comments | show | ask | jobs | submit login
Coq: The world’s best macro assembler? (2013) [pdf] (microsoft.com)
187 points by dezgeg on Dec 15, 2014 | hide | past | web | favorite | 61 comments



Coq is an amazing language. Learning to use it, and understanding the theories that underpin it, give one a far richer sense of understanding of computer science. In that sense it's akin to Haskell, but with an even more (much more!) powerful type system. I'm still a beginner but even with my limited expertise, it's a real breath of fresh air. I would love to see a world in which logic, type theory and functional programming formed the foundation of computer science education, rather than the comparatively arcane and clunky paradigms of object-oriented and imperative programming. I'm convinced that the quality of software would improve drastically.

(And yes, its name is truly unfortunate. It's hard to have a conversation about the language, especially in person, without giggling over the name. Pronouncing it "coke" helps, but it's still the elephant in the room.)

(Also, I'm not sure if the fact that this pdf is entitled "coqasm" is a pun or an amusing coincidence.)


I would love to see a world in which logic, type theory and functional programming formed the foundation of computer science education, rather than the comparatively arcane and clunky paradigms of object-oriented and imperative programming.

It truly is a shame what horrid reputation object-oriented programming has received, even amongst a lot of people who should supposedly know better.

OO and functional programming are not opposites in any way, nor should they be.

I don't know through what circumstances exactly did we ultimately get the quasi-procedural Nygaard-style OO of today, which is indeed clunky and arcane, as opposed to the style of OO rooted in message passing espoused by Alan Kay and the Smalltalk family.

Huge advances in JIT and language VMs can be traced to the work of the Self language designers (itself descended from Smalltalk), the sheer elegance and minimalism of Smalltalk syntax is likely unmatched, with a plausible exception of Forth, the power and dynamic introspection capabilities of the image-based Smalltalk environments are to this day the subject of a lot of poor reimplementation of various subsets, and one could even make the case that languages like Erlang are object-oriented to a degree.

Yet it seems OO research has stagnated and instead FP has become the master paradigm everyone wants to focus on. There's a lot of talk about "programming language as operating system" as well in these circles, but no one has come closer to replicating it than the Smalltalk environments yet.


OO--in Haskell, at least--is most succinctly represented as existential and higher-order types with constraints, which break some important assumptions used for proving things in FP. If you are message passing s.t. the call site is opaque to the caller, this breaks more assumptions. The assumptions in question happen to be important assumptions for the currently cool and trendy research in FP, which is important when you're an academic with a career.

Furthermore, the types required are a bit more general than OO, so once you've introduced the former it doesn't make sense to constrain your landscape of thought to the latter.


I'm not sure it really breaks assumptions: it just requires coinduction and bisimulation instead of induction and equality. Coinduction and Bisimulation aren't as well understood today and are harder to use, so it's a bit of a rough project to move forward with.

What assumptions are you referring to?


>OO and functional programming are not opposites in any way, nor should they be.

Depends what you mean by "OO".

If you mean mutable + stateful programming with objects as an abstraction mechanism (which is what people usually mean by "OO"), then yes, it is in some ways opposite to pure FP.

OO to FP is kind of like Turing Machines to the Lambda Calculus. They both get the job done, but one is kind of a hack and one is very elegant.


Which one is a hack and which one is very elegant, may depend on who you ask...


For learning this sort of thing I always recommend Pierce's Software Foundations [0] and Chlipala's Certified Programming with Dependent Types [1] which are both available online for free, but I'm also reading through Coq d'Art as well and like it a lot... unfortunately it's harder to find.

    [0] http://www.cis.upenn.edu/~bcpierce/sf/current/index.html
    [1] http://adam.chlipala.net/cpdt/


Seconded! I've been working my way through both of those for a while. For those who like watching lectures (as I do), I would also recommend Pierce's lectures on the same material as Software Foundations, Harper's lectures on type theory, and the other associated lecture serieses, all coming from a session of the Oregon Programming Languages Summer School, and available on youtube. Unfortunately there are some gaps in the software foundations lectures, but considering that it's a complement to an already detailed set of literature, it's not a major issue. Also unfortunately, the quality of the audio and video aren't always the greatest. But, beggars can't be choosers! It's still great material.


I'd recommend GOING to the Oregon Programming Languages Summer School. It's amazing.


I plan/hope to next summer. I only found out about it a few months ago.


> And yes, its name is truly unfortunate.

IIRC I've heard from knowledgeable sources that it's quite intentional, a joke at the expense of les anglo-saxons.


That's a great way to make sure your software gets used.


>And yes, its name is truly unfortunate.

Judging by the logo, it's quite intentional.


In case someone else is interested in this kind of work, there's an interesting line of research in certified programs that's similar to this. Basically, you specify the semantics of some language in Coq. Then, write a program in the desired language. This allows us to analyze the program in Coq and prove certain properties about it. Essentially, it's a process to develop certified programs by using the process:

language -> Coq -> prove properties

As another example of this, Sandrine Blazy and Xavier Leroy have a paper that formalize a subset of C in Coq:

http://pauillac.inria.fr/~xleroy/publi/Clight.pdf

There's also some discussion on StackOverflow about the definition of what a certified program is:

https://stackoverflow.com/questions/21320138/definition-of-a...

In any case, formalizing a subset of the x86 architecture in Coq means that we can prove properties about a program written in assembler using Coq, which is both interesting and impressive.


Compcert C, also written in Coq, is another interesting C compiler: http://compcert.inria.fr/


Oh, wow, this is absolutely amazing. My complements to the authors! They're really pushing the bounds of what we can do with Coq further and further.

Hell, it makes me want to open the IDE and prove something right now.


> Hell, it makes me want to open the IDE and prove something right now.

It makes me want to do a little bootloader development too... in Coq!

It might be fun to make a tiny 16-bit real mode OS with this also. You could head towards your own miniature version of SeL4 os something.

Edit:

Oops! I forgot that it couldn't produce 16-bit machine code. Still, build a tiny bootloader that grabs the next sectors from the boot medium, then jump into protected mode (pretty easy really) and you are good to go!


Quoting my comment on this paper from 8 months ago on https://lobste.rs/s/pc1v4g/coq_the_world_s_best_macro_assemb..., where it is full of links to the things I'm referring to:

This is a really interesting and thought-provoking idea. In writing httpdito, and in particular in putting a sort of tiny peephole optimizer into it as gas macros, it occurred to me that (some) Forth systems are awfully close to being macro assemblers, and that really the only reason you’d use a higher-level language rather than a macro assembler is that error handling in macro assemblers, and in Forth, is terrible verging on nonexistent. Dynamically-typed languages give you high assurance that your program won’t crash (writing the Ur-Scheme compiler very rarely required me to debug generated machine code), while strongly-statically-typed languages give you similarly weak assurances with compile-time checks; but it seems clear that the programmer needs to be able to verify that their program is also free of bugs like memory leaks, infinite loops, SQL injection, XSS, and CSRF, or for that matter “billion-laughs”-like denial-of-service vulnerabilities, let alone more prosaic problems like the Flexcoin bankruptcy due to using a non-transactional data store.

Against this background, we find that a huge fraction of our day-to-day software (X11, Firefox, Emacs, Apache, the kernel, and notoriously OpenSSL) is written in languages like C and C++ that lack even these rudimentary memory-safety properties, let alone safety against the trickier bogeymen mentioned above. The benefit of C++ is that it allows us to factor considerations like XSS and loops into finite, checkable modules; the benefit of C is that we can tell pretty much what the machine is going to do. But, as compiler optimizations exploit increasingly recondite properties of the programming language definition, we find ourselves having to program as if the compiler were our ex-wife’s or ex-husband’s divorce lawyer, lest it introduce security bugs into our kernels, as happened with FreeBSD a couple of years back with a function erroneously annotated as noreturn, and as is happening now with bounds checks depending on signed overflow behavior.

We could characterize the first of these two approaches as “performing on the trapeze with a net”: successful execution is pleasant, if unexpected, but we expect that even a failure will not crash our program with a segfault — bankrupt Flexcoin, perhaps, and permit CSRF and XSS, but at least we don’t have to debug a core dump! The second approach involves performing without a net, and so the tricks attempted are necessarily less ambitious; we rely on compiler warnings and errors, careful program inspection, and testing to uncover any defects, with results such as Heartbleed, the Toyota accelerator failure, the THERAC-25 fatalities, the destruction of Ariane 5, and the Mars Pathfinder priority-inversion lockups.

So onto this dismal scene sweep Kennedy, Benton, Jensen, and Dagand, with a novel new approach: rather than merely factoring our loops and HTML-escaping into algorithm templates and type conversion operators, thus giving us the opportunity to get them right once and for all (but no way to tell if we have done so), we can instead program in a language rich enough to express our correctness constraints, our compiler optimizations (§4.1), and the proofs of correctness of those optimizations. Instead of merely packaging up algorithms (which we believe to be correct after careful inspection) into libraries, we can package up correctness criteria and proof tactics into libraries.

This is a really interesting and novel alternative to the with-a-net and the without-a-net approaches described earlier; it goes far beyond previous efforts to prove programs correct; rather than attempting to prove your program correct before you compile it, or looking for bugs in the object code emitted, it attempts, Forth-like, to replace the ecosystem of compilers and interpreters with a set of libraries for your proof assistant — libraries which could eventually allow you to program at as high a level as you wish, but grounded in machine code and machine-checkable proofs. Will it turn out to be practical? I don’t know.

(End quote from lobste.rs comment.)

Also, WHAT IN THE LIVING FUCK IS WRONG WITH YOU PEOPLE THAT THE ONLY THING YOU HAVE TO SAY ABOUT THIS PAPER IS ABOUT WHETHER YOU LIKE COCK. HAVE YOU EVEN READ THE FUCKING PAPER. JESUS FUCKING CHRIST WITH A SHOTGUN IN A SOMBRERO, PEOPLE. FUCK YOU, YOU DROOLING MONSTERS. MAKE YOUR COCK JOKES BUT SAY SOMETHING SUBSTANTIVE.


In some ways, I feel like all we tend to do with our higher-level abstractions is make macroassemblers out of them again. Tiny amount of algorithmic content, then massive amount of business logic. As models for business logic grow more complex, they change from code, to data, and then into code again(this time as a separate language). Provisions from the old environment are lost in the new environment, requiring a second set of abstractions. Repeat...


I would like to point out that things like SQL injection are pretty easy to avoid by just typing(in the type theory sense) things out in a DSL properly, and not having straight string injection. You don't need crazy dependent types to employ the "make sure strings are escaped" strategies.

Infinite loops, on the other hand...


I'm pretty sure you can build a total language and outlaw infinite loops even without dependent types. It also turns out that you can basically do whatever you want in a total language with a good runtime so long as it does coinduction well.

I just think the only real motivation people have taken up on for total languages is their proof theoretic properties which drives you quickly to dependent types.


Sure, I'm pretty sure Turner's original Total Functional Programming proposal https://uf-ias-2012.wikispaces.com/file/view/turner.pdf doesdn't use dependent types, but I'm not sure it has what you mean by "a good runtime [that] does coinduction well".


All I mean by that is that you probably want to be able to evaluate a partiality monad in the runtime if you want to write an interpreter or webserver perhaps.


it occurred to me that (some) Forth systems are awfully close to being macro assemblers, and that really the only reason you’d use a higher-level language rather than a macro assembler is that error handling in macro assemblers, and in Forth, is terrible verging on nonexistent.

Another reason to use a high-level language is that macro assembly code and Forth programs aren't very portable across machines. I guess you could model a virtual machine in the same way the authors model x86, target the virtual machine instead, then have a verified translation from VM operations to x86 instructions, ARM instructions, etc. I wonder if you could get similar performance to a native compiler with this approach.


Forth programs are fairly portable across machines — I would guess at least as portable as C — although often not across Forth implementations. That's why Open Firmware drivers are written in Forth.

Macro assembly code can be portable across machines; that's how e.g. SNOBOL4 was ported to 50 different machines starting in the late 1960s, including even the IBM PC; see http://www.snobol4.org/history.html and https://news.ycombinator.com/item?id=3108946 (a thread here that I posted in, before HN was apparently taken over by drooling cock-obsessed chimpanzees who can't take the time to read EVEN THE ABSTRACT of linked papers, by which I do not mean you, panic) for more details.

In effect, the coqasm approach moves portability, just like safety, from the language into a library. Normal macro assemblers (and, for that matter, C) make it difficult to tell if your application code has avoided processor dependencies. It seems like Coq might actually make it possible to prove that you've done that.


I would really love to see the x86 multi-core architecture spec'd like this. To this day I don't really understand all the details behind x86 concurrency: memory, fences, atomics, concurrent operations ordering, mutex implementations. Such knowledge is crucial to writing fast lock-free structures, and to proving their properties either the whiteboard way or through model checking. Perhaps I should just get through one of these enormous Intel low-level manuals.


Here are some attempts at axiomatic validation of memory barriers and atomics on several architectures: http://lwn.net/Articles/608550/


The x86tso file mentioned there is from this project: http://www.cl.cam.ac.uk/~pes20/weakmemory/

They have some interesting material there on building models of processors' memory-concurrency behavior. Unfortunately, reading the vendors' own manuals is not a good way to find out about that, even if you excuse the verbosity. Memory-consistency behavior is often very vaguely or indirectly specified in the architecture manuals, and worse, behavior of the actual parts does not always reliably agree with the manuals. A good paper: http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf


Thanks for the lwn and cambridge links, they both look like a good read. Considering the big picture I feel like we need another Tony Hoare to show us good ways to reason about shared memory concurrent programs. CSP is cool, but at some point Erlang/Go message passing style is not enough, and one must dive deep into the crazy world of parallel processor architectures. It is my opinion that before somebody sorts this out well we should not expect processor manufacturers to play along - after all hardly anybody understands shared memory concurrency well.



What kind of work do you do?


I'm still a student. But I do know this kind of knowledge is used in certain CS areas, such as computer graphics or algorithmic trading.


This looks really cool. How does this work compare to http://compcert.inria.fr/ (a verified C compiler)?

For most applications C would be low level enough, and possibly a better option.


This would be really out-of-this-world amazing if it had QuickAssembler's UI


Maybe they should rename it?


Honi soit qui mal y pense ;) A coq is a rooster in French, as can be seen from the coq logo. One of the author is called Coquand, the rooster is the emblem of France (where it comes from) and the coq language is called gallina (rooster in latin). So it's a multi levels pun, and likely here to stay. There's a grand tradition of goofy names in open source, but in this case I feel it's safe to say it won't be the main barrier to mass adoption.


It's also, IIRC, based on a logical model called the Calculus of Constructions - or CoC, for short.


> It's also, IIRC, based on a logical model called the Calculus of Constructions

"Calculus of Inductive Constructions" (https://coq.inria.fr/about-coq)


From your link:

> Coq is the result of more than 20 years of research. It started in 1984 from an implementation of the Calculus of Constructions at INRIA-Rocquencourt by Thierry Coquand and Gérard Huet. In 1991, Christine Paulin extended it to the Calculus of Inductive Constructions.


The CoC provides the basic framework for Coq, while the Co(I)C or even the Calculus of (Co)-Inductive Constructions provides a dramatic extension in power and expressivity necessary for "real" programming in Coq.


Is it really an issue though? Don't restaurants have coq au vin on the menu where you're from?


Sometimes a cringe-inducing name is great for visibility and marketing. See "Wii", even "iPad"...


This issue is addressed in the FAQ: https://coq.inria.fr/faq?q=node/16&som=2#htoc4


Actually this is funny, because "bit" means in French the same thing as "Coq" in English. Probably to be fair we should debate about renaming "bit" also :) (or not).


Agreed. Is this pronounced "coke"? Regardless, this is a very interesting paper.


It's cock. Took a course with Andrew Appel; the heroic man did not flinch once when he said it over and over again during the semester. My favorite line of the course: "You're going to do your homework in COCK."


Was the naming intentional? Did they know the common English meaning of it?


They give an explanation here.

https://coq.inria.fr/faq?q=node/16&som=2#htoc4


Reasonable people might disagree, but I think that what they say there doesn't answer the question at all. Cock means rooster in English too, after all, but somehow the name doesn't get used.


It's actually pronounced more like "cock". Coq is the French word for rooster.


In a french accent it's more like "Cuck", that's how I say it in my head. Luckily, I don't have to say it out loud.


I believe it's /ɔ/ in French but usually /ɑ/ in English.


Of course, cock is an English word for rooster.

Does coq have the same connotation in French, though?


Not at all, but I was told that the pun was totally intentional [citation needed].


Is the filename of "coqasm.pdf" also trying to get at something?


Hrm.... Really makes me uncomfortable to say "I love Coq" conversationally.


Why are people getting downvoted for saying coq is not exactly a great name. It's not, no matter it's definition it sounds like cock, not a great name to advertise the language.

I mean if someone created a programming language called vagina, labia, penis or sphincter it would be ridiculed, cause it's not an appropriate name for a language.


"Cock" in English means exactly the same thing as "coq" in French. So, yes, the language is called "cock". Do you have anything against this beautiful bird?


I personally don't care about the bird, I am just into coq.


I prefer this beautiful bird:

http://en.wikipedia.org/wiki/Great_tit


Because they are concern trolling. However, if they were really offended by something so minor, then they deserve ridicule.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: