
Coq: The world’s best macro assembler? (2013) [pdf] - dezgeg
http://research.microsoft.com/en-us/um/people/nick/coqasm.pdf
======
thinkpad20
Coq is an amazing language. Learning to use it, and understanding the theories
that underpin it, give one a far richer sense of understanding of computer
science. In that sense it's akin to Haskell, but with an even more (much
more!) powerful type system. I'm still a beginner but even with my limited
expertise, it's a real breath of fresh air. I would love to see a world in
which logic, type theory and functional programming formed the foundation of
computer science education, rather than the comparatively arcane and clunky
paradigms of object-oriented and imperative programming. I'm convinced that
the quality of software would improve drastically.

(And yes, its name is truly unfortunate. It's hard to have a conversation
about the language, especially in person, without giggling over the name.
Pronouncing it "coke" helps, but it's still the elephant in the room.)

(Also, I'm not sure if the fact that this pdf is entitled "coqasm" is a pun or
an amusing coincidence.)

~~~
vezzy-fnord
_I would love to see a world in which logic, type theory and functional
programming formed the foundation of computer science education, rather than
the comparatively arcane and clunky paradigms of object-oriented and
imperative programming._

It truly is a shame what horrid reputation object-oriented programming has
received, even amongst a lot of people who should supposedly know better.

OO and functional programming are not opposites in any way, nor should they
be.

I don't know through what circumstances exactly did we ultimately get the
quasi-procedural Nygaard-style OO of today, which is indeed clunky and arcane,
as opposed to the style of OO rooted in message passing espoused by Alan Kay
and the Smalltalk family.

Huge advances in JIT and language VMs can be traced to the work of the Self
language designers (itself descended from Smalltalk), the sheer elegance and
minimalism of Smalltalk syntax is likely unmatched, with a plausible exception
of Forth, the power and dynamic introspection capabilities of the image-based
Smalltalk environments are to this day the subject of a lot of poor
reimplementation of various subsets, and one could even make the case that
languages like Erlang are object-oriented to a degree.

Yet it seems OO research has stagnated and instead FP has become the master
paradigm everyone wants to focus on. There's a lot of talk about "programming
language as operating system" as well in these circles, but no one has come
closer to replicating it than the Smalltalk environments yet.

~~~
cynicalkane
OO--in Haskell, at least--is most succinctly represented as existential and
higher-order types with constraints, which break some important assumptions
used for proving things in FP. If you are message passing s.t. the call site
is opaque to the caller, this breaks more assumptions. The assumptions in
question happen to be important assumptions for the currently cool and trendy
research in FP, which is important when you're an academic with a career.

Furthermore, the types required are a bit more general than OO, so once you've
introduced the former it doesn't make sense to constrain your landscape of
thought to the latter.

~~~
tel
I'm not sure it really breaks assumptions: it just requires coinduction and
bisimulation instead of induction and equality. Coinduction and Bisimulation
aren't as well understood today and are harder to use, so it's a bit of a
rough project to move forward with.

What assumptions are you referring to?

------
kxyvr
In case someone else is interested in this kind of work, there's an
interesting line of research in certified programs that's similar to this.
Basically, you specify the semantics of some language in Coq. Then, write a
program in the desired language. This allows us to analyze the program in Coq
and prove certain properties about it. Essentially, it's a process to develop
certified programs by using the process:

language -> Coq -> prove properties

As another example of this, Sandrine Blazy and Xavier Leroy have a paper that
formalize a subset of C in Coq:

[http://pauillac.inria.fr/~xleroy/publi/Clight.pdf](http://pauillac.inria.fr/~xleroy/publi/Clight.pdf)

There's also some discussion on StackOverflow about the definition of what a
certified program is:

[https://stackoverflow.com/questions/21320138/definition-
of-a...](https://stackoverflow.com/questions/21320138/definition-of-a-
certified-program)

In any case, formalizing a subset of the x86 architecture in Coq means that we
can prove properties about a program written in assembler using Coq, which is
both interesting and impressive.

~~~
takemikazuchi
Compcert C, also written in Coq, is another interesting C compiler:
[http://compcert.inria.fr/](http://compcert.inria.fr/)

------
eli_gottlieb
Oh, _wow_ , this is absolutely _amazing_. My complements to the authors!
They're really pushing the bounds of what we can do with Coq further and
further.

Hell, it makes me want to open the IDE and prove something right now.

~~~
samwilliams
> Hell, it makes me want to open the IDE and prove something right now.

It makes me want to do a little bootloader development too... in Coq!

It might be fun to make a tiny 16-bit real mode OS with this also. You could
head towards your own miniature version of SeL4 os something.

Edit:

Oops! I forgot that it couldn't produce 16-bit machine code. Still, build a
tiny bootloader that grabs the next sectors from the boot medium, then jump
into protected mode (pretty easy really) and you are good to go!

------
kragen
Quoting my comment on this paper from 8 months ago on
[https://lobste.rs/s/pc1v4g/coq_the_world_s_best_macro_assemb...](https://lobste.rs/s/pc1v4g/coq_the_world_s_best_macro_assembler_microsoft_research),
where it is full of links to the things I'm referring to:

This is a really interesting and thought-provoking idea. In writing httpdito,
and in particular in putting a sort of tiny peephole optimizer into it as gas
macros, it occurred to me that (some) Forth systems are awfully close to being
macro assemblers, and that really the only reason you’d use a higher-level
language rather than a macro assembler is that error handling in macro
assemblers, and in Forth, is terrible verging on nonexistent. Dynamically-
typed languages give you high assurance that your program won’t crash (writing
the Ur-Scheme compiler very rarely required me to debug generated machine
code), while strongly-statically-typed languages give you similarly weak
assurances with compile-time checks; but it seems clear that the programmer
needs to be able to verify that their program is also free of bugs like memory
leaks, infinite loops, SQL injection, XSS, and CSRF, or for that matter
“billion-laughs”-like denial-of-service vulnerabilities, let alone more
prosaic problems like the Flexcoin bankruptcy due to using a non-transactional
data store.

Against this background, we find that a huge fraction of our day-to-day
software (X11, Firefox, Emacs, Apache, the kernel, and notoriously OpenSSL) is
written in languages like C and C++ that lack even these rudimentary memory-
safety properties, let alone safety against the trickier bogeymen mentioned
above. The benefit of C++ is that it allows us to factor considerations like
XSS and loops into finite, checkable modules; the benefit of C is that we can
tell pretty much what the machine is going to do. But, as compiler
optimizations exploit increasingly recondite properties of the programming
language definition, we find ourselves having to program as if the compiler
were our ex-wife’s or ex-husband’s divorce lawyer, lest it introduce security
bugs into our kernels, as happened with FreeBSD a couple of years back with a
function erroneously annotated as noreturn, and as is happening now with
bounds checks depending on signed overflow behavior.

We could characterize the first of these two approaches as “performing on the
trapeze with a net”: successful execution is pleasant, if unexpected, but we
expect that even a failure will not crash our program with a segfault —
bankrupt Flexcoin, perhaps, and permit CSRF and XSS, but at least we don’t
have to debug a core dump! The second approach involves performing without a
net, and so the tricks attempted are necessarily less ambitious; we rely on
compiler warnings and errors, careful program inspection, and testing to
uncover any defects, with results such as Heartbleed, the Toyota accelerator
failure, the THERAC-25 fatalities, the destruction of Ariane 5, and the Mars
Pathfinder priority-inversion lockups.

So onto this dismal scene sweep Kennedy, Benton, Jensen, and Dagand, with a
novel new approach: rather than merely factoring our loops and HTML-escaping
into algorithm templates and type conversion operators, thus giving us the
opportunity to get them right once and for all (but no way to tell if we have
done so), we can instead program in a language rich enough to express our
correctness constraints, our compiler optimizations (§4.1), and the proofs of
correctness of those optimizations. Instead of merely packaging up algorithms
(which we believe to be correct after careful inspection) into libraries, we
can package up correctness criteria and proof tactics into libraries.

This is a really interesting and novel alternative to the with-a-net and the
without-a-net approaches described earlier; it goes far beyond previous
efforts to prove programs correct; rather than attempting to prove your
program correct before you compile it, or looking for bugs in the object code
emitted, it attempts, Forth-like, to replace the ecosystem of compilers and
interpreters with a set of libraries for your proof assistant — libraries
which could eventually allow you to program at as high a level as you wish,
but grounded in machine code and machine-checkable proofs. Will it turn out to
be practical? I don’t know.

(End quote from lobste.rs comment.)

Also, WHAT IN THE LIVING FUCK IS WRONG WITH YOU PEOPLE THAT THE ONLY THING YOU
HAVE TO SAY ABOUT THIS PAPER IS ABOUT WHETHER YOU LIKE COCK. HAVE YOU EVEN
READ THE FUCKING PAPER. JESUS FUCKING CHRIST WITH A SHOTGUN IN A SOMBRERO,
PEOPLE. FUCK YOU, YOU DROOLING MONSTERS. MAKE YOUR COCK JOKES BUT SAY
SOMETHING SUBSTANTIVE.

~~~
rtpg
I would like to point out that things like SQL injection are pretty easy to
avoid by just typing(in the type theory sense) things out in a DSL properly,
and not having straight string injection. You don't need crazy dependent types
to employ the "make sure strings are escaped" strategies.

Infinite loops, on the other hand...

~~~
tel
I'm pretty sure you can build a total language and outlaw infinite loops even
without dependent types. It also turns out that you can basically do whatever
you want in a total language with a good runtime so long as it does
coinduction well.

I just think the only real motivation people have taken up on for total
languages is their proof theoretic properties which drives you quickly to
dependent types.

~~~
kragen
Sure, I'm pretty sure Turner's original Total Functional Programming proposal
[https://uf-ias-2012.wikispaces.com/file/view/turner.pdf](https://uf-
ias-2012.wikispaces.com/file/view/turner.pdf) doesdn't use dependent types,
but I'm not sure it has what you mean by "a good runtime [that] does
coinduction well".

~~~
tel
All I mean by that is that you probably want to be able to evaluate a
partiality monad in the runtime if you want to write an interpreter or
webserver perhaps.

------
xxxyy
I would really love to see the x86 multi-core architecture spec'd like this.
To this day I don't really understand all the details behind x86 concurrency:
memory, fences, atomics, concurrent operations ordering, mutex
implementations. Such knowledge is crucial to writing fast lock-free
structures, and to proving their properties either the whiteboard way or
through model checking. Perhaps I should just get through one of these
enormous Intel low-level manuals.

~~~
dezgeg
Here are some attempts at axiomatic validation of memory barriers and atomics
on several architectures:
[http://lwn.net/Articles/608550/](http://lwn.net/Articles/608550/)

~~~
mjn
The x86tso file mentioned there is from this project:
[http://www.cl.cam.ac.uk/~pes20/weakmemory/](http://www.cl.cam.ac.uk/~pes20/weakmemory/)

They have some interesting material there on building models of processors'
memory-concurrency behavior. Unfortunately, reading the vendors' own manuals
is not a good way to find out about that, even if you excuse the verbosity.
Memory-consistency behavior is often very vaguely or indirectly specified in
the architecture manuals, and worse, behavior of the actual parts does not
always reliably agree with the manuals. A good paper:
[http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf](http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf)

~~~
xxxyy
Thanks for the lwn and cambridge links, they both look like a good read.
Considering the big picture I feel like we need another Tony Hoare to show us
good ways to reason about shared memory concurrent programs. CSP is cool, but
at some point Erlang/Go message passing style is not enough, and one must dive
deep into the crazy world of parallel processor architectures. It is my
opinion that before somebody sorts this out well we should not expect
processor manufacturers to play along - after all _hardly anybody_ understands
shared memory concurrency well.

------
emmanueloga_
This looks really cool. How does this work compare to
[http://compcert.inria.fr/](http://compcert.inria.fr/) (a verified C
compiler)?

For most applications C would be low level enough, and possibly a better
option.

------
efutch
This would be really out-of-this-world amazing if it had QuickAssembler's UI

------
jpmonette
Maybe they should rename it?

~~~
yaantc
Honi soit qui mal y pense ;) A coq is a rooster in French, as can be seen from
the coq logo. One of the author is called Coquand, the rooster is the emblem
of France (where it comes from) and the coq language is called gallina
(rooster in latin). So it's a multi levels pun, and likely here to stay.
There's a grand tradition of goofy names in open source, but in this case I
feel it's safe to say it won't be the main barrier to mass adoption.

~~~
wk_end
It's also, IIRC, based on a logical model called the Calculus of Constructions
- or CoC, for short.

~~~
bch
> It's also, IIRC, based on a logical model called the Calculus of
> Constructions

"Calculus of Inductive Constructions" ([https://coq.inria.fr/about-
coq](https://coq.inria.fr/about-coq))

~~~
Cederfjard
From your link:

> Coq is the result of more than 20 years of research. It started in 1984 from
> an implementation of the Calculus of Constructions at INRIA-Rocquencourt by
> Thierry Coquand and Gérard Huet. In 1991, Christine Paulin extended it to
> the Calculus of Inductive Constructions.

------
specular
Agreed. Is this pronounced "coke"? Regardless, this is a very interesting
paper.

~~~
VeejayRampay
It's actually pronounced more like "cock". Coq is the French word for rooster.

~~~
tunesmith
In a french accent it's more like "Cuck", that's how I say it in my head.
Luckily, I don't have to say it out loud.

~~~
psychometry
I believe it's /ɔ/ in French but usually /ɑ/ in English.

------
Lrigikithumer
Why are people getting downvoted for saying coq is not exactly a great name.
It's not, no matter it's definition it sounds like cock, not a great name to
advertise the language.

I mean if someone created a programming language called vagina, labia, penis
or sphincter it would be ridiculed, cause it's not an appropriate name for a
language.

~~~
sklogic
"Cock" in English means exactly the same thing as "coq" in French. So, yes,
the language is called "cock". Do you have anything against this beautiful
bird?

~~~
Pyret
I personally don't care about the bird, I am just into coq.

