
CakeML – A Verified Implementation of ML - setori88
https://cakeml.org/
======
vog
I find it interesting that CakeML, like many other developments in this area,
is based on SML (Standard ML) and not OCaml (Objective Caml). Moreover,
whenever I read something about ML languages, it seems most people in the
academic field talk about SML.

Yet, it seems that OCaml is more popular among programmers and real-world
projects. Even though these programmers come from the academic field, given
the niche existence that OCaml still is. For example, the astonishing
MigrageOS project chose OCaml instead of SML.

So my question is:

How is that? Why is OCaml so much popular, despite having just one
implementation and no real spec? Why is SML with its real spec and multiple
implementations not as least equally popular?

EDIT: Here are two possible answers that I don't think apply:

1\. OCaml may be "good enough", which, combined with network effects, make
choosing OCaml over SML a self-fulfilling prophecy. I don't think it is that
simple, because OCaml users and projects come mostly from the academic field.
They are deeply concerned with correctness of code. Which would mean they
should all have favored SML over OCaml. In fact, sometimes correctness seems
to be the sole motivation. For example, the author(s) of OCaml-TLS didn't just
want to create yet another TLS library in a hip language. They are concerned
with the state of the OpenSSL and similar libraries, and wanted to create a
100% correct, bullet-proof, alternative.

2\. Although one could attribute this to the "O" in Objective Caml, I don't
think it is that simple, because it seems the object-oriented extensions are
almost unused, and wherever I saw them being used (e.g. LablGTK, an OCaml
wrapper for the GTK UI library) I don't see that much value, and that sticking
to plain OCaml Modules and Functors would have led to a better interface.)

~~~
DanielBMarkham
It may be similar to the reason for adoption for F# -- it does objects

Doesn't matter if you actually code with objects. The important point for
adoption is that it does them.

So for a noob .NET programmer you say, hey, look at F#! You can code it just
like C#. Kinda.

But as soon as they start coding, you tell them nope, all of these types of
things are actually antipatterns.

~~~
jetti
I disagree with your premise on why F# was adopted. It isn't because it does
objects but it was really the first functional language that was pushed by
Microsoft and ran on the .NET platform. Had F# not run on the CLR I don't
think it would have nearly as many users as it does now.

~~~
WorldMaker
That goes into why Microsoft (Research) chose to base F# on OCaml rather than
SML (or Haskell, which Microsoft Research also has some fingers in that pie):
it's not that useful targeting the CLR if you can't interact with other things
running on the CLR (including the Base Class Library [BCL]). Presumably
Microsoft could have have tried for something wild like trying to build a "CLR
Object monad" for Haskell, but the F# team instead went with the existing,
known commodity language family that already straddled that border between
object systems and functional programming (OCaml).

------
mafribe
Summary: CakeML is the first verified optimising compiler that bootstraps.

Side note: Cake stands for CAmbridge KEnt, which is where (most of) CakeML's
verification was carried out.

The pioneering project in this space was X. Leroy's CompCert. This was the
first verified optimising compiler. More precisely, a realistic, moderatly-
optimising compiler for a large subset of the C language down to PowerPC and
ARM assembly code.

~~~
aisofteng
Could you expand on the significance of the first, for those of us not
familiar with formal verification?

Is this is a first because it is is theoretically difficult to do, or because
it requires a lot of implementation time? What are some key points to read up
and understand in order to properly appreciate this result, past the Wikipedia
article on formal verification [1]?

Thank you in advance for any elaboration.

[1]
[https://en.wikipedia.org/wiki/Formal_verification](https://en.wikipedia.org/wiki/Formal_verification)

~~~
mafribe
The problem with verifying realistic compilers is scale. We have known how to
do it in principle since forever, and verification of toy compilers is part of
textbooks on verification, such as [1], see also [2]. Realistic compilers are
very complicated and Leroy's verification of CompCert took several man years
for one of the world's leading compiler and verification guys. The purpose of
research like CompCert and CakeML is twofold:

\- Provide a verified software toolchain for programmers with a minimal
trusted computing base.

\- Investigate how the cost (in a general sense) of formal verification in
general and compiler verification in particular can be lowered, ideally to the
point that normal programmers can routinely use formal verification.

The advance that CakeML makes over CompCert is bootstrapping: CakeML can
compile itself, while CompCert (being a C compiler written in Ocaml) can't.
Simplifying a bit, bootstrapping lowers the trusted computing base.

Maybe Leroy's [3, 4] are good starting point for learning about this field.

[1] T. Nipkow, G. Klein, Concrete Semantics. [http://www.concrete-
semantics.org/](http://www.concrete-semantics.org/)

[2] A. Chlipala, A verified compiler for an impure functional language.
[http://adam.chlipala.net/papers/ImpurePOPL10](http://adam.chlipala.net/papers/ImpurePOPL10)

[3] X. Leroy, Verifying a compiler: Why? How? How far?
[http://www.cgo.org/cgo2011/Xavier_Leroy.pdf](http://www.cgo.org/cgo2011/Xavier_Leroy.pdf)

[4] X. Leroy, Formal verification of a realistic compiler.
[http://gallium.inria.fr/~xleroy/publi/compcert-
CACM.pdf](http://gallium.inria.fr/~xleroy/publi/compcert-CACM.pdf)

~~~
nickpsecurity
No it goes further than that. They embedded assembly languages & many aspects
of computation in HOL then built their compiler. The thing goes straight from
logic to assembly with the theorem prover being the TCB outside the specs
themselves. Whereas, CompCert was specified in Coq, would probably be
extracted to an ML, and then that whole pile of code (hopefully verified to
assembly) would do the job. Unless they're doing all the compiles in Coq
itself w/ its checker. This is the part I could be really wrong on.

The TCB reduction is huge. Also, seL4 organization built the Simpl embedding
of C in that to do "translation validation" (due to Jared Davis) of it
straight to or matched against assembly. Skips the need for a CompCert-style,
verified compiler altogether. Myreen et al's techniques were also used to
verify theorem provers and now hardware.

So, the CakeML effort and its effects are _huge_. Maybe more so than CompCert
given the flexibility & fact that it's a proprietary product now whereas
Myreen et al's stuff is open. That's what I said back when I saw it. The
prediction was confirmed as COGENT was built on the same technology with
amazing results so far in cost of verification:

[https://ts.data61.csiro.au/projects/TS/cogent.pml](https://ts.data61.csiro.au/projects/TS/cogent.pml)

------
nickpsecurity
Everyone with formal method background interested in this work should consider
taking on one of their posted projects that would improve it. Especially Ocaml
to CakeML translator.

[https://cakeml.org/projects](https://cakeml.org/projects)

Just email them first in case someone has done the work already. Academics
sometimes are slow to update web sites due to digging deep into their
research. ;) The best uses I can think of for CakeML are:

A reference implementation to do equivalence checks against with main
language, a ML or not, being something optimized.

Someone to build other tools in that need high assurance of correctness.
Prototype it to get the algorithm right using any amount of brains and tooling
that already exist with an equivalent CakeML program coming out. Then, that
turns into vetted object code.

A nice language for writing low-level interpreters, assemblers, or compilers
that bootstrap others in a high-confidence way. Idea being in verifiable or
reproducible builds where you want a starting point that can be verified by
eye. They can look at the CakeML & assembly output with some extra assurance
on top of hand-doing it. One might even use the incremental compilation paper
on building up a Scheme to end up with a powerful, starting language plus
assurance binary matches code.

------
gravypod
Go into the compiler explorer [0] and type the following

    
    
        val num = 10
    

Then take a look at the x86 generation.

What is all of that. It doesn't look like executable data needed. Is that just
implicit functions or something baked into the language? If it is, why isn't
it being tree-shook?

~~~
clarus
It seems that there are some examples here:
[https://github.com/CakeML/cakeml/blob/master/explorer/exampl...](https://github.com/CakeML/cakeml/blob/master/explorer/examples.sml)
Whould be cool to have these examples accessible from the web interface of the
compiler explorer.

~~~
gravypod
Sadly it doesn't seem like you can link to a compiler output from their
compiler.

------
Twirrim
What is ML in this context? Neither CakeML nor Standard ML site appear to
actually define it, and it's an acronym with a few definitions in tech (e.g.
Machine Learning)

~~~
more_original
It's Meta Language. The ML language was developed in the early 1970s as a
meta-language for the LCF theorem prover. ML was developed as a language for
programming proof tactics. The strong type system and type soundness
guarantees of ML were important to guarantee that such tactics could only
prove correct theorems.

[https://en.wikipedia.org/wiki/ML_(programming_language)](https://en.wikipedia.org/wiki/ML_\(programming_language\))

~~~
z1mm32m4n
For a bit of history about ML, see Appendix F: The Development of ML (on page
89) from here:

[http://sml-family.org/sml97-defn.pdf](http://sml-family.org/sml97-defn.pdf)

------
fithisux
What is the difference between CakeML and StandardML in terms of Syntax and
Semantics?

