
Vellvm: Verifying the LLVM (2015) - rfreytag
http://www.cis.upenn.edu/~stevez/vellvm/#
======
piemonkey
I think it's important to understand what's meant when someone says some piece
of software is "verified".

In this case, when they say LLVM is verified, they mean that they designed a
formal correspondence between the (1) semantics of the LLVM virtual machine
language and (2) the actual execution of the underlying CPU (in this case,
modeled using actual assembly language). Essentially, this amounts to writing
two interpreters (one for LLVM assembly and one for low-level assembly),
writing a compiler to generate low-level assembly from LLVM assembly, and
proving that the interpreter for LLVM assembly does the same thing as the
assembly language interpreter running on the generated code.

The tricky part here is "proving that they do the same thing"; this means
that, for any program written in LLVM, there is a corresponding program in
low-level assembly that has equivalent semantics. Reasoning about all possible
programs requires reasoning over an infinite family of objects; there are a
few tools in mathematics for doing this, but by far the most useful for
reasoning about programming languages is induction. Coq, the tool used in the
linked page, is a very popular tool for writing inductive proofs on
heterogeneous tree structures (aka, an abstract syntax tree).

Ultimately, they use Coq to define the semantics of LLVM by writing (1)
execution rules (for example, adding two integers results in a new int in a
certain register) and, (2) in the case of LLVM (but not low-level assembly),
typing rules which provide certain guarantees themselves. Coq is a proving
environment which provides a functional dependently-typed programming language
which can be used to construct and verify constructive proof arguments about
programs. For example, you can prove existence (that there always exists an
assembly language program for every LLVM program), or type soundness (that an
LLVM program that is well-typed never performs an illegal operation according
to its type semantics).

Ultimately, the value of proving this correspondence is that we know that LLVM
has (1) equivalent expressive power to low-level assembly and (2) that,
barring any bugs in the way that a CPU executes its assembly, the generated
LLVM code will perform the same operations as the assembly language that it
generates.

Edit: As some follow-up posts point out, this particular project is more
concerned with verifying intra-LLVM transformations (i.e., LLVM to LLVM). This
is quite different from what I described; my post describes a verified
compiler, similar to CompCERT.

~~~
CalChris
> In this case, when they say LLVM is verified, they mean that they designed a
> formal correspondence between the (1) semantics of the LLVM virtual machine
> language and (2) the actual execution of the underlying CPU (in this case,
> modeled using actual assembly language).

I don't think that's accurate. Vellvm is verifying _transformations_ of IR
rather than _executions_ on any given CPU.

~~~
piemonkey
You are right, perhaps my original post was unclear. The goal is to verify the
transformation between LLVM and low-level assembly. I embellished slightly by
equating the semantics of low-level assembly with CPU executions.

~~~
CalChris
I'm pretty sure they're talking about IR to IR transformations (e.g. high
level IR optimizations) rather than IR to low level assembly transformations
(code generator backends). The former is grad school stuff; the latter is
hard.

------
klinskyc
Zdancewic's an amazing teacher and lecturer who I was lucky enough to have for
an intro CIS course. Cool to see him on here

