

Z3 : An LLVM backed runtime for OCaml - Raphael_Amiard
http://raph-amiard.github.com/Z3/

======
ulber
Bad name: Z3 is an SMT solver from Microsoft and a very good one at that. I'm
assuming this LLVM OCaml thing is newer.

~~~
Raphael_Amiard
I wasn't aware of that. The Z3 name was imposed as part of the university
project this started with.

It was supposed to be part of the VMKit suite of compilers, J3 being the Java
bytecode compiler and N3 being the .Net compiler.

I'd be happy to change the name of this project if it ever gains enough
traction to cause confusion in any way :)

~~~
chwahoo
To backup the original poster, one of OCaml's stronger communities focuses on
program analysis for C programs (using the CIL library
<http://www.cs.berkeley.edu/~necula/cil/>) and SMT solvers like Z3 are also
widely used in that community, so there's definitely some potential for
confusion if your OCaml backend gains some traction.

Still, an LLVM backend for OCaml seems like a great idea. Keep up the good
work.

------
pjmlp
C++ instead of OCaml?!

Specially taking into consideration how much better OCaml is suited for
compiler development and the existence of LLVM bindings for OCaml.

~~~
Raphael_Amiard
Well the use of C++ instead of OCaml allowed us a certain number of things :

\- Most notably reading OCaml's header files, and hence using the correct,
macro declared data types in LLVM code generation. This could have been done
with an hybrid of C and OCaml, but was much simpler in OCaml. \- Also using
OCaml runtime to read the bytecode. This could have been done in OCaml, but
would have been more tedious.

Also you have to consider this project isn't treating an AST per se (the thing
at which ML languages excels, by the use of sum types) but a bytecode array,
which is flat in structure. We put back some structure in it, but it is
predictable and not as complex as an AST.

~~~
pjmlp
Don't take it personally.

I am all for using the right tool for the job, and I like C++ (except for a
few things like lack of modules. :) ).

For an outsider which loves ML languages, OCaml still looks it could have been
a better solution.

This is like doing bootstrap compilers, sometimes the efforts of such tasks
lead to improvements on the language eco-system.

Then again, this is about using what you thought was the best tool at hand, so
don't take it badly for my negative comment and good luck for the project.

~~~
Raphael_Amiard
I'm not :) I'm just providing some context.

This project has more fundamental issues in my opinion if you want to consider
it as a real replacement for ocamlopt, as i explained here
<http://news.ycombinator.com/item?id=4798320>. It is not, just a replacement
for OCaml's interpreter, and as such it is a very simple project and will
probably remains so. I don't plan to make it evolve beyond bug-fixing and
maintaining it.

As such, if i had to re-code it in another language, it would be in C, to make
it integrable into OCaml's runtime easily. Probably not OCaml, because its
expressive power is not needed.

If i had to start from scratch, i would do an LLVM backend for the ocamlopt
compiler. And i would do it in OCaml of course :)

------
pascal_cuoq
Just checking that you are aware of OCamlCC. If you aren't, good news! You
will have plenty of notes you can exchange with Benoît.

<http://oud.ocaml.org/2012/abstracts/oud2012-final10.pdf>

------
andrewcooke
how close to ocamlopt do you think you can get? (ie do you have any handle on
what the returns would be for further work on this? what are the main limiting
factors?)

~~~
Raphael_Amiard
Ok brace yourself, this is gonna be a long answer :) Go down for the TLDR if
you don't want to read everything.

It really depends on your approach. This project isn't well suited for
performance, because it treats OCaml's _bytecode_ , which is very hard to
optimize for performance, because it has been designed to be interpreted by an
Abstract Machine, not compiled.

For example, it is stack based, arguments are passed on the stack explicitly
when a function call happens. On the other hand LLVM is register based, with
function arguments. You then have two choices :

\- Either you translate the bytecode very directly, by using an explicit stack
(eg. an array of memory). This is the easiest approach, but it produces code
that is hard/impossible to properly optimize. \- Either you try to make a
model translation, from stack based to register based, and translate every
semantic to the LLVM model (for example, pass function arguments as LLVM
function arguments instead of putting them in a stack manually). This approach
is _much_ more difficult, but promises a lot more potential for optimization.

Another problem is that OCaml's bytecode is untyped. You loose all type
information.

We tried both approaches in the Z3 project. The main branch is based on the
direct-translation/explicit-stack approach. It is not very fast, and hasn't a
lot of promises for going faster.

There are experimental branches based on the translation model. We were able
to get much better performance with them on some code. There is a lot of
potential for optimization, because, this way, you write LLVM idiomatic code,
and you are able to reconstruct some type information, that enables you to do
further optimizations. But there are drawbacks :

\- It is much more complicated. The ZAM isn't formally specified so you pretty
much have to read the code of the VM to understand what's going on. Debugging
is horrible.

\- Once you're there you have to provide your own Garbage Collector. In the
direct-translation approach, we were able to use OCaml's garbage collector
directly, because we re-used the interpreter stack. But in this approach, you
have at least to provide a way to scan the roots. And once you write LLVM
idiomatic bytecode, you realize how ill suited it is for relocating garbage
collection.

TLDR : In the end it is just not worth it to optimize this project for
performance. A better approach would be to start from scratch and do a real
OCaml -> LLVM compiler for ocamlopt, that would be able to use the full AST
with type information.

But even if you did that you'd still have to tackle the Garbage Collection
issue, that is not an easy one :)

EDIT : To provide more context, this is a very good post by Xavier Leroy on
the do-ability of an OCaml->LLVM compiler, and on its interrest (keep in mind
Leroy is conservative about that, but it's probably a good thing).

[http://caml.inria.fr/pub/ml-archives/caml-
list/2009/03/3a77b...](http://caml.inria.fr/pub/ml-archives/caml-
list/2009/03/3a77bfcca0f90b763d127d1581d6a2f1.en.html)

~~~
andrewcooke
thanks. stuff like this is what makes hn still worthwhile.

