It really depends on your approach. This project isn't well suited for performance, because it treats OCaml's bytecode, which is very hard to optimize for performance, because it has been designed to be interpreted by an Abstract Machine, not compiled.
For example, it is stack based, arguments are passed on the stack explicitly when a function call happens. On the other hand LLVM is register based, with function arguments. You then have two choices :
- Either you translate the bytecode very directly, by using an explicit stack (eg. an array of memory). This is the easiest approach, but it produces code that is hard/impossible to properly optimize.
- Either you try to make a model translation, from stack based to register based, and translate every semantic to the LLVM model (for example, pass function arguments as LLVM function arguments instead of putting them in a stack manually). This approach is much more difficult, but promises a lot more potential for optimization.
Another problem is that OCaml's bytecode is untyped. You loose all type information.
We tried both approaches in the Z3 project. The main branch is based on the direct-translation/explicit-stack approach. It is not very fast, and hasn't a lot of promises for going faster.
There are experimental branches based on the translation model. We were able to get much better performance with them on some code. There is a lot of potential for optimization, because, this way, you write LLVM idiomatic code, and you are able to reconstruct some type information, that enables you to do further optimizations. But there are drawbacks :
- It is much more complicated. The ZAM isn't formally specified so you pretty much have to read the code of the VM to understand what's going on. Debugging is horrible.
- Once you're there you have to provide your own Garbage Collector. In the direct-translation approach, we were able to use OCaml's garbage collector directly, because we re-used the interpreter stack. But in this approach, you have at least to provide a way to scan the roots. And once you write LLVM idiomatic bytecode, you realize how ill suited it is for relocating garbage collection.
TLDR : In the end it is just not worth it to optimize this project for performance. A better approach would be to start from scratch and do a real OCaml -> LLVM compiler for ocamlopt, that would be able to use the full AST with type information.
But even if you did that you'd still have to tackle the Garbage Collection issue, that is not an easy one :)
EDIT : To provide more context, this is a very good post by Xavier Leroy on the do-ability of an OCaml->LLVM compiler, and on its interrest (keep in mind Leroy is conservative about that, but it's probably a good thing).