There's also Yeti ( http://mth.github.com/yeti/ ), an ML for the JVM, though I'm not sure how mature it is yet.
It does need more libraries but it's also extremely interoperable with Java/JVM. Very simple and easy to call out to Java libraries. Also compiles to native JVM classes (can be called easily from Java).
For anyone interested, I managed to write a Mustache implementation last weekend:
However, experience shows that, at least in presence of sufficient RAM and for code that runs long enough, the combination of VM + JIT beats fast machine-code. For sources, see the Big Language Shootout (nowadays, JIT-compiled Java ranks above OCaml and even C in many benchmarks) or the Dynamo papers (JIT-recompiled native C beating optimized native C by about 5% on reference benchmarks).
In addition, having a VM + JIT allows for very nice tricks. For instance, recent versions of MacOS X use LLVM code that is dynamically targeted to either the CPU (generally predictable when you're building the application) or the machine GPU (much harder to predict) depending on available resources. If one thinks in terms of distributed/cloud resources rather than one computer, it also makes sense to target one specific architecture at the latest possible instant (e.g. during deployment, possibly even after work-stealing), to be able to take advantage of heterogeneous architectures. Also, the VM + JIT combination paves the way for extremely powerful debugging primitives, or for late optimizations that just can't be performed during compilation due to lack of information.
So, in the long run, it's quite possible that efficient machine-code will be relegated to niches (e.g. system development), while JIT-ed code will [continue to] take over the world.