Also, tracing JIT currently does not do multiple CPU optimizations - which static compilers have been doing. They also don't do SIMD or MIMD optimizations, which compilers like gcc/icc/etc are doing.
However, runtime assemblers - like 'Orc' 'corepy' and older ones like 'softwire' are proving to be lots faster in multimedia applications compared to traditional compilers or interpreters. These runtime assemblers give authors of programs access to the compilers/assemblers. Just like allowing people to specify typing, letting people construct programs at runtime can give massive speed-ups.
C has even had fast interpreters for a while - see tinycc as an example. It doesn't use a tracing compiler, but a very straight forward direct translation compiler which happens to be faster than many interpreters.
It's quite easy to argue 'No' to this question I think.
How is tinycc an interpreter? It's a full-blown compiler, alas a very simple one with very little optimization. If it weren't faster than an interpreter, I'd be shocked. For well written code, it reaches 50%-70% of modern gcc speed from tests I did a couple of years ago. I think that's on par, or even slightly better, than gcc 1.x
For example, the pre and post increment operators are expressions with side effects, can be confusing and can provoke undefined behaviour with ease. Having two variants also seems redundant. But the two variants may have different optimal translations to underlying CPU addressing modes, while leaving the pre or post modified value in a register is often a side-effect of the CPU translation. If the language only permitted ++ as a statement, the usefulness of side-effect would be lost.
It's a cliche that C is little more than portable assembler, but at a design level there's a lot of truth to it. The length of the assembly translation of C code is usually pretty directly proportional to the source code token length. In more expressive languages with more powerful abstractions, that's frequently not the case.
[edit: really, it's two sides of the same coin. it's useful, so both c and cisc chips have it, so the two match up...]
Of course this only applies to ephemeral variants of those data structures.
The tracing JIT is such an obvious tactic that it should be part of any dynamic language. Measure which parts are slow, then optimize those parts using the parameter type information gathered at runtime. Fall back on original code on type mismatch. The implementation is hairy, but it's a no-brainer conceptually speaking.
You are probably thinking of LuaJIT 1.x; The recent LuaJIT 2 is tracing JIT, and it beats to hell out of every other dynamic language compiler, bar none.
Mike Pall for president!
* LuaJIT is a tracing JIT
* what are massive performance improvements? i'd say the jump from pure interpreter to static/tracing JIT compilation has already been a massive performance improvement
Perhaps the Go language is the start of such a sidestep. We should be able to offload a lot of the work of type annotation to the IDE and compiler, yielding a compiled language with the dynamic feel of an interpreted one. In fact, for langs with clean runtime models like Lisp, we should be able to allow intermediate states of partial annotation, and allow some interpreted execution for fast prototyping. (But demand full annotation before deployment.)
(EDIT: Yes, I was aware of Haskell when I posted this. Haskell. There I said it!)
Is there a tool that uses a Bayesian technique for suggesting type annotation that cannot be inferred? This would be tremendously useful. (And very dangerous in the hands of the incompetent.)
Example; you have two threads in a producer/consumer setup. The first thread puts "g x" into the queue. The second thread removes this, and applies "f", and puts the result into a result queue. Then a result-processing thread prints those to the file.
You run this, and notice that it doesn't really take advantage of all cores. It seems to run slower on 2 or 3 cores than it does on 1 core, even though "f" and "g" are of roughly equal complexity and the application of each happens in a different thread! WTF?!?
The problem is, of course, that lazy evaluation "helped you". You never needed "g x" in the first thread, so it wasn't evaluated; the thunk was put in the queue. When you applied "f (g x)" in the second thread, you still didn't need the value of either computation, so it was never evaluated. You just put a thunk in the result queue. Finally, in the result-printing thread, you did need the result -- so all the work to evaluate the result was done there; in a single thread.
You realize this, switch to strict queues, and now your app is 2x faster.
(BTW, I simplified the producer/consumer model a bit. Normally you have one thread feeding work into a queue, n threads reading from that queue, doing work, and putting results into a result queue, and then a final result-aggregator thread. The end result would be the same; all the work would be done in that last thread. Slow!)
OCaml isn't lazy-by-default, so reasoning about time/space performance is generally easier. It's worth considering as an alternative to Haskell.
JIT in general means you generate native code at runtime instead of at compile time.
I think the static/dynamic type debate is just a vestige of the state of interpreter/compiler technology in the 80's. I wish everyone would get over it and introspect and debug their knee-jerk reactions so we can get on with the next stage of language implementation.
If you statically type your function as Integer -> Integer because one invocation in a million uses an integer bigger than 2^63, you are taking a speed hit for the other 999,999 calls. But a tracing JIT can just use a machine integer those times, and only fall back to the arbitrary-size Integer when it needs to.
Edit: We appear to be in fierce agreement.
Static/Dynamic typing are no longer mutually exclusive either. We need to get past this idea already.
(I mean we as a field, not present company necessarily.)
The paper "An Efficient Implementation of Self" (http://selflanguage.org/documentation/published/implementati...) is a good overview.
Lua is another language with a great JIT compiler, LuaJIT. There are several papers about Lua (http://www.lua.org/doc/jucs05.pdf, http://www.lua.org/papers.html) and a bunch of good mailing list posts about LuaJIT (e.g. http://article.gmane.org/gmane.comp.lang.lua.general/58908).
There is always the risk of unbounded compilation time as you say... or at least compilation time that runs long enough that the compiled code isn't ready in time to be useful, but empirical evidence of tracing has shown it to be a net win (over interpretation) in every implementation I have seen, so this risk must be small or at least manageable.
* I'm delighted that the LuaJIT port to amd64 is almost done, BTW. I've submitted two threads about it that haven't made it to the front page.
Most systems start of by interpreting the some form of bytecode and observe which parts are executed most frequently and then compile these to machine code.
Now there are two main techniques for choosing what to compile: You can count how often each function/method is executed and then compile a method to machine code when the counter reaches a certain threshold. You can even compile several versions of the function, e.g., one for integers, one for floating point values.
A tracing JIT, on the other hand, does not compile single methods at once, but whole execution paths (traces) throughout the whole program (typically some kind of loop). A trace can span several methods and thus may allow better inter-procedural optimisations. For example if a certain if-branch is taken more frequently the trace will only include the common case and will fall back to the interpreter in the uncommon case.
Tracing JIT is probably more complicated than method-based JIT, because there are issues with code duplication (e.g., several traces may include the same code) and trace exits can be expensive, so you have to efficiently link traces together dynamically. However, trace-based JIT allows more agressive optimisations and, since traces tend to be longer than methods allow more optimisations in general.
For a good paper on how powerful a tracing-JIT can be, see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.36.7...
These are tangentially related to the article topic, but gives you some sense of what tracing does.
Perhaps it's different in the Lua world, but it seems to me like this post is claiming the exact opposite of what the evidence suggests?
Don't get me wrong, here. I think trace trees are a very cool optimization technique and surely have a place in the future of JIT compilers. I just don't think they've “Won” at this point.
2) Tracing JIT's might win for loop-centric languages. They miss some serious opportunities for parallelization in languages with rich collection-oriented operators (functional and array languages).