Hacker News new | comments | ask | show | jobs | submit login
Have Tracing JIT Compilers Won? (lambda-the-ultimate.org)
76 points by nex3 on Mar 10, 2010 | hide | past | web | favorite | 39 comments

Ahead of time compilers do work quite well for a subset of dynamic languages. Just like tracing JIT works well for a subset. This has been proven by projects like shedskin, cython, and tinypyC++. Given typing information, either explicitly or through type inference - massive speed-ups are possible.

Also, tracing JIT currently does not do multiple CPU optimizations - which static compilers have been doing. They also don't do SIMD or MIMD optimizations, which compilers like gcc/icc/etc are doing.

However, runtime assemblers - like 'Orc' 'corepy' and older ones like 'softwire' are proving to be lots faster in multimedia applications compared to traditional compilers or interpreters. These runtime assemblers give authors of programs access to the compilers/assemblers. Just like allowing people to specify typing, letting people construct programs at runtime can give massive speed-ups.

C has even had fast interpreters for a while - see tinycc as an example. It doesn't use a tracing compiler, but a very straight forward direct translation compiler which happens to be faster than many interpreters.

It's quite easy to argue 'No' to this question I think.

> C has even had fast interpreters for a while - see tinycc as an example. It doesn't use a tracing compiler, but a very straight forward direct translation compiler which happens to be faster than many interpreters.

How is tinycc an interpreter? It's a full-blown compiler, alas a very simple one with very little optimization. If it weren't faster than an interpreter, I'd be shocked. For well written code, it reaches 50%-70% of modern gcc speed from tests I did a couple of years ago. I think that's on par, or even slightly better, than gcc 1.x

It helps that C is designed to have an almost one to one mapping between its native operators and machine operations and addressing modes.

For example, the pre and post increment operators are expressions with side effects, can be confusing and can provoke undefined behaviour with ease. Having two variants also seems redundant. But the two variants may have different optimal translations to underlying CPU addressing modes, while leaving the pre or post modified value in a register is often a side-effect of the CPU translation. If the language only permitted ++ as a statement, the usefulness of side-effect would be lost.

It's a cliche that C is little more than portable assembler, but at a design level there's a lot of truth to it. The length of the assembly translation of C code is usually pretty directly proportional to the source code token length. In more expressive languages with more powerful abstractions, that's frequently not the case.

not arguing with your general point, but the difference between pre and post increment is very useful when writing things like stacks and queues with pointers. these days maybe people use libraries or templates, but i suspect they were introduced for ease of programming, not ease of translation.

[edit: really, it's two sides of the same coin. it's useful, so both c and cisc chips have it, so the two match up...]

> not arguing with your general point, but the difference between pre and post increment is very useful when writing things like stacks and queues with pointers.

Of course this only applies to ephemeral variants of those data structures.

"Will a more static style JIT like Google's V8 be able to keep up?"

V8 compiles everything to assembler and does not do JIT. V8 seems to beat TraceMonkey on both CPU and memory benchmarks ( http://shootout.alioth.debian.org/u64/benchmark.php?test=all... ), so I doubt their approach has "lost". They probably do this because most JavaScript programs are very small and it's relatively cheap to compile everything to assembler from the start. They do their optimizations on assembler level as well - instead of doing it at bytecode level as it's done in most other VMs. This is at least the information I gathered when I looked at V8 last time - it was a year ago, so things might have changed.

Just because V8 doesn't have an IR doesn't mean it's not a JIT. V8 still has to generate assembly throughout the execution of a javascript program (to create new hidden classes, patch the inline cache, etc...). See: http://code.google.com/apis/v8/design.html

I'd say that for languages such as Python or Javascript tracing JIT compilers are the only viable approach. Because the languages allow anything you can't make any sensible predictions about types and call graphs at compile time (people tried; didn't work well). The incremental-JIT approach (LuaJIT) has less information to work with, and it's not aware which parts of the program are bottlenecks, so it can't intelligently decide which parts parts need optimization most.

The tracing JIT is such an obvious tactic that it should be part of any dynamic language. Measure which parts are slow, then optimize those parts using the parameter type information gathered at runtime. Fall back on original code on type mismatch. The implementation is hairy, but it's a no-brainer conceptually speaking.

I suspect that dynamic languages will evolve over the next 10 years to allow for really efficient compilers, but for the current languages such as Javascript I doubt we're ever going to see massive performance improvements.

> The incremental-JIT approach (LuaJIT) has less information to work with, and it's not aware which parts of the program are bottlenecks, so it can't intelligently decide which parts parts need optimization most.

You are probably thinking of LuaJIT 1.x; The recent LuaJIT 2 is tracing JIT, and it beats to hell out of every other dynamic language compiler, bar none.

Mike Pall for president!

* the only viable approach? v8 is pretty fast; in my programs, i've seen firefox being faster only once yet (in an optimal case for tracing)

* LuaJIT is a tracing JIT

* what are massive performance improvements? i'd say the jump from pure interpreter to static/tracing JIT compilation has already been a massive performance improvement

With modern IDEs, tracing JIT, and type inference, I'm surprised that someone hasn't made the same lateral thinking sidestep that Hennessey & Patterson made for RISC. They offloaded much of the decoding work from the chip to the compiler, yielding faster chips in a smaller die size.

Perhaps the Go language is the start of such a sidestep. We should be able to offload a lot of the work of type annotation to the IDE and compiler, yielding a compiled language with the dynamic feel of an interpreted one. In fact, for langs with clean runtime models like Lisp, we should be able to allow intermediate states of partial annotation, and allow some interpreted execution for fast prototyping. (But demand full annotation before deployment.)

(EDIT: Yes, I was aware of Haskell when I posted this. Haskell. There I said it!)

Is there a tool that uses a Bayesian technique for suggesting type annotation that cannot be inferred? This would be tremendously useful. (And very dangerous in the hands of the incompetent.)

You have just described Haskell. Annotate types if you want, let the compiler infer otherwise. Very dynamic feeling, but much safer. And you're right, it can make for some very fast code.

I've always got the impression that Haskell's lazy properties makes it hard to reason about code performance. You can force high-performance code, but that requires circumventing the beauty of the language. Has this problem been solved?

The "reasoning about performance" is more of a trap for the unwary than a fundamental problem.

Example; you have two threads in a producer/consumer setup. The first thread puts "g x" into the queue. The second thread removes this, and applies "f", and puts the result into a result queue. Then a result-processing thread prints those to the file.

You run this, and notice that it doesn't really take advantage of all cores. It seems to run slower on 2 or 3 cores than it does on 1 core, even though "f" and "g" are of roughly equal complexity and the application of each happens in a different thread! WTF?!?

The problem is, of course, that lazy evaluation "helped you". You never needed "g x" in the first thread, so it wasn't evaluated; the thunk was put in the queue. When you applied "f (g x)" in the second thread, you still didn't need the value of either computation, so it was never evaluated. You just put a thunk in the result queue. Finally, in the result-printing thread, you did need the result -- so all the work to evaluate the result was done there; in a single thread.

You realize this, switch to strict queues, and now your app is 2x faster.

(BTW, I simplified the producer/consumer model a bit. Normally you have one thread feeding work into a queue, n threads reading from that queue, doing work, and putting results into a result queue, and then a final result-aggregator thread. The end result would be the same; all the work would be done in that last thread. Slow!)

OCaml has type inference too, though it doesn't have Haskell's typeclasses, which are very nice. (You can get much of the same functionality with functors (parametric modules), but it's not as straightforward.)

OCaml isn't lazy-by-default, so reasoning about time/space performance is generally easier. It's worth considering as an alternative to Haskell.

Not Haskell. Something more permissive.

I'd like this for a Smalltalk style OO environment. Strongtalk got partway there.

Can somebody please explain what a tracing JIT is?

The runtime watches the code run, and if it notices that a specific type is in use in a certain code section, it rewrites the generated code to use that type when possible.

For example, if you want to add all the integers from 1 to 2^31, the JIT will eventually realize that the accumulator can be a register (with a machine int in it) instead of a JavaScript big integer. (If that overflows, it will switch back.) Obviously addition is faster when it's one CPU instruction with no memory loads, so this is a big speed win.

JIT in general means you generate native code at runtime instead of at compile time.

Often, there's more information at runtime than compile time. This may even be true at times for statically typed languages.

I think the static/dynamic type debate is just a vestige of the state of interpreter/compiler technology in the 80's. I wish everyone would get over it and introspect and debug their knee-jerk reactions so we can get on with the next stage of language implementation.

Sure, because static typing doesn't mean you picked the best type, it just means that you picked a type.

If you statically type your function as Integer -> Integer because one invocation in a million uses an integer bigger than 2^63, you are taking a speed hit for the other 999,999 calls. But a tracing JIT can just use a machine integer those times, and only fall back to the arbitrary-size Integer when it needs to.

I think you're only considering static typing as an efficiency benefit. I'm equally as interested in its program-correctness benefits.

Static typing and JIT compiling aren't mutually exclusive. You can do all the static analysis, checking, whole-program optimization, etc. that's feasible at compile time, compile to bytecode, and then do JIT compilation at runtime.

Edit: We appear to be in fierce agreement.

Static typing and JIT compiling aren't mutually exclusive, though

Static/Dynamic typing are no longer mutually exclusive either. We need to get past this idea already.

(I mean we as a field, not present company necessarily.)

Exactly! I thought that was implicit in what I said. Maybe in the future static-typing's benefit would only be for program correctness.

Strongtalk. Google. Read. Been done.

Such as, for example, Java. Java probably had the first tracing JIT-based VM outside academia.

That depends on whether you consider the Smalltalk and Self research at Sun and Xeroc PARC to be "academia". Much of the JVM optimization technology came directly from those.


The paper "An Efficient Implementation of Self" (http://selflanguage.org/documentation/published/implementati...) is a good overview.

Lua is another language with a great JIT compiler, LuaJIT. There are several papers about Lua (http://www.lua.org/doc/jucs05.pdf, http://www.lua.org/papers.html) and a bunch of good mailing list posts about LuaJIT (e.g. http://article.gmane.org/gmane.comp.lang.lua.general/58908).

I was thinking Strongtalk + Tracing + Type inference.

Sure. (I was replying more directly to, "Java probably had the first tracing JIT-based VM outside academia.".)

One important issue is that compilation often needs to happen in bounded time. A JIT-compiler that does overly elaborate optimizations at runtime may lead to excessive pauses, and sometimes that's a dealbreaker. (Some of the same trade-offs appear with non-incremental garbage collection.)

Many tracing JIT implementations do the compilation in a separate thread from the running VM, only providing the compiled code when it's ready. This reduces the overhead considerably, and should pretty much prevent any pauses in execution (assuming a CPU that can reasonably handle multiple threads).

There is always the risk of unbounded compilation time as you say... or at least compilation time that runs long enough that the compiled code isn't ready in time to be useful, but empirical evidence of tracing has shown it to be a net win (over interpretation) in every implementation I have seen, so this risk must be small or at least manageable.

I'm not in any way saying that it rules out JIT compilation* , just that it's a a constraint that may not have occurred to people, and one that gives it a different (often complementary) focus than static analysis.

* I'm delighted that the LuaJIT port to amd64 is almost done, BTW. I've submitted two threads about it that haven't made it to the front page.

JIT simply stands for Just-in-time compilation or optimisation, that is you delay optimising parts of your program until the program executes.

Most systems start of by interpreting the some form of bytecode and observe which parts are executed most frequently and then compile these to machine code.

Now there are two main techniques for choosing what to compile: You can count how often each function/method is executed and then compile a method to machine code when the counter reaches a certain threshold. You can even compile several versions of the function, e.g., one for integers, one for floating point values.

A tracing JIT, on the other hand, does not compile single methods at once, but whole execution paths (traces) throughout the whole program (typically some kind of loop). A trace can span several methods and thus may allow better inter-procedural optimisations. For example if a certain if-branch is taken more frequently the trace will only include the common case and will fall back to the interpreter in the uncommon case.

Tracing JIT is probably more complicated than method-based JIT, because there are issues with code duplication (e.g., several traces may include the same code) and trace exits can be expensive, so you have to efficiently link traces together dynamically. However, trace-based JIT allows more agressive optimisations and, since traces tend to be longer than methods allow more optimisations in general.

For a good paper on how powerful a tracing-JIT can be, see http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

Here are some articles from Mozilla explaining their Javascript tracing:




These are tangentially related to the article topic, but gives you some sense of what tracing does.

I'm confused. In the javascript space, it seems like V8 and squirrelfish (or is it Nitro? I prefer SF) have soundly trounced tracemonkey. Last time I checked, Squirrelfish isn't a tracing JIT centric implementation (nor is V8). I've yet to see a ttJIT javascript engine that hasn't been soundly trounced by the more conventional approaches.

Perhaps it's different in the Lua world, but it seems to me like this post is claiming the exact opposite of what the evidence suggests?

Don't get me wrong, here. I think trace trees are a very cool optimization technique and surely have a place in the future of JIT compilers. I just don't think they've “Won” at this point.

Not to mention I don't think any pure approach will ever "win". To date, we've used a lot of different compilation techniques for wholly different use cases, as well as together for different parts of compilation where one makes more sense than another. I don't see any reason why that won't happen here. There's no reason this is an all-or-none proposition.

Smalltalk images are a clear demonstration that it doesn't have to be "all or none." Given a clean runtime model, one should be able to switch the execution engine on the fly, just as Smalltalk lets a Smalltalk app (debugger) act as a bytecode interpreter for the process being debugged, then switches back to JIT machine code when you hit "run".

1) No. See V8.

2) Tracing JIT's might win for loop-centric languages. They miss some serious opportunities for parallelization in languages with rich collection-oriented operators (functional and array languages).

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact