
Can JIT be Faster? - aespinoza
http://tirania.org/blog/archive/2012/Apr-04.html
======
ubernostrum
Both Herb and Miguel seem to be writing on a flawed assumption, which is that
JIT will only ever apply optimizations that can be proven safe. As I
understand it, this is true for CLR implementations. And if that were the case
generally, there would be no significant-enough difference between the set of
optimizations available with AOT and the set of optimizations available with
JIT, and JIT would never be "as fast as C".

But this disregards one of the major selling points of profiling + optimizing
JIT: you can apply optimizations that cannot be proven safe _and you can get
away with it_. This means you can not only be "as fast as C", you can
implement a JIT that will mop the floor with C.

Good JITs are already taking advantage of this, and the very first comment on
Miguel's article points this out. This (four-year-old!) article by Charles
Nutter explains it a bit more in the context of the JVM:

<http://headius.blogspot.com/2008/05/power-of-jvm.html>

~~~
vilya
I keep on hearing that it's _possible_ to write a JIT-based vm that's "faster
than C". I have yet to actually see one where that is true for anything other
than one or two very specific micro-benchmarks.

Edit: the old C2 wiki actually has something useful to say about this.

<http://c2.com/cgi/wiki?AsFastAsCee>

<http://c2.com/cgi/wiki?SufficientlySmartVirtualMachine>

~~~
DannyBee
It's actually not that difficult to find programs where a JIT is better.
Things that benefit _significantly_ from profile directed feedback often run
faster in JIT than static compilers, because JIT's tend to do this
dynamically, whereas static compilers require training runs.

For example performance critical code that has a lot of conditional branches,
but only a few are taken at runtime, will often work faster in a JIT.

(Of course, the typical solution is not to write performance critical code
with a large number of branches, i'm just giving you examples)

~~~
vilya
Sure, but even in those cases the saving has to win back the overall cost of
the runtime profiling and dynamic compilation. The crossover point seems to be
high enough in the graph that it's not reached very often in practice -
especially since, as you say, programmers tend to learn early on that hoisting
conditions out of inner loops Makes Code Faster.

~~~
stcredzero
_The crossover point seems to be high enough in the graph that it's not
reached very often in practice_

I remember a study posted here a few months back that supported the notion
that most web apps have a certain "steady state" that they reach after they
start up, where almost all the types of variables are known, and most of the
funkier "dynamic" changes don't happen. If one can get a JIT to remember the
tracing information from session to session, it should be able to run as fast
as or faster than C.

~~~
vidarh
But if they reach a steady state, then profile guided optimization can be used
to make an AOT C compiler generate faster code too.

~~~
stcredzero
True, but let's say the implementation Lang in question has optional types.
Consider the unfilled type information to be technical debt. If the tracing
information can be exploited to automatically fill in the type info, then this
is technical debt that magically repays itself. That's something startups
should be interested in.

------
fleitz
Yes and No.

If you bundle JIT with garbage collection, bounds checking, and all the usual
things that come with the JVM/CLR and execute on current hardware, probably
not.

If you were to JIT a language like C it would probably be faster as you could
optimize to specific hardware. OpenCL uses JIT for example and it's many times
faster than static compilation (because it can take advantage of extra
hardware at runtime).

The issue to contend with is the reality that about 30 years of the industry
have gone into making C fast, everything from compilers to the way CPUs are
designed. There's likely no inherent reason any style of compilation /
execution couldn't beat static compilation, it's just we need the same level
of resources and hardware support for those execution models.

For instance with garbage collection and bounds checking you no longer need an
MMU.

~~~
Someone
Is that "If you were to JIT a language like C it would probably be faster"
true? I think a JIT C compiler would run into trouble with alias detection.
Let's say that it removes an if statement because the variable a in the
"if(a)" that starts it always is false. In C, code like "x->y = 1;" could
affect the value of a, as could _any_ pointer dereference. If the JIT cannot
prove that that could happen, it would have to insert a check "does this
modify a?" for every such statement. I think there will be quite a bit of code
out there where even a very smart JIT would not be able to conclusively prove
that such aliasing does not happen.

Also, I do not accept that OpenCL argument. IMO, OpenCL is more similar to
static compilation than to a JIT. It is as if you recompiled your kernels
whenever you changed your video card (perhaps also when you change its
configuration); it does not do things that a JIT would do, such as "hm, most
of the pixels are black; let's optimize for that case".

~~~
DannyBee
We (compiler optimization folks) have gotten very good at making pointer and
alias analysis fast, particular simple pointer analysis like you might find in
JIT. For a JIT, you would likely write out conservative but correct static
results in whatever the equivalent of "javac" is for your C jit, and then
refine it in the JIT. You'd invalidate it if you saw accesses you can't
account for come up later on.

Even if you didn't want to do that, on demand CFL reachability formulations of
pointer analysis can calculate reasonable pointer results for individual
pointers fast enough if it became important.

Realistically however, no JIT is going to do advanced pointer analysis, unless
you have CPU to burn. As for "conclusively prove that aliasing does not
happen", you don't actually have to, because if it is truly going to improve
performance, you can insert runtime checks.

if (&a == &b) <do super fast thing> else <slower fallback code>

------
DannyBee
The very first comment misunderstands reality. It talks about how you can make
java programs that outrun C++ ones using hotspot. Ignoring, of course, the
fact that nothing really stops you from making a good C++ JIT except
engineering time, and it would do just as well in those cases.

Nobody does it because the cases in which it would help are small, and most of
those folks are willing to do profiling/training runs, and get about as good
results, even if it requires more pain.

As Herb said, the languages were made for different tradeoffs. Speaking as a
compiler guy, yes, you can make almost any of them as fast as each other given
enough time and effort.

But putting time and effort into a JIT may not be as effective as choosing a
different language.

~~~
wisty
> Speaking as a compiler guy, yes, you can make almost any of them as fast as
> each other given enough time and effort. But putting time and effort into a
> JIT may not be as effective as choosing a different language.

So, the most "bang for your buck" may be using a JIT for a language like
Python. Sure, it won't be as fast (at least, not until a huge amount of work
has been done on it), but it can get faster. Trying to beat C + a good C
compiler is ... ambitious.

~~~
jules
But possible: <http://shootout.alioth.debian.org/u64q/fortran.php>

Doing it for Python will be impossibly hard, yes.

~~~
wisty
Yes, fortran is actually faster than C for some things. OK, C++ with the right
templates, or a later C standard can theoretically do what Fortran does
(remind me again - pass by reference, no recursion, and bounded arrays rather
than pointers?). And Intel tends to be faster than gcc.

There's no way you can make broad statements (like "nothing will beat C" or
"fortran is actually faster than C for numerical stuff") without having a few
exceptions. Actually, Pypy is faster than C, in _extremely_ specific
circumstances (i.e. ones which were set up by the Pypy people to show off, see
"PyPy faster than C on a carefully crafted example" ... thought they point out
there are things you could do in C to make it win).

The most bang for you buck in JITs is for dynamic languages, because while
stuff like type inference can exponentially explode, you'll only really have 1
type in many cases (i.e. a float), and the JIT can pick that up. And getting
Python within an order of magnitude of C would be a massive win.

------
rogerbinns
One thing I would like to see is speculative execution. I have an 8 core
machine. When the JIT compiler runs there is no reason why it can't generate 7
alternatives to any particular code sequence, run them all in parallel and
keep whichever was the fastest. Or run both sides of an if statement
simultaneously.

This helps solve the problem that compiler writers often have multiple
alternatives they can generate, but have to make hard decisions as to which to
pick.

I'm not saying that implementing this will be easy, but then again all of the
easy things have already been done.

~~~
wmf
Describing this as not easy is a bit of an understatement; there has been a
lot of research into speculative threading and communication overhead between
cores almost always kills it.

~~~
rogerbinns
The main hurdle is getting a block of execution large enough to make it
worthwhile trying alternatives and absorbing the cost of setup and finish. I
suspect some form of STM would help a lot since you could then run multiple
threads of code and only "commit" the winner. That allows a larger execution
block.

I also think software people still have a innate fear of "wasting"
CPU/memory/disk which is hard to overcome.

~~~
maxs
One example of where this is used in practice is the fftw library (fastest
Fourier Transform in the west). At runtime you can run a trial of FFT
algorithms for your data size and architecture, and it will pick the fastest.

~~~
rogerbinns
It is a pain with FFTW since they expose all this in the API, let you
save/load the details etc. A better example is MongoDB. When you make a query
they try multiple query plans concurrently, note which is fastest and use that
in the future with continued performance monitoring. Once diverging from
expectations they run the contest again.

Compare with other databases that put an inordinate amount of effort and
tuning into coming up with the one true query plan per query.

------
SeanLuke
> Java HotSpot takes a fascinating approach: they do a quick compilation on
> the first pass, but if the VM detects that a piece of code is being used a
> lot, the VM recompiles the code with all the optimization turned on and then
> they hot-swap the code.

Wait, what? Hotspot does a "quick compile" on the first pass? When did this
happen?

~~~
lucian1900
Some time ago. It has a "pasting" JIT for as a first optimisation stage, which
just pastes together machine code for each opcode, to remove the overhead of
interpreting.

Note that it does just interpret by default.

~~~
SeanLuke
> It has a "pasting" JIT for as a first optimisation stage

Could you point to somewhere on the interweb which discusses this in detail?

------
CurtHagenlocher
One of the performance advantages of C is the sheer amount of detail that's
left to the implementation. By contrast, languages like C# and Java are
typically much more tightly specified, which prevents optimizations that are
not provably correct. This is not a matter of JIT vs non-JIT.

~~~
ubernostrum
The JVM is quite happy to speculatively apply optimizations that are not
provably correct, and has done so for _years_. Have a look at the link in this
comment:

<http://news.ycombinator.com/item?id=3799975>

~~~
CurtHagenlocher
Sure, but if not provably correct then they need to be detectably incorrect.
That's still a higher bar than a valid C implementation needs to pass.

------
yxhuvud
Sigh. Why the hell do people use the Candara font?

Do they want people to have a hard time reading what they write?

~~~
ukreator
There's no accounting for taste. This font looks good for me.

~~~
yxhuvud
It looks nice to me too, if I press ctrl-+ twice. Maybe it is just that the
font doesn't scale well enough for high resolutions.

