
Invokedynamic in JRuby: Constant Lookup - mattyb
http://blog.headius.com/2011/08/invokedynamic-in-jruby-constant-lookup.html
======
a-priori
I sort of understand how deoptimization works in general but I'm not sure how
the JVM can use those methods to detect when a constant changes. Say you have
code like the following:

    
    
      GRAVITY = 9.81
      def g_force(mass):
        GRAVITY * mass
    

At this point the JVM will aggressively optimize that method to inline the
constant. If you later do:

    
    
      GRAVITY /= 6 # we're on the moon now!
    

This change will affect the values of any computations that depend on the
return value of g_force but should not affect which code paths get executed
down the line. I'm probably missing something about how this mechanism works,
but how would deoptimizing be triggered here?

~~~
headius
I don't know the granularity on which the deoptimization guards get inserted,
but the basic idea would be that if the constant changes, any code running
that depended on the value of that constant will have to deoptimize.

In your example, we would never enter the optimized code path without knowing
that the constant's still the same as last time. Since no loop is emitted,
there's no deoptimization mid-loop required. So we either run the optimized
code, or we never get to the optimized code.

If a deopt is required mid-loop, Hotspot inserts a "trap" that would see the
deopt command and immediately branch to the interpreter with the current
state.

Of course there's limits to how far optimization can go, but having a way to
tell Hotspot that there's a known-immutable constant object reference at this
point in the code opens up a lot of opportunities.

------
equark
Can somebody explain to me why the JRuby crowd think invokedynamic will lead
to massive performance gains even though IronPython/IronRuby have had similar
(to my understanding) capabilities for a long time but not seen massive gains?

~~~
headius
I assume you're referring to the Dynamic Language Runtime (DLR). DLR does
nothing even close to invokedynamic for optimization.

The best you can do with DLR is regenerate little dispatch stubs at each
dynamic call site, to at least avoid the overhead of re-searching the class
hierarchy. These stubs are then reinserted at the dispatch point and used to
make the call.

Unfortunately, since all current CLR implementations do _not_ optimize code at
runtime, these newly-created stubs can't be optimized with the rest of the
code. So you get a call from the code to the stub, a few guards, and another
dispatch from the stub to the target. Those three pieces never optimize
together.

invokedynamic is also very useful for optimizing things other than method
dispatch. In JRuby master, we're using invokedynamic for constant lookup (as
illustrated in this post) and for lazily creating literal values. In both
cases, it reduces the access to a single memory hit, much faster than what
JRuby 1.6 or any DLR languages can do.

tl;dr: DLR is the best you can do for dynamic dispatch optimization at an API
level atop current CLR impls and subject to limitations therein. invokedynamic
brings true end-to-end dynamic dispatch support to the JVM.

~~~
equark
Thanks, this is exactly what I was looking for. DLR != invokedynamic contrary
to my impression of how the DLR operated. I will be interested in see how
JRuby performance improves as a result.

~~~
headius
Don't get me wrong, I think the DLR is a great piece of work. It's just
limited in how much it can optimize dynamic dispatch since CLR itself can't
dynamically optimize.

As far as tooling for building languages, DLR is pretty epic. It's too bad
Microsoft decided to bail on dynamic language work.

------
brimpa
The first commenter from the blog got it right

> Seems unhelpful to compare benchmarks where one shows the work completely
> optimized away.

The author doesn't offer any benchmarks of (something resembling) real code.

~~~
headius
It's mostly lucky (or unlucky) that the constant benchmark now optimizes away
to nothing. As I mentioned in the comments (replying to that commenter), my
point was to show that the overhead of constant lookup is now nearly zero, so
if the values aren't used they won't be accessed. The actual work done for
constants that don't disappear would be roughly equivalent to a simple memory
access...very fast indeed.

I will probably do future posts with less synthetic benchmarks, but it's
ultimately just hard to benchmark a specific language feature in the presence
of optimizations like the JVM performs.

