
Zero-Overhead Metaprogramming - adamnemecek
http://stefan-marr.de/papers/pldi-marr-et-al-zero-overhead-metaprogramming/
======
pavpanchekha
I saw this talk at FCRC/PLDI. It's both very impressive and very well-
presented. What follows is what I gathered from the talk; I haven't read the
paper but it's on my reading list. The basic idea is that the old notion of a
polymorphic inline cache can be nicely generalized to erase away
metaprogramming like method_missing or descriptors.

PIC is a way to speed up method dispatch. Normally, if you call a method foo
in your program, but you don't know the class of the object you're calling foo
on, you need to go look up the appropriate code to run in a virtual table or
similar data structure. This is slow, because you might do this lookup very
often. PIC means caching this lookup in the code that calls foo—basically
adding an instruction like

    
    
        if (obj.class == CACHED_CLASS) CACHED_CODE()
    

The reason this helps is that we can do the test quickly, and the branch
predictor will predict it very well; then our JIT can inline CACHED_CODE,
perhaps optimize it against the other code nearby, and now we have nearly-
optimal machine code. After "warm-up", in other words, we've dynamically
inferred the type of obj and have optimized our machine code to take that into
account.

Now on to meta-programming. Suppose instead of calling a method on an object
we are instead instantiating a class object which is dynamically determined.
PIC isn't exactly applicable, because it's not like the type of the class
object is in question (the class object's class is probably Class or
something). Instead, this paper proposes caching the identity of the class
object, and the caching the initialization code and optimizing around it. You
could do the same for things that override method_missing and similar, to
cache all the decisions those overrides are making and then allowing the JIT
to inline the code.

Their evaluation suggests that this method works _really_ well, almost
eliminating all of the overhead from meta-programming. This isn't too
surprising: we know that even dynamic features are usually used in very
"static" ways, that JITs are really good at finding these ways, and that after
warming up JITs can usually optimize away all that dynamism to get very fast
code.

I am kicking myself for never having thought of this, despite having read
several of the PIC papers. I am not a big believer in meta-programming, but
this work is simple, extremely useful in practice, and also deepens our
understanding of a technique both theoretically and practically.

~~~
ezy
I understand the PIC technique, but I'm having a hard time understanding how
this is different from what a trace might do. That is, inline the impl of
method_missing and then optimize the most common path with guards included. Or
is it that I'm being too optimistic about the tracer -- and with a generalized
PIC one isn't relying upon the tracer to get it right, but forcing a cache on
meta-object protocol boundaries?

~~~
smarr
You are right, with a tracer, you get most of that for free. For a meta-object
protocol, you'll need however still a little bit help to avoid creating too
many guards. And, the PICs or `dispatch chains' are very useful in the
interpreter, which avoids enormous warmup penalties.

------
hoverbear
I'm very curious how your workflow is for putting together this paper! I put
together [https://github.com/Hoverbear/acm-pandoc-
paper](https://github.com/Hoverbear/acm-pandoc-paper) for my own purposes but
there is some work to be done.

------
anonymousDan
Meta question - anyone know if this was autogenerated from latex, and if so
how?

~~~
smarr
Not to 100%. I used a combination for tex4ht, tex4ht configuration, several
hacky post processing scripts, and manual editing. Converting a Latex document
to a nice HTML site is easily the worst part of the whole writing process :-/

~~~
anonymousDan
Looks great. Out of interest, what were the main things that tex4ht doesn't
give you out of the box?

~~~
smarr
Clean HTML (no superfluous tags), proper links to references (the rendering
splits the links between author and year), ligatures are broken in browsers,
standard footnotes are insensible for HTML, the standard HTML used for
headers, figures, and others wasn't to my liking.

~~~
jacobolus
One recommendation: use ragged-right rather than justified text in the HTML
version. Because browsers use a line-by-line text composer instead of full-
paragraph composition, and don’t do any hyphenation, fully justified text
looks terrible on web pages, especially in relatively narrow columns.

In the case of this specific document, there’s all kinds of nasty gappy
whitespace in the webpage version.

It’s made worse by the spaces in “et al. 19XX” getting turned into non-
breaking spaces. You really want to fix that if you can. For example, look at
the 3rd and 9th line in this screenshot.
[http://i.imgur.com/H6ROnyQ.png](http://i.imgur.com/H6ROnyQ.png)

But even disregarding that problem, many lines end up looking pretty bad.

[Note: don’t read this as personal criticism. The fault lies about 90% with
the browser vendors who care more about saving a few microseconds of compute
time than making legible documents.]

~~~
wakaflockafliz
I didn't notice this ever before, then I read your comment. Now I see it all
over the place...oh man.

------
atilaneves
What about compile-time metaprogramming such as found in D and C++. That's
zero-overhead (discounting possible template bloat) and requires no special
techniques at all. Well, besides knowing how to write compile-time code, that
is.

------
pron
It is my understanding that Graal -- one of the two JITs used in the paper --
will be available for use with "stock OpenJDK HotSpot" as a plugin (as opposed
to a special HotSpot build), starting in Java 9. At least, that's the plan[1].
Is that correct?

[1]: [http://openjdk.java.net/jeps/243](http://openjdk.java.net/jeps/243)

------
cec
Off topic, but does anyone know how the author generated this web version of
the paper? It looks like as though they've automatically generated it from the
LaTeX source of the PDF. It looks brilliant, and much easier to read in the
browser.

------
malkia
Haven't read it fully, but the first thing that popped in my head - wouldn't
Apple benefit from this? There was one very detailed article on how their
object method lookup changed over the years, but can't find it right now
(Objective C).

~~~
chrisseaton
I'm not an expert in Objective C but I believe that all dispatch is done by a
library call that accepts the receiver, the message name, and the arguments.
This means that there isn't any place to store per-call-site data, such as the
kind of caches used in this paper.

------
pron
Why does the partial evaluation approach result in larger performance variance
than meta-tracing in the reported results?

~~~
smarr
Yes, that's indeed a none-obvious issue, and looks rather strange on the
graph.

That's not the 'partial evaluation' per se. Instead, it is the difference
between RPython and HotSpot that surfaces here. RPython does currently
generated singled threaded VMs, and neither the GC nor the compilation are
done in separate threads. The HotSpot JVM however can do those things in
parallel and additionally has other infrastructure threads running. In the
end, this increases the likelihood that the OS reschedules the application
thread, which becomes visible as jitter.

~~~
pron
Oh, makes sense :) Is that mentioned in the paper and I missed it? If not, it
might deserve a footnote, as the difference is glaring.

BTW, would Graal really become available as a stock-HotSpot plugin in Java 9
thanks to JEP 243? I see things are ready on Graal's end[1], but are they on
HotSpot's?

[1]:
[https://bugs.openjdk.java.net/browse/GRAAL-49](https://bugs.openjdk.java.net/browse/GRAAL-49)

