
Two paths, one peak: a view on high-performance language implementations - mr_tyzic
https://wingolog.org/archives/2015/11/03/two-paths-one-peak-a-view-from-below-on-high-performance-language-implementations
======
vvanders
AoT, JIT, bytecode, none of this matters if you can't control your data layout
and access patterns. All this talk of performance and still not a mention of
cache misses anywhere.

There seems to be a desire for this magic inliner and compiler that fixes all
your performance problems when it just doesn't work like that.

Until you understand your data and have complete control over it none of this
other stuff matters.

~~~
_delirium
You have it backwards; cache effects only start to matter once you have the
other things right. For example, being able to transform boxed to unboxed
arithmetic, or to optimize out function calls from tight loops, will, at least
in most cases, give you a bigger performance boost than rearranging data for
cache-friendliness will.

~~~
vvanders
I dunno, depending on your architecture you're looking at 200-400 cycles per
cache miss, that's a lot of function calls.

~~~
Someone
Many function calls are a cache miss.

That doesn't matter much if each function call accesses thousands of memory
locations, but if you have many small functions such as calls to (un)box
objects or property accessors, or small closures you pass to a function,
getting rid of the function calls by inlining code makes a huge difference.

------
Veedrac
If "template JIT" means what I think it does, this is basically what you get
from Cython[1] or, later, Nuitka[2], only they're "template AOT"s.

Sadly, it buys you sorely little. CPython's bytecodes are "big", so very
little time is spent between them relative to that spent inside them. The
motivation to do anything smarter here is lacking.

[1] [http://cython.org/](http://cython.org/)

[2] [http://nuitka.net/](http://nuitka.net/)

------
nickpsecurity
The Leroy paper in the bottom of the comments was worth it.

