
“TurboFan” – Experimental new optimizing compiler for Google's V8 JS engine - jonrimmer
https://groups.google.com/forum/#!msg/v8-dev/ab8V5Z58_70/5-05DvysCt8J
======
cpeterso
> We've reached the point where it's better for our development velocity to
> work in the open on the bleeding edge branch of V8. That's also better for
> our collaborators who are working on ports to more platforms.

I wonder who are the collaborators with access to Google's private V8 repo and
what platforms they're porting to. If merging TurboFan to the open repo didn't
reveal their partners' proprietary plans now, then why not develop in the open
sooner?

~~~
azakai
I interpreted that differently - that Google's private, secret v8 repo is
hidden from their partners as well. Therefore they decided to unveil the work
publicly so that their partners can access it, start to work to port turbofan
to the other platforms, and so forth.

As to who the partners are, you can see commits from Intel adding x87 support
and Imagination doing MIPS, for example.

As to why not develop in the open sooner, good question - this is the second
time v8 does this, after CrankShaft (third time if you count the initial
unveiling of chrome and v8). Maybe it's just how they work.

~~~
admax88q
x87 isn't a platform, its just the floating point parts of x86

~~~
mraleph
in the context of V8 "x87-platform" means "x87-only for floating point" as
opposed to normal ia32 platform where V8 assumes SSE2 is present. It's
implemented as a separate platform port, not as a bunch of if's inside the
ia32 port.

------
bhouston
It would be nice to understand the goals of this project in terms of both
approach as well as the overall performance improvements and how it would
compare with the new LLVM-based approach taken by Safari/Webkit.

~~~
azakai
Yes, very curious about this too.

As a first guess, I'm not sure what the v8 strategy is here. The new compiler
seems to use the "sea of nodes" approach as opposed to SSA form. A comparison
of the two is here

[http://static.squarespace.com/static/50030e0ac4aaab8fd03f41b...](http://static.squarespace.com/static/50030e0ac4aaab8fd03f41b7/50030ec0e4b0c0ebbd07b0e0/50030ec0e4b0c0ebbd07b268/1281379125883/)

The "sea of nodes" approach can give some speedups, but they don't appear huge
- 10-20% in that link. Not sure how representative that data is. But it is
interesting that modern compilers, like gcc and LLVM, typically use SSA form
and not the approach v8 is taking, as further evidence that the "sea of nodes"
is not clearly superior.

Perhaps the v8 designers believe there is some special advantage for JS that
the new model provides? Otherwise this seems surprising. But hard to guess as
to such a thing. If anything, JS has lots of possible surprises everywhere,
which makes control flow complex (this can throw, that can cause a deopt or
bailout, etc.), and not the best setting for the new approach.

Furthermore, the "sea of nodes" approach tends to take longer to compile, even
as it emits somewhat better code. Compilation times are already a big concern
in JS engines, more perhaps than any other type of compiler.

Perhaps v8 intends to keep crankshaft, and have turbofan as a third tier
(baseline-crankshaft-turbofan)? That would let it only run the slower turbofan
when justified. But that seems like a path that is hard to maintain - 2
register allocators, etc., - and turbofan seems like in part a cleanup of the
crankshaft codebase (no large code duplications anymore, etc.), not a parallel
addition.

Overall the Safari and Firefox strategies make sense to me: Safari pushes the
limits by using LLVM as the final compiler backend, and Firefox aside from
general improvements has also focused efforts on particular aspects of code or
code styles, like float32 and asm.js. Both of those strategies have been
proven to be very successful. I don't see, at first glance, what Chrome is
planning here. However, the codebase has some intriguing TODOs, so maybe the
cool stuff is yet to appear.

~~~
rayiner
The "sea of nodes" approach is just a data structure for representing a
program in SSA form. It's orthogonal to anything that has an impact on speed.
E.g. GCC uses a tree representation, LLVM uses a CFG, and Hotspot (C2 and
Graal) uses a "sea of nodes" representation, but they all represent code in
SSA form and that representation is orthogonal to the quality of particular
optimizations implemented within the framework.

The speedup reported in that paper is from running constant propagation and
dead code elimination at the same time instead of doing them separately, which
finds more constants and dead code because the two problems are coupled. The
same process can be implemented in a more traditional CFG representation (and
generally is--sparse conditional constant propagation).

~~~
rayiner
Too late to edit this, but I should clarify: "data structure" is probably not
the right word. To be more precise, "SSA form" is a property of variables in a
program IR. It means variables are assigned only once, that defs dominate
uses, and value flow is merged at control flow merge points with phi nodes.
You can have different program representations that all represent values in
SSA form, but differ in how they represent other things. Where the "sea of
nodes" representation differs is that it explicitly represents control
dependencies. In LLVM, you always have a control flow graph, with basic blocks
and edges between them. Control dependencies between instructions are implicit
from their placement in particular basic blocks. In a "sea of nodes" IR, there
are no basic blocks.[1] Control dependencies are represented explicitly with
control inputs to nodes, just as data dependencies are represented explicitly
with data inputs.

This makes certain things easier in a "sea of nodes" IR. Normally, during
optimization you don't have to worry about maintaining a legal schedule of
instructions within and between the basic blocks. You just have to respect the
control dependencies. However, in order to get executable code you have to
impose a schedule on the nodes, whereas with a more conventional CFG IR, you
already have a schedule in the form of the basic blocks and the ordering
within them.

[1] See Section 2.2-2.4 of Click's paper:
[http://paperhub.s3.amazonaws.com/24842c95fb1bc5d7c5da2ec735e...](http://paperhub.s3.amazonaws.com/24842c95fb1bc5d7c5da2ec735e106f0.pdf).
His IR replaces basic blocks with Region nodes. The only instructions that
must be linked to Region nodes are ones that inherently have a control
dependency. E.g. an "If" node takes a data input and produces two control
outputs, which can be consumed by region nodes. "Phi" nodes must also have
control inputs, so they can properly associate different control flow paths
with the data values that are merged along those paths.

------
dictum
Here's something that's been bothering me since Chrome dropped the
experimental support for `position: sticky` (and CSS Regions before) and I
didn't find the right place to ask (nor is a submission about the JS engine an
appropriate venue for this question, I know), so I'm going to hijack this
thread:

We know some properties are _expensive_ and when you use them a few times (or
with certain values) you get sub-60fps scrolling — but why are they
_expensive_? Are they inherently hard to optimize (e.g. different GPUs across
mobile devices), or is it that nobody got to optimize them yet?

~~~
pcwalton
Because, in general, they can trigger layout/reflow, so they can't run on the
compositor. For example, changing things like "top" on an absolutely
positioned box can trigger reflow of the contents inside, because of the way
CSS works (for example, if "bottom" is set, then the height changes, which can
affect the size of things with percentage heights, which can cause floats to
be repositioned, etc. etc.)

In traditional browser engines it's even worse because layout runs on the main
thread, which is also shared with your JavaScript, so painting ends up
transitively blocked waiting for your scripts to finish. That is not the case
in Servo (disclaimer: I work on Servo), but making layout run off the main
thread is hard for many reasons—some inherent to the problem, some historical
in the design of current engines—so all current browsers run JS and layout
together.

~~~
bgirard
Exactly.

To elaborate on how this works in Gecko. The rendering pipeline has several
optional stages:

requestAnimationFrame (Scripts) -> Style flush -> Reflow flush -> display list
construction -> Layer construction (recycling) -> invalidation ->
Paint/Rasterization -> Compositing (on it's own thread).

Gecko tries to only run each stage of the pipeline only if they are needed.
Fast operations like a CSS transition on an opacity or transform will only
activate the Compositing phase. WebGL only canvas drawing will only activate
rAF + Compositing. Meanwhile a JS animation on "top" will run all of these
phases.

