
V8: A Tale of TurboFan - JoshTriplett
http://benediktmeurer.de/2017/03/01/v8-behind-the-scenes-february-edition/
======
mike_hearn
It's interesting to watch the swings and roundabouts in the VM engineering
space. I remember many years ago being in a Google engineering all hands where
Android was first announced (to the firm) and the technical architecture was
explained. I and quite a few others were very surprised to hear that they
planned to take slow, limited mobile devices and run a Java bytecode
interpreter on them. The stated rationale was also quite surprising: it was
done to save memory. I remember being very dubious about the idea of running a
GCd interpreted language on a mobile phone (I had J2ME experience!).

In the years since I've seen Android go from interpreter, to interpreter+JIT,
to AOT, to JIT+AOT, to interpreted + JIT then AOT at night. V8 has gone from
only JIT (compile on first use) to multiple JITs to now, interpreter+single
JIT. MS CLR still doesn't have any interpreter and is fully AOT or JIT
depending on mode. HotSpot started interpreted, then gained a parallel fast
JIT, then gained a parallel optimising JIT, then went to a tiered mechanism
where code can be interpreted, compiled and recompiled multiple times before
the system stabilises at peak performance.

Looking back, it's apparent that Android's and indeed Java's initial design
was really quite insightful. A tight bytecode interpreter isn't quite as awful
as it sounds, especially given how far CPU core execution speed has raced
ahead of memory and cache availability. If you can fit an interpreter almost
entirely in icache, and pack a ton of logic into dense bytecode, and if you
can use spare cores that would otherwise be idle (in desktop/mobile scenarios)
to do profile guided optimisation, you can end up utilising machine resources
more effectively than it might otherwise appear.

~~~
mwcampbell
Meanwhile, iOS has used strictly AOT compilation from the beginning for native
applications. And doesn't iOS have a reputation for buttery smooth UIs?
Perhaps the original AOT compilation approach of ART wouldn't be so bad if it
didn't have to be done on the user's device, taking up the user's time while
installing apps or upgrading the OS.

~~~
Twirrim
iOS's advantage in this is that the hardware range is small. It's feasible to
do full AOT away from the end user's device.

With Android there are hundreds or thousands of combinations of various bits
of hardware and drivers, with various levels of compatibility. Device
manufacturers have a terrific advantage in being able to produce hardware that
is more specialised for the purpose, but there are trade-offs. Compiling all
possible variations AOT away from the end device is going to be difficult.

~~~
mike_hearn
The Play Store could AOT compile things on the server side. I don't know why
it doesn't, it seems like such an obvious win.

~~~
esprehn
The binaries produced from the AOT process are too large to download over the
network for typical users and data plans.

~~~
pjmlp
Yet Apple and Microsoft managed to do it.

~~~
mike_hearn
In fairness, they managed to do it only for the highest of high end users.
Apple institutionally doesn't care about anyone in places like China, India or
Africa where data is expensive. That's why Android completely dominates the
global smartphone market share.

I guess nothing would stop the Play Store from AOT compiling binaries just for
users with fast connections though.

~~~
pjmlp
Windows Phones were widely available in India, before everything kind of felt
apart, though.

------
mwcampbell
I think there's a lesson for application developers buried in here: If you
have the freedom to choose the language and language features of your
application, choose your language and language features so as to minimize the
risk of performance cliffs such as those that the article discusses at the
beginning. This probably means static typing, and static dispatch (i.e. not
virtual functions) by default. And of course, a language like that can be
compiled to JavaScript, e.g. Kotlin, C# via Bridge.net, or F# via Fable.

By the way, I know this is a tenuous association, but to me, the name TurboFan
makes me think of CPU-hungry code that cranks up the fan(s) on a user's
laptop.

------
franciscop
This is a huge deal for me. I am creating _server_ [1] for Node.js: " _npm
install server_ ". The slowest part I noticed in some initial benchmarks was
the promise-based code which is used heavily instead of express' callback
based middleware.

While performance is not one of the main concerns for the library (they are
simplicity, "batteries-on" and user experience in general) it is nice to see
that my hunch was right; V8 would provide a 500% performance boost to
Promises.

[1] [http://serverjs.io/](http://serverjs.io/)

~~~
davnicwil
I've never seen this before and it looks interesting - from a brief scan of
the source & about section it looks to me like a wrapper over express to
provide sensible defaults to round off some of the sharp corners when setting
up a project from scratch.

This could be extremely useful. I've been using express in production for
years, and though setting up a new server is not fundamentally difficult, it
does involve a lot of common boilerplate (cookie-parser _et al_ ) that I think
a lot of people already solve with a boilerplate template. This could be a
cleaner, more fully-featured approach.

Is this more or less it, or is there more to it / future aims to do more? I
also wonder - did you talk to the express developers about making direct
contributions to provide solutions to these problems from within the express
library itself, rather than externally via a wrapper library?

I'll keep an eye on the development of this project :-)

~~~
franciscop
Thanks! I got started few months back. I haven't really talked to them besides
asking a question or two. The thing is, express _used to be like this_ but it
was then split up; from what I read it was mainly due to unstability in the
subpackages, something which I think/hope will be solved by now. Another point
is apparently it's easier to maintain since there have been some problems
internally.

The main differences (for the initial or a later release) are:

* A lot of functionality out-of-the-box. Just install it and get to work.

* Websockets as first-class citizens: since they are one of the major advantages of Node.js in itself it makes sense they are trivial to use.

* Error handling: intercept some messages from Node.js and provide a more human-readable version.

* Promise-based.

There's also a single parameter for middleware instead of a _[err], req, res,
next_ parameters, since Promises work really well with a single parameter. You
might think that this comes from [http://koajs.com/](http://koajs.com/) , but
only the name comes from there, I used to call it _inst_ for instance until I
found a better name, which I found in Koa's _ctx_.

Oh, I talk about all of this in here:
[https://serverjs.io/about](https://serverjs.io/about)

------
rurban
Seeing all the problems and wrong assumptions they had, makes me appreciate
luajit even more.

100x smaller, multiple times faster and did it right from the beginning.

~~~
scriptproof
That is irrelevant. On a typed and mostly static language all is 100x easier
to compile too.

~~~
NiLSPACE
Lua is a dynamic language though.

~~~
scriptproof
You are right. What is the magic formula to make the JIT so fast?

~~~
pjako
Javascript has some features that makes it hard to be JITed. For example it
has no fixed object layout. If you are interested into how language design
impacts performance this is worth reading:
[http://wren.io/performance.html](http://wren.io/performance.html)

------
jules
> From my very personal point of view, the TurboFan optimizing compiler at
> that time was probably the most beautiful version we ever had, and the only
> version (of a JavaScript compiler) where I could imagine that a “sea of
> nodes” approach might make sense (although it was already showing its
> weakness at that time).

What were the weaknesses of the Sea of Nodes? Backwards data flow analysis and
control flow sensitive analysis being hard?

~~~
qznc
I worked on a Sea of Nodes compiler [0] myself and "backwards data flow
analysis and control flow sensitive analysis" has never been a problem.

Changes to the CFG are hard because of the Phi nodes, which have to be adapted
in lockstep with CFG changes. However, this is probably inherent to SSA form,
but not sea of nodes.

You want a good graph visualization instead of text-based output, because you
should actually to see the "non-order of the sea". Text-output with an
implicit order can hide issues.

Personally, I believe sea of nodes is better then others, like SSA form is
better than non-SSA. There is nothing which makes it inherently more powerful,
but it feels more elegant. Unfortunately, there is no objective comparison and
anecdotes are apples vs oranges. Similarly, how would you compare object-
oriented with functional programming?

[0] [http://libfirm.org](http://libfirm.org)

~~~
jules
LibFIRM's sea of nodes representation is a bit different in that nodes are
tied to a basic block, if I understand it correctly. That would indeed make
control flow sensitive analysis easy. Their sea of nodes representation does
not have this property, so it is not immediately apparent under what control
conditions a node may be executed. Their scheduler will place nodes in basic
blocks at the end of the pipeline.

I loved your paper about PBQP register allocation, by the way!

------
Klathmon
This is a fantastic write-up!

While I've often felt many of the pain points in the article and they were
never explained anywhere (that I saw anyway).

I'd often profile/benchmark something, make sure it's fast enough to use in
our performance critical section of the code, only to find that once in the
application I'd only get a fraction of the speed I was expecting.

I would then hop over to using --trace-opt only to find that functions were
getting deoptimized or never optimized in the first place and I'd start
playing the game of trying things here and there to get it to cooperate. And
in some cases --trace-opt wouldn't tell me anything that I could usefully
understand yet my code would still be slow.

Here's to hoping that turbofan clears up a lot of these weird cases!

And slightly off topic, but what are the plans for the dart VM? Is it going to
end up using TurboFan or will it stay with Crankshaft?

~~~
mraleph
> And slightly off topic, but what are the plans for the dart VM? Is it going
> to end up using TurboFan or will it stay with Crankshaft?

Dart VM is a code base independent of V8, so V8's plans to adopt one or
another compiler have no implications on Dart VM.

Dart VM's IR is closer to Crankshaft, but it supports full language unlike
Crankshaft (so in this sense it is closer to TurboFan).

We currently have no plans to radically rework the compilation pipeline,
because we see no need for that.

------
wisebit
>Looking at it naively, it seems to follow the rules for the arguments object
in Crankshaft

I've never managed to find a good source on the V8 internals and how to target
these optimizations (the "rules for the arguments object" the author alludes
to).

Any recommendations?

~~~
cwmma
this [1] is probably what you want

1\.
[https://github.com/petkaantonov/bluebird/wiki/Optimization-k...](https://github.com/petkaantonov/bluebird/wiki/Optimization-
killers#3-managing-arguments)

~~~
wisebit
Thanks! The linked repo [1] is also great:

[1] [https://github.com/vhf/v8-bailout-
reasons](https://github.com/vhf/v8-bailout-reasons)

~~~
vhf
They are complementary resources, "Optimization killers" helps you avoid
pitfalls in practice, "v8-bailout-reasons" tries to document and explain the
various Crankshaft bailouts.

------
dvdplm
Latest node (7.7.1) uses v8 5.5 I believe: does anyone know what the roadmap
wrt to updating v8 to the latest version containing all this new stuff look
like?

~~~
curveship
There's a branch of node, vee-eight-lkgr, running the new v8 5.8 compiler
chain. You can read about how to try it out (and help test it) here:
[https://medium.com/@bmeurer/help-us-test-the-future-of-
node-...](https://medium.com/@bmeurer/help-us-test-the-future-of-node-
js-6079900566f#.d2iq1i2zq)

~~~
dvdplm
Works great! On my project full builds went down from ~20s to ~17.5 and dev
builds down from ~10s to ~7.8s! Good stuff. :)

------
bhouston
Very interesting g.

