Chromium Blog: A New Crankshaft for V8

greattypo · on Dec 7, 2010

The Chrome javascript engine team is simply a beast.

kenjackson · on Dec 7, 2010

I'm curious to see if they broke any code, like the IE engine did recently. For example, with loop-invariant code motion, what is legal in a language like C, may not be in Javascript (for same reason the IE DCE optimization was invalid).

I'd find it hard to believe that Goog would make the same mistake after all the hullaballoo, but I'd love to see it validated.

VMG · on Dec 7, 2010

I'd like to have a guide for writing code that is easily optimized by V8 and similar engines. Having local variables is good, as far as I can tell, but it would be nice to have a full overview with dos and don'ts

drivebyacct2 · on Dec 7, 2010

"Having local variables is good"

Reading sentences like this scare me because it reminds me that some people don't know what the 'var' keyword means or think it's acceptable to shove everything in window or global.

ams6110 · on Dec 8, 2010

Last I read, JavaScript has 2 scopes: function scope, and global scope. Unlike many languages, there is no block scope (e.g. a counter in a for loop).

carussell · on Dec 8, 2010

There is in Mozilla https://developer.mozilla.org/en/New_in_JavaScript_1.7#let_s... http://wiki.ecmascript.org/doku.php?id=proposals:block_expre...

solutionyogi · on Dec 8, 2010

It's definitely not available in current JavaScript interpreter in different browsers.

eru · on Dec 8, 2010

On the other hand, JavaScript allows you to nest functions, unlike many languages.

drivebyacct2 · on Dec 8, 2010

I feel dumb now because I fear I don't understand what you mean when you say block scope...

    for(var i=0; i<3; i++) { console.log(i); }

appears to be valid... (or just in Chrome?)

aboodman · on Dec 8, 2010

Valid, just doesn't do what you think.

function foo() { var x = 1; if (true) { var x = 2; var y = 3; } console.log(x); console.log(y); }

foo(); // 2, 3 (vs 1, undefined if there was block scope)

drivebyacct2 · on Dec 9, 2010

Thanks for the explanation!

pornel · on Dec 8, 2010

JS performs "variable hoisting", so your example is interpreted as:

    function(){ var i; … for(i=0; …

It has surprising side effects:

    alert(i);
    var i;

works and outputs "undefined", which is value of the variable i that is defined.

    alert(k);
    var i;

This is an error, and will fail to execute because of undefined variable.

alanh · on Dec 8, 2010

The IE team didn’t break anything in shipping software, right? This is just an IE9 beta problem, presumably fixable before they ship a stable IE9?

kenjackson · on Dec 8, 2010

Absolutely. It wasn't even beta, it was just a tech preview. If they didn't have bugs of that sort, they aren't pushing things hard enough.

pbiggar · on Dec 7, 2010

Loop-invariant code motion is still legal in JS, and most dynamic languages, as is DCE. You just need to take into account the weird semantics of JS.

What the IE team got wrong was the assumptions they made about the semantics of the code that was optimized. If they had checked for the presence of a valueOf property, they could still have done the optimization.

kenjackson · on Dec 7, 2010

Exactly. I'm not saying those opts aren't legal, but in cases that look legal in C, they're not legal in JS. And note, you can still do it in the presence of a custom valueOf method, as long as it doesn't have side effects (and thus of course loop-invariant itself).

In essence I'm asking if Google does in fact do this check, and does analysis to ensure that the valueOf method doesn't get written dynamically in the loop itself.

natmaster · on Dec 7, 2010

"...performance of JavaScript property accesses, arithmetic operations, tight loops..."

Does this mean Crankshaft includes a tracing JIT like Firefox? This layman speak confuses me.

scott_s · on Dec 7, 2010

Yes, look at the list of the four main components.

pbiggar · on Dec 7, 2010

I don't think that says it uses a tracing compiler (naturally the terms are vague in this field, so I'm not certain). Their architecture looks much more like HotSpot than TraceMonkey.

wmf · on Dec 7, 2010

Especially considering that HotSpot and V8 were designed by the same person.

pbiggar · on Dec 7, 2010

I don't think this is true. Do you have a reference for that?

codebaobab · on Dec 7, 2010

This may not be 100% literally true, but it is definitely true in spirit. (The previous poster is referring to Lars Bak, but both Hotspot and V8 are/were team efforts.) The family tree here is:

  self->hotspot->V8

But, yes, Lars is the man.

igouy · on Dec 8, 2010

    self->hotspot-> Resilient Smalltalk Embedded Platform ->V8

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.84.7...

codebaobab · on Dec 8, 2010

OK, you got me on that one. I've been a professional Smalltalk programmer for almost 20 years and I've never heard of the Resilient Smalltalk Embedded Platform.

igouy · on Dec 8, 2010

aka OOVM it took a novel and interesting approach - use Eclipse as the IDE to edit text files (there was a syntax for class definition) and sync bytecode with an image on a remote device through tcpip.

maw · on Dec 7, 2010

See the first and fourth paragraphs of http://en.wikipedia.org/w/index.php?title=Lars_Bak_(computer....

swannodette · on Dec 7, 2010

It will be interesting to see how this optimization affects V8's memory profile (and how that in turn affects the currently slim memory profile of Node.js).

panarky · on Dec 7, 2010

Looks like this Node patch includes Crankshaft: https://github.com/ry/node/commit/c30f1137121315b0d3641af6dc...

dtwwtd · on Dec 7, 2010

It will also be interesting to see if this improves performance of long running Node.js scripts.

thasmin · on Dec 7, 2010

When Chrome was released two years ago, I noticed a significant difference in speed. Nowadays I think the announcements JavaScript performance improvements is a bit overenthusiastic. The only real world benchmark mentioned in the article is that Gmail loads 12% faster. What JavaScript apps are constrained by performance and what are Crankshaft's effects on them?

johnthedebs · on Dec 7, 2010

I think the point of these optimizations is that they allow javascript-intensive applications to be developed that would otherwise have just been too slow.

In other words, they're paving the way for the future more so than trying to squeeze every last ounce of speed from current applications (which just happens to be a great side effect of their work).

jamesaguilar · on Dec 7, 2010

Yeah, you won't see a lot of applications that benefit from these optimizations immediately because if there were any that would mean that they had been written before there was any device capable of running them.

pohl · on Dec 7, 2010

I'm happy that the various javascript teams are developing towards where the web is going rather than where it is.

"A good hockey player plays where the puck is. A great hockey player plays where the puck is going to be." — Wayne Gretzky

timtadh · on Dec 7, 2010

I don't think the GP would disagree with you. I believe he was asking what specific apps benefit today (besides GMail which was mentioned in the blog).

silentOpen · on Dec 7, 2010

The coming WebGL games desperately need all the performance they can get.

pohl · on Dec 7, 2010

I'm guessing you're right. I was responding to the tone of "Nowadays...overenthusiastic" — in support of enthusiasm — and to the notion of naming specific current apps — which I'm imagining as the puck's current location.

We may not even be equipped to answer that question. There are a lot of web-based but private & internal corporate applications that we'll never know about, and can't hope to name.

InclinedPlane · on Dec 8, 2010

Much web client JS is constrained by dom speed so this will only have an incremental effect. However, for node apps this willbe pretty significant.

pornel · on Dec 8, 2010

> What JavaScript apps are constrained by performance and what are Crankshaft's effects on them?

Canvas and WebGL games.

ashot · on Dec 7, 2010

unreal. how much theoretical headroom is left to optimize js compiler performance?

I had assumed we were reaching some theoretical upper-bound because all major frameworks were on par in terms of performance.

jerf · on Dec 7, 2010

To a first approximation, my answer would be something like http://shootout.alioth.debian.org/u32/benchmark.php?test=all... .

That's not necessarily the whole answer and I imagine JS can't ever quite go that fast. But still....

Raphael_Amiard · on Dec 7, 2010

> and I imagine JS can't ever quite go that fast

I'm not the most knowledgeable person on the subject, but from what i understand, there is no theoretical reason that JS couldn't go that fast. The two languages are more similar than they are different, even if JavaScript is quite more complex.

I remember Mike Pall saying something similar in an LTU thread some time ago.

grayrest · on Dec 7, 2010

> I remember Mike Pall saying something similar in an LTU thread some time ago.

He did and the mozilla guys pointed out how this wasn't the case. The languages are very similar but JS has some weird semantics due to how things are scoped. (ref. the Chakra optimization brouhaha a month ago)

jules · on Dec 7, 2010

Going the other way: what small changes in languages can we make to make them much more optimize-able while at the same time keeping (most of) the expressiveness?

grayrest · on Dec 7, 2010

It's a tradeoff between compatibility with existing JS versus perf/capability enhancements. The ECMAScript committee goes back and forth on this topic. The biggest stride in that direction was strict mode, which is intended to catch most of the low hanging fruit. I know Brendan Eich has mentioned a number of things that could change to get the language faster. I don't, however, know if I read it all in one place or I'm combining nultiple one-off examples.

I'm not confident enough in my memory to write out a probably incorrect list of things I remember. Here's a list of the various things I've read from Brendan in case you're interested in tracking it down:

http://lambda-the-ultimate.org/node/3851#comment-57671

This is the LtU thread my previous comment referred to. It's a large (and fantastic!) thread, but I believe that comment and children are the most direct comments back and forth between Mike and Brendan. Andreas Gal is also on the thread and from Mozilla.

http://www.aminutewithbrendan.com/

Brendan's weekly JS podcast. They're relatively accessible and generally cover a lot of area. A good way to get in the language zeitgeist.

http://brendaneich.com/

Brendan's blog, mostly focuses on ES Harmony stuff and Mozilla specific topics.

SwellJoe · on Dec 7, 2010

Would this imply that Lua has reached some pinnacle of speed and can't go any faster? That seems to be a side effect of your statement.

I'm not familiar with Lua, beyond reading an article or two about it, but does its simplicity imply some sort of maximal efficiency? Are the developers behind Lua simply the best programmers in the world and already have everything figured out with regard to optimizing a JIT? I'm not arguing with you...it does seem like that's a reasonable goal for JavaScript JITs to strive for in the near future. But, it doesn't really answer the question of how much better performance can get (in JavaScript or Lua or any other language). Past performance is not necessarily indicative of future performance when so many people are working on the problem from so many angles.

jws · on Dec 7, 2010

Browsing the alternatives in the shootout dataset, LuaJIT appears to be the fastest of the dynamic languages and feature-wise matches Javascript well enough to be a fair benchmark.

You can gain another factor of 2 or so in speed by going to a static language like C or Ada, but that isn't really a fair comparison and you can see the price paid in code size.

The good news for the web is that there may be another factor of 2 to 3 available for Javascript speedup.

eru · on Dec 8, 2010

You can also go to a static language like OCaml. They don't blow up your code size, but are also fast.

jerf · on Dec 7, 2010

LuaJIT is a major outlier, probably the most surprising result in the entire shootout. In point of fact there isn't much more LuaJIT can do without flat-out exceeding C, which isn't going to happen on the shootout to any significant degree any time soon.

(The conventional "JIT can be faster than compiled code" argument doesn't apply because the problems are accurately known by the author of the shootout benchmark code in advance, so, for instance, if there's a speed advantage to sticking with 'char' where you might have been tempted to write 'int', the C shootout code already does that.)

jules · on Dec 7, 2010

That argument is often made for JITs, but I have never seen a real world example where extra runtime knowledge that JITs have is used that couldn't be done better by a static compiler, except in the cases where runtime code loading is used.

mikemike · on Dec 8, 2010

Alias analysis is a good example. A JIT compiler may speculatively add dynamic disambiguation guards (p1 != p2 ==> p1[i] cannot alias p2[i]). If the assumption turns out to be wrong, the JIT compiler dynamically attaches a side branch to the guard using the new assumption (p1 == p2 ==> p1[i] == p2[i], which is an even more powerful result).

Doing this in a static compiler is hard, because it would have to compile both paths for every such disambiguation possibility. This quickly leads to a code explosion. You'd need very, very smart static PGO to cover this case: there are no branch probabilities to measure, since the compiler doesn't know that inserting such a branch might be beneficial. It may only derive this by running PGO on code which has these branches, which leads to the code explosion again.

Auto-vectorization is another example: a static compiler may have to cover all possible alignments for N input vectors and M output vectors. This can get very expensive, so most static compilers simply don't do it and generate slower, generic code. A JIT compiler can specialize to the runtime alignment and even compile secondary branches in case the alignment changes later on (e.g. a filter fed with different kernel lengths at runtime).

wmf · on Dec 7, 2010

I agree in general, although I will point out that virtually no C developers use PGO while it's on by default in HotSpot and now V8. (Of course, it looks like Java needs PGO just to try to catch up with gcc -O3.)

nimrody · on Dec 8, 2010

Not strictly the same but take a look at the CPU world:

VLIW (compilers try to optimize processing based on static knowledge - e.g. Intel's Itanium) vs. the current Intel CPUs (based on P3/P4 architecture) which dynamically allocate resources depending on runtime knowledge.

Runtime information can help compilers. Just look at profile guided optimizations in current static compilers.

The real trouble in JIT compilers is usually that the target languages semantics are very high level. For example an integer in C is machine sized and is not expanded in size to fit its value -- unlike some dynamic languages.

nl · on Dec 7, 2010

http://weblogs.java.net/blog/2008/03/30/deep-dive-assembly-c...

This link doesn't quite give what you are after (it's mostly about static compilation in the Java HotSpot compiler), but I believe the lock elision features (http://www.ibm.com/developerworks/java/library/j-jtp10185/in...) have to be done at runtime in the JVM (because of late binding).

Obviously this doesn't totally invalidate your argument ("except in the cases where runtime code loading is used"), but it is worth noting that in many languages late binding is normal, and so this is the general case.

Also, HP's research Dynamo project "inadvertently" became practical. Programs "interpreted" by Dynamo are often faster than if they were run natively. Sometimes by 20% or more. http://arstechnica.com/reviews/1q00/dynamo/dynamo-1.html

mikemike · on Dec 8, 2010

This is a common misinterpretation of the Dynamo paper: they compiled their C code at the _lowest_ optimization level and then ran the (suboptimal) machine code through Dynamo. So there was actually something left to optimize.

Think about it this way: a 20% difference isn't unrealistic if you compare -O1 vs. -O3.

But it's completely unrealistic to expect a 20% improvement if you'd try this with the machine code generated by a modern C compiler at the highest optimization level.

kragen · on Dec 8, 2010

I think http://shootout.alioth.debian.org/u32/benchmark.php?test=all... is probably a better ceiling. LuaJIT is fantastic and already generates better code than GCC in some cases, but in many others it does not.

There's no theoretical limit to how close a compiler can come to a programmer when it comes to generating machine code to do a particular well-defined task.

seanalltogether · on Dec 7, 2010

Javascript execution is very rarely the bottleneck on webpages. The bottleneck is almost 90% render speed of dom updates.

Splines · on Dec 7, 2010

So - how long does it take for features to make their way into production? Am I reading the release calendar correctly in that it'll take 12 weeks from start of development to beta, and another 12 weeks from beta to stable?

It looks like Chrome 8 went stable on 12/2. So we'll see Chrome 10 in 4 months?

aboodman · on Dec 7, 2010

Chrome stable releases are every 6 weeks.

The releases are overlapped though, so we are testing v n+1 in beta while v n is in stable, and we are starting new feature development for v n+2 while v n is in stable.

Chrome 8 just went stable, and we've just started testing Chrome 9.

So crankshaft will either be 6 weeks from today (if it's in 9), or 12 weeks from now (if it's in 10).

I don't think it's been announced which it's targeted for.

HTH

Splines · on Dec 7, 2010

Ah, ok. I guess I misread the release calendar. Thanks for the clarification :).

> I don't think it's been announced which it's targeted for.

According to the perf comparison chart on the blog, it's in Chrome 10. Also, it's mentioned that Crankshaft is available in the canary build, which is currently at 10 too.

12 weeks then. I sometimes revert back to FF for the plugins, but I always come back to Chrome for the perf :).

cosgroveb · on Dec 7, 2010

Gmail really does seem to load in about half the time now in the Canary build.

ams6110 · on Dec 8, 2010

OTOH this does not bode well for the future potential bloat in Gmail.

tsta · on Dec 7, 2010

I've benchmarked Google Chrome 9.0.597.10 and 10.0.603.3 (with Crankshaft) and the latter is 30% faster. See the detailed results: http://dromaeo.com/?id=124912,124913

CJefferson · on Dec 7, 2010

Has anyone got any experience using javascript/V8 as a scripting language for a C++ app? We currently use python with boost::python bindings, but are finding we have to limit the amount of python code, as it is too slow.

kqr2 · on Dec 7, 2010

Have you looked into lua? It's a good embedded scripting language.

http://www.lua.org/

pygy_ · on Dec 7, 2010

Especially, for x86 and x64, where you can use LuaJIT.

LuaJIT2 is still in beta, but it is already very stable. Performance-wise, it is comparable to Haskell and Java.

http://luajit.org/

nitrogen · on Dec 8, 2010

Are there any plans to port LuaJIT to ARM or LLVM? I see a couple of posts mentioning slow FP performance on ARM, but that could be solved with a technique like that used by LNUM.

pygy_ · on Dec 8, 2010

The LLVM IR is too low level. It loses some context necessary to get the performance of the tailored JIT written by Mike Pall.

There is a separate effort to write a JIT complier for Lua on top of the LLVM, but the performance is not as good as LuaJIT, and reaching such a level will be very complex, (assuming it's possible).

joeyo · on Dec 8, 2010

There is sponsorship for a PPC LuaJIT port (targeting embedded systems, I believe) and Mike Pall has expressed interest in an ARM port in the past, but I don't know what the status of that is.

nitrogen · on Dec 8, 2010

PPC LuaJIT sounds like it would be useful in console games (Why did the big three consoles switch to PPC just as Macs switched to Intel, anyway?).

pjscott · on Dec 7, 2010

Also, the language itself is pleasant and straightforward. If you know other modern programming languages, Lua shouldn't be too surprising.

DanielRibeiro · on Dec 7, 2010

Ever looked into Cython? http://www.cython.org/

Sage (a open source replacement for matlab) uses it quite successfully for speeding up critical paths.

russell_h · on Dec 7, 2010

Thats essentially what Node.js is, although with a heavy focus on asynchronous operation.

Check out: https://www.cloudkick.com/blog/2010/aug/23/writing-nodejs-na...

azakai · on Dec 8, 2010

Syntensity moved from Python to JavaScript as an embedded scripting language,

http://www.syntensity.com/toplevel/intensityengine/

worked out very well there.

Of course the real examples are... web browsers, which are C++ apps that are scripted by a JavaScript engine. Seems to work good there as well ;)

tlb · on Dec 7, 2010

At Anybots, we built all the real-time robot code in Python with performance-critical bits in C++ wrapped using boost::python. Server code is in pure Javascript on Node.JS. I haven't tried to integrate C++ into Node, but it looks easy.

jorangreef · on Dec 7, 2010

Next up, remove the 1.9GB max memory limit of V8 processes: http://code.google.com/p/v8/issues/detail?id=847

todd3834 · on Dec 7, 2010

Will this have any effect on NodeJS?

chrislloyd · on Dec 7, 2010

I doubt it.

From what I understand, the significant improvements in speed come from Crankshafts tradeoff of compilation optimisation for startup speed. If your app is a for loop with 2 iterations that code path won't be heavily optimised as the interpreter would potentially be more spending more time compilation code than in execution of unoptimised code. It will therefore startup faster. However, hotspots (loops with 1,000 iterations, per se) will be heavily optimised.

This is great for websites as speed and responsiveness is perceived as startup time. You'll certainly notice a difference when using the Node as a scripting tool. However, most Node applications are long running servers executing the same code paths over and over. Its unlikely that Crankshaft is performing any extra optimisations, it is just changing when it performs these optimisations. However, if Crankshaft _is_ doing significantly more advanced optimisations (I don't know) then, yes, Node will benefit. Please correct me if I am wrong, I would love to be.

cwp · on Dec 8, 2010

Node-based servers will definitely benefit from this. The advantage of a two-stage compilation scheme is that the "base" compiler generates non-optimized code that is self-profiling: it collects type information as it runs. That information is then used by the second-stage compiler to produce code that is more optimized than a single-stage compiler (like pre-Crankshaft V8) can produce.

nene · on Dec 7, 2010

From the article:

> In addition to improving peak performance as measured by the V8 benchmark suite, Crankshaft also improves the start-up time of web applications such as GMail.

As I understand it, there is more to Crankshaft than just startup time improvement.

tlrobinson · on Dec 7, 2010

Sure, I would expect node.js to be an ideal environment for this type of hot code analysis, since servers written with it are typically long running vs. more transient web pages. Of course whether there are significant gains depends a lot on whether your server is CPU-bound or IO-bound.

mxavier · on Dec 7, 2010

This seems like it would be huge for the Node.js people. The next big milestone for V8 for node has to be the GC issues that have been wreaking havoc on node apps under high load.

jrockway · on Dec 7, 2010

Seems like it should.

StuffMaster · on Dec 7, 2010

Sounds like they borrowed the tracing idea from mozilla.

scott_s · on Dec 7, 2010

Not really. It's a well-known VM optimization technique.

cpr · on Dec 7, 2010

First featured in Self many years ago, done by the same folks (Lars Bak and crew) who brought you V8. Then Sun bought their Smalltalk/Self-based company, and they built HotSpot for the JVM. Then Google hired them to do the same for Javascript, and now we have V8.

It generally takes about 10-20 years to get truly new ideas from the labs to consumer-level products.

effn · on Dec 8, 2010

Neither Self nor HotSpot uses tracing. It's a relatively new compilation technique, the implementation in the original tracing paper used java bytecode as the source language.

I think you are confusing tracing with adaptive compilation.

kingkilr · on Dec 8, 2010

Actually there's even older work using tracing for re-optimization of assembley :)

grayrest · on Dec 7, 2010

They don't mention that they're tracing. I think they're just optimistically optimizing functions in hot loops.

erik_landerholm · on Dec 7, 2010

I thought someone was actually developing a new kind of crankshaft for a real V8 engine....

eru · on Dec 8, 2010

Doesn't `Chromium Blog' and the URL give it away?