Hacker News new | past | comments | ask | show | jobs | submit login
Speculation in JavaScriptCore (webkit.org)
138 points by pizlonator on July 29, 2020 | hide | past | favorite | 67 comments

If you were ever curious how modern JavaScript VMs (or VMs for other dynamic languages) achieve high performance, this is an awesome resource. It explains tiers, the goals and design of different tiers, on stack replacement, profiling, speculation and more!

JavaScript engines are the most advanced at this (only LuaJIT is even comparable), it would be awesome if Python, Perl, Ruby, PHP or the like aimed for the same level of performance tech.

I'd say that the JVMs (particularly Azul's) are more advanced at this. Even with types, they sill speculate in order to inline across virtual method calls.

But agreed that the amount of perf JS engines can achieve is truly impressive.

JSC speculates in order to inline across virtual method calls while also inferring types.

Also inlining across virtual method calls is just old hat. See the post’s related work section to learn some of the history. Most of the post is about techniques that are more involved that inlining and devirtualization.

HotSpot is a fork of strongtalk, which did the same thing and in fact invented the techniques you're talking about (ie. creating fake backing types for untyped code and optimistically inlining those with bounces out to the interpreter on failures, perhaps keeeping several copies around and swapping out the entry in the inline cache).

Additionally that functionality has been added to over time with the invokedynamic byte code and it's optimizations.

Invented some of the techniques in the post. Most of the post is about what’s new. I cite the work that led up to strongtalk throughout the post.

I read the article and don't see any examples of what hotspot doesn't do? What am I missing?

HotSpot has four tiers huh?

That’s just one snarky example. There are lots of others. I’m sure you could identify them easily, if you are familiar with HotSpot and you read the post.

Last time I checked it had five tiers.

Like, really one non snarky example is all I'm looking for. As someone who does know the internals of hotspot fairly well and read the post.

JSC has four in the sense that hot enough functions go through LLInt, then Baseline, then DFG, and then FTL. They go through three lower optimization levels before the max one.

HotSpot has three in the sense that you get interpreter, client, then server.

Both VMs have additional execution engines available behind flags. JSC has a different interpreter (CLoop, based on LLInt but different) for some systems, so it’s like a “fifth tier” if you think of “tier” as just an available execution engine. I think of tier as a stage of optimization that you get adaptively, without having to specifically configure for it. I get that HotSpot can alternatively AOT or Graal, but those aren’t tiered with the rest to my knowledge (like there is no interpreter->client->server->graal config but if there was then that would be four tiers).

HotSpot definitely uses five tiers (including the interpreter, so same as JSC).


I'll give you that levels 1-3 use the same IR, but that says more about the generality of the C1 IR than JSC being more advanced for using different IRs IMO.

I don't think it's reasonable to describe those as five independent tiers. Two of them just add (slower) code to collect profiling information.

According to that, it skips level 1 and goes all the way to 3 in some cases.

Again, not the same as what JSC does, and not nearly as aggressive. Most notably, there is no baseline jit. Also, C1 in any config compiles slower than DFG.

Did you edit out where you had said "JSC > Java" in your initial reply?

That really colors the conversation in a different way retroactively that I don't really appreciate.

It now looks like I'm berating you out of nowhere, when really you originally made an assertion and I'm just trying to get you to back it up.

You started by claiming that Azul is more advanced. You got snark.

Maybe it’s more advanced at collecting garbage or supporting threads, but it is not comparable in the field of type inference, because Azul’s VM and other JVMs do not do any of the kind of type inference described in this post. And no, invokedynamic is nothing like inline caching for JS - not even close. Most of this post is about what type inference for dynamic languages really looks like when you invest HotSpot-like efforts to that specific problem. Saying that Azul is more advanced at type inference is far off from the truth at a very fundamental level.

So I gave you snark. Being snarky is fun sometimes!

Perhaps, but not for Hacker News: https://news.ycombinator.com/newsguidelines.html

Anyway, this is silly. The post cites HotSpot and it’s predecessors. It’s just not the case that we repeat their technique, and number of tiers is a snarky difference and not the real one. I’m not going to give you a TL;DR that enumerates the meaningful differences just because you claimed that there aren’t any.

They were asking for one good example of an innovation over HotSpot, not an enumeration of all the differences. That seems like a reasonable request!

The “number of tiers” difference doesn’t seem quite substantive enough, I would say, although I’m definitely no expert.

If you want a good example then read the post. You’ll find many.

It’s not the purpose of this post to enumerate differences to HotSpot, but it does show some cases where the two approaches are alike.

HotSpot does not need to infer object shapes, so it doesn't need inline caches for field accesses. HotSpot doesn't need to infer whether (and where) methods lie in the prototype chain of an object. HotSpot does not need to speculate that arithmetic fits in integer range. HotSpot does not need to speculate that fields are (not) deleted from objects. HotSpot does not need to speculate that a function has a .toString() called on it. HotSpot does not need to speculate that a local variable can be modified by a sloppy direct eval. HotSpot does not need to do scope analysis and closure conversion. HotSpot does not need to speculate that the arguments object does not escape.

All of these things are extremely typical for a JS VM to profile and speculate on.

I do not work on JSC, but I did work on V8 and it does all of these things.

> HotSpot does not need to infer object shapes, so it doesn't need inline caches for field accesses.

Those are just modeled as invokedynamic getters and setters for dynamically laid out objects. It's handled great.

> HotSpot doesn't need to infer whether (and where) methods lie in the prototype chain of an object.

How is that different than vtable lookups?

> HotSpot does not need to speculate that arithmetic fits in integer range.

Here's where they added that to the JVM. https://bugs.openjdk.java.net/browse/JDK-8042946

> HotSpot does not need to speculate that fields are (not) deleted from objects.

Once again, handled as invokedynamiced getters and setters.

> HotSpot does not need to do scope analysis and closure conversion. HotSpot does not need to speculate that the arguments object does not escape.

They absolutely do since lambdas have been supported.

>> HotSpot doesn't need to infer whether (and where) methods lie in the prototype chain of an object.

> How is that different than vtable lookups?

sigh Maybe to you this seems fun, but from where I am sitting, you are starting to get annoying. It is really not necessary for you to try to dominate this whole post with some completely unnecessary over-the-top HotSpot-is-the-best-ever chest thumping. This wasn't an isolated comment, but you are all over the thread and in people's faces. It's odious when you want to make confident claims and yet don't know the difference between vtable lookup and JavaScript prototype access. It's exactly this kind of exchange that we need less of.

I wish you would step back and appreciate more of the shared context here to recognize that you don't need to explain (and exaggerate) the inner workings of JVMs to people who have worked on them for quite some years in the past. We'd probably be friends and have fun if you'd drop the HotSpot schtick. I went through that phase too, about 10 years ago.

At this point your comments here and elsewhere show a pattern of aggression that just makes people want to disengage, but I am forced to offer you some advice to try to salvage this. It's no fun for anyone to be part of a community with such an adversarial and hostile conversation ongoing. I hope I can resist the urge to be drawn into correcting the rest of your misunderstandings about JavaScript here in a way that doesn't make you feel slighted, because you're not receptive to it anyway and that's an obviously unproductive discussion. But listen, invokedynamic isn't a panacea, it's a nice addition to the JVM to accomplish some dynamism, but it's not mission accomplished by any means. Invokedynamic is a mechanism that requires the existence of dynamic optimization at a higher level, i.e. a higher-level language runtime. Your claims again overstate what it does. You didn't understand what I meant about scope analysis or the arguments object because those don't exist at the JVM level and you must have patterned matched them to something else completely different.

And don't even get me started on HotSpot's startup time.

Many, if not most, of the people working on advanced JavaScript VMs come from a Java background. I would say JS VMs are orders of magnitude more sophisticated in the type of behavioral profiling they do in order to recover types, understand object shapes, and shortcut prototype lookups. While HotSpot and Azul (a progeny of HotSpot) are highly tuned, a lot more investment has gone into bread-and-butter JIT and GC work, whereas JS runtimes have to have a very advanced object model and type profiling infrastructure.

PHP has a JIT coming in 8.0, using the same underlying tech that LuaJit does. Unfortunately, most of what people do with PHP isn't CPU bound, so it doesn't help much.

This could be a chicken and egg thing. As soon as JavaScript started to get faster, we started using it for more CPU intensive tasks, which in turn led to more investment in optimizations, and so on, in a virtuous circle.

Large entities are competing on browser performance "which in turn led to more investment in optimizations" part does not really ring true. We are just benefiting from browser market dynamics.

Browser competition helped a lot, but if you have a large body of CPU bound code, you can bet people will work hard to optimize it.


And let’s not forget HHVM. That was (is?) quite the beast.

LuaJIT 2.x only uses Dynasm to generate the bytecode VM and not for actual JIT compilation. Runtime JIT is done differently.

Dynasm is actually written in Lua. There's a good guide by Peter Crawley on how to run it here: https://corsix.github.io/dynasm-doc/tutorial.html#compiling

Yes, I suppose "in the same way earlier versions of LuaJIT worked" would be more accurate.

LuaJIT 1.x was also an enormous disappointment so it’s a bit perplexing that the PHP team would try something which probably sucks.

Don't believe him. The php jit us entirely different from Lua hit, other than using dynasm. Even raku is using dynasm.

The PHP jit is actually much easier to understand. With SSA for all tiers, CFG, but not that much type speculation at all. The advantage of php, Perl, Ruby over JS is that objects don't vary that much, methods are not overridden that much, and arrays and hashes are much easier. The php jit is at least 10x easier and smaller than JSC. Also C, not C++.

Underappreciated comment and upvoted.

> most of what people do with PHP isn't CPU bound

... cause it's I/O bound? Just curious.

I feel like I can almost visualize the killshot slides that the HipHop and HHVM folks used to show the extent to which PHP can be CPU bound and the extent to which server idle time can be increased by using speculative compilation tricks on PHP.

So, I think it is cpu bound for at least some people.

Versus PHP5.x that's true. Mostly because the PHP internals were inefficient. The performance differences vs HHVM vanished with PHP 7.x.

HHVM's 2nd generation region based JIT performs enormously better than PHP7.x in places where dynamic languages don't perform so well but PHP7.x has the clear lead when benchmarking large apps like WordPress.

I get where the "rooting for the underdog" feeling comes from, but it still feels good that the relatively small and underfunded PHP team mostly beat Facebook here. I like to imagine there is some internal debate at FB on whether to just go back to mainline PHP and kill HHVM. Especially with a credible JIT coming.

Yes, or just waiting on MySQL, for example.

The JIT does dramatically speed up microbenchmarks like Mandlebrot, etc. 5x.

LuaJIT isn't really comparable. Filip touched on why the tracing compilers struggle: tail duplication. Avoiding tail duplication means having heuristics which keep traces short which means limiting your optimization scope.

You can see an example of working around the limited optimization scope by templating Lua here: https://github.com/LuaJIT/LuaJIT-test-cleanup/blob/master/be...

This makes some variables which would otherwise have to be loaded at the start of the trace into constants in the recorded trace. A more complex compiler can just do the same optimizations without the fuckaround.

Is that really the point of that code? It seems like the point is to generate code to map real numbers from math.random to letters in a probability distribution in the smallest number of branches.

Maybe not in architecture and thus in perf ceiling, but they actually get good perf, which is generally not the case for the main implementations of other popular scripting languages IMO. Thanks for highlighting the tracing JIT issue.

Actually that's available for the Python language as well with PyPy.

The post gives PyPy a shout out. But it’s subtle. PyPy is similar but not the same. I think that JSC’s exact technique could be tried for Python and I don’t believe it has.

Maybe, I wish I had the occasion to deploy PyPy in production, but as a daily Python user since 2008: I never had to switch to PyPy to fix any performance problem that mattered for my users or me, but I keep an eye on PyPy and admire it as well as the developers behind it.

PyPy is faster than vanilla Python but it's not really built the same way as modern JavaScript engines and likely can't hit the same perf level.

What is the current development status of Pypy? I haven't heard much about it in the past couple years.

Python actually uses prototypal inheritance behind the scenes.

Pypy is a tracing JIT though which is quite different. JSC and v8 compile a method at a time. Spidermonkey used to use tracing, but switched to methods too (though I think it still does limited tracing in some situations).

TraceMonkey was removed almost a decade ago https://bugzilla.mozilla.org/show_bug.cgi?id=698201

JaegerMonkey used both approaches. I don't know if IonMonkey still does though.


EDIT: It looks like a couple trace-like techniques are still used in IonMonkey (maybe the devs have some input there though).


There’s no tracing in either JägerMonkey or IonMonkey. That’s just a confusing statement about the similarity of speculation guards and OSR exits to trace guards and side exits. For a while TraceMonkey was used as a higher tier above JägerMonkey which might be what you’re thinking of?

I don’t work for Mozilla but spent a ton of time studying TraceMonkey for building my own tracing JIT.

There is also Julia, which JITs very performant code.

I think that Julia uses the LLVM JIT. When I used it last, Julia was somewhat prone to long startup delays since it would fully compile functions on first execution.

That can work great for long running applications such as its main niche of scientific computing but would be terrible for JS since you want the page to be interactive ASAP.

This is why talking about JIT performance is so complicated. Not only do you need to worry about compilation speed and speed of the generated code, you also have to worry about a lot about impact on memory and impact on concurrently running code. Plus most JITs also need to have some sort of profiling system running all the time as part of those constraints to only spend compilation resources on hot paths.

The difference between Julia and speculative compilers is that Julia is much more likely to have to compile things to run them. JSC compiles only a fraction of what it runs, the rest gets interpreted.

Modern JS engines have a multi-tier structure and profiling info that lets them choose what to JIT and which point on compile speed vs runtime speed tradeoff space to take for any given chunk of code. The post covers a lot of this.

I always love reading more about JavaScriptCore internals although I have to confess that much of the time one of the main lessons I get from it is that life would be much easier if we had types and didn't need to speculate so much in the first place.

Not having types is a virtue for the web, where interfaces can change in weird ways. Dynamism leads the engine and the JS code to be less tightly coupled. So, there’s more wiggle room for evolution in the engine and more wiggle room for portability defenses in JS.

So, it just depends on how important the benefits of dynamic types are versus the benefits of static types. I don’t think static types of dynamic types are better; they are just good at different things.

It wouldn't be the JS we know and love if it had been burdened with a type system designed by a committee sometime in the 90s. That said, one thing we can say for sure is that the dynamic typing doesn't make your job any easier :)

You may want to speculate even when you have precise concrete types.

For example your type system may tell you have you have an int32, but you can speculate that only the lowest bit is ever set, with a kind of synthetic type you could call int32&0x1 which isn't expressable in the type system the user uses.

> dynamic typing doesn't make your job any easier

Yeah, it makes millions of application programmers' jobs easier at the expense of a small group of experts - sounds like the right tradeoff?

> Yeah, it makes millions of application programmers' jobs easier

I don't think it's that simple. Large programs get unwieldy, no matter what language you write it in, and a large body of evidence suggests that having static types for both safety and documentation is a big win, because it makes programs more robust and ironically makes programmers more productive in the long run. As you and I both know, this is a long discussion that stretches back decades, so it probably isn't going to be productive to hash it out here.

A more important discussion which is not being had is the question of the size of the trusted computed base. Framed this way, it makes sense to minimize the size of the trusted computed base and not have a complicated dynamic language implementation on the bottom. Instead we should have layers with a very strict statically-typed target that is easy to make go fast at the bottom. This is why I want to put WebAssembly under everything. Yes, even JS. (Fil would probably not agree here :-))

> For example your type system may tell you have you have an int32, but you can speculate that only the lowest bit is ever set,

Range analysis is really important for JavaScript code because everything is a double and it is generally a win to avoid doing double math if possible, but I am duoubtful that it makes much difference for statically-typed integer code outside of array bounds checking optimizations. In my mind, range analysis on integers really only feeds branch optimizations. Maybe it's the case that optimizing integer code that is full of overflow checking benefits from range analysis (the kind of stuff you find inside the implementation of a dynamic language), but I can't really think of much else.

> a large body of evidence suggests that having static types for both safety and documentation is a big win

Citation needed. A review of studies on static vs. dynamic languages concluded "most studies find very small effects, if any". https://danluu.com/empirical-pl/

I read through all those studies and didn't think they shed any light on the subject, as this is something very hard to get numbers on:

"The summary of the summary is that most studies find very small effects, if any. However, the studies probably don't cover contexts you're actually interested in."

”Large programs get unwieldy, no matter what language you write it in, and a large body of evidence suggests that having static types for both safety and documentation is a big win“

I am beginning to think that cutting your large untyped program into pieces and typing it only at the boundaries will get you all the benefits. That probably means most, if not all, types inside those pieces can be inferred.

How about optional typing, even in some limited form (primitive types, primitive-only structs) with gradually adding more over the time?

I think there was some proposal already but it's probably dead now because I haven't heard about it for a while.

There are still ways to program to unstable interfaces in static languages though, and they tend to be safer overall because they are isolated from the rest of the language.

I'd love to see a post like this on instruction selection.

Wow, much longer/complete than I would expect, thank you!

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact