Hacker News new | past | comments | ask | show | jobs | submit login
Maybe you don't need Rust and WASM to speed up your JS (2018) (mrale.ph)
126 points by zbentley on June 24, 2021 | hide | past | favorite | 85 comments



There was a back-and-forth on optimizing this (see the link Steve found). The WASM version got faster too by adopting algorithmic changes proposed here, and the conclusion of Rust authors was that you do need WASM if you want predictable performance.

High-perf JS relies on staying on the JIT happy path, not upsetting GC, and all of this is informal trickery that's not easy to do for all JS engines. There's never any guarantee that JS that optimizes perfectly today won't hit a performance cliff in a new JIT implementation tomorrow.


> There's never any guarantee that JS that optimizes perfectly today won't hit a performance cliff in a new JIT implementation tomorrow.

Yep. As someone who played this game a few years ago, JIT implementations do change (e.g. function inlining based on function size was removed from V8, delete performance changed, number of allowed slots in hidden classes changed, etc).

Also worth noting that just because an optimization works in V8, there's no guarantee that it will also work in another JS engine.


Though I wonder... Some decades ago people made similar arguments with regards to C/C++ versus native assembly. That the only way compiled languages could approach assembly languages on performance was if the code was written in a way that the compiler could optimize. If it couldn't, performance would often tank.

But as compilers got better this become less and less of an issue. By today it has become very rare to need to write in assembly. So shouldn't history repeat itself? That JIT compilers will be so good that it becomes extremely unlikely that you'll hit the performance cliff.


I don't think so. Compilers have gotten smart, but so have people. See for example stuff like this[0]. Hot path stuff still sees hand-written assembly even in this day and age because even modern compilers aren't able to provide enough overhead elision guarantees required for the really hardcore optimizations.

What really happened is that thanks to the advances in computing power over the decades, we can now afford to skimp on performance most of the time - putting effort in optimizing only the most critical code paths - and still be able to get away with it.

But when we're talking about squeezing every drop of performance, one of the big challenges with fighting against the JIT engine is figuring out when things deoptimize. This is because there are patterns (e.g. polymorphic code paths) that are inherently impossible to optimize perfectly (because halting problem). As soon as the runtime runs into a "weird" data type, deoptimization is forced to kick in and that can destroy performance. As a developer you have to be extremely mindful of this. Type systems like Typescript help, but you can still be effectively polymorphic (e.g. large union types, `any`, etc) if you're not being mindful of the underlying machine code.

And JIT has one major downside that these benchmarks often intentionally omit in the name of "reducing noise": they usually take a while to warm up to native-like speed. I'm talking in the order of thousands of iterations running at a couple of order of magnitudes slower before JIT speeds fully kicks in.

[0] https://github.com/jart/cosmopolitan/blob/master/libc/nexgen...


Compiler optimizers also optimize only the generated code, but not the data structure layout, which is usually what matters most.

Also, compilers often can't optimize stuff which is obvious to humans. E.g. here is an example of GCC not being able to delete useless code: <https://godbolt.org/z/655KeM33Y> (it's fixed in later versions of GCC, but this version of GCC also isn't that old). Another example is Clang emitting useless code out of nowhere: <https://youtu.be/R5tBY9Zyw6o?t=1620>. So, it's not out of the ordinary for compilers to be really stupid about the assembly they generate.


JIT compilers like V8 are able to optimize from Object hashmaps to structs, but the problem is that it has trouble figuring out how to group arbitrary object literals into the same bucket in order to get monomorphism guarantees (this is a very common pattern among projects that emit large data structures such as JS parser projects like babel for example)

This is partly a reason why virtual dom projects don't expose the ability to replace a virtual dom node constructor with inline literals despite it being a pure deterministic function.


That is why top JITs, do cache their findings between runs.

On Android and UWP's case, PGO data is even shared across devices.

And IBM has a JVM commercial feature where a dedicated JIT server gets used and serves the whole cluster nodes with heavily optimized native code.

Naturally JITs coded on long nights tend to lack such engineering.


Your comment was fine until it leveled a needless attack at other JITs.


> Also worth noting that just because an optimization works in V8, there's no guarantee that it will also work in another JS engine.

The essay even covers that, when it mentions that matching arities had a significant impact on V8 (14% improvement) but not perceivable effect on SpiderMonkey.


Likewise just because an inline size was optimal on one architecture doesn't mean it is optimal going forward with larger average cache sizes, etc. A JIT can adapt, while wasm is stuck.



I know we are not supposed to talk about down voting but in this case I would really like to know why? The article Steve links to is great and even the GP (top comment) approves. Yet Steve's comment is now dimmed.

I thought HN was supposed to be better.


It isn't dimmed. It contains only a link and visited links are styled to be lighter gray than normal text. You can check this with the inspector: normal comments have the class c00 (color: #000000); dimmed comments have classes like c5a (color: #5a5a5a), c73, etc.


ok bad. too late to delete.


Have an upvote from me; sorry that my lack of anything but a link contributed to the confusion here!


thanks Steve!


The dancing required to stay within the happy paths, and the necessity of avoiding certain common abstractions in order to do so, leaves me convinced it must be much easier to write performant high-level code in C, C++, Rust etc than in something like JavaScript or Python. While it's possible to write fast code in the latter, it doesn't seem like much fun.


Browser WASM implementations are also JIT based, and GC will eventually be part of the picture for WASM runtimes as well.


> There's never any guarantee that JS that optimizes perfectly today won't hit a performance cliff in a new JIT implementation tomorrow.

I would say the opposite: there kind of are such guarantees, to JIT development’s detriment. Because JIT developers use the performance of known workloads under current state-of-the-art JITs as a regression-testing baseline. If their JIT underperforms the one it’s aiming to supersede on a known workload, their job isn’t done.

This means that there’s little “slack” (https://slatestarcodex.com/2020/05/12/studies-on-slack/) in JIT development — it’ll be unlikely that we’ll see a JIT released (from a major team) that’s an order-of-magnitude better at certain tasks, at the expense of being less-than-an-order-of-magnitude worse at others. (Maybe from an individual as code ancillary to a research paper, but never commercialized/operationalized.)


Browsers have replaced their optimizing backends entirely a few times already. They all optimize similar kinds of code, but the details of when and how change all the time. Vendors have deprecated and regressed on whole benchmark suites when they've decided the tests no longer represent what they want to optimize for.

The biggest problem is that JIT is guided by heuristics. Observations from interpreters decide when code is advanced to higher optimization tiers, and that varies between implementations and changes relatively often. Your hottest code is also subject to JIT cache pressue and eviction policy, so you can't even rely on it staying hot across different machines using the exact same browser.


People are downvoting this comment but I think it's a valuable assertion, and the core fact (JS engine developers use current/real-world empirics to make optimization decisions) is indisputably true: https://v8.dev/blog/sparkplug


Well, it's a bit silly to quote SSC (a blog that rewrites basic philosophy so STEMheads can think they invented it) for a definition of "slack".


I'll need someone to step in with a link to the actual post, because I can't think of the right search terms, but he has (of course) written about this in the past. His self-described schtick is to write about existing things in such a way that they reach a new audience, by dint of acting as a translator between some of the mental models that various people have. I'm not sure how much is gained by describing such an approach in the way that you do.

If you'd simply omitted the bracketed clause, perhaps your comment might have been useful!


The only things philosophers do are think and write, and somehow they are ass bad at writing.

SSC wouldn't have a niche if they could get concepts across without a bunch of obscurantist bullshit.


If you're referring to the continentals, I think French people and academics are just like that because their readers aren't impressed by books they can understand.


They're the most egregious but academics in general suffer from the same flaw.

People criticize SSC because they're like "uh leave it to the REAL experts", but A) it's just opinions and B) the 'real experts' are unreadable, so I'll continue to ignore them while occasionally checking out astralcodexten.


I've spent the last decade working on and thinking about JITs, runtime hidden type modeling, and other optimization problems relating to reactive dynamic systems dealing with heavy polymorphism (specifically Javascript as a programming language).

Now, different people in the industry will have different perspectives, so I'll preface this by saying that this is _my_ view on where the future of JIT compilation leads.

The order of magnitude gains were mostly plumbed with the techniques JIT engineers have brought into play over the last decade or so.

One aspect that remains relevant now is responsiveness to varying workloads, and nible reaction to polymorphic code, and finding the right balance between time spent compiling and the productivity of the optimized code that comes out the other end. There is significant work yet to be done in finding the right runtime type models that quickly and effectively distinguish between monomorphism, polymorphism, and megamorphism, and are able to respond appropriately.

The other major avenue for development, and the one that I have yet to see a lot of talk about, is the potential to move many of these techniques and insights out of the domain of language runtime optimization, and into _libraries_, and allowing developers direct API-level access to the optimization strategies that have been developed.

If you work on this stuff, you find very quickly that a huge amount of a JIT VM's backend infrastructure has nothing to with _compilation_ per se, and much more to do with the support structures that allow for the discovery of fastpaths for operations over "stable" data structures.

In Javascript, the data structure that is of most relevance is a linked list of property-value-maps (javascript objects linked together by proto chains). We use well-known heuristics about the nature of those structures (e.g. "many objects will share the _keyset_ component of the hashtable", and "the linked list of hashtables will organize itself into a tree structure"). Using that information, we factor out common bits of the data-structure "behind the scenes" to optimize object representation and capture shared structural information in hidden types, and then apply techniques (such as inline caches) to optimize on the back of that shared structure.

There's no reason that this sort of approach cannot be applied to _user specified structures_ of different sorts. Different "skeletal shapes" that are highly conserved in programs.

For me, the big promise that these technologies have brought to the fore is the possibility of effectively doing partial specialization of data structures at runtime. Reorganizing data structures under the hood of an implementation to deliver, transparently to the client developer, optimization opportunities that we simply don't even consider as possible today.


Anything explicit JavaScript the language wise to help this? New private attributes? Something similar to Python __slots__.

Basically why the guess structures when programmer could easily tell you the static bits and ask them to be frozen.


On the language side you'd need a way to tell the runtime which properties were relevant to extracting a backbone structure. Maybe a "backbone" field on the property descriptor - e.g.:

`Object.defineProperty(obj, "prop", { writable: true, value: ..., backbone: true });`

Then you'd need to generalize the hidden type modeler in the VM to interpret the backbone field and lift "prop" (and the value associated with it) into the hidden type representation. The important thing to remember here would be to ensure that these foldings of things into the hidden type are _transitive_. If the value of `obj.prop` itself has one or more "backbone" fields defined on it, then those too would need to be lifted into the hidden type.

Assignments to these fields would need to generate a new hidden type for the underlying object, much as new shapes are created and installed for JS objects are mutated to add new properties or change the prototype.

You'd then need to implement some mechanism of collapsing the object representations down to some inline format.

Lastly, you'd need to teach the code-generator to use the hidden type to optimize accesses through "obj.prop" (or longer property access sequences).

At the end of all of that, you then get a really beautiful optimization behaviour where:

"obj.prop.anotherBackboneProp.someRegularProp", in suitably monomorphic or lightly-polymorphic locations, becomes not 3 separately-checked property accesses, but a single type-check on "obj", followed by a access into the top-level object.

That would give a rudimentary ability for library and framework developers to define common shared, stable backbone structures for their own use. And that's fun to really think about.

There's a bunch of work to be done, and it has to be motivated by real use cases, but I'm sure they are there. The idea that this powerful technique is only applicable to the _specific_ set of data structures involved in property lookup in object inheritance chains seems an unimaginative perspective.


Javascript JITs are really good, you (likely) aren't going to see major performance improvements by dropping into WASM.

That said, one major benefit of WASM that Javascript jits will have a hard time competing with is GC pressure. So long as your WASM lib focuses on stack allocations, it'll be real tough for a Javascript native algorithm doing the same thing to compete (particularly if there's a bunch of object/state management).

For a hot math loop doing floating point calcs (mandelbrot calc), however, I've seen javascript end up with identical performance compared to WASM. It's really pretty nuts.


>aren't going to see major performance improvements by dropping into WASM

I think that might also change as they add proposal and roadmap items like "direct access to the DOM"[1] to WASM.

Proposals: https://github.com/WebAssembly/proposals

Roadmap: https://webassembly.org/roadmap/

[1] https://github.com/WebAssembly/interface-types/blob/master/p...

Edit: Overall, the proposals seem to be pushing WASM closer to being a general purpose VM (VM as in JVM, not KVM).


> Javascript JITs are really good, you (likely) aren't going to see major performance improvements by dropping into WASM.

A javascript JIT is never going to be able to compete with codegen from a low-level statically typed language running through an optimizing compiler. I mean, this very article contains the perfect example: by manually inlining the comparison function they got huge performance gains from a sorting function. That is child’s play for GCC or LLVM (it’s the whole point of std::sort in C++).


I don’t know about the “low-level” part. I have a feeling you’d get just as much of a win from a HLL statically-typed language, like Haskell.

It’s the static-typing, not the low-level-ness, doing most of the heavy lifting in making code JITable/WPOable. You don’t need to manually inline a comparison function, if the JIT knows how to inline the larger class of thing that a comparison function happens to fall into, and if the code is amenable to that particular WPO transformation.

I would compare this to SQL: you don’t optimize a SQL query plan by dropping to some lower level where you directly do the query planner’s job for it. Instead, you just add information to the shape of your query such that it becomes amenable to the optimization the query planner knows how to do. That will almost always get you 100% of the wins it’s possible to get on the given query engine anyway, such that there’d be nothing to gain by writing a query plan yourself.


So, you'd think static typing was a major win but it actually isn't (surprisingly) for a JITed language. Most of the benefits of statically typed languages comes from memory layout optimizations. However, that sort of layout optimization is something that every fast JITed language ends up doing.

This is why, for example, javascript in Graal ends up running nearly as fast as Java in the same VM. The reason it isn't just as fast is the VM has to insert constraint checks to deopt when an assumption about the type shape is violated.

https://www.graalvm.org/javascript/


not really, Haskell is slower than C++/Rust. You get faster performance by being mechanical sympathetic, i.e. you care about how to laid out your data in memory efficiently for the CPU to process, when and how much to allocate, which code path to fold together (inlining) to create a smallest possible set of instructions.

JIT is theoretically possible to figure out all that and transform your code into optimized form, but practically? we don't have Sufficiently Smart Compiler[1] yet. Usually, JIT is worse in restructure your data layout than figure out which part to inline.

[1] https://wiki.c2.com/?SufficientlySmartCompiler


There are certainly nasty edges to javascript programming and I'm not trying to say that it will always be near the same performance as a WASM implementation or a statically compiled binary.

What I'm saying is that you'll find more often than not that it gets close enough to not matter. You'll see javascript at or below 2->3x the runtime of a C++ or Rust statically compiled implementation in most cases. That is pretty close as far as languages go. Java is right in the same range (maybe a little lower).


Isn't Java's JIT is better than static optimizations of C++ because of runtime data is also very important for optimizations? Also inlining of comparison functions usually already done in V8...


Depends. Java's JIT can handle cases of dynamic dispatch better than C++ because of runtime information. However, C++ has the advantage of time. It can do more heavy optimizations because the time budget to do such opts is much longer than what the JIT has available to it. C++ compilers can be much more aggressive about things like inlining.

That said, yes, the runtime information provides really valuable information. That's why PGO is a thing for C++.


"Javascript JITs are really good, you (likely) aren't going to see major performance improvements by dropping into WASM."

If that were true, nobody would be working on WASM because there would be no point.

"For a hot math loop doing floating point calcs (mandelbrot calc), however, I've seen javascript end up with identical performance compared to WASM. It's really pretty nuts. "

That's easy mode for a JIT. If it can't do that it's not even worth being called a JIT. That's not a criticism or a snark, it's a description of the problem space.

The problem is people then assume that "tight loops of numeric computation" performance can be translated to "general purpose computing performance", and it can't. I have not seen performance numbers that suggest that Javascript on general computation is at anything like C speed or anything similar. I see performance being more rule-of-thumb'd at 10x slower than C or comparable compiled languages. Now, that's pretty good for a dynamic scripting language, which with a naive-but-optimized interpreter tends to clock in at around 40-50x slower than C. The competition from the other JITs for dynamic scripting languages mostly haven't done as well (with the constant exception of LuaJIT). But JIT performance across the board, including Javascript, seems to have plateaued at much, much less than "same as compiled" performance and I see no reason to believe that's going to change.


> If that were true, nobody would be working on WASM because there would be no point.

Wasm has much more potential than just optimization. It opens up capabilities like:

* Using other languages besides Javascript, or languages that just compile to JS

* Passing programs around vs passing data around

* Providing a VM isolate abstraction that's embedable in your existing language/ runtime


If WASM wasn't faster than Javascript, nobody would care. They would just compile things to the "as fast as C" Javascript, and it would have been done years ago.

We need to redo all this work for WASM precisely because there is no such thing in general as "as fast as C Javascript". Javascript itself is not a suitable target for all those things. A 10x performance haircut off the top, in addition to what it costs to get to even performance that good (JITs tend to eat RAM like candy to get their performance increases on dynamic languages), just wasn't an acceptable base for the things you're talking about.


I'm using wasm and I'm not even targeting the web. JS is irrelevant to me.


|* Using other languages besides Javascript, or languages that just compile to JS|

This might be the most under appreciated comment I have seen all day. While Typescript did some wrangling to bring a bit a bit of "normalcy" to Javascript, the consistency that C++/Rust bring from a coding language point is (for me) enough to have gottem me interested in the use of WASM. Maybe I'm just lazy or have a strong dislike of the headache I get from learning JS compared to switching over to other languages.


WASM however is quite high level, restricted bytecode and stack based. The baseline performance gain over JS is maybe 30% to 10x. Not the 50-100x you‘d expect.

It’s also new. Maybe there is potential here for WASM to get faster.


WASM couldn't be a 50-100x improvement over (optimized) JS because that would make it something like 10x faster than C. C isn't necessarily the absolute upper limit, it does have a couple of systemic flaws in terms of going absolutely the fastest (aliasing issues), but there certainly isn't that much room for improvement in it.

WASM will probably get faster, but I don't necessarily expect Rust-generated WASM or other static language-generated WASM to get that much faster. I think most of it will come from JIT'ing stuff that will mostly affect dynamic language-generated WASM. That said I do think Javascript's JIT is going to be fairly close to the limit of what you can see in WASM either, because it is implausible that you could "just" compile the JS to WASM then use the WASM optimizer to do better than the Javascript JIT. The Javascript JIT has everything the WASM optimizer would have, and more, since there would inevitably be loss in the translation.


> Javascript JITs are really good, you (likely) aren't going to see major performance improvements by dropping into WASM.

The original essay shows exactly that, the one here shows pretty epic optimisations (and needs for an excellent knowledge of the platform) to get to "naive" WASM performances.


For math calculations amenable to SIMD, browser vendors have already designed in a huge advantage for WebAssembly: plans for SIMD support in JS was dropped, and kept for WebAssembly. At the start it gives up to a 4x advantage to WebAssembly with the basic 128-bit SIMD, but possibly 16x later if/when WebAssembly gets support for vector widths corresponding to today's hardware.

(anyone know why they started with narrow SIMD? surely it's easier to emulate 512-bit simd with 128-bit simd instructions than try to compile 128-bit simd to take advantage of 512-bit simd?)


https://blog.feather.systems/jekyll/update/2021/06/21/WasmPe...

We did that! Yea chrome basically keeps right up, of course not accounting for SIMD


"Javascript JITs are really good, you (likely) aren't going to see major performance improvements by dropping into WASM."

WASM may not significantly outperform the JIT of one particular browser on a given scenario but you are more likely to get homogeneous performance across different browsers.


This nicely show why Rust/WASM speed up things.

All that detective work? Is UNDOING what javascript do, idiomatically!

Can be argued that JS is "fast" after all that, but instead, that show Rust give it for free, because is idiomatic there.

The problem with JS and other mis-designed languages is that do easy things are easy, but add complexity is easier. And the language NOT HELP if you wanna reduce it.

In contrast, in Rust (more thanks to ML-like type system), the code YELL the more complex it becomes. And Rust help to simplify things.

That is IMHO the major advantage of a static type system, enriched on the ML family: The types are not there, per-se, as performance trick, but for guide how model the app, then you can model performance!

P.D: In other words? If JS were alike rust (ala: ML, plus more use of vectors!) it will be far easier to a) not get on a bad performance spot in first place b) easy to simplify things with type modeling!


I wouldn’t call JS mis-designed. Most use of JS in the world is still for simple things, e.g. validating form inputs to put an X or checkmark beside each field. It’s important for those cases that in JS “easy things are easy”, even at the expense of developer productivity for complex apps. JavaScript (the syntax+semantics) was never intended for complex apps.

It would be better if we had JS and another language that were both usable in browsers to interact with the DOM, where this other language was more low-level. And this was the original plan — the “other language” was originally going to be Java. (Java applets were originally capable of manipulating the DOM, through the same API surface as JavaScript!) Then people stopped liking Java applets, so it moved to thinking about “pluggable” language runtimes, loadable through either NPAPI (Netscape/Firefox) or ActiveX (IE), enabling <script type=“foo”> for arbitrary values of foo. This effort died too, both because those plugin systems are security nightmares (PPAPI came too late), and because browsers just didn’t seem willing to standardize on a plugin ecosystem in a way that would allow websites to declare a plugin once that enables the same functionality on all past, present, and future browsers, the way JS does.

Eventually, we acknowledged that all browsers had already implemented JavaScript engines (incl. legacy browsers on feature-phones) and so it would be basically impossible to achieve the same reach with a latecomer. So we switched to the strategy of making browsers’ JavaScript engines work well when you use in-band signalling to (effectively) program them in another language; and we called the result WASM.

This isn’t the cleanest strategy. What’s great about it, though, is that (the text format of) WASM will load and run in any browser that supports JavaScript itself.


> And this was the original plan — the “other language” was ...

In short? mis-designed. JS was fora use case, and have become more and more for other uses case (even backend!). This is also the problem with html, css, dom.

Plus, some WATs are part of rushing it, and others for mismatch in the uses cases.


I think the fact that it features, very prominently, an OO model (prototypal) that's pretty firmly in "please, never actually use any of the notable features that differentiate this model from others" territory is enough to fairly label it mis-designed, however far it's come. It's no accident that prototypal-heavy JS is damn near at the bottom of paradigms for approaching JS programming, beneath a bunch of others, OO or otherwise, that brush right past it, pretending it's not there.

I'd point to not making async calls synchronous-to-the-caller by default as another pretty bad design mistake. The way so many JS files start nearly every line (correctly! This isn't even counting mis-use of the feature, which is also widespread!) with `await` is evidence of this, and so's all the earlier thrashing before we had `await` to deal semi-sanely with this bad design decision.

The original scoping system was just bad. We have better tools to make it suck less now, but it was designed wrong originally.


I’ve thought about what JS would look like if instead of await, the language would await by default and expressions would be explicitly deferable via some other keyword (like “defer”).


Yes, Javascript can be made surprisingly fast, but it requires very detailed knowledge of the JS engine internals and may require giving up most high-level features that define the language (as demonstrated by asm.js). And the resulting carefully optimized JS code is often less readable / maintainable than relatively straightforward C code doing the same thing compiled to WASM.


When I started developing my JS 2D canvas library (back in 2013) I worked on a desktop PC and used Firefox for most of my code testing work. The library always ran faster on Firefox than on IE or Chrome, which at the time I assumed was probably something to do with Firefox putting extra effort into the canvas part of their engine.

Then in 2019 I rewrote the whole library from scratch (again), this time working on a MacBook using Chrome for most of my develop/test stuff. The library is now much faster on Chrome! I assume what happened was that I was constantly looking for ways to improve the code for speed (in canvas-world, if everything is not completing in under 16ms then you must own the humiliation) which meant that prior to 2019 I was subconsciously optimising for FF; after 2019 for Chrome.

> it requires very detailed knowledge of the JS engine internals and may require giving up most high-level features that define the language

I can't claim that I have that knowledge. Most of my optimisations have been basic JS 101 stuff - object pools (eg for canvas, vectors, etc) to minimise the stuff sent to garbage, doing the bulk of the work in non-DOM canvases, minimising prototype chain lookups as much as possible, etc. When I tried to add web workers to the mix I ended up slowing everything down!

There is stuff in my library that I do want to port over to WASM, but my experiments in that direction so far have been less than successful - I need to learn a lot more WASM/WebAssembly/Rust before I make progress, but the learning is not as much fun as I had hoped it would be.

> Javascript can be made surprisingly fast

The speeds that can be achieved by browser JS engines still astonish me! For instance, shape and animate an image between two curved paths in real time in a 2D (not WebGL!) canvas:

- CodePen - https://codepen.io/kaliedarik/pen/ExyZKbY

- Demo with additional controls - https://scrawl-v8.rikweb.org.uk/demo/canvas-024.html


Yeah, it strikes me as an odd choice that V8 doesn't just use C code for something as central and important as sorting, but maybe there's some technical reason.


There's probably some overhead for switching between a C++ implementation and user-defined JS, and the sorter would need to do that a lot since most sort calls come with the compare-callback.


My understanding comes from Erlang’s HiPE optimizer (which was AoT rather than a JIT), but I think you’ve nailed it. Most JITs can’t optimize across a native↔interpreted boundary, any more than compilers can optimize across a dynamic-linkage (dlopen(2)) boundary. When all the code is interpreted, you get something that’s a lot more WPO-like.

IIRC, some very clever JITs do a waltz here:

1. replace the user’s “syscall” into the runtime, with emitted bytecode;

2. WPO the user’s bytecode together with the generated bytecode;

3. Find any pattern that still resembles the post-optimization form of part of the bytecode equivalent of the call into the runtime, and replace it back with a “syscall” to the runtime, for doing that partial step.

Maybe clearer with an example. Imagine there’s a platform-native function sha256(byte[]). The JIT would:

1. Replace the call to sha256(byte[]) with a bytecode loop over the buffer, plus a bunch of logic to actually do a sha256 to a fixed-size register;

2. WPO the code with that bytecode loop embedded;

3. Replace the core of the remnants of the sha256-to-a-fixed-size-register code, with a call to a platform-native sha256(register) function.


> maybe there's some technical reason

Native code is opaque to the JIT, which means no inlining through native code (inlining being the driver for lots of optimisations) and no specialisation. This means if you have a JIT native code is fine for "leaf" functions but not great for intermediate ones, as it hampers everything.

When ES6 compatibility was first released, the built-in array methods were orders of magnitude slower than hand-rolling pure JS versions.

This issue is one of the things Graal/Truffle attempts to fix, by moving the native code inside the JIT.


For some functions speed-up from inlining and run-time specialization via JIT outweigh the benefit of native code speed.

There's similarly a cost in crossing JS<>WASM boundary, so WASM doesn't help for speeding up small functions and can't make DOM-heavy code faster.


I disagree.

JS optimizations are implementation defined. They are also largely undocumented. You will have to read source code if you want to know the truth, or write microbenchmarks to probe the VM behavior.

Your optimization from today can be deoptimized tomorrow. There are no documented guarantes that certain code will remain fast.

Knowing about inline caching, shapes, smis, deoptimizations and trampolines... does helps. But the internals are unintuitive.

Instead, I can save myself that time and use WASM instead.

There are so many things that can trigger a deoptimization that I would rather ignore them all and do it in WASM instead.


Is there a 2021 update to this topic? How has the state of WASM changed? How have JS engines changed? Is it still a good idea to make these changes to your JS code?


https://blog.feather.systems/jekyll/update/2021/06/21/WasmPe...

Here's some benchmarks. But in summary, for small simple hot loops, chrome is just as fast using js, but for firefox wasm gives big performance improvements.

Of course, if you can use SIMD instructions, then wasm will win, but that's less fair.


It is shocking how fast javascript is.

Hyperscript is an interpreted programming language written on top of javascript with an insane runtime that resolves promises at the expression level to implement async transparency.

Head to the playground and select the "Drag" example:

https://hyperscript.org/playground/

And drag that div around. That whole thing is happening in an interpreted event-driven loop:

  repeat until event pointerup from document
    wait for pointermove(pageX, pageY) or
             pointerup(pageX, pageY) from document
    add { left: `${pageX - xoff}`, top: `${pageY - yoff}` }
  end
The fact that the performance of this demo isn't absolutely CPU-meltingly bad is dramatic testament to how smoking fast javascript is.


I cannot tell if you're being ironic or not. Being able to drag a static text box around is not a very impressive feat. And more importantly, it is CPU-meltingly bad: when I drag the box around, my Firefox process jumps from 5% to a steady 35% (on a Ryzen 2600).


You're... dragging an empty div around. You know what else doesn't choke? Me dragging absolutely any window on my PC. It will happily do so at 120Hz without chugging. It even does crazy things, such as playing games with complex event loops at 120FPS.

The fact that you're excited at how dragging a single div around actually works as expected is a testament to how horrifyingly low your expectations are.


For JS it is the same - moving empty div or moving whole window - it is still same amount of code and actual moving is always on different threads. It works for native apps too.


Doesn't work on desktop Safari. Div "rises" on-click and the cursor changes as (evidently) intended, but the div cannot be dragged.


Discussed at the time:

Maybe you don't need Rust and WASM to speed up your JS - https://news.ycombinator.com/item?id=16413917 - Feb 2018 (181 comments)


(https://blog.feather.systems/jekyll/update/2021/06/21/WasmPe...) OP was very useful for us in optimizing our js vs wasm benchmarks. We were wondering what a very simple parallel problem like mandelbrot rendering would show about browser jits vs their wasm compilers.

Our conclusion was that wasm was consistent across browsers whereas js wasn't. Further, If you can use simd, wasm is faster. Also, v8 is way faster than spidermonkey.


Your link gives a 404, I think there is a problem with the path in the URL.

And since you are talking about the fastest mandelbrot on the web, here is my contribution https://mandelbrot.ophir.dev

It renders in real time and is fully interactive, all in pure js.


Does it still 404? Sorry I made some mistake with it. Theres a link in the post to an interactive demo.

The reason I said fastest is cause I wrote the hot loop with SIMD instructions. I'll be adding in period checking and stuff in a couple months time, will be sure to refer your work then, thanks.



Regarding caching and memoisation. Isn't the main benefit memory usage saving. I wonder if it's possible to do parsing for speed and then background deduplication for memory savings. (I don't know what the status is of multithreading in js or wasm)


Huh? Caching is explicitly about extracting time savings at the cost of increased memory usage. Did I read your comment wrong?


> Maybe you don’t need Rust and WASM to speed up your JS

Perhaps, but you do need it in order to have an excuse to use Rust and WASM. True story, this is what I did last weekend.


That's the problem. Unless you're doing something which does a very large amount of work on the client, you should not need WASM. Game engine, yes. Blog, no. Shopping cart, no. Scrollbar, no. Snooping on the user's mouse cursor, NO.


Absolutely. Incidentally, I’m very new to game programming (and performance-critical code in general). Are there high-level GC’d languages which are well suited for (pick some flavor of) game programming? I always read about C++ or (increasingly) Rust when it comes to game programming. Even with all its modern features, Rust still errs on the zero cost abstraction side when it comes to abstracting away the underlying memory and execution model. In 2021, are compilers, JITs, and GC algorithms still worse than humans at optimizing performance critical code? I thought we’d have moved past that by now.


Unity is mostly programmed in C#. It's a good way to start developing games. Unreal Engine and C++ are used for many AAA titles, but it's a harder world than Unity.

There are some game engines in Rust for simpler 3D games. I haven't tried them. 2D games seem to be well supported in Rust. I'm working in Rust, writing a client for a virtual world, and I have to say that the 3D graphics ecosystem for that kind of thing in Rust is not quite there yet. I'm using bleeding edge packages (Rend3->WGPU->Vulkan) where I'm in regular contact with the developers. I'm doing it this way mostly because I want to see if I can get better performance than the existing single-thread C++ client, which is compute-bound in the main thread.


Depends on what game you are looking to develop. There will always be an area where only low-level code will do, but for a great chunk of games, a modern high level language will be more than adequate. While it is not used as often as it should in my opinion, Java also has LWJGL, and with the newish low-latency GC, there won’t be any GC freeze happening.


You need wasm to get any good performance on data intensive applications, because SIMD.


Why would someone think that the purpose of WASM is to speed up JavaScript?


I read that sentence as replace the word "JavaScript" with "Frontend" and it made more sense.

Generally though, I think wasm represents an opportunity to break the strangle-hold JavaScript has had on the frontend ecosystem for nearly 3 decades. Typed, compiled languages typically have huge advantages in terms of safety and performance over dynamically typed interpreted languages. And now a lot of high-level features and syntactic sugar that used to be only available in dynamic languages is becoming available in systems languages.


SIMD is locked away behind WebAssembly, for many tasks you'll be giving up many multiples of available performance if you stick to JS.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: