Firefox Nightly: WebAssembly.instantiate took 227.6ms (54.4mb/s)
Chrome Canary: WebAssembly.instantiate took 8576ms (1.4mb/s)
(Edit: And I believe that's not even using the streaming compilation mentioned in the article, it's just the new baseline compiler in action)
That's correct. Streaming compilation would finish earlier, but might actually benchmark more slowly because you'd be adding in the time that the compiler is idle and waiting for the network to catch up.
Preloading the .wasm file in the test lets us measure just the speed of the compiler, independent of the network.
Even though I'd prefer to use Firefox I tend to stick with Safari due to the battery life advantage which really shows when you open a lot of tabs.
I believe I could summarize things by saying the only way you can really save energy* doing the same work+ is by using a different semiconductor process (either power/leakage-reduction-focused or smaller).
* For serious values of "energy"
+ Where the same work is not always true for a given task, if one optimizes an algorithm
P = V * I
I = V / R
P = V * (V / R)
= V^2 / R
That's not the primary cause of the power = frequency^2 rule, but actually adds a factor on top of that.
Chrome: WebAssembly.instantiate took 12935.5 ms (1 MB/s)
Firefox Nightly: WebAssembly.instantiate took 1223.1 ms (10.1 MB/s)
Yikes, one order of magnitude in difference.
FF: WebAssembly.instantiate took 280.3 ms (44.2 MB/s)
Chrome: WebAssembly.instantiate took 3022.4 ms (4.1 MB/s)
Chrome: WebAssembly.instantiate took 13692.8 ms (0.9 MB/s)
Firefox: WebAssembly.instantiate took 330.8 ms (37.4 MB/s)
This is on MacOS 10.13.2. I'd love to run Firefox but the battery savings and reduced heat from using Safari makes it too hard to pass up in this regard.
Firefox Nightly: WebAssembly.instantiate took 158.2 ms (78.3 MB/s)
Edge 41.16299.15.0: WebAssembly.instantiate took 99.2 ms (124.8 MB/s)
I did not expect Edge to be even faster.
Nightly:WebAssembly.instantiate took 1825.3ms (6.8mb/s)
Mine was a laptop in low power mode. Yours was much faster all around.
Firefox 57: 5053.8 ms (2.4 MB/s)
Firefox Nightly 59: 454.6 ms (27.2 MB/s)
Chrome 63: 9034.9 ms (1.4 MB/s)
Wow, it's over x10 faster...
The repo itself is at https://github.com/lukewagner/test-tanks-compile-time
In Safari 11.0.2 I get: WebAssembly.instantiate took 2885.9 ms (4.3 MB/s)
In Vivaldi 1.13.1008.40 I get: WebAssembly.instantiate took 7719 ms (1.6 MB/s)
How far we've come. A whirlwind tour of todays JITs (apologies for the million links):
.Net Core seems not to use tiered compilation. It never interprets the IR; everything is run through the same JIT compiler. https://github.com/dotnet/coreclr/issues/4331
HotSpot uses three tiers these days (counting direct interpretation as a tier) - https://docs.oracle.com/javase/8/docs/technotes/guides/vm/pe...
Edge's Chakra engine has two - https://blogs.msdn.microsoft.com/ie/2014/10/09/announcing-ke...
V8 seems to use two - https://v8project.blogspot.co.uk/2017/05/launching-ignition-...
Firefox's SpiderMonkey JS engine uses two - https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Sp...
The only downside I'm aware of is that it increases the pressure on the code cache. If your code cache is not large enough, it will thrash as methods are discarded then recompiled. We had significant performance problems with a server and it took quite awhile until we realized that was the cause. A cache of 256 mb was more than enough for us running a 2 million LOC monolith under Tomcat, so the absolute memory use isn't that significant. (Reference we found while researching: http://engineering.indeedblog.com/blog/2016/09/job-search-we...).
Once you know this is an issue, it's easy to monitor, but it is one more thing that can go wrong in the JVM.
Only a couple years later did they re-attempt it and stick with it.
I could be mistaken here, and I wasn't able to find anything online to support me.
The only variants of .NET with interpreter support were from 3rd party implementations, and the .NET Micro Framework, used in NETduino.
And now their focus seems to be to improve their AOT story.
Another interesting evolution was Android, with Dalvik and its basic JIT, ART with AOT on installation, to ART reboot with an interpreter in Assembly, followed by JIT and AOT code cache with PGO.
Android optimizes for battery life, but it's also worth noting that Dalvik was a really rudimentary JIT, having no benefits from JIT compilation, only drawbacks, the ART with AOT being a good upgrade.
But tiered compilation is in a different league, being about speculating what's going to happen depending on what the process has witnessed thus far. The point of tiered compilation is to profile/guard stuff at runtime and recompile pieces of code based on changing conditions, which is how you can optimize virtual call sites or other dynamic pieces, which you can't do ahead of time because the missing part is the decompiler which can revert optimizations based on invalidated conditions.
It's really interesting actually, because you can profile a C++ app and use that to optimize your AOT compilation, but the compiler is still limited by the things it can prove ahead of time, or otherwise it would be memory unsafe.
Should have been more explicit, as I was referring to CoreRT and .NET Native.
> But tiered compilation is in a different league, being about speculating what's going to happen depending on what the process has witnessed thus far.
Just as ART was refactored on Android 7 and 8. ART with pure AOT is only for Android 5 and 6.
Tiered is specifically that you have a fast compiler and a slow compiler (or further tiers). Speculative is as you describe.
Still counts as JIT in my book, but you're right that it's a bit subtle.
Unix-style configure/build/install isn't considered JIT.
Installing a .Net application is pretty similar, but we don't consider it JIT.
In the usual .Net model, what's distributed is IR rather than source-code. Compilation to native code happens at install time. The build-and-install process is less explicit than the Unix way, and it's less error-prone (fewer dependency issues and issues with the compiler not liking your source code).
Really it's a very similar model to the Unix one, but we call one JIT and not the other.
Oracle Java, of course, only ever compiles to native code at runtime, and never caches native code. 'Proper' JIT. (This may be set to change in the near future though.)
Interestingly, .Net seems to be moving in the direction of full static compilation, or they wouldn't be asking devs to rebuild UWP apps to incorporate framework fixes - https://aka.ms/sqfj4h/
Various research OSs are JIT-based, of course. It looks like JX (a Java operating system) caches its native code, so it's not 'pure JIT' https://github.com/mczero80/jx/blob/5fbeae79/libs/compiler_e...
It looks like Cosmos (a C# operating system) does the same https://en.wikipedia.org/wiki/IL2CPU
I think it would need to be integrated into the package management system pretty tightly (or have one of its own) to get all of the shared library dependencies.
In that case the only real JIT I know of is basic-block-versioning. I think almost all JITs will compile branches or methods to some extent before they are actually needed.
Yours is probably not a reasonable definition therefore. I think a JIT is just a compiler that can compile as the program is running.
I mean, that's more or less what the name "just-in-time compiler" implies. I'm aware that the name is not necessarily a precise definition, but I'm not sure how far the definition stretches. Does JIT have a precise agreed-upon definition, or is it somewhat more vaguely defined?
Whether that lazy-compilation strategy is fine-grained or not isn't clearcut, I believe. I think if you distribute a C program with a bash bootstrapper calling plain old gcc to compile and run the C code only when needed, even gcc might be considered a (coarse-grained, rather rudimentary) JIT in that context.
If it means compile on start, it still requires the compiler to be used at load time.
Non JIT would mean you can distribute the code without the compiler. If you can't do that, its JITTED or interpreted, if instead of requiring a compiler to be present you require an interpreter.
- libraries written in another language, such as SQL.js
- hot spots of an application that can benefit from fast number crunching (e.g., gaming, visualization)
- truly cross-platform at native performance
I don't think anyone serious enough to use WASM in their application is making the assumption of using wasm will make all your stuff faster. It won't. It's just another performance tool, with its benefits subject to performance methodologies.
Sidebar: Google Doc is an interesting application in this perspective, given they render the entire application in a canvas, and the application itself is probably not written in JS. I'm excited what the future holds for tools like Google Doc.
What I profoundly dislike is that such good articles about wasm, written by excellent technical people, all silently ignore that. I am absolutely certain that the authors know all about it, but they don't mention that to their audience, which, for the majority, doesn't know. Therefore, that very silence (not just glossing over, but actual silence) brings misinformation to the masses.
Why would we possibly return to a manual memory management, raw pointer oriented, assembly language level of abstraction from the much richer and safer abstraction that JS already has? Wasm doesn't even have any notion of Characters or Strings! You really want to return to the days of each project having their own String libraries, because it's all built on top of raw asm?
Webasm is not for JS-style code! You can't have "a little pre-compilation of JS to WebASM", that makes zero sense. We already have the incredibly complex JIT compilation of JS to x64/ARM/etc, which necessarily interacts with the garbage collector, type system, permissions & security, browser debugging/profiling tools, etc, all of which wasm does not have any notion.
Just wait until the JVM and Flash Runtime are ported to wasm. Downloaded and compiled on every page load :).
Why that, instead of caching common components like any other web asset, or even having the browser act as a dependency manager?
There are bound to better solutions than "download and compile on every page load."
If C++-to-WASM via clang catches on, I think this is exactly what will happen.
DOM will die as soon as the industry moves to one or two good GUI toolkits that run under Webassembly and are way faster to use than the cumbersome present combination of HTML+CSS+CSS preprocessor+JS libs.
Mark my words.
Everyone thinks that the rendering engines in browsers are easy to beat in terms of performance. I thought that too, until I implemented one. They are definitely beatable, but not easily, and certainly not with an architecture like that of Qt or GTK.
E.g., I don't think any sane design of a UI toolkit would include the ability to read and modify the string representation of the UI code at runtime - yet it's a critical feature for the DOM.
Likewise, you wouldn't necessarily need the ability to access and mutate arbitrary nodes of the document tree at any time. (including mutations that might change which CSS selectors apply to a node)
E.g., you could only expose higher-level widgets instead or only expose variables that feed into a template. That would allow optimisations which aren't possible with CSS and DOM.
Finally, a WASM toolkit would be shipped with a particular website anyway, so it wouldn't need to be general-purpose.
On the other hand, there is a great incentive for website operators to make their site into a single unparseable blob: Ad-blockers. If every site had it's own internal data representation and internal rendering engine, that would make it almost impossible for ad-blockers to modify certain parts of the site while leaving others intact.
Those can largely be avoided, and they typically don't cause global performance impacts.
> E.g., I don't think any sane design of a UI toolkit would include the ability to read and modify the string representation of the UI code at runtime - yet it's a critical feature for the DOM.
That isn't a problem. innerHTML is lazily computed from the tree structure: if you don't use it, you don't pay for it.
> Likewise, you wouldn't necessarily need the ability to access and mutate arbitrary nodes of the document tree at any time. (including mutations that might change which CSS selectors apply to a node) E.g., you could only expose higher-level widgets instead or only expose variables that feed into a template.
The main benefit of this would be to eliminate restyling, but cascading is really useful from a design point of view. That's why we've seen native frameworks such as Qt and GTK+ move to style sheets. And if you reinvent restyling, it'll be a ton of work to do better—remember that Servo and Firefox Quantum have a parallel work-stealing implementation of it. I've never seen any native toolkit that even comes close to that amount of performance effort.
I'm not paying for it, the DOM implementation is - with increased complexity. (E.g., HTML parsing suddenly becomes a time-critical operation because some wiseguy decided to implement animations for his website using setTimeout and innerHTML.)
And they can't drop it because a lot of sites rely on it - however, if you wrote a new, limited-purpose renderer on top of WASM, you could decide to drop it and simplify the implementation without losing much utility.
> And if you reinvent restyling, it'll be a ton of work to do better
But that's kind of my point - if you can control which parts of the tree are exposed and which mutations are valid, you might not need to implement restyling at all. (Or in reduced scope)
I'm not talking about cascading in general, but about how you can make arbitrary changes to the DOM after initial load, which the restyler has to fully support.
We're talking about performance here, not implementation complexity. Besides, it's not a win in terms of complexity if sites ship a limited subset of the Web stack to run on top of the full implementation of the Web stack that's already there.
> But that's kind of my point - if you can control which parts of the tree are exposed and which mutations are valid, you might not need to implement restyling at all. (Or in reduced scope)
Sure, you can improve performance by removing useful features. But I think it'll be a hard sell to front-end developers. Qt and GTK+ didn't add style sheets and restyling for no reason. They added those features because developers demanded them.
My point is that writing custom UI renderers using canvas and WASM might become a reasonable thing to do. For that you don't need to stick to the web stack at all, you can invent whatever language, API and data model fits your needs. Those can be a lot simpler than the DOM and therefore easier to implement with good performance.
Please correct me (you know a lot, and I'm betting some of my assumptions are wrong).
I'm looking into building an extension to build Quantum Display Lists from a WASM vdom.
Last time I checked C/C++ based UI libraries even text selection was a problem. If there were a cross platform way to build UI-s as good and feature rich as a modern browser is now, then it will slowly die.
That's the reason we have so many Electron based apps, because it makes UI building really simple.
What I think of GUI toolkits, I think of lots of imperative code to build out an interface, e.g., "Create a window. Add a vertical box layout. Create button1. Change button1.font to xxx. Change button1.style to bold. Set the minimum height of button1 to 20px. Add button1 to the box. Create button2. Add button2 to the box. Tell the box to grow button2 when it is resized. Create button 3..."
The declarative style of HTML/CSS seems so much better. The grouping of elements becomes apparent just by looking at how they are nested, with no need to keep track of what gets added to what. And CSS gives you a really rich ability to select groups of elements, style them, try out new styles, reuse styles across pages, and so on.
CSS has definitely gotten really complicated. But then, I could never build anything in a GUI toolkit without constantly referencing the API docs to figure out how to do this or that, either...
I'm not a frontend whiz by any means, but I've always found the widget-centric (GUI) approach fit my mental model better than the HTML centric one.
SPA architectures help, but I find most HTML designers tend to prefer raw HTML to any composed/widget approaches.
CSS (as a concept) is actually quite great, which is why you've seen the older GUI approaches adopt it. Qt itself is also leading more towards a reactive approach where you interact with an abstract data model, and the UI reflects the updates.
Qt always had a nice UI designer, with layout managers for responsive UIs, no need for imperative code to build out an interface.
Usually imperative UI code tends to be a thing only among developers that dislike RAD tooling, or game devs using immediate mode UIs.
Actually I think that most game UIs (the menus, settings, inventories - not the actual game) are done quite well - maybe immediate mode GUIs have a place outside of gamedev?
- abstracts lots of file handling
- js is much easier than most languages to pick up and learn
- js has a large dev base because of the web
- people are well versed in coding for the web, and adding a “native” layer on top of that is actually quite easy
- cross platform c/c++ in general is not so simple even without the GUI
If you’re talking about rendering everything on a canvas, well, there’s been the occasional discussion about making it a11y-friendly, exposing content in it to screen readers and so forth, but nothing has really happened with it.
Your WebAssembly GUI toolkit is going to be completely invisible to screen readers.
This means compiling the lib to wasm, right? At first I was thinking of running JS clientside that imports some C lib somehow, which confused me.
Note that wasm's main objective is to run non-JS code on browsers, not for a faster JS.
Here is the best overview and tutorial I've ever seen in HN , if you are interested:
Is this this absurd? Given you can compile WASM whilst streaming, you should really precompile your JS into WASM if you can.
Yes, it is absurd. Because your wasm that was compiled from JS needs to embed an entire implementation of the dynamic nature of JS. And to make all those dynamic features remotely fast, you cannot just compile them as is. You need to use a JIT to be able to perform speculative optimizations. But then, where's the JIT? Oh, it's built inside your wasm code. And basically you end up shipping a JS interpreter+compiler+JIT as part of your wasm, instead of just the .js code. Parsing and compiling all of that will be much, much worse than parsing the .js code and feed it to the already existing JS interpreter+compiler+JIT that is in the browser.
Whether that's useful in practice is an open question, but it's plausible.
There is virtually no human-written JS code that is amenable to compilation to wasm in a meaningful way. At the very least, you need a (mostly) sound type system to be able to compile to wasm with a positive expected ROI.
The speed of wasm comes in a large part from the fact that it is entirely statically typed, which means we don't need the speculative optimizations (and their deoptimization guards) all over the place.
Modern CPUs also don't have hardware GC support. Intel i432 was the last attempt at it.
Interaction with DOM can be achieved with a few imported functions.
Intel i432 was far from the last attempt. Besides all of the Lisp HW developed after it, Azul made CPUs in the 2000s with hardware support for GC. Acceleration of concurrent copying collection requires a surprisingly low amount of CPU support.
You are right I also forgot about Azul, but eventually they dropped it, because it wasn't worthwhile anymore, just like it happened with all other specialized hardware implementations.
Firefox 57: WebAssembly.instantiate took 2990.2ms (4.1mb/s)
Chrome 63: WebAssembly.instantiate took 8736.9ms (1.4mb/s)
Safari 11.0.2: WebAssembly.instantiate took 10341ms (1.2mb/s)
If more speed is about to arrive, wow.
I'm curious what optimisations are needed / valuable for wasm files to improve streaming performance. I'm assuming if, e.g.:
baz = baz +1
Then compilation would start and get stuck until it had a definition for bar? If so, presumably the next build time optimisations for a website will be to shuffle the code around in to as optimal an order as possible so as to improve streaming compilation speed?
The list goes on, but the idea is that certain hot spots in an application will be able to benefit from having a fast number crunching engine.
All I was able to find is this issue: https://github.com/WebAssembly/design/issues/1079 with not activity for a long time
1) Write a Windows app
2) Run it in the browser with wasm
3) Stuff that into Electron and distribute to Mac/Linux/Windows
Why distribute the electron wrapped wasm on Windows instead of using the real native Windows app? It's more consistent this way! Single codebase! Developer efficiencies!
Webassembly.org's own docs mention that it's intended to be agnostic about its runtime environment. Electron is for packaging HTML, CSS and JS into a "native" application, but WASM doesn't actually need that if it's running outside the web.
Why not a native runtime on top of a cross-platform library like SDL? Just because it's "Web Assembly" doesn't mean it has to be limited to webdev paradigms.
C, C++, Rust and others already have their own DOM/JS support.
WASM is exciting for statically typed languages especially. JS is not the target. It might eventually benefit from faster parsing but that's not the motive now.
What does that mean? Could you expand on that?
Can Rust code compiled to wasm manipulate the DOM?
C++ -- https://github.com/mbasso/asm-dom
I don't know exactly how these work but Emscripten allows interop both ways (embedding JS in native code and calling native code from JS) -- https://kripken.github.io/emscripten-site/docs/porting/conne...
I'm not sure what your parent means.
Chrome 63: 3143.7ms (3.9mb/s)
Firefox 57: 1499ms (8.3mb/s)
Edge 41: 97.3ms (127.2mb/s) !!!
Firefox 59: 474 ms
Edge 41: 164 ms
The difference is smaller with newer FF, but that is amazing from Edge!
Chromium: 1835 ms (6.7 MB/s)
No offense to Yehuda in general (he is doing great work), but Ember.js so ignorant of any js-size recommendations, that it seems weird to quote Yehuda in that context.
Funny enough, on my workstation it seems to compile something more like 60-80 MiB/s, to keep up with my network which was recently upgraded to gigabit.
Very impressive stuff, I hope workstation CPUs can keep pace with networks.
Sooner or later, that’s an avenue people will want to explore, I assume?
As a side note, it is interesting to see that multithreaded compilation of a single page provides significant performance benefits here...this is usually not done with C/C++ code compilation from what I understand about it
> As a side note, it is interesting to see that multithreaded compilation of a single page provides significant performance benefits here...this is usually not done with C/C++ code compilation from what I understand about it
It's slightly different, but native code is typically compiled concurrently, too. The meat of it is often handled by the build system rather than the compiler itself, but that's not so different.
Assembly was actually bytecode, with a micro-coded CPU doing the actual execution.
All Xerox computers were like that. The first boot step was to load the right kind of micro-code for the environment being started.
The AS/400 native environment (nowadays known as IBM i), is based on bytecode TIMI, which gets AOT compiled via a kernel level JIT.
7 was a phenomenal release, I saw 50% reductions in processing time across the board and on old array heavy systems 5-10x memory reduction.
To be fair the benchmarks usually take a wordpress
or drupal installation and do a requests per second measurement, which IMO is a real world benchmark.
No hate, I just don't get why hhvm doesn't get any love for what they did. Maybe because from HPHPc to HHVM they seriously gave the PHPc a competition and people kind of got mad.
To be fair that's not because they where sleeping, but because they attempted to do something that proved too hard (unicode support) and they had to abandon it. That's why PHP skipped version 6.
I don't know - I expected to see a ton of Hack projects show up here but it's like no one cared about the language except as a wake-up call to PHP. Maybe the involvement of Facebook put people off.
Well, that's because typically all cores are maxed out during a parallel build of large-scale C++ software, so there's no need to go any further.
With link-time optimization it's a different story…hence the work some compilers (like rustc for Rust) are doing to parallelize builds of single compilation units.
I wrote up a short article and video demonstrating it last year at https://hacks.mozilla.org/2017/03/previewing-the-webassembly...
It doesn't need to be. This is a choice they've made. Other implementations of WASM could interpret it they wanted.
The Church-Turing thesis tells us that any program you can compile to machine code can also be interpreted, so it is not possible that any language needs to be compiled into machine code.
Second reason is that this approach matches how the underlying theory of languages and automatons works. One can view modern AST producing compiler frontend as compiler that compiles it's input into program that builds the resulting AST.
On the other hand many modern optimalization passes simply cannot be done in streaming manner or even by any pushdown automaton.
Additionnally, it wouldn't be portable (executable compiled for desktop wouldn't run on mobile).
 See this comment : https://news.ycombinator.com/item?id=16171133
And if you compare the size of the binary output with the size of the source code, the binary is bigger in many cases because of optimizations (and runtime size, for small programs). Additionally, the source code can be gziped with a good compression factor whereas the binary cannot. Then 99% of the time, the source code is lighter to send over the internet than the compiled binary.
I’d imagine that nobody does speculative compilation since the benefit is too low given how fast the network is. Also, yes, there would be security concerns.
Didn't Google's NaCL implement verification of sandboxed machine code?
Maybe they can optimize further by speculating what the next line will be...
It runs in existing browser VMs, which have been pretty battle tested.
Another interesting note is that threads are now on hold for WebAssembly due to Spectre, that is, SharedBufferArray has been disabled. Hopefully it can be enabled in the future.
* the filesystem is cloud storage (Drive/Dropbox/what have you -- the Unhosted (https://unhosted.org/) architecture)
* the apps are insecure but open-source by requirement (interpreted jS)
* ... running in a controlled sandbox (the browser)
* ... using a standard UI language (HTML/CSS)
* with functionality modifiable/overridable by user preference (extensions)
It's pretty much the ecosystem you would want if you were building this from scratch! Except you'd want Html/CSS/JS to be much more intelligently designed from the start (I'm waiting so eagerly for the day that browsers natively run more scripting languages than just JS...)
It never could be done in the 90s because everything ran too slowly, but it's feasible now.
It used to be called Lisp Machines, Smalltalk, Oberon Juice, Java Jini, Inferno.
In theory this could really be the universal vm for the web everyone needed, but it's still lacking real sockets and dom support.
Or going the other way, could hotspot be replaced with a wasm jit by compiling java to wasm? I know they have slightly different memory models, but I don't understand why they seem to be treated so separately.
But that is kind of my point - there are advanced vms out there. I don't see why the web needs its own vm apart from them. All the differences I see are fairly minimal.
I don't know all the JVM implementations out there, but it wouldn't surprise me if there was one with it implemented.
By the way, RMI and Jini worked by streaming code across the network.
It looks like LiveConnect was a much bigger thing that MS pushed.
Sure hotspot itself couldn't be used straight up, but the changes are certainly much less than creating a whole new vm.
Web Assembly targets C and C++ as source languages, unlike the JVM.
> Why not just compile js et al to java bytecode?
Because JS and Java semantics are different, and emulating JS semantics on top of the JVM is slow.
> Sure hotspot itself couldn't be used straight up, but the changes are certainly much less than creating a whole new vm.
The Web Assembly VM shares as much code as possible with the engine's JS VM. This is obviously better than using HotSpot, as the relevant code is already shipping in browsers.
Nobody is going around rewriting code for no reason.
> Web Assembly targets C and C++ as source languages, unlike the JVM.
> Because JS and Java semantics are different, and emulating JS semantics on top of the JVM is slow.
Once you add gc into wasm, it is almost guaranteed to be closer to java than c/c++.
Yes. For example, Web Assembly has unsigned integer arithmetic and explicit memory allocation/deallocation, neither of which the JVM has.
> Considering all the UB in C, this cannot possibly be true.
Undefined behavior is a concern of the compiler of the source language. Web Assembly doesn't compile C or C++. It simply interprets a VM, the semantics of which are designed to be relatively free of undefined behavior.
I would highly doubt it's performance competitive at the level that browsers are at now. It strikes me as likely impossible to get performance competitive on, say, SunSpider if you aren't highly tuned for it.
> Web Assembly has unsigned integer arithmetic
This is your idea of a major difference that affects implementation so much that is separate VM needs to be written? Many Java programs already do essentially manual memory management already too. Those aren't the big differences. Things like safe-memory access are bigger issues. And once wasm gets gc, it will even be closer to Java and JS.
I hope Graal outperforms everything else, so we can stop pretending like WASM is something different than what Java has been trying to do.
Nashorn seems to be pretty close to node.js.