Something you could send over the wire to the remote application and it could execute it with high performance and reasonable security. Cryptographic signatures can be used for more security sensitive contexts.
So one thing I'm really hoping no matter what happens to WebAssembly, that the external interface and requirements are kept simple.
It'd be amazing if it was possible to run small ~10 kB WASM modules remotely on a microcontroller, like Cortex M4+ or such, with say 64 kB of RAM. You could add a lot of flexibility in some applications.
It actually looks very interesting for cross-platform "scripting", once there's a good cross-platform optimizing compiler library for it.
From where I stand, currently the most interesting proposals of the listed ones:
0) High-performance non-web "embeddability" with low memory requirements. (Not actually listed, but very much desirable.)
1) Threading. Would greatly help performance and portability in some cases. I'm a bit sceptical whether there's a way to add this after CPU speculative attacks (Spectre, etc.) became known.
2) SIMD. Lots of filtering and data processing algorithms can greatly benefit from this.
3) Tail calls / bulk memory ops.
4) Export / import mutable globals (and good module support in general)
5) Exceptions. (Minor, portability).
A nanokernel with the capability to run a WASM binary with a simple GC could have highly portable kernel modules and due to the properties of WASM, run userspace code in ring 0 without much risk to hardware and software.
Additionally stuff like paging and memory management are much simpler to begin with as the memory access pattern of WASM is much simpler (with some dynamic relocation tricks that should be possible you could even identity map and don't do paging at all)
I've also thought about doing microcontrollers as you mention, however, I was thinking about Arduino's (ATmega). They would need a much simpler and more reduced WASM since you only have about 2 KB of work memory.
 - http://retro-b5500.blogspot.com/
 - http://worrydream.com/refs/Teitelman%20-%20The%20Cedar%20Pro...
 - http://pascal.hansotten.com/category/ucsd-p-system/
 - http://pascal.hansotten.com/niklaus-wirth/lilith/
 - ftp://ftp.cis.upenn.edu/pub/cis700/public_html/papers/Franz97b.pdf
The subset of native (for example, x86) assembly that is equivalent to webassembly is already perfectly secure. You just limit the native assembly to pure computations without any system calls. It will also always be significantly faster than webassembly. When it comes to the security of system calls, then both webassembly and native assembly needs to make sure that whatever platform-calls that are available are secure. webassembly have no benefits over native assembly when it comes to this.
You need portability on the Web. NaCl was abandoned in favor of PNaCl for a reason.
The Web as a distribution method has supported the downloading of native applications since forever. Instead of trying so hard to make it so that users don't need to install native applications, then I think that it would be much more optimal if development efforts instead went into making the native installation-process much more streamlined. For example, browsers could have some kind of functionality that makes it so that end-users do not need to choose the correct type of executable themselves (such as choosing between x86-windows7, x86-windows10, Linux, etc) and instead just click a single installation button.
How would we have made the 32-bit ARM-to-AArch64 transition if the Web were not portable? Remember that Apple cut 32-bit support very quickly, and no longer ships with it. For that matter, how could Apple have gotten away with shipping a usable browser at all on the ARM architecture in 2007, in the native world?
Remember that the x86 architecture is proprietary to Intel and is covered by a patent thicket. You'd be handing Intel a monopoly over the Web forever.
> For example, browsers could have some kind of functionality that makes it so that end-users do not need to choose the correct type of executable themselves (such as choosing between x86-windows7, x86-windows10, Linux, etc) and instead just click a single installation button
They've had this for decades .
While the mentioned browser functionality have existed for decades, then that does not change the fact that for the past decade then almost everytime I have had to download a native application I have had to select the correct type of executable myself. So the argument still stands. Developers should implement the mentioned functionality on a download page for their native application, instead of trying to make everything a web application.
BTW how do you propose to safely run untrusted application with no built in support of parallelization in parallel, maybe even on multiple machines? How do you safely directly reference an object that is owned by another (again, untrusted) module, again, maybe even on a different machine (e.g. using RDMA through Infiniband)? These are just some of the possibilities that WASM is offering.
Think about why that is. It's because Web site authors frequently don't "do the right thing", and they instead do what is most convenient. Following that line of reasoning, it is naive to think that Web site authors would have offered ARM executables at all back in 2007. That is the entire problem.
Defining a subset of x86 that is well defined, secure, and doesn't subject your vendors to the AMD/Intel licencing duopoly, is worse than starting over with anything else.. There was even a bug quite recently about everyone misunderstanding the normal boot sequence instructions since 1980 (or so?) Then the other question is: why did you make it run so poorly on the newer mobile architectures?
It certainly seems like one could have chosen from among the existing llvm-ir RISC hardware targets and then made everyone implement an emulator.. But making a JS compatible target is accessing pre-existing emulators available on anything meant for an end user.
As a JS programmer, I don't really look forward to encountering wasm from others on sites. But I do like that I can try out any llvm language that I might be experimenting with in the web environment that I'm already familiar with.
I would never do wasm on a normal website, but it could make a lot of sense in an electron like setup. That might be a bit insane, but no more than using the Java VM was 10 years back, and without as much risk of lock in and extortion from a company like Oracle in another 10 years.
Even without other languages, it is a matter of time before my transpiler chooses wasm for me, as it is now a subset to be prioritized for optimizations across browsers..
(That bits of your favorite assembly are floating around the web is a problem in my world that drags me down dead ends, not a solution to anything I want to do. Existing jar files on the other hand.. they are not great but often enough if I have the patience for Java tweaking.
AFAIK wasm will develop toward an open replacement of JVM that uses browser UX that has undergone some critical processes instead of the Sun's random idea UXes followed by attempts at alternatives.)
Can you expand on what you mean when quoting those words in this context? I can't put my finger on what point you're trying to make here.
Why the condescending quotes? Are you somehow above a programming language?
Outside the browser it adds very little over Java, MSCLI, LLVM, P-Code, M-Code, Python Bytecodes, TIMI, DEX and a myriad of other formats.
Cortex M4 is already targeted by MicroEJ and MicroPython.
No doubt there have been plenty of those.
> Outside the browser it adds very little over Java, MSCLI, LLVM, P-Code, M-Code, Python Bytecodes, TIMI, DEX and a myriad of other formats.
It does add one very important thing: a large ecosystem. One where security and performance matters: the web.
Regarding libraries, graphical debuggers and reverse code generation.
Security and performance were part of Java and .NET design, hence why they have validation as part of their execution workflow.
As for security on the Web, cross site scripting and WebGL exploits prove that is still room for improvement.
Java and CLR bytecode verification is crap because they weren't designed for efficient verification from the start, WASM was: they're expensive, slow, and imprecise. WASM verification gives you much stronger properties, and it's significantly cheaper. It's a much better universal bytecode than any other offerings currently available.
Also WASM remains to be battle tested regarding exploits in the wild.
The difficulty of full safety verification on the JVM is well studied . The security-focused Joe-E language actually decompiles JVM bytecodes back to Java to avoid the many full abstraction failures that have been documented.
The CLR is a little better on the safety record, but the verification costs can be even higher because the CLR supports various pointer types. Proper verification requires control-flow analysis, but instead they partition the bytecode into safe/unsafe variants and only the safe variant is "verifiable". IIRC, WASM doesn't have this limitation.
As for the links I will check them later. Thanks providing them.
WASM still needs to prove itself in similar scenarios.
"Fine-tuning security" is typically symptomatic of bad security design. Security is not a separable concern.
I will read the paper later on.
Modern JVM bytecode is cheap to verify and gives you very strong properties like memory and type safety, i.e. the app will not have buffer overflows or type confusion attacks in it.
WASM gives you virtually no guarantees about the software running in it, beyond that it (maybe) can't escape its sandbox.
It's a much better universal bytecode than any other offerings currently available.
I think that's a very strong statement for something so debatable.
WASM appears, to me eyes, to have numerous serious flaws that make me wonder why people are so excited about it. JVM bytecode appears to beat it in every aspect.
1. No GC support, no real workable plan to get there because there's also no type system worth a damn. They've been looking at it for years and this article says all they've got is a tiny stepping stone - no actual GC, just a way to mark pointers to things that came from JS in the type system. This wouldn't matter if the future of the web was software written in C, but if it is, god help us all.
2. No threading. JVM has been thread safe from the start, and has had a huge amount of work put into its memory model, so you can reason about the nature of bytecode when run on different CPUs and in the presence of multi-threading. Moreover making a runtime like V8 fast is much harder if you try to make it thread safe. JVMs have a long history of being fast in the presence of large scale threading but no WASM supporting VMs do.
3. No support for exceptions, apparently at least one failed attempt to add it. Even LLVM has support for exceptions. Again, no big deal if the future of the web is C99 but what a bad joke if it is.
4. No FFI of any use.
5. No dynamic code loading.
6. Tooling is poor or non-existent.
7. Ignoring these differences, WASM bytecode looks a lot like JVM bytecode e.g. is a stack based language that requires compilation client side via JITC to approach good performance.
As far as I can tell it's a significant regression from what Java could do even 20 years ago. And don't start talking about security. WASM integrations already opened up critical security bugs in browsers:
If you think WASM is magically immune to sandbox bugs, you're wrong.
The reason to be excited about it is to have a universal sandbox to portably run untrusted code regardless of the source. That's unprecedented flexibility for something so widely deployed.
Neither the JVM or the CLR provide such a sandbox for code that uses pointers. LLVM IR is not portable, is not sandboxed and is always in flux.
As for your list of "flaws", they aren't flaws at all. A flaw is a feature that cannot be supported, even in principle. Everything you list are possible as extensions to the core type system.
WASM was designed with efficient verification in mind, and with mechanized formal proofs as I point out at . Java's verification was always a hack, and the mechanized proofs of Java's verification procedure never encompassed all JVM bytecodes, and full verification was always too costly to perform at runtime. Java has numerous security problems, not just with verification, but also vulnerabilities due to full abstraction failures.
As I mentioned in , the security-focused Joe-E language explicitly chose to decompile JVM bytecode to plain Java to avoid all of the numerous documented full abstraction failures of the JVM .
The similarities between the stack-oriented bytecodes are entirely superficial.
Actually, the CLR supports pointers and the JVM can run arbitrary LLVM bitcode (so C, C++, Rust etc) in a memory safe way with bounds checking and garbage collection these days. Check out Sulong and Safe Sulong:
JVMs can also manually allocate memory using the Unsafe class, I suppose if you wanted WASM like protection semantics that API could be constrained to a particular memory region and become "SortaUnsafe" for example. You could then compile C to such a dialect. The LLVM bitcode on Graal approach is likely to work better though,.
However, realistically most new software is not being written in C or even C++ for that matter. Most developers use managed languages. And the JVM can do a much better job of that than WASM can.
A flaw is a feature that cannot be supported, even in principle.
This is a fascinating definition of flaw that I haven't previously encountered.
In what sense does the JVM not do "full verification at runtime"? Also if you have some more material on which bits of the JVM bytecode set aren't formally verified I'd like to read that. I can imagine, that the latest features may not have been done, but the older JVM bytecode sets were formally verified at least.
I said it doesn't provide a sandbox for pointers. CLR pointers can corrupt your whole VM instance. WASM's lightweight bytecode with no runtime can support in-process heap isolation.
> JVM can run arbitrary LLVM bitcode (so C, C++, Rust etc) in a memory safe way with bounds checking and garbage collection these days. Check out Sulong and Safe Sulong:
Nice find. Still, look to be interpreted though.
> In what sense does the JVM not do "full verification at runtime"?
The last report of a formalized semantics for verification that I saw was over 10 years old:
The JVM bytecode has significant verification challenges as discussed in this paper and by Leroy in the other papers I linked. The Joe-E papers I linked also discuss the problems of targeting the JVM at length.
As I linked in my other comment, WASM was built with mechanized proofs nearly from the outset. They learned from the mistakes made in the CLR and JVM.
It gets JIT compiled to native code that runs (for some code shapes yadda yadda usual story) about 10% slower than gcc, if I recall correctly. Without the safety aspect it can run as fast as GCC for some benchmarks.
But so did the CLR and JVM designers.
The paper you link to says the hard part of JVM bytecode verification is the jsr "subroutine" control flow instruction. This instruction has been phased out, bytecode version 51+ doesn't allow it anymore. Whilst a JVM may well support verification of older bytecode formats, and that verification is still safe and sound, you can implement the latest version of the spec and avoid that entire design error.
Given that, and given that the JVM type system has been proven sound despite being significantly more powerful than WASM's, I'm not really certain how this is meant to prove that WASM is better.
No they weren't. Formalization and verification of the IR came much later in both cases.
> Given that, and given that the JVM type system has been proven sound despite being significantly more powerful than WASM's, I'm not really certain how this is meant to prove that WASM is better.
"Significantly more powerful" is way overstating the case. In fact, I'd hazard that it's flat out wrong. You can easily express some patterns in JVM bytecode, but other patterns not at all.
Secondly, WASM's primitives types are much more flexible than those available on the JVM. For instance, unsigned types.
Finally, the JVM carries a lot of baggage, a) in terms of backwards compatibility as you briefly mention, b) in terms of unnecessary control flow instructions which unnecessarily complicate verification (like exceptions), c) unconfigurable runtime that's poorly suited to some programs, d) an overly complicated security model that's an impediment more than a help, etc. I could probably come up with a dozen more reasons, but that's just off the top of my head.
AFAIK there is no Java-Bytecode involved for this and to optimize the AST-interpreter to something faster you need the Graal-Compiler, most JVMs can't do this. There is no C++/LLVM/Rust to Java-Bytecode compiler that is widely used, while this exists for WASM. Even a huge Codebase like AutoCAD was compiled for the Web. Java-Bytecode wasn't designed for this, you could theoretically do the same with Java-Bytecode but it would probably be much slower. Simply because Java-Bytecode wasn't designed for this purpose.
Java-Bytecode is notoriously hard to verify, WASM is more strict and therefore verification is easier. (With verification I mean the process in the Browser/JVM that checks if the bytecode is actually valid). WASM only allows structured control flow, no arbitrary gotos like Java. This makes the compilers job much easier, many JVMs actually just bail out on irreducible loops and therefore such code might not get optimized. OTOH WASM-Bytecode wasn't designed for interpretation.
I don't know why there is this heated discussion. WASM had the chance to learn from the mistakes in Java's Bytecode and both bytecodes were designed for different purposes after all. Java Bytecode was designed for Java, WASM for being a language-agnostic compilation target. Supporting GC in Java-Bytecode for example is much easier than having a similar thing in WASM.
> No threading
> No exceptions
Along with not forcing any object model, this looks like very reasonable design choices to me. (I'd add "no global mutable state", but it's being added.)
Reason 1: this is very similar to what a (RISC) CPU would offer. If anyone need to implement these features, they can be implemented on top of WASM, as they can be implemented on top of a CPU IS. A small specification not loaded with such complex issues as GC or object model is much easier to keep correct and fast. Also, any advances in e.g. GC can be pushed to older WASM implementations, because they don't replace the VM, they are libraries.
Reason 2: Using message-passing between processes ("workers") instead of threading, and Option/Maybe instead of exceptions, is a known way to avoid a large number of problems. The current WASM design makes this approach simple and reasonably natural. There's a reason why Erlang is designed as it is; WASM can take a page from its book.
There is a reason why every "C on the JVM" project, of which there have been many, used MIPS GCC and then interpreted the MIPS machine code at runtime. (Yes, really.)
I don't think JVMs where without any security bugs, since WASM is mainly used in browsers right now, security is a MAJOR concern. Java-Bytecode was designed for a different purpose. Try to compile a language with multi-inheritance to Java-Bytecode, it's a pain. WASM is designed in a more language-agnostic way, that's why GC and Exception Handling is much harder to specify than in the JVM.
BTW: GC'ed languages can already be compiled to WASM: they more or less "just" need to compile the GC to WASM too. The GC proposal would only allow to reuse the embedder's (most likely the browser) GC.
But. Then you admit that compiling Java to WASM is hard because there's no GC, you'd have to ship your entire GC with the app and it wouldn't be fast because GCs like to integrate with the JIT compiler and WASM is the JIT compiler here. Also WASM JITs don't really understand managed code patterns well.
So how is WASM more language agnostic? Seems like (if we ignore Graal) WASM has trouble with languages JVM bytecode is good at, and vice-versa. Quite comparable, no?
BTW even saying C++ is easy to compile to WASM is tricky because WASM apparently has no exception support, and exceptions are a part of the C++ language. Likewise for vendor extensions like vector intrinsics, inline assembly etc. At best you can handle a subset of the language.
So in the end I don't buy that it's really more generic. As munificent says, it seems more like people aren't really sinking their teeth into the tradeoffs involved.
Even if we would agree that the JVM is as good as WASM as a language-agnostic bytecode, WASM still makes sense since it doesn't come with all the baggage of the JVM like class files, many bytecodes that exactly match the Java semantics but can't be used in other languages. Browser-vendors would still have to add new bytecodes for common operations for both size and speed reasons. So it made sense to design a new bytecode format. WASM even allows streaming compilation: The browser can start compiling bytecode before it downloaded the whole file.
Yes, there are a few features missing from WASM. But just look how many applications have already been compiled for the web. The missing features are not that relevant for many large existing and performance-sensitive native applications written in C/C++. We don't need to rewrite this applications in JS. I mean JS wouldn't even be fast enough for that anyways. That's what WASM was designed for and even according to you WASM is better suited for this than Java-Bytecode.
Inline assembly and vector extensions would also be problematic in Java Bytecode.
Does WASM have in place any mechanisms to prevent Oracle or other malicious destructive companies from co-opting it, copyrighting its APIs, and running it into the ground for profit?
I wonder if the real difference between WebAssembly and the other bytecodes you mentioned is just the number languages that will end up targeting it.
Though, one advantage over LLVM is that it's simpler.
That's not very important though, because those had a 20+ years headstart.
WebAssembly is going to have much much bigger momentum than them (except Java and .NET), to the point that it will eclipse the ecosystem all of them together in 5-10 years.
Using LLVM is exactly what Google tried with PNaCl. It had a lot of drawbacks; WebAssembly is a much improved iteration of this idea.
Microsoft's original plan for CLR was to replace the complete Windows stack with .NET, they just failed to do, because not only they had to deal with technical issues, there were the internal wars from WinDev making sure that would never happen, hence the Longhorn failure, followed by the same ideas reborned as COM on Vista, rebooted as WinRT/UAP/UWP a couple of years later.
Microsoft even had something like LLVM, done on top of the CLR, called Phoenix. It just never came out of MSR into production.
Also, GraalVM will happily consume LLVM bitcode.
Standard C++ compiles just fine.
There is no different than using language extensions on GCC, clang or any other C++ compilers.
You are not forced to use GC beyond the interop to other .NET languages.
I have integrated quite a few C++ libraries this way, given that my C++ experience makes it easier than having to deal with P/Invoke attributes and possible linking issues.
WASM exists mostly for political reasons as far as I can tell: it's something people who work on browsers can "own", vs outsource to other older teams with more historical baggage and different backing corporations. I mean, for WASM to be technically compelling you have to buy the idea that the best way to move the web forward is to enable applets-written-in-C without any useful form of DOM interop or GUI. That seems like a rather implausible claim.
There are very good technical reasons for WASMs existence. JVM-Bytecode for example is very Java-focused, WASM doesn't need class-files or many bytecodes like invokevirtual. For supporting C++/Rust it doesn't even need GC.
Bringing large applications like PSDFKit, AutoCAD or games into the browser is certainly getting the web forward. Sure, WASM isn't intended for manipulating the DOM. This is still JS's job (at least for the foreseeable future).
As for embedding a JVM, sure, why not? Good JVMs are open source these days and HotSpot starts in 50 msec or less. V8 is very comparable, tech wise in terms of its approach. Lots of code out there targets JVM bytecode.
I would bet strongly against that happening long term. Everybody developing WebAssembly is interested in making the web better and pretty much only that. One of the biggest issues preventing WebAssembly usage right now is that you have to ship a large run-time along with your app. So there's going to be a lot of pressure to move that run-time into the browser. See the linked article: GC is the proposal the author highlights the strongest.
The biggest pressure in your favour is the disparity of languages supported. They'll probably want to ensure that any built-in library functions they add are useful to a wide range of WebAssembly communities.
The GC is one of the most complicated parts of the browser engine, and serious bugs and exploits are regularly discovered in all of them. Safer languages like Rust won't help at all in the case of WebAssembly because it's not sharing the same type system.
 Are finalizers even supported? Can they run arbitrary code? Can they resurrect the object? If they can, will they run again? Can they throw an error? What happens if they throw an error? In a language like Lua these rules are very specific because they're important to the semantics of the C API, especially wrt to bindings. For example, within a collection cycle Lua finalizers are guaranteed to be run in reverse order of construction.
This is demonstrably untrue. Please peruse the design repo, I think you'll find that there are very frequently discussions of non-web applications, something folks are actually quite keen to support.
Unfortunately, what's likely to happen is that everyone just ships their WASM apps in Electron since that's already the de facto standard.
Standard where? A couple of well know apps beloved by startups with high end desktop systems?
Also VS code runs on Electron, so it's hardly uncommon.
Electron apps are as native as PWAs, with the caveat that they bring a 300 MB runtime alongside with them.
I bet there are still more Swing applications in production than VSCode users.
That doesn't make JAR a de facto packing format.
Electron apps (or, fine, PWAs) will become the standard model for WASM development because that model already exists, and is what developers will most readily use... and because most of the interest in WASM comes from the web development community. By the time something better comes along, it will have to fight against the network effects.
I don't need a 300 MB pro application.
There is already a perfectly fine browser on my system.
Actually four of them.
Already done (at least at the spec level), ref: https://github.com/WebAssembly/spec/pull/814. I'm currently implementing it in my non-web WASM backend now :-)
The plan is to rely on site isolation, where every origin runs in a separate process. Spectre only exposes memory in the same process so this should make it benign.
SharedArrayBuffer was disabled due to Spectre but Chrome plans to re-enable it soon: https://bugs.chromium.org/p/chromium/issues/detail?id=821270
I don't think this is a bad thing, BTW. You may be disappointed, but they are just looking to solve a different problem then what you are concerned with. It's an imminent, practical problem, so nothing wrong with someone addressing it, even if it isn't your problem.
Aside from that, an app/program/whatever delivered as webassembly will typically have strong dependencies on the target host environment, which means its not portable or cross-platform.
It's not useless, since you still get to choose your own language for any target that supports webassembly (assuming things work).
But code, whatever language its written in, is very strongly influenced by the frameworks & patterns provided/imposed by the host environment, by environmental constraints, not to mention the problem domain itself.
E.g., assume there are bindings for any language you want... if you write a To-Do app for iOS using native controls, it's going to look pretty similar whatever the language.
> ...and are always being careful to separate the JS/Web ifaces/impls
I'll just point out that this is quite important if your goal is to cleanly extend browsers. You specifically don't want to duplicate or impinge anything the browsers already provide.
I'm not trying to be a downer, but it seems like people are hoping/expecting webassembly to do more than it can or will (maybe I'm wrong, but that's the impression I'm getting).
I don't think Java fulfills this requirement. Or the one about tail calls. Java bytecode is too high-level, too. It forces you into a very limited execution framework.
WebAssembly is a lot less complicated, and has a lot of potential.
With WebGL it was ~0.5s on a "good" machine and ~6s or "did not work at all" on a "bad" machine. We never figured out why it worked well on one machine, but not so on others. Of course, I am talking only about machines with WebGL enabled ;) This would have been a support nightmare.
With WebAssembly we are now in the 0.1-0.5s range per image search and it just works. So far, we have had exactly 0 WA-related support issues, for both, Chrome and Firefox.
Like if the GC proposal is accepted, does every WebAssembly implementation now need to support GC all the time? Or can a slimmed-down version of WebAssembly still load modules that don't use GC?
I am looking for the Lua of WebAssembly implementations.
0 - https://github.com/kanaka/wac
1 - https://github.com/paritytech/wasmi
2 - https://github.com/sunfishcode/wasmtime
3 - https://github.com/WebAssembly/spec/tree/master/interpreter
I'm interested in an interpreter that is available as a library.
I'm also interested in a small, standalone JIT-compiled implementation, but that seems less likely to actually exist.
Basically I'm looking for the Lua and LuaJIT of WebAssembly.
Basically, for most FFI-based use cases, anything that C can do Rust can do.
0 - https://github.com/neon-bindings/neon
It's more than that. These languages often come with build systems for building extensions. For example, Ruby's 'mkmf'. You just list your .o files and mkmf creates a Makefile that builds against the correct headers with the correct flags:
This tool knows about C/C++. It sadly doesn't know about Rust.
0 - https://github.com/tildeio/helix
Writing my own Node modules in Rust works great. But I can't write a node module in Rust and distribute it via npm the way that C++ authors can. When someone adds their module as a dependency and does an npm install, npm will duly run node-gyp to compile their C++. There's no Rust equivalent of that and, until there is, Rust will always be a second-class citizen when it comes to embedding into those languages.
As you point out, there's nothing technically stopping the ecosystem tools for those languages from supporting Rust, but they just don't--likely because they'd prefer to not have to bundle an entire Rust toolchain with their distribution. But whether the reason is technical or not is immaterial...I still can't push a Rust-implemented module to npm.
If the goal is to create a module that the larger Node community can use to embed wasm in Node, Rust simply isn't an option because distribution via npm is a given.
> As you point out, there's nothing technically stopping the ecosystem tools for those languages from supporting Rust
There of course would be a problem if it was technically impossible, but that's not true, it just might be impractical right now.
If it goes higher-level:
* Compiled applications get smaller because each instruction corresponds to a higher-level semantic behavior and contains more information. Think how a virtual method call is a single instruction in the JVM but several instructions in x64.
* Compiled applications get smaller because runtime facilities can be baked into the browser and shared across all apps. GC is the big one but things like object representation, strings, "standard library" functions, etc.
* More host-level optimization opportunities open up. The JVM JIT can do lots of optimizations because it understands directly what things like interface and virtual calls are and doesn't have to try to reconstitute that information by pattern matching on lower-level instruction sequences.
If it goes lower-level:
* It's a viable target for more diverse languages and paradigms. You can compile other languages to JVM bytecode, but the less like Java your language is, the harder that becomes. The VM has a grain to it. WebAssembly, by virtue of having fewer things baked in, is more open to a varied languages. C/C++ are the big ones because any memory-safe instruction set makes it really hard to support those.
* More application-level optimization opportunities open up. Optimizing compilers can output code that's closer to the metal. They can take advantages of constraints in the source language that the VM may not know about and generate code specific to that.
* Peak execution speed is higher. If your instruction set lets you directly map to something close to what the CPU executes, you can take maximum advantage of it. Despite decades of JIT engineering, C/C++ are still the fastest for that reason.
* The instruction set is simpler. That makes it easier to implement, target, optimize, security audit, and build tooling for.
There is no Goldilocks instruction set that gives you all of these. Adding instructions for GC is going to make it much more complex and is deadweight for languages like Rust and C that won't use it. Defining how the GC works is going to be really difficult given languages like Python (ref-counting), C# (finalizers), etc. where specific GC policy decisions are user-visible.
Making the instruction set statically typed is what you want for low level typed languages. But it makes it harder for dynamically-typed languages to be implemented efficiently. Every dynamically-typed language that compiles to JS gets to reuse the incredible JITs inside JS implementations because JS is itself dynamically-typed. But if you compile, say, Python, to WebAssembly, do you ship a Python bytecode -> WebAssembly JIT inside the application? Does every language have to do that?
Having strings means you need to pick a string format, which is basically impossible since every language out there makes different, incompatible choices for string representation (null-terminated? length-prefixed? null-clean byte-arrays? UTF-16? UTF-8? multi-encoding?). But not having strings means every app has to include a string library in its runtime and means you can't reliably pass strings around in interop.
No amount of hard work is going to magically fix these because they are directly opposed. I think WebAssembly is really interesting, but one of the things that has always turned me off about it is that its proponents don't often acknowledge the hard trade-offs that have to be made.
I'm not sure they are directly opposed. I think that they can coexist reasonably. Specifically, keep the instruction set reasonably minimal, ask compilers to do the optimization while trying to avoid impedance mismatch between the insn set and CPU archs, add the rare high-level feature that is hard to implement without exposing lots of context (e.g. GC), don't act like features not used 100% of the time are "deadweight", and leave complications like strings to libraries/ecosystem. I.e. WASM abstracts the host and CPU, libraries abstract the logic on top.
I think, looking at the proposals, they are doing a good job with the tradeoffs. For all of your bullet points, I think they are reasonably addressed now or in the future. I am glad they are not tackling things that can be done just as well in libs (e.g. strings) while properly tackling things that cannot (e.g. GC, threads, etc).
What I really hope is that Rossberg et al remain true to their convictions, relish their BDFL-ish role, and don't get too beat down by people saying the higher level things they have on the road map are too high and/or the lack of other ones make the level too low.
Here's the critical difference: Compared to other VMs, WebAssembly is developed in an open and transparent manner. Hopefully everyone in the process is experienced enough to make the right decisions.
Probably the most annoying thing about VMs is that the different memory models make it hard to write portable libraries that can take advantage of things like garbage collection. Want to write the next SQLite in Java? Good luck trying to use it from Python, C#, c, ect.
Perhaps this time around, WebAssembly can get enough traction that we can write portable, high-performance libraries with garbage collection?
Or maybe that's just wishful thinking?
There's always a tradeoff between how much intelligence you put into a runtime vs how much you have to ship with the app. In the limit WASM would turn into "download and run an entire language runtime with every web page", which is clearly suboptimal. Most likely compilation to JS will continue for the forseeable future when possible for that reason.
One idea that could probably work very well is to write a WASM interpreter for GraalVM. GraalVM has very high level constructs but can also support low level manually memory managed code. They have one for LLVM bitcode already, but WASM would probably be simpler to implement and with a more stable bytecode format. Then C/Rust/exceptionless C++ could ship artifacts as WASM files, scripting languages can ship source code, static managed languages like Java or Haskell (eta) can ship JARs, and they can all interoperate and be compiled together.
I don't see a route to getting there with current WASM runtimes or just the existing featureset of WASM though. As a portable ISA for distribution of limited, OS neutral C-ish libraries it seems reasonable enough. As a future VM for everything I can't see them beating GraalVM anytime soon. GraalVM has the shared high level constructs, but it is able to also (eventually) compile low level WASM/LLVM style code down to what a typical GCC like compiler would create.
I suppose the same will be true for WASM "runtimes" required by commonly used languages: you will have 2-3 major versions of them (think Python 2 vs Python 3), and an automatic update when a new version is released.
Much like common JS libraries or fonts, these will be on CDNs, so they'll be fast more-or-less local downloads.
Naturally it is a good way to move the browser into a general purpose VM.
Now to transform it into a general purpose OS, it is another matter.
I assume there are complexities in compiling C/C++ to the JVM but wouldn't wasm have many of the same issues?
Bytecode has its flaws but the trend is away from using bytecode for all languages and towards partial evaluation of AST interpreters. That works well for managed OO like languages, and also other managed scripting languages, and it can also run C/C++/Rust/FORTRAN etc, with interop between the languages.
WASM has no real roadmap to get to running anything other than small, manually memory managed programs. It's being billed as "generic" but I don't see it. GraalVM is generic. WASM struggles to run anything that requires a more sophisticated runtime than malloc.
You mean small software like AutoCAD? https://web.autocad.com/
> GraalVM is generic.
How well does GraalVM perform on a beefy 200 MHz 32-bit microcontroller with 256 kB RAM? How do you use GraalVM to run existing C++ code on a website?
Or does your definition of "generic" exclude some platforms?
Don't get me wrong, I'm sure GraalVM has its uses in the server side, and perhaps in desktop (or even mobile) application context. It just doesn't seem generic in a way that is useful for embedding small/medium size libraries/executable code on top of systems written in C/C++/Rust.
And for the microcontroller, likewise - why would I want the overhead of a WASM VM on such a device? You won't fit V8 on it. Anyway, Graal can AOT compile programs down to small native binaries, but they're not that small at this time: give it a few megabytes of RAM and then yes it's possible.
Well, not only browser developers. Often the corporate higher ups just suddenly want things like being able to run C++ based product in a web browser. I'd also rather use a native application, but others don't seem to always agree.
> why would I want the overhead of a WASM VM on such a device?
For example for rapid prototyping or as a way to safely provide plugins at runtime on the device without requiring to flash firmware (or more expensive, larger flash chip). Anyways, there'd be no VM (or GC) overhead during execution other than bounds checks, because you can just compile WebAssembly into native in one go.
> You won't fit V8 on it.
Why would you need V8? You'd just need something that can read WebAssembly bytecode, allocate registers and emit platform native assembler. I think direct 1-1 stack machine mapping codegen could be made in few tens of kB. A bit more for basic register allocation.
People are currently working on them. Also 100% performance is not always necessary, getting to 50%ish shouldn't be that hard. That's plenty for a scripting engine replacement scenario, for example. Or for bit banging some GPIOs on a microcontroller.
> why not just compile straight to native
To be able to run validation checks on the bytecode, so that you can still execute programs that are trusted only to a certain point.
And to support multiple platforms.
There are also a lot of platforms that aren't supported by LLVM or are not first tier targets.
..and that's great. You can implement them properly to fit your language, instead of Rube Goldberg devices around JVM idiosyncrasies. Just as would you do with any HW instruction set.
In the future it may work, because a built in GC is on the roadmap for WASM. Unfortunately I'm not sure that it will work great for most scenarios. For example, Go(lang) ships with its own GC which is very tailored to its use case. Defaulting to a WASM GC could work, in theory, but would it impose difficulties such as unexpected performance characteristics with longer GC pauses or whatever?
There's a lot of edge cases that will have to be covered for this stuff to work instead. Perhaps shared WASM binaries would be the best solution, allowing runtimes to be offloaded and cached regardless of user land code running.
That runs into the problems that have generally prevented browsers from cross-site caching common scripts already.
I am really excited at the prospect of writing my front and backend in a sane language like C# or kotlin.
The proposals look promising and might reduce this overhead somewhat. However I don't think they will manage to design
the virtual machine that is able to run all major rogramming languages efficiently. What ab0out features such as longjmp and goto in C?
We are usually significantly smaller than emscripten for the same code.
But yeah, as you and others said, while both Rust and C/C++ can do that, there's a lot more going on in the general case. Once you add a malloc/free implementation, or string operations, or anything else, the size increases.
The fundamental issue is that Rust and C/C++ were not designed for this use case. In principle, a new language could do a lot better. If such a language were GC based, it would avoid shipping a malloc/free (once wasm gets GC, that will be "free" for that language). And if such a language had the same string types and a compatible "standard library" with the Web, so that it would just call directly into existing Web APIs, then it would avoid shipping a lot of other code that Rust and C/C++ currently need.
TypeScript, for example, could become such a language. That would probably be the optimal path for creating tiny wasm binaries.
I’m not as convinced as you regarding such a language, for example, once host bindings lands, you’ll be able to call into every API with no overhead, and the bindgen stuff we’re doing will transparently just shrink overnight. But we’ll see! It’s exciting times.
If you really want to go down in size, you either need to compile without the standard library and/or use wasm-gc and wasmopt to remove all the useless code that have been added to your wasm binary.
With this, you can generate wasm binary that weight only a few hundreds octets.
See here: https://rustwasm.github.io/book/game-of-life/code-size.html
In the future, it will do more of the right things by default, but for now, you have to do some configuration if you want the tiniest possible binary. Early days!
char *m = NULL;
m = 0xFF;
For m the compiler exchanges the code with the abort function :-)
I wonder, if something similar is possible in Rust.