
Wasm3 – A high performance WebAssembly interpreter in C - sound_and_form
https://github.com/wasm3/wasm3
======
ridiculous_fish
This is pretty exciting if real:

> Bytecode/opcodes are translated into more efficient "operations" during a
> compilation pass, generating pages of meta-machine code

WASM compiled to a novel bytecode format aimed at efficient interpretation.

> Commonly occurring sequences of operations can can also be optimized into a
> "fused" operation.

Peephole optimizations producing fused opcodes, makes sense.

> In M3/Wasm, the stack machine model is translated into a more direct and
> efficient "register file" approach

WASM translated to register-based bytecode. That's awesome!

> Since operations all have a standardized signature and arguments are tail-
> call passed through to the next, the M3 "virtual" machine registers end up
> mapping directly to real CPU registers.

This is some black magic, if it works!

~~~
uasm
> "WASM translated to register-based bytecode. That's awesome!"

If the hardware executing this code is "stack-based" (or, does not offer
enough general purpose registers to accomodate the funtion call) - this will
need to be converted back to a stack-based function call (either at runtime,
or beforehand). Wouldn't this intermediate WASM-to-register-based-bytecode
translation be redundant then?

~~~
tyingq
I don't know of any current physical stack machine CPUs.

~~~
uasm
Stacks are used extensively across the x86 family [0]

[0] -
[https://en.wikipedia.org/wiki/X86_calling_conventions](https://en.wikipedia.org/wiki/X86_calling_conventions)

~~~
tyingq
"Has a stack" isn't the same as "Has a stack based ISA".

~~~
colejohnson66
To expand: a Forth machine or similar would be a stack based ISA. Using a
stack is a different matters; Pretty much every ISA uses stacks.

------
CharlesW
Why is an interpreter desirable when JIT compilers create significantly faster
code? Is this primarily about embedded use?

~~~
kbumsik
iOS prohibits JIT for example.

~~~
saagarjha
More specifically, apps submitted to the App Store may not utilize JIT.

~~~
vips7L
Is there any specific reason for this?

~~~
kbumsik
Executing instructions dynamically from memory (JIT does this) exposes
potential security issues. A good example is the famous Specter vulnerability.

Specter itself is not JIT-specific but it is known to very hard to reproduce
that the only current viable target environment is JIT. So JavaScript was
affected by Specter.

------
setheron
The neater article seems to be about M3 interpreter
[https://github.com/soundandform/m3#m3-massey-meta-
machine](https://github.com/soundandform/m3#m3-massey-meta-machine)

Tbh, I couldn't get the eureka moment though. Might try to read in the AM ;)

~~~
thermals
Yeah, this is a good way to design a fast interpreter! It's traditionally
called a "threaded interpreter", or (somewhat confusingly) "threaded code":

[https://en.wikipedia.org/wiki/Threaded_code](https://en.wikipedia.org/wiki/Threaded_code)

[http://www.complang.tuwien.ac.at/forth/threaded-
code.html](http://www.complang.tuwien.ac.at/forth/threaded-code.html)

You can see an example of this particular implementation style (where each
operation is a tail call to a C function, passing the registers as arguments)
at the second link above, under "continuation-passing style".

One of the big advantages of a threaded interpreter is relatively good branch
prediction. A simple switch-based dispatch loop has a single indirect jump at
its core, which is almost entirely unpredictable -- whereas threaded dispatch
puts a copy of that indirect jump at the end of each opcode's implementation,
giving the branch predictor way more data to work with. Effectively, you're
letting it use the current opcode to help predict the next opcode!

~~~
vshymanskyy
Yeah, but... It's not only the "threaded code" approach, that makes Wasm3 so
fast. In fact, Intel's WAMR also utilizes this method, yet is 30 times
slower..

------
29athrowaway
The motivation for WebAssembly rather than plain ARM or x86 assembly =
portability, security.

It would be interesting to see how this is designed for security in mind.

~~~
DarthGhandi
Pardon my wasm illiteracy here, what exactly makes it more secure?

Struggling to see it.

~~~
ridiculous_fish
WASM has a sandboxing model. The idea is:

1\. Control flow is always checked. You can't jump to an arbitrary address,
you jump to index N in a control flow table.

2\. Calls out of the sandbox are also table based.

3\. Indexed accesses are bounds checked. On 64 bit platforms, this is achieved
by demoting the wasm to 32 bit and using big guard pages. On 32 bit platforms,
it's explicit compares.

The result is something which may become internally inconsistent (can
Heartbleed) but cannot perform arbitrary accesses to host memory.

~~~
tasogare
How is that different from CLR or JVM?

~~~
ridiculous_fish
I'll speak to JVM; I'm less familiar with CLR but I believe it's the same.

JVM and WASM both have statically-verifiable control flow. No wild jumps, no
executable stacks, etc. Phew.

Arrays and pointer arithmetic are a big difference. WASM has a big linear
memory block, and instructions may access arbitrary locations within it - the
runtime performs bounds checking only at the edges. So your `sprintf` can
still overflow the buffer, and smash the stack's data, but can't affect the
host, or the control flow.

JVM goes further: it prohibits pointer arithmetic _and_ pushes array accesses
down into the instruction stream. To access a JVM array, you must provide the
array reference itself, and the runtime will perform bounds checking using the
length.

The JVM approach gives you better runtime safety - no Heartbleed! The WASM
approach is lower-level and is more easily adapted to existing system
languages: C++, Rust, other LLVMites.

~~~
pjmlp
CLR was designed to support C++ since day one.

And while WASM trumps the security trumpet, without actually supporting proper
bounds checking, the CLR will taint C++ pointer arithmetic as unsafe, thus
making the whole module unsafe.

So I as consumer can decide if I am willing to trust an unsafe module or not.

~~~
johncolanduoni
The CLR doesn’t guarantee control flow integrity (modulo type confusion) or
any form of isolation when linear memory accesses are used. So here WASM
offers another option in the middle between trust and don’t trust: “trust
unsafe module only to not compromise its own functionality; no attacking the
rest of the process or kernel” (modulo runtime bugs).

~~~
pjmlp
Well for that execution of unsafe Assemblies was already enabled anyway, so
there isn't much that the verifier can do.

Which is something that WASM isn't being honest about, corruption of internal
data structures is allowed.

If I can control what goes into memory just by calling module public functions
with the right data set and access patterns, CFI won't help a thing.

Suddenly the authorization module that would authenticate me as regular user,
might give me another set of capabilities and off to the races.

~~~
my123
Note for C++ on the CLR that you can use /clr:safe as an MSVC compilation
argument. This errors out when trying to access to random pointers at compile
time.

/clr:pure uses unsafe and supports those cases though.

And yeah, WebAssembly only doing bounds checking within a _single_ memory
block and not actually offering true bounds checking is a big downgrade, and a
pretty much unjustified one (+ it's rare among JITted languages...).

~~~
zozbot234
If you care about "true" bounds checking, just compile to Wasm from a safe
source language. Besides, Wasm does support multiple memory blocks so a
potentially-unsafe module need not "taint" anything else.

~~~
pjmlp
Security is as strong as the weakest link.

------
MuffinFlavored
> Node v13.0.1 (interpreter) 28 59.5x

[https://github.com/wasm3/wasm3/blob/master/test/benchmark/co...](https://github.com/wasm3/wasm3/blob/master/test/benchmark/coremark/README.md)

59.5x faster than node.js at what? Executing WebAssembly?

~~~
vshymanskyy
V8 has a built-in (pure, no JIT etc.) interpreter of WASM. Which is quite slow
according to this test.

------
haberman
These are impressive performance numbers.

> Because operations end with a call to the next function, the C compiler will
> tail-call optimize most operations.

It appears that this relies on tail-call optimization to avoid overflowing the
stack. Unfortunately this means you probably can't run it in debug mode.

~~~
vshymanskyy
It's not that bad even in debug mode (or without TCO). Just not optimal. Also,
there is a way to rework this part, so it does not rely on compiler TCO.

~~~
haberman
If the jump to the next opcode is a tail call, wouldn't an arbitrarily long
sequence of instructions take arbitrarily much stack space?

------
jononor
Impressive list of constrained targets for embedded. The AtMega1284
microcontroller for example has only 16 KB of RAM. Which is a lot for an 8-bit
micro, but pretty standard for a modern application processors.

~~~
vshymanskyy
Yup. TinyBLE is nRF51 SoC with 16Kb SRAM as well.

