Makepad-stitch: WASM interpreter in Rust, 15kloc, 0 deps, faster than wasm3

rikarendsmp · 2024-05-14T20:33:16

For use in makepad (our Rust IDE/UI framework) as our extension-system Eddy Bruël built a new experimental spec compliant wasm interpreter that relies on sibling call optimisation to get within 4x of a wasm JIT. It has no dependencies and compiles in 2 seconds. Can ship on iOS and is very useful for low weight app extension systems

ejpbruel · 2024-05-14T21:17:53

I'm the author of Stitch. Rik is somewhat overpromising: Stitch is only faster than Wasm3 on Windows (by about 15%). On Mac and Linux, it is about as fast. It does get within 4x of Wasmtime (which is the JIT I've compared it against), but only on Mac. On Linux and Windows, it gets within 8x.

rikarendsmp · 2024-05-14T21:19:57

You should test it on a fast CPU tho on linux, its likely faster there. As you have said its likely an instruction cache issue.

nextaccountic · 2024-05-15T11:51:59

> The reason Stitch is slightly faster than Wasm3 on Mac, but slightly slower on Linux, is likely because Stitch has more variants per instruction compared to Wasm3, which puts pressure on the instruction cache. I suspect this gives Stitch the edge on the Apple M2 Pro, with its large instruction cache, but Wasm3 the edge on the Intel Xeon E312xx, with its smaller instruciton cache.

What means "more variants per instruction" here?

ejpbruel · 2024-05-16T06:48:05

Stitch lazily compiles each wasm function to a form of threaded code in which each instruction has multiple variants, depending on where it loads its operands from.

Each operand is either stored on the stack (s), a virtual register (r), or as an immediate value (i). The add instruction, for instance, has an add_ss variant, an add_rs, and an add_ri variant, among others.

Most instructions store their result in a register, so that subsequent instructions operating on the result can avoid a stack load.

i32/i64/f32/f64.const instructions are completely elided: instead of storing constant values on the stack, they are stored as an immediate operand for the next instruction (this is one area where stitch differs from wasm3, which preloads all constants for a function on the stack every time the function is called)

I hope that clarifies things :-)

ubedan · 2024-05-14T22:56:16

Are the requirements for fast rust compilation written down somewhere? Is that something you'd consider sharing, possibly with all of the tricks you've learned about how to get around various situations that would otherwise lead to slow compilations?

tredre3 · 2024-05-14T23:06:55

In this specific case the fast compilation is down to having 0 dependencies.

Rust is usually only rivalled by npm in terms of number of dependencies that a typical project pulls. This project is a refreshing take!

sim7c00 · 2024-05-15T06:22:57

people using deps for trivial things and those trivial things pulling 100s more deps for more trivial things is not really a rust or npm issue. its an issue of people becoming really lazy in light of amazing package management tools. its weird how good pkg mgmt has this totally shit side effect for compilation, but it sure makes buildong things faster. (which in rusts case i guess can go a long way as coding stuff can also take alot of time, not familiar with js land there)

rikarendsmp · 2024-05-15T08:19:11

Yea most dependencies in Rust land are entirely avoidable. Most even easily (like left-pad), sometimes its hard like a wasm engine. This wasm engine is the result of us trying to avoid a dep on a wasm engine from the ecosystem. It bloated the build time of our IDE by 300% to just include it (we were just 25% of the buildtime if i include wasmtime). Now was that a good idea? I don't know but we're getting very close now to a full product whilst still compiling in release build with all deps in <10s on a new mac. I'm personally really opposed to using small convenience libraries. Just write the code you need in a 'slightly less convenient way'. Use the standard library if you can. No need to pile on say entire networking stacks if you can talk to the platform networking apis for instance.

eviks · 2024-05-15T09:57:35

> its weird how good pkg mgmt has this totally shit side effect for compilation,

that's because the package management tools aren't amazing, those would distribute prebuilt stuff pluggable into your project with close to 0 build time overhead vs. today where everyone has to constantly build all the deps

rikarendsmp · 2024-05-15T06:17:39

Essentially it is: avoid all dependencies, and if you take them on, like platform libs strip them down to only what you use

koolala · 2024-05-15T03:22:24

How well does it perform when evaluating itself? Can it bootstrap?

ejpbruel · 2024-05-15T05:57:48

I haven’t tried this, but Stitch should technically be able to run itself if we gave it a C-like API that you could export from a Wasm module. Would be fun to try at some point :-)

wahern · 2024-05-15T21:44:45

Wouldn't you need to add support for the tail call extensions, return_call (opcode 0x12) and return_call_indirect (opcode 0x13)? Stitch doesn't appear to implement those, even though Stitch itself relies on TCO for instruction dispatch, for which (IIUC) LLVM would emit return_call or return_call_indirect.

ejpbruel · 2024-05-16T06:39:07

That’s a good point! I did’t even think about that. For Stitch to become self hosting, we’d either have to implement those instructions, or implement a fallback mode using an interpreter loop with a trampoline. The latter would negate most of the speed benefits of Stitch though.

koolala · 2024-05-18T04:53:34

Interesting, thank you for explaining so clearly.

nextaccountic · 2024-05-15T11:50:43

Can you add a comparison with wasmer as well?