Wasmi v0.32: WebAssembly interpreter is now faster than ever

ceronman · 2024-05-28T13:07:58 1716901678

Very interesting! They say they went from a stack-based IR, which provided faster translation but slow execution time to a register-based one, which has slightly slower translations, but faster execution time. This in contrast with other runtimes which provide much slower translation times but very fast execution time via JIT.

I assume that for very short-lived programs, a stack based interpreter could be faster. And for long-lived programs, a JIT would be better.

This new IR seems to be targeting a sweet spot of short, but not super short lived programs. Also, it's a great alternative for environments where JIT is not possible.

I'm happy to see all these alternatives in the Wasm world! It's really cool. And thanks for sharing!

herobird · 2024-05-28T13:27:21 1716902841

Thank you, that's a good summary of the article!

Even faster startup times can be achieved by so-called in-place interpreters that do not even translate the Wasm binary at all and instead directly execute it without adjustments. Obviously this is slower at execution compared to re-writing interpreters.

Examples for Wasm in-place interpreters are toywasm (https://github.com/yamt/toywasm) or WAMR's classic interpreter.

titzer · 2024-05-28T16:14:56 1716912896

There is also this (https://github.com/mbbill/Silverfir) and Wizard (https://github.com/titzer/wizard-engine).

I wrote a paper about Wizard's in-place interpreter and benchmarked lots of runtimes in (https://dl.acm.org/doi/abs/10.1145/3563311).

As there seem to be even more runtimes popping up (great job with wasmi, btw), it seems like a fun, maybe even full-time, job to keep up with them all.

herobird · 2024-05-28T16:22:40 1716913360

Also great job on Wizard btw! Your article about Wizard was a really interesting read to me.

The abundance of Wasm runtimes is a testimony of how great the WebAssembly standard really is!

herobird · 2024-05-28T12:00:52 1716897652

I was just told that I should state here that I am the author of the article, please ask me anything. :)

benatkin · 2024-05-28T14:36:39 1716906999

This looks to have come out of a blockchain company: https://github.com/wasmi-labs/wasmi/blob/master/NEWS.md#anno...

I think smart contract execution is a good application of WebAssembly. It seems promising!

herobird · 2024-05-28T14:42:52 1716907372

Wasmi is an independent project nowadays. And you are right that it was originally designed for efficient smart contract execution but with a scope for more general use.

benatkin · 2024-05-28T16:21:13 1716913273

I see it was included in the README. The other uses are also very interesting.

> It is an excellent choice for plugin systems, cloud hosts and as smart contract execution engine.

Would also be nice for a more dynamic sandboxed code running on mini home servers as something like https://sandstorm.io/

CaptainOfCoit · 2024-05-28T14:50:43 1716907843

Most of the funded and innovative work on WebAssembly & co seems to come from various cryptocurrency/blockchain groups, for better or worse.

k__ · 2024-05-28T16:59:13 1716915553

It's well suited for permissionless compute.

Only issue is, not all languages that compile to WASM are deterministic.

chrisco255 · 2024-05-28T17:40:40 1716918040

Yeah typically you cannot use garbage collectors nor multithreading in smart contract development.

k__ · 2024-05-28T19:39:50 1716925190

Interestingly, Lua has GC and is deterministic if you replace the random hash seed.

JonChesterfield · 2024-05-28T11:27:36 1716895656

> Wasmi intentionally mirrors the Wasmtime API on a best-effort basis

How does that correspond to implementing the wasi api? I think wasmtime is a javascript project for wasm on the web, is that a distinct thing to wasi?

(this file existing suggests wasmi contains an implementation of wasi https://github.com/wasmi-labs/wasmi/blob/master/crates/wasi/...)

herobird · 2024-05-28T11:41:44 1716896504

I see that this sentence was a bit vague. What was meant that Wasmi as a Rust library mirrors the Wasmtime API that can be found here: https://docs.rs/wasmtime/21.0.1/wasmtime/

But yes, Wasmi also supports WASI preview1 and can execute Wasm applications that have been compiled in compliance to WASI preview1.

flembat · 2024-05-29T05:14:23 1716959663

This is the sort of iterative process other interpreters have gone through as well, the balance between startup speed and memory usage leads to byte code, the stack based interpreter is replaced by a register based one. Later but not yet in this project, the byte code is replaced with a list of addresses, and then these are converted to machine code and inlined, in the simplest possible form of compilation, next code is generated for branches to make loops and flow control fast and we eventually get back to a jit. This gets saved to a binary image and we have a compiler.

herobird · 2024-05-29T06:53:43 1716965623

Yes this iterative process is indeed very visible. Wasmi started out as a mostly safe Rust interpreter and over time went more and more into a performance oriented direction.

Though I have to say that the "list of addresses" approach is not optimal in Rust today since Rust is missing explicit tail calls. Stitch applies some tricks to achieve tail calls in Rust but this has some drawbacks that are discussed in detail at Stitch's README.

Furthermore the "list of addresses" (or also known as threaded code dispatch) has some variance. From what I know both Wasm3 and Stitch use direct threaded code which stores a list of function pointers to instruction handlers and use tail calls or computed-goto to fetch the next instruction. The downside compared to bytecode is that direct threaded code uses more memory and also it is only faster when coupled with computed-goto or tail calls. Otherwise compilers nowadawys are pretty solid in their optimizations for loop-switch constructs and could technically even generate computed-goto-like code.

Thus, due to the lower memory usage, the downsides of using tail calls in Rust and the potential of compiler optimizations with loop-switch constructs we went for the bytecode approach in Wasmi.

syrusakbary · 2024-05-28T15:18:40 1716909520

Great analysis on all the runtimes.

I love all the improvements that Wasmi has been doing lately. Being so close to the super-optimal interpreter Stitch (a new interpreter similar to Wasm3, but made in Rust) is quite impressive.

As a side note, I wish most of the runtimes in Rust stopped adopting the "linker" paradigm for imports, as is a completely unnecessary abstraction when setting up imports is a close-to-zero cost

herobird · 2024-05-28T15:49:13 1716911353

Thank you! :)

when using lazy-unchecked translation with relatively small programs, setting up the Linker sometimes can take up the majority of the overall execution with ~50 host functions (which is a common average number). We are talking about microseconds, but microseconds come to play an importance at these scales. This is why for Wasmi we implemented the LinkerBuilder for a 120x speed-up. :)

syrusakbary · 2024-05-28T16:05:27 1716912327

I see [1], thanks for sharing. I'll need to dig a bit deeper in your implementation!

[1] https://github.com/wasmi-labs/wasmi/blob/master/crates/wasmi...

mgt19937 · 2024-05-28T11:06:37 1716894397

It remind me of typst, which use wasmi as its wasm plugin executor.

- https://typst.app/docs/reference/foundations/plugin/

eviks · 2024-05-28T12:02:52 1716897772

Would you say their use case is closer to the "Translation-Intense" type?

mgt19937 · 2024-05-29T03:51:04 1716954664

Not really. I think the wasm file is only translated once and most time is spent on executing it.

titzer · 2024-05-28T15:19:56 1716909596

I'm curious if you tried Wizard. A cursory look at some of the benchmarks looks like at least some have dependencies other than WASI. How did you run on them on the other engines?

herobird · 2024-05-28T15:25:28 1716909928

I am aware of Wizard and I think it is a pretty interesting Wasm runtime. It would be really cool if it was part of Wasmi's benchmark testsuite (https://github.com/wasmi-labs/wasmi-benchmarks). Contributions to add more Wasm runtimes and more test cases are very welcome.

The non-WASI test cases are only for testing translation performance, thus their imports are not necessary to be satisfied. This would have been the case if the benchmarks tested instantiation performance instead. Usually instantiation is pretty fast though for most Wasm runtimes compared to translation time.

titzer · 2024-05-28T15:56:27 1716911787

FWIW there are a bunch of benchmarks we've put up here:

https://github.com/composablesys/wish-you-were-fast/tree/mas...

They run on nearly all engines.

herobird · 2024-05-28T16:20:22 1716913222

Oh this is very interesting! I wish I knew about this before I wrote my own benchmarking suite. :D

benji-york · 2024-05-28T11:25:56 1716895556

I am confused as to why this is noteworthy—their engine benchmarks as much slower than the competitors.

herobird · 2024-05-28T11:46:39 1716896799

For startup performance Wasmi and Wasm3 are both the fastest engines according to the benchmarks. For execution performance you are right that generally JIT engines are expected to be faster than interpreter based Wasm engines.

Also, as stated in the article, on Apple silicon Wasmi currently performs kinda poorly but this will be improved in the future. On AMD server chips Wasmi is the fastest Wasm interpreter.

infogulch · 2024-05-28T14:55:18 1716908118

Do you have a good metric for where the break even point is? Some X-million instructions?

herobird · 2024-05-28T19:04:28 1716923068

No I do not but it is a very interesting question and probably not even answerable in practice because not every instruction takes the same amount of time to execute to completion. An outlier in this regard are for example host function calls which could do arbitrary things on the host side or bulk-memory operations which scale linearly with their inputs etc.

benji-york · 2024-05-28T12:58:27 1716901107

Thanks for the detail!

Jyaif · 2024-05-28T11:30:51 1716895851

There are a few benchmarks where Wasmi is the fastest interpreter, for example:

https://wasmi-labs.github.io/blog/posts/wasmi-v0.32/benches/...

benji-york · 2024-05-28T12:58:51 1716901131

I appreciate you pointing these out—thanks!

bryanrasmussen · 2024-05-28T10:29:29 1716892169

should change to :WebAssembly interpreter faster than ever.

herobird · 2024-05-28T11:19:29 1716895169

yep, the title somehow got cut at the end :S

nathell · 2024-05-28T11:45:49 1716896749

Someone should make a programming language called ‘ever’ with a reeeally slow interpreter, so that everyone else can now legit claim that their language is faster than ever.

giancarlostoro · 2024-05-28T12:51:00 1716900660

I know its a joke but it makes me wonder is there such a constant speed type of language? Maybe it would be enforced by an emulator its speed but it would be interesting to have a known speed to measure against.

FragenAntworten · 2024-05-28T14:48:18 1716907698

PICO-8 limits the speed of the virtual CPU ("4M vm insts/sec").

https://www.lexaloffle.com/pico-8.php

uonr · 2024-05-28T13:48:13 1716904093

real-time os?