Oh wow, really chuffed to see a blog post I wrote 4 years ago back on the front page of HN once again. It was a real rollercoaster, spending ages developing a talk, that I wasn't able to give due to a flight cancellation - then turning it into a blog post that lots of people seemed to like.
Honestly, WASM's "br"/"br_if" semantics is quite wacky: for a block "br"/"br_if" mean "break", and for a loop they mean "continue". I understand that they didn't want to have two separate opcodes which would only make sense in certain contexts but could we at least have gotten two mnemonics for them? Having "break/break_if" for targeting a block and "continue/continue_if" for targeting a loop would reduce some confusion and I don't think there are any difficulties in either assembling or disaseembling that.
I've seen only about three toy-ish WASM interpreters and they all translate those unorthodox control flow instructions into something like goto's during the parsing stage. Why did the WASM authors did it this way, I don't know: it's just as well possible to give types to traditional asm-like GOTOs and labels.
Structured control flow is more efficient to verify, more compact, and disallows irreducible loops by design.
Also note that "br" "br_if" and "br_table" are all just text format mnemonics. You can think of the "br" as "branch" instead of break. We didn't have different instructions for branching depending on the type of the label because it's not necessary, just a waste of opcode space.
> You can think of the "br" as "branch" instead of break
Yes, but the branch target depends on the targeted block's kind: it's the block-exit label for ifs and blocks and the block-start for loops. Just a bit unusual.
> We didn't have different instructions for branching depending on the type of the label because it's not necessary, just a waste of opcode space.
Yeah, I understand that, but it still could be two different mnemonics: 0x0C/0x0D could disassemble as "continue/continue_if" if the label targets a loop, and as "break/break_if" if it targets a non-loop; similarly for assembling. I imagine the added overhead would be minuscule.
Anyway, thanks for answering. I suppose forcing the structured control flow in the IR beats e.g. simplicity of compiling a for-loop in one pass.
block
;; code for INITIALIZATION
loop
;; code for CONDITION
i32.eqz
br_if 1
i32.const 0
loop
if
;; code for INCREMENT
br 2
end
;; code for BODY
i32.const 1
br 0
end
end
end
Well, more efficient to verify, because it is more restrictive
Meaning some control flows are unrepresentable and certainly less compact; since they must be 'interpreted'
I think that's probably the best decision for quick webpage loading, but given that WASM is generally used for code heavy webpages -- not quick jquery replacements, I'm not convinced it was the best decision overall
-- I love a lot of wasm, I just wish it was a better target for open ended computation; which is something gotos and coroutines are better suited for
That said; WASM is a clever trade off; but the trade-off is hidden within compilers (ref Go's wasm for instance)
Irreducible loops (loops with more than one way to enter) are not representable by structured control flow. It's possible to model with with state variables (e.g. loop over switch). Incidentally if irreducible control flow is modeled this way (set the control variable to a constant and then branch back to the loop header which immediately then switches on the variable), and the compiler performs jump threading with constant propagation (i.e. duplicate the loop header up to the switch, and fold the switch), the resulting control graph will again have irreducible control flow. In fact it will be the same as the original control flow graph. So it's possible to undo the inefficiency of the state variable with some transformations in the Wasm consumer.
It's not just that it's more restrictive, it's that it forces the producer to do work for the verifier. Structured control flow follows a stack discipline and also pre-declares all labels, so the consumer (verifier) can use the minimum data structures necessary, immediately reusing metadata for each control structure when it is exited.
What's the problem with multiple entry points anyway? If we're not using computed goto, or dynamically verifying it when we do, aren't all of our jump targets known, and so inherently safe?
What am i missing?
EDIT: is it just a question of speed of jit for register allocation / dominance frontiers?? But the verifier is providing safety right??
Of all the Wasm JITs I know of, none of the Web engine's optimizing compilers can handle irreducible control flow in their backends. Their register allocators and codegen algorithms would need to be modified. I worked on TurboFan, and several parts of it would choke on irreducible control flow. It's likely these engines would fall back to baseline compilation for such functions, which is a bigger perf impact than using state variables.
As far as I can tell (5 minutes of internet research) this was to allow easier compilation to JavaScript as a fallback in the days when WASM wasn't widely supported.
> The upcoming Go 1.11 release will have experimental support for WebAssembly. This will include full support for all of Go's features, including goroutines, channels, etc. However, the performance of the generated WebAssembly is currently not that good.
> This is mainly because of the missing goto instruction. Without the goto instruction we had to resort to using a toplevel loop and jump table in every function. Using the relooper algorithm is not an option for us, because when switching between goroutines we need to be able to resume execution at different points of a function. The relooper can not help with this, only a goto instruction can.
> It is awesome that WebAssembly got to the point where it can support a language like Go. But to be truly the assembly of the web, WebAssembly should be equally powerful as other assembly languages. Go has an advanced compiler which is able to emit very efficient assembly for a number of other platforms. This is why I would like to argue that it is mainly a limitation of WebAssembly and not of the Go compiler that it is not possible to also use this compiler to emit efficient assembly for the web.
[2019] Author builds a compiler that _targets_ WebAssembly, not a compiler that _takes in_ WebAssembly. Interesting but I was hoping for the other side, a minimal AOT or JIT wasm --> machine code compiler.
Works very well. I compiled the same c++ code to wasm and native (without optimization), the native version was slower than the wasm one running in this and jit'd, including startup time. Planning on using it for scripting for a game engine I'm slowly working on, instead of being locked into one language.
At least in Chrome, WASM is actually jitted and "tiered" on function level. Not sure how relevant this post still is, but AFAIK if anything changed then that the number of tiers has increased rather than decreased:
This may actually introduce visible "warmup-stutter" in rendering applications which is very unfortunate, but AFAIK the tiering was introduced to accommodate massive WASM blobs such as created by Unity which otherwise might take tens of seconds to AOT-compile.
You could use zero-knowledge proofs to prove that a given native binary was created from a specific compilation process and WASM binary. No privileged trusted authority necessary.
Though any pre-compiled binaries would be browser version specific and CPU architecture specific, so it might be too much of a pain for browsers and websites to bother with yet for small gains.
For 99% of us that's just an implementation detail anyway. Although I remember reading that Firefox can compile wasm faster than normal network transfer rates so maybe AOT makes more sense there.
We do streaming baseline compilation as fast as the network can hand us bytes, but we also kick off a more optimized compilation in a background thread.
A lot of optimizations can be baked into the generated wasm, but you still need to spend some time doing eg register allocation.
> A lot of optimizations can be baked into the generated wasm, but you still need to spend some time doing eg register allocation.
Such a shame WASM is a stack machine. If it wasn't we could have had fast, singlepass compilation with near native performance without any complex optimizing recompilers or multi level JITs.
(This is not speculation. I actually wrote a VM which executes code as fast as wasmtime but compiles 160 times faster and guarantees O(n) compilation.)
Hard work pays off in the end :-)