Hacker News new | past | comments | ask | show | jobs | submit login
Build your own WebAssembly Compiler (2019) (scottlogic.com)
161 points by davikr on Dec 3, 2023 | hide | past | favorite | 32 comments



Oh wow, really chuffed to see a blog post I wrote 4 years ago back on the front page of HN once again. It was a real rollercoaster, spending ages developing a talk, that I wasn't able to give due to a flight cancellation - then turning it into a blog post that lots of people seemed to like.

Hard work pays off in the end :-)


Just letting you know: love the high-quality blogs you guys always write, especially the few on webgl!


    (block
     (loop
       [loop condition]
       i32.eqz
       [nested statements]
       br_if 1
       br 0)
     )
That's... not really the while loop, is it? That should've been something like this IMO

    (block
     (loop
       [loop condition]
       i32.eqz
       br_if 1
       [nested statements]
       br 0)
     )
Honestly, WASM's "br"/"br_if" semantics is quite wacky: for a block "br"/"br_if" mean "break", and for a loop they mean "continue". I understand that they didn't want to have two separate opcodes which would only make sense in certain contexts but could we at least have gotten two mnemonics for them? Having "break/break_if" for targeting a block and "continue/continue_if" for targeting a loop would reduce some confusion and I don't think there are any difficulties in either assembling or disaseembling that.

I've seen only about three toy-ish WASM interpreters and they all translate those unorthodox control flow instructions into something like goto's during the parsing stage. Why did the WASM authors did it this way, I don't know: it's just as well possible to give types to traditional asm-like GOTOs and labels.


Structured control flow is more efficient to verify, more compact, and disallows irreducible loops by design.

Also note that "br" "br_if" and "br_table" are all just text format mnemonics. You can think of the "br" as "branch" instead of break. We didn't have different instructions for branching depending on the type of the label because it's not necessary, just a waste of opcode space.


> You can think of the "br" as "branch" instead of break

Yes, but the branch target depends on the targeted block's kind: it's the block-exit label for ifs and blocks and the block-start for loops. Just a bit unusual.

> We didn't have different instructions for branching depending on the type of the label because it's not necessary, just a waste of opcode space.

Yeah, I understand that, but it still could be two different mnemonics: 0x0C/0x0D could disassemble as "continue/continue_if" if the label targets a loop, and as "break/break_if" if it targets a non-loop; similarly for assembling. I imagine the added overhead would be minuscule.

Anyway, thanks for answering. I suppose forcing the structured control flow in the IR beats e.g. simplicity of compiling a for-loop in one pass.

    block
        ;; code for INITIALIZATION
        loop
            ;; code for CONDITION
            i32.eqz
            br_if 1
        
            i32.const 0
            loop
                if
                    ;; code for INCREMENT
                    br 2
                end
        
                ;; code for BODY
                i32.const 1
                br 0
            end
        end
    end


Well, more efficient to verify, because it is more restrictive

Meaning some control flows are unrepresentable and certainly less compact; since they must be 'interpreted'

I think that's probably the best decision for quick webpage loading, but given that WASM is generally used for code heavy webpages -- not quick jquery replacements, I'm not convinced it was the best decision overall

-- I love a lot of wasm, I just wish it was a better target for open ended computation; which is something gotos and coroutines are better suited for

That said; WASM is a clever trade off; but the trade-off is hidden within compilers (ref Go's wasm for instance)


Nothing is unrepresentable, if I’m not mistaken.

https://en.m.wikipedia.org/wiki/Structured_program_theorem

Rather, I think the trade off was more in the realm of “aggressively optimize for what the web does well”.

With that being said, I do wish it was better suited for general purpose computation.


Irreducible loops (loops with more than one way to enter) are not representable by structured control flow. It's possible to model with with state variables (e.g. loop over switch). Incidentally if irreducible control flow is modeled this way (set the control variable to a constant and then branch back to the loop header which immediately then switches on the variable), and the compiler performs jump threading with constant propagation (i.e. duplicate the loop header up to the switch, and fold the switch), the resulting control graph will again have irreducible control flow. In fact it will be the same as the original control flow graph. So it's possible to undo the inefficiency of the state variable with some transformations in the Wasm consumer.


It's not just that it's more restrictive, it's that it forces the producer to do work for the verifier. Structured control flow follows a stack discipline and also pre-declares all labels, so the consumer (verifier) can use the minimum data structures necessary, immediately reusing metadata for each control structure when it is exited.


Since you're giving great answers ;)

What's the problem with multiple entry points anyway? If we're not using computed goto, or dynamically verifying it when we do, aren't all of our jump targets known, and so inherently safe?

What am i missing?

EDIT: is it just a question of speed of jit for register allocation / dominance frontiers?? But the verifier is providing safety right??


Sure, you can still have safe code with irreducible loops. Safety is orthogonal. There is a proposal to add "multi-loop" (https://gist.github.com/conrad-watt/6a620cb8b7d8f0191296e3eb...) to Wasm, but it hasn't been a high priority.

Of all the Wasm JITs I know of, none of the Web engine's optimizing compilers can handle irreducible control flow in their backends. Their register allocators and codegen algorithms would need to be modified. I worked on TurboFan, and several parts of it would choke on irreducible control flow. It's likely these engines would fall back to baseline compilation for such functions, which is a bigger perf impact than using state variables.


I was hoping we’d have someone more qualified to answer this question, but whatever. Holy shit!

https://news.ycombinator.com/user?id=titzer

I love this site


As far as I can tell (5 minutes of internet research) this was to allow easier compilation to JavaScript as a fallback in the days when WASM wasn't widely supported.

"Please add goto" issue has been open since 2016:

https://github.com/WebAssembly/design/issues/796

Most interesting comment:

> The upcoming Go 1.11 release will have experimental support for WebAssembly. This will include full support for all of Go's features, including goroutines, channels, etc. However, the performance of the generated WebAssembly is currently not that good.

> This is mainly because of the missing goto instruction. Without the goto instruction we had to resort to using a toplevel loop and jump table in every function. Using the relooper algorithm is not an option for us, because when switching between goroutines we need to be able to resume execution at different points of a function. The relooper can not help with this, only a goto instruction can.

> It is awesome that WebAssembly got to the point where it can support a language like Go. But to be truly the assembly of the web, WebAssembly should be equally powerful as other assembly languages. Go has an advanced compiler which is able to emit very efficient assembly for a number of other platforms. This is why I would like to argue that it is mainly a limitation of WebAssembly and not of the Go compiler that it is not possible to also use this compiler to emit efficient assembly for the web.

^ https://github.com/WebAssembly/design/issues/796#issuecommen...


Unlike most other JITs V8 keeps the block structure of JS internally and they did not want to add GOTOs to their internal representation.


If you're interested in this, there's also https://wasmgroundup.com/ currently in early access.

I've worked through the first couple of chapters and found it pretty well structured and helpful.


Thanks! I was looking for a wasm book a few weeks ago, and most looked (understandably) out of date. Very helpful!

The frontend masters course by Jem Young was really good as well. https://frontendmasters.com/courses/web-assembly/


Discussed at the time:

Build Your Own WebAssembly Compiler - https://news.ycombinator.com/item?id=21889614 - Dec 2019 (19 comments)


[2019] Author builds a compiler that _targets_ WebAssembly, not a compiler that _takes in_ WebAssembly. Interesting but I was hoping for the other side, a minimal AOT or JIT wasm --> machine code compiler.



Works very well. I compiled the same c++ code to wasm and native (without optimization), the native version was slower than the wasm one running in this and jit'd, including startup time. Planning on using it for scripting for a game engine I'm slowly working on, instead of being locked into one language.


Did I understand correctly that the native code was compiled without optimization? Meaning “-O0”?


Which benchmark did you use?


For AOT, the simplest approach, that actually produces the fastest native code, is to naively translate WASM opcodes to C.

This is for example what W2C2 does: https://github.com/turbolent/w2c2


Isn’t one of the main selling points of WASM it’s runtime performance? I’d be surprised if there was interest in a JIT compiler for it.


At least in Chrome, WASM is actually jitted and "tiered" on function level. Not sure how relevant this post still is, but AFAIK if anything changed then that the number of tiers has increased rather than decreased:

https://v8.dev/docs/wasm-compilation-pipeline

This may actually introduce visible "warmup-stutter" in rendering applications which is very unfortunate, but AFAIK the tiering was introduced to accommodate massive WASM blobs such as created by Unity which otherwise might take tens of seconds to AOT-compile.


Maybe something like signed compilations is an answer -- a trusted source signs and verifies pre-compiled code

Then browsers can skip verification+jit etc if they trust the signature


You could use zero-knowledge proofs to prove that a given native binary was created from a specific compilation process and WASM binary. No privileged trusted authority necessary.

Though any pre-compiled binaries would be browser version specific and CPU architecture specific, so it might be too much of a pain for browsers and websites to bother with yet for small gains.


For 99% of us that's just an implementation detail anyway. Although I remember reading that Firefox can compile wasm faster than normal network transfer rates so maybe AOT makes more sense there.


We do streaming baseline compilation as fast as the network can hand us bytes, but we also kick off a more optimized compilation in a background thread.

A lot of optimizations can be baked into the generated wasm, but you still need to spend some time doing eg register allocation.


> A lot of optimizations can be baked into the generated wasm, but you still need to spend some time doing eg register allocation.

Such a shame WASM is a stack machine. If it wasn't we could have had fast, singlepass compilation with near native performance without any complex optimizing recompilers or multi level JITs.

(This is not speculation. I actually wrote a VM which executes code as fast as wasmtime but compiles 160 times faster and guarantees O(n) compilation.)


Interesting. Would the differences in binary sizes be prohibitive for use on the internet?


> Would the differences in binary sizes be prohibitive for use on the internet?

No. They are competitive. For one of my benchmarks I get a WASM blob that is 90KB, and for my VM I get 76KB. (Both are stripped of any debug info.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: