
WebAssembly architecture for Go - diakritikal
https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4/preview#heading=h.mjo1bish3xni
======
pcwalton
The main benefit of using the Web Assembly GC is not improved performance
within the language, but rather having a single GC that handles both native
DOM objects and language objects. Having two GCs interoperate is a big pain,
and it's very difficult to pull off without leaks or performance problems.
This is especially true for DOM objects, which have no user-visible
finalizers. (The closest thing is WeakMaps, but I don't believe these are
enough to implement proper cross-language GC, because you can't query whether
an potentially-dead object is actually dead.)

Web Assembly GC is not a performance optimization; it's necessary for
correctness.

~~~
sseth
In Chrome today, as an example, the C++ based DOM and V8 (Javascript ) objects
are using different GCs. Obviously it is a pain point, but the situation of
having different GCs for DOM and language objects is not new with Web
Assembly.

~~~
slimsag
It's also worth mentioning WASM has no GC today.

~~~
pjmlp
My amd64 and ARM CPUs also don't have it.

On languages where the algorithm is an implementation detail, one can make use
of reference counting with a cycle collector, which are just a few hundred
lines.

Implementing one is pretty simple, making it perform well is another matter.

~~~
hajile
Unlike your real processor though, you can't directly access memory or ensure
its absolute location. You can make an advanced GC that is slow or a basic one
that is not.

------
mutagen
What kind of WASM file sizes can we expect? I know some work has gone into
shrinking Go executables, especially in 1.7, but will we be able to produce
something like the 15KB Rust Hello World WASMs? Go has a fantastic stdlib but
it hasn’t prioritized web optimized file sizes (yet).

I’m looking forward to the results of this work!

~~~
steveklabnik
A "hello world" in Rust is ~100 bytes; the more stuff you use, the bigger it
gets, as less of the stdlib can be removed. The biggie is an allocator, which
adds some size, but you can use things like
[https://github.com/fitzgen/wee_alloc](https://github.com/fitzgen/wee_alloc)
and it's less than a kilobyte...

------
camdenlock
This is huge. Looking forward to see what sort of front-end toolkits will pop
up for this new platform (Go for the client-side web).

~~~
zenhack
Worth noting, gopherjs has been around forever, and is mature:

[https://github.com/gopherjs/gopherjs](https://github.com/gopherjs/gopherjs)

...and already has a reasonable js interop story, whereas my understanding is
that calling the dom api from wasm is not the simplest thing.

------
johnhenry
I'm trying to learn more about this topic and I'm curious if anyone could
clarify for me -- The reason Go would need a specific architecture for
WebAssembly is because Go supports features, like garbage collection, that
WebAssembly does not.

Is that right? Close? An oversimplification. Way off?

~~~
goalieca
Follow-up question, why can't LLVM -> Web Assembly solve the problem?

~~~
sitkack
x86 elf -> wasm could solve this problem.

~~~
slrz
No, it wouldn't. At that point you already lost lots of information and your
only option is to faithfully reproduce the behaviour of the x86 machine.

You can still do it, of course and performance isn't even that bad. See
Fabrice Bellard's x86 emulator using asm.js.

[https://bellard.org/jslinux/tech.html](https://bellard.org/jslinux/tech.html)

~~~
sitkack
Ok, pick a better ISA, like RISC-V.

------
kodablah
> Go’s garbage collection is fully supported. WebAssembly is planning to add
> its own garbage collection, but it is hard to imagine that it would yield a
> better performance than Go’s own GC which is specifically tailored to Go’s
> needs

I don't think it's hard to imagine reading the GC proposal. The JS collector
that might be reused could be off thread, something WASM can't (yet) do.

> Most file system operations are mapped to Node.js’ “fs” module. In the
> browser, file system operations are currently not available.

Please please abstract this. As a maintainer of a non-JS WASM backend, I'd
love to use Go too.

> Especially a “goto” operation in WebAssembly would be very helpful.

I didn't look into the Go use case enough, but curious how much better this
would be than the current labeled block and labeled break approach in WASM.
WASM has fairly strict stack/frame rules/types, so arbitrary gotos wouldn't
work.

~~~
ramenmeal
> WASM has fairly strict stack/frame rules/types, so arbitrary gotos wouldn't
> work.

kinda funny because golang follows the same idea.

~~~
kodablah
So I am curious how the presence of them in WASM would help

~~~
dualogy
Jump-to-arbitrary-address / labels-as-values isn't made available to Go devs
by Go-the-language, but seems required by Go-the-runtime+compiler, such as for
compiling Goroutines.

~~~
skybrian
It sort of does via channels, though. Sending a value to a bufferless channel
causes the next goroutine to run (eventually). This looks a bit like a goto if
you squint.

Go needs multiple stacks and a way to switch to executing a different stack.
Web assembly could provide this in a high-level way.

------
amelius
Strange that wasm doesn't support go-to instructions. Didn't they see the need
for it coming? Most compiler backends need it, I suppose.

~~~
obl
I'm assuming it's because they wanted existing js/asm.js JITs to easily accept
wasm, and those only see structured control flow.

------
truncate
So when they say, it generates a big switch statement, do they generate this
big switch statement for whole program or for individual functions which takes
enough context to continue at at point where it yielded?

~~~
skybrian
Speculating, but it sounds to me like it's within individual functions. Go's
program counter is "[split] into 2 parts: PC_F and PC_B. PC_F is the index of
the function to be executed. PC_B is the index of the basic block to be
executed."

------
MrBuddyCasino
Afaik in theory stack machines are simpler to implement, but slower than
register machines. In practise, the JVM is still pretty quick. Why was WASM
designed as a stack machine?

~~~
comex
What you’re saying applies to interpreters, which execute the program opcode
by opcode, and (in the case of stack machines) keep an actual stack at
runtime. WebAssembly was designed to be the input for (at least mildly)
optimizing JIT compilers. Since the stack is required to be at a fixed depth
at each instruction, when the compiler reads the instruction, it can map each
of the inputs it pops to a single earlier instruction that pushed it[1], and
record a reference directly to that in its internal IR. After that it
completely forgets about the stack. Later on it’ll do register allocation for
the native architecture, so the generated native code makes efficient use of
registers.

Or in other words, the “stack machine” is basically just a compact encoding of
an AST.

(Note that the native “stack” is an entirely separate thing which does exist
at runtime.)

[1] Actually, there can be multiple possible instructions from, e.g., inside
the ‘if’ and ‘else’ sides of a prior if-block (which can push values that stay
on the stack after the end of the block). But since both sides have to push
the same number of values, this is still tractable; it turns into a phi node
in SSA representation.

~~~
MrBuddyCasino
Makes sense, thanks for the detailed response!

------
FullyFunctional
Interestingly the issues sounds a lot like what a high performance Haskell
implementation would run into. One additional note though on GC: highly tuned
PL implementation requires completely control over object layout, pointer
representation, and GC. (!)

No off-the-shelf GC will ever be perfect for all languages, thus I think WASM
would be far better off doing whatever it could to facilitate runtimes
implementing their _own_.

(!) Example from the lazy world: lazy closures when evaluated and updated
often results in an indirection. We rely on GC to shortcut these. Sometimes we
even have GC perform trivial constant-time evaluations when we know the result
is smaller than the suspended evaluation.

Example from the Lisp world: cdr-coded cons cells can cut size in half, but no
generic GC would be able to do this.

------
api
Why is WASM so weird? Why not just thinly model modern CPUs so it can be
almost transliterated?

~~~
zenhack
One big advantage of stack machines is that they tend to be better on code
size, which very much matters when you're transferring code over the network.
I expect other parts of the design make it easier to JIT; the lack of
arbitrary gotos probably makes analyzing control flow easier, for example.

------
tbodt
If you tried to look at the document and found that the entire internet has
been screwing with the formatting, change "Suggesting" to "Viewing" in the top
right. Seems like someone forgot to check permissions on the document.

~~~
LyndsySimon
Honestly, the defacement is at least as interesting as the original document.

~~~
hashkb
That's not the reason this was posted?

