
WebAssembly Troubles Part 1: WebAssembly Is Not a Stack Machine - panic
http://troubles.md/posts/wasm-is-not-a-stack-machine/
======
titzer
[one of the original Wasm designers here]

Responding to the OP, since there is no comment section on the site.

First off, this rant gets the history of Wasm wrong and the facts of Wasm
wrong. I wouldn't unload on a random person on the internet generally, but I
would like to point a sentence like:

> Not only that, but for the most part the WebAssembly specification team were
> flying blind.

It's an ad hominem. This really just impugns people and invites an argument.
It might be cathartic, but generally it doesn't advance the conversation to
cast aspersion like this.

And it's not true. I can tell you from first hand experience that a baseline
compiler was absolutely on our minds, and Mozilla already had a baseline
compiler in development throughout design. The Liftoff design that V8 shipped
didn't look too different from the picture in our collective heads at the
time. And all of us had considerable experience with JIT designs of all kinds.

As for the history. The history is wrong. The first iteration of Wasm was in
fact a pre-order encoded AST. No stack. The second iteration was a post-order
encoded AST, which we found through microbenchmarks, actually decoded
considerably faster. The rub was how to support multi-value returns of
function calls, since multi-value local constructs can be flattened by a
producer. We considered a number of alternatives that preserved the AST-like
structure before settling on that a structured stack machine is actually the
best design solution, since it allowed the straightforward extension to multi-
values that is there now (and will ship by default when we reach the two-
engine implementation status).

As for the present. Wasm blocks and loops absolutely can take parameters; it's
part of the multi-value extension which V8 implemented already a year ago.
Block and loop parameters subsume SSA form and make locals wholly unnecessary
(if that's your thing). Locals make no difference to an optimizing compiler
like TurboFan or IonMonkey. And SSA form as an intermediate representation is
not as compact as the stack machine with block and loop parameters which is
the current design, as those extra moves take space and add an additional
verification burden.

A final point. Calling Wasm "not a stack machine" is just a misunderstanding.
All operators that work on values operate on the implicit operand stack. This
is the very the definition of a stack machine. The fact that there is
additional mutable local storage doesn't make it not a stack machine.
Similarly, the JVM has mutable typed locals and yet is a stack machine as
well. The JVM (prior to 6) allowed completely unstructured control flow and
use of the stack, leading to a number of problems, including a potentially
cubic verification time. We fixed that.

All that said, there might be a design mistake in Wasm bytecode. Personally, I
think we should have implicitly loaded arguments to a function onto the
operand stack, which would have made inlining even more like syntactic
substitution and further shortened the bodies of very tiny functions. But this
is a small thing and we didn't think about it at the time.

[edit: Perhaps "ad hominem" is a bit strong. It feels different to be on the
receiving of a comment like "flying blind"\--it doesn't mean the same thing to
the sender and receiver--especially when this was really not the case, as I
state here.]

~~~
tomxor
> It's an ad hominem. This really just impugns people and invites an argument.
> It might be cathartic, but generally it doesn't advance the conversation to
> cast aspersion like this.

Ignoring any factual incorrectness, I can not see how the author could have
made his point in a more respectful way. He clearly has great enthusiasm for
WASM and respect for it's authors, I am struggling to see how anyone could
have interpreted it as cathartic...

The paragraph in which your excerpt originated makes this pretty clear:

> The developers of the WebAssembly spec aren’t dumb. For the most part it’s
> an extremely well-designed specification [...] I considered WebAssembly’s
> design to be utterly rock-solid, and in fact I still strongly believe that
> most of the decisions made were the right ones. Although it has problems,
> it’s incredible how much the WebAssembly working group got right considering
> it was such relatively unknown territory at the time of the specification’s
> writing.

~~~
hinkley
In an engineering discipline, asserting that someone is 'flying blind' could
very, very easily be taken as offensive. Knowing what's going on and why is so
fundamental to 'good engineering' practice that you basically are calling the
people ethical failures. 'Impugn' is a perfectly reasonable word for how
someone might react to such an aspersion.

Maybe in the future don't accuse engineers of 'flying blind' if you aren't
inviting return fire.

From context there was a lot of conjecture going on, but the big challenge
with building something new is what order to build the bits in to give you the
most useful information fastest. As the number of people goes above 2 the odds
that everyone agrees or that 'everyone' is right drop rapidly toward zero. You
do the best you can, and hope it's good enough that you still have time to
react to the worst of the decisions you made earlier. But it's not 'flying
blind'.

~~~
jungler
Wow, not the kind of engineering room I'd want to be in. You have to be able
to make claims that the other party does not have a complete picture of the
situation, and an external critic is indeed going to be vulnerable to the same
criticism.

Maybe you would have a point if it were a Linus-style "only a fucking idiot
would" rant. But responding to a sincere attempt to defend a design decision
as if it were an insult is some prima donna behavior.

------
pizlonator
Recomputing liveness is not really a big deal. Can be quite cheap, especially
over a register based IR.

I think that this article overstates the impact of all of this.

~~~
tom_mellior
Yes. The article is obsessed with the code quality generated by streaming
compilers, which is probably the wrong thing to focus on. A real high-
performance backend has no trouble reconstructing SSA form and using it for
optimizations. But forcing frontends to emit SSA would be a burden on them.
(LLVM bitcode formally requires SSA form as well, but this can be worked
around by using allocas.)

It might, however, make sense to have another standard "SSA WebAssembly"
program representation. There could then be standard tooling to compile
vanilla WebAssembly to the SSA form, frontends could choose which variant they
want to emit, and backends preferring SSA as input could still be made happy.

~~~
nonsince
Author here:

I'm obsessed with the quality of streaming compiler-emitted code for a few
reasons. Firstly, I'm working on an optimising streaming compiler. Secondly, I
work for a blockchain company and we can realistically only allow linear-time
compilation, this doesn't necessarily mean streaming compilation but we might
as well make it both (I explain why we need linear-time compilation in a
different article [http://troubles.md/posts/why-
wasm/](http://troubles.md/posts/why-wasm/)). Thirdly, anything that gives
streaming compilers more information also means that non-streaming compilers
have to reconstruct less information, and lastly in this particular case there
is no reason (except for backwards compatibility constraints) why we can't
preserve more of the information from the front-end and have streaming
compilers emit better code.

~~~
pizlonator
A streaming compiler can emit really great code even without liveness. It’s
not clear to me what optimizations you’re hoping to get from this. To do most
SSA optimizations you need a backend that can lower from SSA, which is not
linear afaik. Register allocation might be helped a bit by liveness, but you
can get block-local liveness information in linear time already - so for your
thing to be better you’d have to prove that there is something sweet about
having a non-SSA compiler that does register allocation using imperfect
liveness information, which was provided by an adversary. Then you’d have to
prove that this is ok - that an adversary can’t force you to do more work than
you want by lying about liveness. It’s probably not ok; for worst case perf
you’re almost certainly better off not trusting provided liveness info and
reconstructing it yourself on a block-local basis.

Anyway. I could tell you a lot more about how to design compilers but I have
to take my kid to school.

~~~
nonsince
A statically-typed stack machine like Wasm is homomorphic to SSA form with
liveness, and it's impossible to lie about liveness in this format. Most of
the complexity in the streaming compiler that I'm working on is around
producing good code for locals when we have no liveness information for them.
I explain why this is in the article.

~~~
pizlonator
It’s not guaranteed that using the liveness implicit in the SSA that falls out
of a stack language is going to give you better code in less time than a
block-local register allocation with locals live at block boundaries spilled
to the stack.

~~~
titzer
Indeed, and for good spilling decisions, you'll want to have next-use distance
information for values. While a stack machine gives you an approximation of
this (deeper in the stack is farther in the future), for best results I
imagine you'll want to do two passes anyway, so locals are no worse for this,
other than at block boundaries, if you lack liveness you have to spill them.

------
devbug
As someone in the midst of building a game in C for 7 platforms, with
WebAssembly being one of them, my main disappointment is with the lack of
coroutines (or lack of control over the stack to implement them.) It hinders
how wide my engine can go since I'm limited to a fork-and-join model for
splitting work across threads. Poor code generation is also another pain
point, but I fully expect that to improve drastically over the coming year.

Overall, I'm pretty excited for WASM and the implications of it, but it does
feel like the web has regressed in the ability to deliver games.

~~~
ljackman
Most common coroutine implementations, such as JavaScript's and Python's, are
delimited or "symmetric". This means the most obvious implementation is in
terms of compiler transformations in the source language. It seems out of
WASM's scope to do this.

Undelimited "asymmetric" coroutines, like Lua's, could be an interesting
addition. That still seems to me to be too high level a feature for a
"portable assembly language" specification though.

~~~
wahern
I think you might be conflating characteristics regarding delimited and
symmetric coroutines. But let's step back.

JavaScript's and Python's choice of coroutine styles was constrained and
effectively dictated by runtime limitations. CPython, V8, and similar
implementations mix their C and assembly callstacks with their logical
language callstacks. Because the host runtimes didn't readily support multiple
stacks without a complete rewrite, this bled into the language runtime. There
was a path dependency whereby early _implementation_ _choices_ directed the
evolution of the _language_ _semantics_.

WASM is recapitulating the same cycle. Which is understandable because time is
limited and you can't make the perfect the enemy of the good, but you still
have to recognize it for what it is--a vicious cycle of short sightedness. If
WASM doesn't provide multiple stacks as a primitive resource, then things like
stackful coroutines, fibers, etc, will have to be emulated (at incredible
cost, given WASM's other constraints regarding control flow). And if they have
to be emulated they'll be slow, which means languages will continue avoiding
them.

~~~
ljackman
I agree that CPython and V8 omitting the ability to juggle multiple stacks is
a mistake. For higher-level languages, undelimited coroutines or continuations
allow for very useful abstractions like Go's and Erlang's transparently non-
blocking IO.

However, it doesn't seem to be _entirely_ an implementation detail. Some
developers just don't seem to like the semantics of called functions being
able to cooperatively yield without the caller explicitly opting into it with
a keyword like `await`. I disagree with them, but it's a legitimate complaint
I've a few times.

It reminds me of arguments in the Lisp community about delimited continuations
and undelimited, i.e. Common Lisp and Scheme. A lot of the arguments there are
really about semantics and not implementation details, and come to the same
point: should cooperative scheduling require explicit notation at each level
of the call stack?

My view on this is that systems languages like C and Rust should require
explicit notation for it whereas application languages should not. This seems
to be a point in favour of Go and Erlang over Java and C#.

However WASM, similar to C or Rust, seems to target a level in the tech stack
at which it should concern itself only with abstractions that have relatively
direct translations to the instruction set of the underlying hardware. Support
for multiple stacks doesn't fit into this from what I can see. (A similar
argument can be made for WASM not supporting garbage collection too, although
it looks like that'll be added at some point to make interoperability with JS
smoother.)

With the JVM supposedly adding fibres soon, it poses a question for WASM: is
it trying to be a portable assembly language, a portable high-level language
runtime, or something in between?

------
bogomipz
The author states:

>"This means that you have overhead associated with compilation - knowing the
liveness of variables is extremely important for generating efficient
assembly, but instead of the liveness being calculated when creating the IR
and stored as a part of it you have to recalculate this data every time."

Can someone say what is involved in calculating "liveliness"? What is the
procedure for doing so?

~~~
tom_mellior
You iterate over the program (every individual function, really) backwards. A
use of a variable means that it is "live" before that point; a definition
(i.e., a write into) a variable means that it is "dead" before that point.
That is, at any point, a variable being "live" means that its value at that
point may be used in the future. Liveness is especially important for register
allocation: If two variables are both live at some program point (and cannot
be proved to have the same value), the compiler must place them in different
registers or stack slots.

As an aside, liveness is also useful for some other things. For example, a
variable that is live at the start of a function is one that may be used
without being initialized, and the compiler can emit a warning for it.

[https://en.wikipedia.org/wiki/Live_variable_analysis](https://en.wikipedia.org/wiki/Live_variable_analysis)

Edit: BTW, it really is "liveness", not "liveliness".

------
afiori
One thing I dislike about how criticism of WebAssembly are formulated is that
often they refer to the MVP as final product. Tail calls are important but not
essential, many functional languages have a C runtime; the point is if they
(or an equivalent alternative) can be added properly or if the standard is not
flexible enough.

~~~
wahern
The issue is that if the VM doesn't support tail calls or alternatives like
unstructured goto then you're going to end up with two layers of emulation for
such languages, rather than one. That's incredibly slow. Which is all fine and
dandy, but people need to realize that WASM will _not_ be nearly as performant
as claimed, particularly when hosting other language runtimes.

------
Osiris
In Part 1, the author claims that WebAssembly is not a stack machine.

In Part 2, when discussing `goto`, he says, "WebAssembly is a stack machine."

Part 2 contains no explanation about the contradiction with Part 1.

------
garganzol
Huh? Web assembly is a stack machine and locals do not pose a problem
whatsoever.

Yes, the author just needs more work to be done. But it's perfectly doable,
although it has a complexity. Like everything in the world of compilers. There
are no free lunches.

~~~
nonsince
Yes compilers are hard, but why make them harder? There is literally no reason
to require rebuilding this information. The compiler emitting Wasm has that
information and already uses it, having locals and disallowing blocks to
take/return values actually means more complexity in both the compilers
generating Wasm and the runtimes generating native code from Wasm. That's the
entire premise of the article.

~~~
pizlonator
Even if carrying that information was the thing that you needed (I don’t think
it is but there’s a separate thread about that), it’s definitely not the thing
that other implementations need.

------
benj111
"For the most part it’s an extremely well-designed specification. However,
they are weighed down by WebAssembly’s legacy"

Really?

Wikipedia: "In March 2017, the design of the minimum viable product was
declared to be finished and the preview phase ended."
[https://en.m.wikipedia.org/wiki/WebAssembly](https://en.m.wikipedia.org/wiki/WebAssembly)

It was only announced in 2015.

I'm not taking sides here, but either it's well designed, _or_ getting weighed
down by legacy in 2 (or 4) years.

~~~
rkangel
Immediately following that 'legacy' sentence is an explanation of what they
mean by it:

> WebAssembly started out not as a bytecode, but more like a simplified binary
> representation for asm.js. Essentially it was originally designed to be
> source code, like JavaScript.

WebAssembly itself is relatively new, but it wasn't a completely blank sheet
of paper that they were starting with when they designed it.

~~~
benj111
Yes and the next sentence finishes:

"and only at the last minute did it switch to stack-based encoding for the
operators"

Which kind of counts against it being well designed.

To me, "weighed down" by legacy suggests some deep problem that shouldn't be
manifesting in something so young. You could argue that 2 years is a long time
in tech, I wouldn't say it's a long time in language development though.

Maybe I'm just arguing semantics here? Is a library for a particular language
weighed down by legacy because it's designed to run on one particular
language?

