
What’s in that .wasm? Introducing wasm-decompile - slow-typer
https://v8.dev/blog/wasm-decompile
======
snazz
This looks much nicer than the wasm2c output for that binary. I compiled it
with `clang wasm.c -c -target wasm32 -O2` just like in the instructions (I'm
on LLVM 10), and used the latest wasm2wat with `wasm2wat -f wasm.o` and got
this instead:

    
    
      (module
        (type (;0;) (func (param i32 i32) (result f32)))
        (import "env" "__linear_memory" (memory (;0;) 0))
        (import "env" "__indirect_function_table" (table (;0;) 0 funcref))
        (func (;0;) (type 0) (param i32 i32) (result f32)
          (f32.add
            (f32.add
              (f32.mul
                (f32.load
                  (local.get 0))
                (f32.load
                  (local.get 1)))
              (f32.mul
                (f32.load offset=4
                  (local.get 0))
                (f32.load offset=4
                  (local.get 1))))
            (f32.mul
              (f32.load offset=8
                (local.get 0))
              (f32.load offset=8
                (local.get 1))))))
    

wasm2c (also from WABT) returns this thing:
[https://paste.linux.community/view/7877995f](https://paste.linux.community/view/7877995f)

~~~
Aardappel
wasm2c has a different objective though: to be recompile-able again while
preserving semantics. wasm-decompile was designed for readability first.

~~~
snazz
Fair enough. I’m still surprised at just how unreadable (for me) the wasm2c
output was, though. The compiler must have done quite a bit of optimizing that
wasm2c was unable to undo.

~~~
Aardappel
It doesn't actually try to undo anything, it just translates Wasm instructions
1:1 (they're in your link at lines 205-221). wasm-decompile does try to "undo"
some thing, but it is generally impossible given LLVM's optimized output and
how low-level Wasm is (see also article).

~~~
snazz
Okay, that makes sense.

------
klodolph
This is fascinating. For various reasons, WASM is less like a target bytecode
format and more like a peculiar IR for compilers. I’m sure this has all sorts
of effects on the tooling.

~~~
k__
What's the difference?

~~~
klodolph
If you were designing a bytecode as a compilation target, you would provide an
easy correspondence in the bytecode to basic blocks.

See:
[https://en.wikipedia.org/wiki/Basic_block](https://en.wikipedia.org/wiki/Basic_block)

WASM instead provides traditional control structures. So the compiler either
has to preserve control structures through to the IR, or has to work backwards
from basic blocks to control structures. Both options are undesirable, from
the perspective of compiler writers, and would be unnecessary if the VM were a
greenfield project.

~~~
MaxBarraclough
I get the impression WASM is really a very clunky representation, far from
what any greenfield project would ever have arrived at. That is to say, its
decisions aren't just tradeoffs that people disagree about, they're simply
inappropriate for what it's trying to do. More than once I've encountered
comments and blog-posts lamenting its fundamentals, e.g. [0].

[0] [http://troubles.md/wasm-is-not-a-stack-machine/](http://troubles.md/wasm-
is-not-a-stack-machine/)

~~~
klodolph
The criticisms in the linked article are simply not grounded in fact, and the
article was obviously written by someone without any expertise in how modern
compilers work. There are also a number of basic errors in the article.

Liveness information simply doesn’t belong in the bytecode. SSA is trivial to
recreate from local mutable variables (it would make a good homework
assignment for someone in an undergrad “intro to compilers” class).
WebAssembly is obviously a register machine.

> With this, it becomes possible to get rid of locals entirely.

Both factually incorrect and pointless. There is no tangible benefit in
getting rid of locals entirely. You are merely changing out one representation
(register machine) for a different one (stack machine).

There are valid criticisms of WASM, but the linked article doesn’t have any.

The weird part of WASM is the control structures. The rest of it is a fairly
sensible, actually rather nice register machine. You can see that older
bytecode systems like the JVM are stack machines, but newer ones tend to be
register machines. This isn’t because people are getting stupid, it’s because
there are legitimate reasons to prefer register machines, and on the balance
of things, my observations are that people with experience in the field tend
to prefer register machines.

~~~
readittwice
I would still consider WASM a stack machine and not a register machine. Yes,
there are mutable local variables in WASM but Java bytecode has them as well -
which you consider a stack machine. BTW the designers of WASM explicitly call
WASM a stack machine here:
[https://github.com/WebAssembly/design/blob/master/Rationale....](https://github.com/WebAssembly/design/blob/master/Rationale.md).
With WASM's MVP it was necessary to store e.g. loop state in local variables,
thanks to recent changes this doesn't seem to be necessary anymore. I think
this was the main argument that blog post considered WASM to be a register
machine. javac also makes heavy use of variables in bytecode, but somehow no
one considers the JVM a register machine.

> my observations are that people with experience in the field tend to prefer
> register machines

That's actually the opposite of my observation, they seem to prefer stack
machines.

~~~
kazinator
The clang disassembly given in the article sure makes it look like WASM is a
nested expression tree, which leaves the choice of stack versus register to
the implementation.

    
    
          (f32.add
            (f32.add
              (f32.mul
                (f32.load
                  (local.get 0))
                (f32.load
                  (local.get 1)))
              (f32.mul
                (f32.load offset=4
                  (local.get 0))
                (f32.load offset=4
                  (local.get 1))))
            (f32.mul
              (f32.load offset=8
                (local.get 0))
              (f32.load offset=8
                (local.get 1))))
    

The outer f32.add could translate into a byte code instruction that finds its
two operands on a stack, or to one which gets them from registers.

The code only says that there is a f32.add call which has two operands that
are the result of a f32.add and f32.mul and so on.

The implementations will agree in their treatment of locals: that there are
two locals 0 and 1, which support loading at offsets and such.

Both stack and register machines can support locals.

~~~
klodolph
Yes, that’s exactly true. The “stack machine” here can be seen as nothing more
than a way of encoding the expression tree.

~~~
afiori
The stack machine is a model for the semantics of wasm, in the sense that the
safety properties of wasm are defined in terms of stack types rather than SSA
of register properties (for those that are unfamiliar with stack machines,
this stack is a different type of concept from the call stack, that in wasm is
used store the local variables). Whether during execution it is better
implemented as a register machine or a stack machine is an implementation
detail.

------
mmastrac
This is super handy. Pseudocode is very useful for understanding flow - so
much more than actual assembly. I've always found it an order of magnitude to
understand bad asm-to-C decompilation from IDA or Ghidra over perfect
disassembly.

------
dlojudice
> Decompile to what?

> `wasm-decompile` produces output that tries to look like a "very average
> programming language" while still staying close to the Wasm it represents.

> #1 goal is readability

> #2 goal is to still represent Wasm as 1:1 as possible

It seems AssemblyScript would do the job

[1] [https://assemblyscript.org/](https://assemblyscript.org/)

~~~
Aardappel
AssemblyScript would certainly do worse at #2, and possibly also at #1. To be
translate to Wasm or from Wasm lead to different optimal designs, see for
example how these two systems deal with loads and stores.

------
3pt14159
It would be nice if the decompiled output were runnable through an interpreter
so you could step through it with a debugger of some kind and rename or
annotate the variables and functions as you reverse engineer what is going on.

------
Aardappel
I'm the author, if anyone has specific questions :)

~~~
6nf
I notice that your code supports the 'name' custom section as expected, and
furthermore you support a few other custom sections too - 'dylink' for
example. Where did you find the documentation for these sections? The reason I
ask is that I don't believe the official webassembly specs talk about those
sections, so I guess they are somewhat compiler specific perhaps?

~~~
Aardappel
They are indeed not part of the spec since they are somewhat tool specific,
for example the linking symbol names so far are only consumed by LLD. Docs
here: [https://github.com/WebAssembly/tool-
conventions/blob/master/...](https://github.com/WebAssembly/tool-
conventions/blob/master/Linking.md)

~~~
6nf
Thank you!

One more question please - does this tool support naming of global variables?
The official wasm documented 'name' section only supports local variable names
I think?

~~~
Aardappel
It can pick names from the name section, linking section, or import/export
name, in that order of preference (see
[https://github.com/WebAssembly/wabt/blob/master/docs/decompi...](https://github.com/WebAssembly/wabt/blob/master/docs/decompiler.md)).
In the case of globals, the only way to name a global is thus if its imported
or exported.

------
_hardwaregeek
Loving the tooling around wasm getting better. I've been debugging my compiler
output with hexl-mode and reading the binary format and while it's not _that_
bad, it'd be nice to do more advanced debugging with a text format.

There was a project I saw too that intended to visualize WebAssembly's
execution. That'd be extremely helpful too

~~~
cfallin
> reading the binary format ... it'd be nice to do more advanced debugging
> with a text format.

Do you know about `wasm2wat` (from the WebAssembly binary toolkit, "WABT")? It
produces a 1-to-1 text representation of the bytecode and is meant to always
roundtrip via `wat2wasm` back to the same bytecode.

~~~
_hardwaregeek
Yeah...I should probably use that. But does it work on mangled WASM? Part of
the issue was that my compiler wasn't producing valid WASM

~~~
cfallin
Ah, no, probably doesn't do parsing recovery. But `wasm-validate` from the
same toolkit will at least tell you the offset at which your wasm file has an
error (I just flipped some bits in a wasm file to test this), which may be
helpful!

------
irrational
When I first started learning JavaScript in the late 90s, the primary way I
learned new things was from reading other peoples code in my browser. Nowadays
this isn't as easy since you often have to run obfuscated code through a
prettifier to get it back into a human readable format, but it is still
possible with some effort. I was concerned that WASM would make this
impossible (despite the stated goal of "Be readable and debuggable —
WebAssembly is a low-level assembly language, but it does have a human-
readable text format (the specification for which is still being finalized)
that allows code to be written, viewed, and debugged by hand." _), but WASM-
decompile gives me hope.

_[https://developer.mozilla.org/en-
US/docs/WebAssembly/Concept...](https://developer.mozilla.org/en-
US/docs/WebAssembly/Concepts)

------
fowl2
Can we compile it back to wasm again? ;P

~~~
frosted-flakes
No.

> Its #1 goal is readability: help guide readers understand what is in a .wasm
> with as easy to follow code as possible. Its #2 goal is to still represent
> Wasm as 1:1 as possible, to not lose its utility as a disassembler.
> Obviously these two goals are not always unifiable.

> This output is not meant to be an actual programming language and there is
> currently no way to compile it back into Wasm.

~~~
vbezhenar
Actually I thought about implementing a programming language which is a bit
more pleasant to work with than raw wat format, but which still translated
roughly 1 to 1 to wasm. Something like in this link, actually. But that seems
outside of my capabilities and I'm not sure if it's really useful to anyone.

~~~
rhencke
You might enjoy AssemblyScript.

[http://assemblyscript.org](http://assemblyscript.org)

~~~
DonHopkins
The great thing about AssemblyScript is it makes it possible to share some of
the same code and interfaces and data and tools between JavaScript and
WebAssembly.

If you're already developing in TypeScript, WebAssembly is a good way to
generate WASM code that interoperates nicely with it, which you can't do with
plain old JavaScript.

------
saagarjha
Ooh, this is nice! No more having to read wasm2wat’s mildly annoying format.

~~~
cjbprime
FWIW there's also wasm2c and wasm2js out there :)

~~~
DonHopkins
I'd love to have wasm2assemblyscript!

AssemblyScript: A Subset of TypeScript That Compiles to WebAssembly

[https://news.ycombinator.com/item?id=15187961](https://news.ycombinator.com/item?id=15187961)

[https://github.com/AssemblyScript](https://github.com/AssemblyScript)

[https://github.com/AssemblyScript/assemblyscript](https://github.com/AssemblyScript/assemblyscript)

[https://docs.assemblyscript.org/](https://docs.assemblyscript.org/)

