
Reverse-Engineering WebAssembly [pdf] - ingve
https://www.pnfsoftware.com/reversing-wasm.pdf
======
KenanSulayman
WebAssembly is actually simple to work with.

If you want to obtain a "C" pseudocode, you can give a wasm file to wasm2c
[1].

You can re-obtain a WebAssembly folded-expression text format using wasm2wat
[1].

You can obtain a call-graph from a WebAssembly module by generating the wat
representation using wasm2wat and pasting it into main.wat on
[https://webassembly.studio/](https://webassembly.studio/) (-> Empty Wat
Project). Then save and build; right click the new main.wasm and select
"Generate Call Graph."

That said, check out this encrypted and anonymous "pastebin" I built [2] with
the crypto being written in Rust and bindings generated using wasm-bindgen
[3]. It surprisingly hard to debug when optimized using wasm-opt [4].

[1] Part of WebAssembly Binary Toolkit:
[https://github.com/WebAssembly/wabt](https://github.com/WebAssembly/wabt)

[2] Source code on Github:
[https://github.com/psychonautwiki/impis/blob/master/core/src...](https://github.com/psychonautwiki/impis/blob/master/core/src/lib.rs)
— Demo paste:
[https://imp.is/n/7NFsfEiCjkFBVgC6A4JS6GyqN7puN5Sg7ed11m8VrtT...](https://imp.is/n/7NFsfEiCjkFBVgC6A4JS6GyqN7puN5Sg7ed11m8VrtTv#DSP6yVL9Hg8QKLeeUTfJEv8SVRyx6uNFHzCNbEm5wF2f)

[3] [https://github.com/rustwasm/wasm-
bindgen](https://github.com/rustwasm/wasm-bindgen)

[4] Part of Binaryen:
[https://github.com/WebAssembly/binaryen](https://github.com/WebAssembly/binaryen)

~~~
piphf
WebAssembly is not "simple to work with", especially when it comes to
analyzing non-trivial, large, optimized programs. The tool [1] generates a
one-by-one equivalence of wasm instructions to C code. I guess you could
qualify that as a "decompiler", but real decompilers - the ones used for
malware analysis such as JEB or IDA - are optimizing decompilers that provide
an output of higher level (eg more legible) than the input disassembly/binary.

------
ttoinou
The future looks like everyone's going to use his fav language to compile to
WebAssembly

Am I the only one who feels like it's the end of the web as we knew it in the
90s and 00s, where you could open any web page, understand how it works and
learn from it ?

~~~
trgv
I thought wasm was going to have a human-readable equivalent, called wast.
See: [https://webassembly.org/getting-started/advanced-
tools/](https://webassembly.org/getting-started/advanced-tools/)

My understanding (maybe wrong) was that this was going to be available in the
browser.

~~~
err4nt
I'm not an expert, but my understanding is that WASM has two formats: a text-
based format called WAT, and a binary format called WASM.

In order to run the code in the browser, the code will have to be compiled to
the binary format.

So where WAT comes in is your methods for producing WASM files now become one
of the following:

Source in <otherlang> -> WAT -> WASM

Source in <otherlang> -> WASM

WAT -> WASM

So the human-readable WAT can either be used as a compile target for another
language, which can easily be compiled into WASM, or you can write the WAT
manually and compile it. Alternatively other languages might be able to
compile directly to the binary format, skipping WAT representation entirely.

~~~
steveklabnik
Generally, WAT is produced from the binary format. Compilers don't go through
WAT; they produce the binary output directly.

The translation between WAT and the binary format is lossless, so there's no
advantage of producing WAT as an intermediate step.

~~~
err4nt
Since WAT -> WASM is already easy to do, compiling <otherlanguage> to WAT
makes it really easy for people to create their own abstractions for writing
WebAssembly in nearly _any_ other programming language, not just those that
can compile directly to the binary format.

~~~
steveklabnik
I don't understand why that would be true.

It's also just as easy to get WASM from WAT as it is WAT from WASM. I don't
know of any languages that compile to WAT and then compile to WASM; as far as
I know 100% of languages compile directly to WASM.

------
kodablah
WASM instructions are fairly straightforward so an obfuscator can be written
quite easily. I could easily create a proxy tool that introduces
randomization/non-determinism on a per download basis if it were worth it.
There is no execution of arbitrary memory so there are limits. JS can create
new WASM mods and link them at runtime, but invocations across import/export
might have a performance hit. But moving around functions, subdividing
functions, etc is really easy.

Also, the paper has Emscripten-specific reverse engineering details (such as
locations in the mem for where stack starts vs where heap starts) that don't
apply to many other WASM compilers.

------
DonHopkins
My reading of the disassembled code on page 7 is that the "end" opcode
actually takes one byte, and isn't just a syntactic structure that the
assembler removes. As if a Lisp VM had a close-paren instruction.

    
    
        +003Eh:    i32.eqz
        +003Fh:    if $3
        +0041h:      br $2​ (---> break out of $2 (BLOCK))
        +0043h:      end
        +0044h:    get_local $12
    

What's the purpose of having an "end" opcode? Is there no overhead at runtime
because it evaporates when the code is compiled? It it to avoid having forward
referencing offsets in the code? Is it just in there for verification
purposes?

It's kind of like a "comefrom" opcode, a target that other opcodes jump to (or
after)!

~~~
kazinator
I have an _end_ opcode in the TXR Lisp virtual machine. It delimits blocks of
code that have some sort of context attached.

The opcode allows the virtual machine interpreter to recurse on itself; when
it hits the _end_ , the dispatch loop executes return to bail out to the
higher level of recursion. Thus _end_ is also useful for exiting the top-level
invocation of the VM. It is required, in fact; if the end instruction is not
present, the interpreter will keep marching through memory past the end of the
routine. No wasteful check is needed whether the instruction pointer is past
the code block.

my _end_ instruction also specifies a result value (because the machine is
register based; there is no top-of-stack implicit value). This becomes the
return value of a procedure when the final _end_ is executed. The _block_
instruction also uses it. When a (block ...) is compiled, the return value of
the ordinary block termination is specified in the _end_ instruction at the
end of the block. Control returns to the _block_ instruction which receives
that value.

 _end_ has something in common with the x86 _ret_ instruction and its ilk.
It's not so much an exotic "come from" as an ordinary "return".

------
Maijin212
Radare2 can also disassemble, analyze, assemble and even decompile wasm via
r2dec. The support for wasm has been added in March 2017.

