
Wasabi: A Framework for Dynamically Analyzing WebAssembly - lainon
http://wasabi.software-lab.org/
======
daniellehmann
Daniel here, PhD student working on this. Ask away if you have any questions
:)

For now, this is in an early stage, so probably mostly interesting to
researchers or as inspiration. But I am working on making it more usable
(adding documentation, examples).

~~~
kannanvijayan
Hey there! I work on Spidermonkey (not specifically the wasm part, but I've
touched it a few times due to other work).

First off, nice work - this is really cool. One of my big hopes for WASM is
that it provides a real low-level semantic model to start talking about
program analysis. It takes away a whole lot of the platform-specific
skullduggery you need to do to perform these sorts of analyses on machine
code, and the whole model is a lot simpler than something like CLR or JVM
since it's so low-level.

I've been keeping my eye out for program analysis tools on top of wasm for a
while now. Very happy to see your work.

Have you gotten in touch with any of the wasm folks at Moz? I'm sure they'd be
happy to talk with you about it (I know I would definitely not mind picking
your brain a bit on some high-bandwidth comms chan - learn a bit about the
implementation challenges and issues you ran into).

Cheers :)

~~~
daniellehmann
Thanks for the warm words, happy to hear that others are interested in this
project and WebAssembly in general!

Regarding your question: No, I have not yet contacted the WebAssembly people
at Mozilla. But it's definitely a good idea to talk to the implementation
experts. Before I do that, I just wanted to collect more "concrete"
questions/problems to ask about.

One of those questions is about the performance overhead of the WebAssembly
<-> JavaScript interop. In Wasabi, we have a lot of this, because the
"analysis hooks" are written in JavaScript and we insert roughly one hook call
per original instruction into the wasm binary. Even without any analysis code,
just adding these calls can have a runtime overhead >30x. I would like to
optimize this, but before that I need to find our where the overhead is coming
from. Possible reasons are (just guesswork, input from people working on this
is greatly appreciated):

\- that many calls are just inherently expensive, be it cross-language or not
(possible solution: be more selective about when to insert calls to analysis
hooks) \- Wasm <-> JavaScript calls are more expensive than Wasm <-> Wasm ones
(possible solution: compile analyses to Wasm, or: hope that this gets
optimized better by engines in the future) \- the added instructions inhibit
some wasm compiler optimization(s) (e.g., inlining is no longer performed
because the function bodies are larger than some threshold) \- ...

So far, I found working with WebAssembly very pleasing. The spec is compact
but still easy to follow. I wrote my own de-/encoder and "high-level
representation" of the binary format, which was straightforward and is abled
to roundtrip all test files from the spec repo. The most surprising bit was
about validation of dead code (i.e., code after an unconditional br is type
checked, but the br is assumed to "produce any possible value").

As for personal communication: I am happy to answer any in-depth question via
email or so (see [http://software-
lab.org/people/Daniel_Lehmann.html](http://software-
lab.org/people/Daniel_Lehmann.html)).

~~~
kannanvijayan
> Even without any analysis code, just adding these calls can have a runtime
> overhead >30x. I would like to optimize this, but before that I need to find
> our where the overhead is coming from.

That thought entered my head immediately as soon as I noticed you were
instrumenting every instruction.

> \- that many calls are just inherently expensive, be it cross-language or
> not (possible solution: be more selective about when to insert calls to
> analysis hooks)

This is definitely true, and hard to get around. The wasm instructions will be
compiled to machine instructions, and the calls will still be calls, and calls
are expensive.

One possible approach to mitigate this cost might be to collect and batch
calls into the hook functions. Basically your instrumentation would be a
trace-dump of execution and data to some in-wasm memory, and periodically you
call out to JS for analysis once the buffer fills up.

This should reduce the call overhead and replace it with a single write to a
well known location.

Now, if your analysis functions expect to be able to peek at memory and get a
consistent view of memory at the time of the instruction being analyzed,
you'll need to do some special magic to re-compute the memory state at that
time from the recorded trace, but that can be done on-demand when analysis
requires, so 0 cost if the hooks are not present.

Please note that I'm not sure how well this would work exactly, but it seems
promising.

> \- Wasm <-> JavaScript calls are more expensive than Wasm <-> Wasm ones
> (possible solution: compile analyses to Wasm, or: hope that this gets
> optimized better by engines in the future)

It's getting optimized now. My impression is that the big cost here is
marshalling wasm numbers into JS values. I don't know of a good way to avoid
this aside from not calling into JS when you can avoid it (i.e. you know there
are no analysis hooks attached to something).

I wonder if a simple runtime flag check within wasm, guarding the call-out,
would significantly reduce the overhead cost.

> \- the added instructions inhibit some wasm compiler optimization(s) (e.g.,
> inlining is no longer performed because the function bodies are larger than
> some threshold)

This shouldn't be the case too much. Most of the heavyweight compiler opts
happen before emission to wasm, including a good chunk of inlining. I'm not
even sure if Odinmonkey (our Wasm impl) does any extra inlining on top of that
- it might just expect the compiler to take care of that.

I'll get in touch. I think you'd get more confident answers on these from the
direct WASM crowd. My answers are a bit speculative, and lack concrete details
about the latest implementation status.

------
stefs
not to be confused with fog creeks legacy compiler which is also called
wasabi:
[https://www.joelonsoftware.com/2006/09/01/wasabi/](https://www.joelonsoftware.com/2006/09/01/wasabi/)

------
trollied
Just an aside... One thing I've been wondering for ages - How is WebAssembly
any different to/better than the clusterfuck that Flash became? The security
model/ sandboxed isolation?

~~~
jedisct1
It's an open standard. Designed with security in mind. That can reuse a lot of
the things already made for Javascript. That is intentionally small and
simple. That doesn't try to reinvent things such as UI (web browsers already
have everything we need).

------
bryanrasmussen
for some reason I was reading it as a framework for dynamically analyzing
webpack.

