How do they prevent people from embedding malicious payloads in the WebAssembly?

aeonfox · 2025-10-02T01:34:56 1759368896

By sandboxing:

> We first discuss the implementation considerations of the input to the Wasm-side Init() API call. The isolated linear memory space of Wasm instance is referred to as guest, while the program’s address space running the Wasm instance is referred to as host. The input to a Wasm instance consists of the contiguous bytes of an EncUnit copied from the host’s memory into the guest’s memory, plus any additional runtime options.

> Although research has shown the importance of minimizing the number of memory copies in analytical workloads, we consider the memory copy while passing input to Wasm decoders hard to avoid for several reasons. First, the sandboxed linear memory restricts the guest to accessing only its own memory. Prior work has modified Wasm runtimes to allow access to host memory for reduced copying, but such changes compromise Wasm’s security guarantees

theamk · 2025-10-02T02:58:29 1759373909

It's "the future work". But possibly, allowlists might help:

> Security concerns. Despite the sandbox design of Wasm, it still has vulnerabilities, especially with the discovery of new attack techniques. [...] We believe there are opportunities for future work to improve Wasm security. One approach is for creators of Wasm decoding kernels to register their Wasm modules in a central repository to get the Wasm modules verified and tamper-resistant.

aeonfox · 2025-10-02T05:37:54 1759383474

The only issue the article seems to raise is that their solution isn't that optimal because there's redundant copying of the input data into the sandbox, but this enables the sandbox to be secure as the Wasm code can't modify data outside of its sandbox. I'd assume the memory is protected at the CPU level, something akin to virtualisation. But then there's maybe some side-channel attack it could use to extract outside data somehow?

mannyv · 2025-10-02T05:43:09 1759383789

Because sandboxing has worked so well in preventing security issues?

flockonus · 2025-10-02T04:38:19 1759379899

For one, sandboxing can't solve the halting problem.

aeonfox · 2025-10-02T05:31:00 1759383060

So you're suggesting a denial of service? I doubt the sandbox is running on cooperative scheduling. Ultimately the VM can be run for some cycles, enough to get work done, but enough to permit other programs to run, and if the decoder doesn't seem to be making any progress the user or automated process can terminate it. Something like this already happens when js code takes up too much time on a browser and a little pop-down shows to let you stop it.

catlifeonmars · 2025-10-02T05:04:57 1759381497

You can if you bound the program in time and space. In particular if you put a timeout on a computation you know for a fact it will halt at some point before the timeout.

waterproof · 2025-10-03T05:11:23 1759468283

On top of security concerns, you're now shipping potentially buggy code with every dataset. And you're relying on adherence to semver to match local binary unpacker implementations to particular wasm unpackers.

I see why you're doing it, but it also opens up a whole avenue of new types of bugs. Now the dataset itself could have consistency issues if, say, the unpacker produces different data based on the order it's called, and there will be a concept of bugfixes to the unpackers - how will we roll those out?

Fascinating.