Hacker News new | past | comments | ask | show | jobs | submit login
WasmBoxC: Simple, Fast, and VM-Less Sandboxing (kripken.github.io)
224 points by syrusakbary on July 28, 2020 | hide | past | favorite | 106 comments

> Wasm sandboxing is even safe to run in the same process as other code (at least modulo Spectre-type vulnerabilities)...

If you want strong security with this big a modulus you may have been looking for RSA. (ugh, sorry)

Spectre V1 (speculative bounds check bypass / type confusion) is basically game over for intra-process memory isolation without introducing expensive or complicated mitigations: speculation barriers at every branch, or a more optimized pass like Speculative Load Hardening (https://llvm.org/docs/SpeculativeLoadHardening.html) that can bring the overhead _down_ to 20-50%.

From what I've seen, the best understanding right now is that the process is the smallest defensible unit of isolation.

That's not to say that using WASM as an intermediate compilation target won't eliminate some threats! The runtime can definitely help to improve memory safety, for example. But if you care about confidentiality of the data in your process, any code that is untrusted enough to require sandboxing should also be run out-of-process, e.g. across an IPC boundary.

Spectre makes me wish we'd give something like IBM's project DAISY / Transmeta's Crusoe another chance. If we have one mode with a simple and high-density instruction set similar to Hitachi's SH4 / ARM's Thumb2, along with a VLIW mode (similar to switching between Thumb2 and regular ARM), and some hardware support for tracing and dynamic recompilation (including reservoir sampling of instructions causing pipeline stalls), we might get decent performance and code density, while being able to move all of the speculation into the dynamic recompilation layer.

Having the processor natively support the dense non-VLIW instruction set means you don't pay much in the way of startup latency, and once you're warmed up, your hot spots have all of their cross-DLL calls inlined / no longer indirect, and your virtual function calls devirtualized / speculative inlined. Spots where the dynamic recompiler was wrong about pipeline stalls eventually use dynamic information to recompile and shuffle instructions around to avoid pipeline stalls. You don't pay the transistor and power budget for out-of-order execution or speculative execution hardware. Hardware speculative execution gets replaced with predicated instructions and shadow register save/restores.

I think IBM's DAISY was much closer to being on the right track vs. Intel's EPIC / Itanium. If your system is built for dynamic recompilation and re-optimization of native code on the fly, your compiler doesn't have to be as good at statically predicting execution paths and pipeline stalls.

Transmeta's main problem was that they were emulating all x86 instructions before they warmed up, and after warm-up they were still emulating all x86 instructions outside their hot code path. With a simpler instruction set like SH4 / Thumb2, they could hopefully have hardware support for the code outside of the hot spots / pre-warmup.

Given the amount of time spent running JavaScript, hopefully one would also look at the bytecodes for V8, JavaScriptCore, and SpiderMonkey for inspiration as far as making the non-VLIW instruction set an efficient and compact JavaScript JIT target.

> ...while being able to move all of the speculation into the dynamic recompilation layer.

Or move it all into static analysis. You might be interested in the Mill CPU architecture:


Very Very Very Long Instruction Word system, without a conventional register file. All of the speculation that is done in hardware on a conventional CPU is pushed back to the compiler.

More about Spectre, Meltdown and the Mill here:


Yea, that sort of dataflow analysis (at static compile or just-in-time compile time) is what I'm getting at for avoiding Spectre/Meltdown, but with the added ability of being able to use profiling information to re-generate the code paths where you're getting pipeline stalls due to flaws in the assumptions used in static instruction scheduling.

> If we have one mode with a simple and high-density instruction set similar to Hitachi's SH4 / ARM's Thumb2, along with a VLIW mode (similar to switching between Thumb2 and regular ARM), and some hardware support for tracing and dynamic recompilation (including reservoir sampling of instructions causing pipeline stalls), we might get decent performance and code density, while being able to move all of the speculation into the dynamic recompilation layer.

Trading hardware complexity for software complexity is not the right strategy to combat sidechannels. They just reappear up the stack.

Any speculative optimization that depends on program values can potentially leak information through timing.

With speculation moved to a (JIT) compiler, this becomes way harder to trigger.

You can no longer deterministically trigger a speculative fetch and detect whether it was slow or served from cache. Whether the speculative fetch ever occurs is deteremined by the JIT, and once the JIT realized it should not even occur, it will never occur again. Quite likely you won't be able to collect enough bits even if speculative execution is supported by the hardware. It it's not, you will just not have anything to time, AFAICT.

It's true that it is much, much harder to trigger, and less predictable, but the information leak is still there. Both the bitrate and the signal-to-noise ratio are far worse.

It's worth pointing out that JITs can and do get caught in deopt loops, and it doesn't even necessarily need to be a deopt loop in a JIT that is the information leak. Something as simple as interned strings can leak information about what strings a program is using. Also with hashtables; if you have control over some of the keys that go into a hashtable and some knowledge of its implementation, timing information can reveal information about hash collisions, and thus other keys in the table.

Fundamentally, side channels are unwanted information flows in a system, and the more complicated the system, the more potential there is for side channels. Moving complexity around might make a difference on the reliability and bandwidth, but it doesn't eliminate them. If side channels are a serious concern, the best defense is simplicity, not massive rearchitecting to a new and different kind of complexity.

(even speaking as a person who spent decades working on JITs--wrong hammer here)

My understanding is that Spectre/Meltdown attacks are all the result of either conditional memory operations conditioned upon speculatively read data, or else speculative data operations using addresses calculated from data that was speculatively loaded.

Is that correct? Doesn't moving speculation from hardware control to software control allow you to perform better dataflow analysis than you can reasonably do in hardware, which allows you to perform more provably safe speculations (or less often pay the I/O overhead of converting a speculative conditional load into an unconditional load and a speculative conditional register-to-register move) than you get if the speculation decisions need to be performed in hardware?

Using a JIT doesn't automatically prevent Spectre/Meltdown attacks, but as far as I'm aware, they all involve either a conditional memory operation in a speculative operation or a speculative indirect load using an address generated via speculative execution.

I think some data flow analysis in the JIT would allow you to replace speculative conditional loads with unconditional loads followed by speculative conditional register-to-register moves (including shadow register commit/rollback). (Unconditional prefetches would be weaker protection if the hardware is free to drop prefetches when the memory bus is busy.) Data flow analysis should also allow you to keep track of which addresses are calculated from data retrieved via speculative operations, and refuse to emit those loads where they'd execute before it's known if the predicating speculation was correct.

Any time you reduce the JIT / processor's instruction scheduling flexibility, you're going to reduce its performance, but I don't see a way to mitigate better than outlined above.

I think any hardware-based mitigations are along the same lines as outlined above, but are much less capable of performing dataflow analysis in hardware, so I think they'd have to be much more conservative than a software-controlled speculation that's much more capable of dataflow analysis.

Though, my understanding of the subject is cursory and there could very well be variants of the attacks not covered by the outline above and there may be hardware mitigations that work differently than unconditional prefetches or a basic hardware "speculation tainted" bit to keep track of data loaded by speculative operations and prevent indirect memory operations via "speculation tainted" addresses.

I can think of all kinds of schemes that instead of using a single "tainted" bit, assign each in-flight stream of speculation a "speculation domain" number (presumably reserving speculative domain zero for non-speculative execution), and then perform some data flow analysis, along with forwarding networks to re-mark all uops for a given domain to the non-speculative domain or no-op out all uops marked with a given domain. From there, you can start doing a lot of data flow analysis in hardware, keeping track of which domain loaded which data and preventing any sort of conditional or indirect memory uops from operating on speculative data, but it seems like it would very quickly eat up a lot of transistor budget, getting more complicated than the out-of-order scheduling logic.

> Spectre V1 (speculative bounds check bypass / type confusion) is basically game over for intra-process memory isolation without introducing expensive or complicated mitigations

Code which is executed deterministically cannot receive timing channels (or, indeed, learn anything about its environment) and hence cannot exploit Spectre. This seems potentially practical for this sort of library sandboxing problem.

Exactly, by default code sandboxed by wasm is fully deterministic and can't do any timing measurements, not unless you explicitly give it access to an import that does such a measurement.

On the web, a website might run arbitrary wasm + JS which means it might let wasm time things. But if you use wasm to sandbox a specific library then the situation is different and you control the wasm imports.

> by default code sandboxed by wasm is fully deterministic

Sorry? I'm not sure what you mean by "fully deterministic" here, because as far as I was aware you can do basically anything including choosing to not terminate inside of WASM.

You can have an infinite loop, sure, but aside from that wasm semantics are precisely defined in a deterministic way (well, except for minor issues with float NaN bits). That is, if a computation terminates, it will always terminate and with the same results.

That's the case because wasm itself has no way to tell the time, generate a random number, etc., and each operation's semantics are well-defined.

(If infinite loops are a concern, you can do what wasm VMs do on the web which is to show a "stop script?" dialog after too much time passes.)

WASM is not deterministic from a timing perspective.

Correct (but the wasm itself cannot observe that).

It can when there are multiple threads.

True, threads make things more complicated here. (In the context of this link, only wasm MVP is supported, which does not include threads.)

>From what I've seen, the best understanding right now is that the process is the smallest defensible unit of isolation.

In what scenario would you be more comfortable running a potentially malicious process versus a potentially malicious WASM code behind a API sandbox where you control the exposed APIs. Sure there are process level sandboxes but that seems like a much larger bug surface area with expertise outside of the domain of people building the apps, compared to a WASM sandbox with a host app defined bridge API.

> Sure there are process level sandboxes but that seems like a much larger bug surface area

A process sandbox allows you to pass through syscalls you deem harmless without reimplementing them and only building the bridge API for things that require complex policies. With wasm you need to implement wrappers for everything.

Let's say your wasm module is pure compute and does IPC via shared memory locks. You now at least have to implement the locking API. In the process sandbox case you only whitelist the futex syscall and that's it.

And in practice you'll have to use the process sandbox anyway due to CPU vulnerabilities or bugs in the bridge API.

>A process sandbox allows you to pass through syscalls you deem harmless without reimplementing them and only building the bridge API for things that require complex policies. With wasm you need to implement wrappers for everything.

Yeah except you have to know what OS your application is going to run on, kernel version, set up the sandbox correctly and trust that the kernel doesn't have any privilege escalation bugs - likely none of which is your area of expertise as an application developer.

A sandbox runtime with exposed APIs is much closer to your domain and you can model it to be much more domain specific so the surface you need to wrap should be smaller.

I think the surface mostly depends on how much you want to give the sandboxed application, not on the sandboxing technology you choose. If want to build something emulating the capabilities of a desktop operating system including gfx acceleration (e.g. webgpu) then it will be much larger than some isolated compute function in the fashion of network edge workers.

I am not aware of a single CVE against openbsd's pledge. On the other hand there are plenty of javascript engine escapes.

Part of that is certainly because nobody is actively attacking OpenBSD's pledge, while JavaScript engines are under constant attack. (FWIW, every major JavaScript engine utilizes a platform sandbox that provides pledge-like functionality.)

There are some interesting approaches for running Wasm code confidentially. I'd recommend to take a look into Enarx, as they are pioneering the space with SEV/SGX integration into Wasm workloads.


SGX is a tool in the toolbox, but it solves a different problem: isolating a small section of especially privileged code from the rest of a larger, less-trusted application.

The sandbox described in the article is trying to do roughly the opposite: protect the main application from an isolated section of untrusted code.

Also, SGX requires extreme care in deployment due to side-channel attacks, see e.g. https://software.intel.com/security-software-guidance/insigh...

SEV is also interesting, but requires code to run in a separate VM -- which satisfies my requirement above that it at least be in a different process.

If you had control over the ISA, and were jitting code as you would with WASM, couldn't you defeat Spectre 1 with an instruction that essentially introduces a masked region that the processor will not read or speculate outside of until the mode is disabled?

If the attacker controls the instruction stream it's game over (since they can disable the mask), but if they're just writing wasm that generates malicious speculative in process loads, it would prevent that.

Then an attack would have to trick library calls in to doing the same job, which is certainly possible, but much harder.

Of course this would negate many of the benefits of staying in the same process, so I'm not necessarily saying it's a good idea.

When you say "control over the ISA", I'll assume you mean "precise control over the emitted instructions".

In which case: yes! That's Speculative Load Hardening (https://llvm.org/docs/SpeculativeLoadHardening.html). SLH tries to squash side-channels by preventing any speculatively-loaded data from being forwarded to dependent instructions until proving that branch prediction followed the right path.

But this undoes a lot of the performance that microarchitectures have added through branch prediction, since dependent memory loads (think linked list entries, or C++ vtables) are stalled behind full resolution of the branch condition.

If you're doing nontrivial compute, you can end up ahead performance-wise by splitting the computation into a separate process and invoking it via IPC. Now you don't need SLH because the untrusted process doesn't have long-lived secrets in its address space.

No, I mean specifically if you're building your own CPUs and can add instructions. You add the ability to set a hardware mask that all values are passed through before they're used as addresses for loads and stores including speculation. Loads and stores that fall outside the masked region will simply wrap around.

In your JIT, you enter and leave this mode before and after running user code to ensure it can't escape its region.

This would be a lot of work to pull off and would require custom hardware and software, but (at least as far as I can tell) it should work.

This is basically segments. It would work as long as your implementation doesn't have Meltdown-like vulnerabilities, i.e. speculation past hardware enforcend protection. We know it can be done because there are high performance CPUs which are not vulnerable to Meltdown.

The answer is that are too many different kinds of speculative leaks to defeat them all via masking. For example, the supervisor code (i.e. the code of the Wasm Engine) has access to the broader address space, by design. This code can be tricked in various ways (in speculation) into performing out-of-bounds reads and disclosing that information in ways that is detectable by user programs.

Right, you have to trick the outside world to perform the speculation for you. In the case of a web browser, you're not going to plug all those holes.

But in some cases, you probably could make it absolutely bulletproof. The closer to data->data transforms you get, the better.

For example, if you wanted a router that could run WASM blobs that make routing decisions. You pass in a header byte array and receive back route information.

Designed correctly, it's not a given that there is any surface area for an attacker to read data from a neighbour's address space.

> But in some cases, you probably could make it absolutely bulletproof.

I don't believe this. Not on modern hardware. I worked on Spectre for almost two years while at Google. We wrote this:


Well, your years of study are worth much more than my idle speculation, so after browsing your paper I'll happily accept that you're right.

But my intuition is that the instructions themselves executed under such a masking system are no more able to perform timing attacks on the rest of the process than arbitrary code from one process can perform timing attacks on another.

If there's something specific I'm missing there I'd love to know what it is so I can update my mental model.

Spectre vulnerabilities only matter wrt. information disclosure. Really we need OS's that are explicitly aware of information domains, and only flush address space mappings when switching from a more privileged to a less privileged domain.

so, some things:

1. Information disclosure is pretty important, especially if your process has AWS credentials in the environment block or it's doing mTLS with a long-lived key.

2. Those operating systems already exist, see e.g. https://twitter.com/aionescu/status/948818841747955713

3. Spectre V1 is within the same process, so this isn't a question of address mappings across differently-privileged domains. It's the same domain (i.e. address space).

4. Flushing address spaces across privilege domains isn't a concern on modern processors thanks to tagged TLBs and process-context or address space identifiers (PCID, ASID)

This is interesting. If data confidentiality requires process isolation, then how can CloudFlare workers be safe?

There are really two branches of Spectre defense research, and each side sneers at the other and declares their approach doesn't work.

One side says "we can block speculation at the trust boundary, especially the process boundary by having the kernel flush all caches, etc." The argument against this is that every new attack has to be explicitly mitigated. New attacks are coming at a rather fast rate and it's almost certain that some bad guys are aware of attacks that the good guys haven't uncovered yet. (It's also ridiculously expensive to use these defenses at a fine-grained level.)

The other side says "we can make it basically infeasible to extract side channels by limiting the non-determinism -- such as timers -- that allow malicious programs to observe microarchitectural side effects." The first side says this is wrong, you can use repeated attempts and statistics to get over any amount of noise. It just takes longer. Maybe you have to run the attack for weeks to leak anything of value but it's still possible.

The reality is that no one has actually solved Spectre. However, both sides have done things that raise the barrier to attack. The best anyone can do right now is try to raise that barrier as high as they can. It seems to be working -- we don't really see Spectre attacks in the wild.

Workers uses a mix of ideas in a pragmatic defense. I'll have an extended post about it on the Cloudflare blog tomorrow.

(disclosure: I work at Google on side-channel stuff and https://github.com/google/safeside)

I'm really excited to read that post!

I agree that the best we can claim right now is that we've made Spectre and other speculative attacks "expensive enough" that they're unlikely to be the most profitable area for attack.

That said, I'd be a bit worried about the assertion we haven't seen Spectre attacks "in the wild". It is incredibly difficult to put together a set of metrics that would convincingly detect attempts at even a straightforward speculative information disclosure.

(haha, two branches, I get it)

I definitely agree that it's possible that attacks have happened but weren't detected. But if attacks were widespread, I'd expect we'd hear about at least some of them. The fact that we don't hear about any suggests to me that there are some significant barriers to real attacks that the theory isn't explaining. Indeed, our own attempts to build attacks seem to suggest that building a PoC in a lab is one thing but making something that actually runs in production and exfiltrates useful data is another entirely, and ridiculously hard even when it is "theoretically" possible. Not that that stops us from wanting to throw all the defenses we can at it, of course.

Hey here's Kenton's promised blog post! I hope it gets separate attention on HN!


HN Discussion https://news.ycombinator.com/item?id=23989270

I’m open to a slow but secure processor. Certainly for my personal computing.

Try underclocking your CPU to like 30% of its max speed and see if your programs still run comfortably.

This would've made sense in a world where programs didn't become more bloated and slower over time, but it would have visible repercussions given our software today.

It’s really just the browser that becomes difficult to use. For everything else you can find a decent lightweight option. I've been doing most of my computing on a raspberry pi 2 recently and it's just as productive as my main laptop outside of the browser, which is just an absolute hog for memory even when rendering basic pages.

CPU I could definitely take a hit (say, remove speculative execution, which seems like it would "fix" spectre) as I don't need to do my taxes or write or listen to music at maximum speed. My video game console doesn't need security at all and can continue to speculatively execute freely.

IMHO the days of general purpose compute at high performance are firmly over, different computation shapes have different cpu(/gpu) characteristics to optimize for.

That would be positive though, maybe then not so many people would be running Python sites on Django or Electron apps.

It sounds like you want to actively discourage the use of Python and Django irrespective of performance concerns?

Care to elaborate or was it just flippant snide?

More like discourage the use of pure scripting without any regard for JIT/AOT toolchains or coding without performance considerations, in opposition to what we used to care about.

The example with Python and Django was what came quickest to mind, but I can gladly expand it to include Ruby and Rails, or any other stack that falls under the same Web sites/desktop applications with scripting languages umbrella.

The usual retort is "the scripting language is rarely the bottleneck" as well as "developer time is more expensive than hardware".

Between these two get-out clauses, aren't you talking about a very small minority of use-cases? Those where the scripting language is the bottleneck and the problem can't simply be solved by spending a few $ more on your VPS?

I thought we were all discussing about going green and using less hardware.

Plenty of languages offer JIT/AOT toolchains while offering scripting language tooling as well.

Besides, my remark also applies to compiled languages when care is not taken for proper algorithms and data structures.

> Plenty of languages offer JIT/AOT toolchains

Like, for example, Python?

Nah, if you mean the black swan ignored by the Python community, thus forcing everyone that cares to go to Julia instead.

Can you be specific about this, for those of us not especially aware of the Python ecosystem?

PyPy is a Python implementation with JIT support, which has been going at it during the last 15 years or so, and nowadays is mostly compatible with the reference implementation.


However for most Python users, it is more common just to stick with CPython and rewrite stuff in C than having a go at using PyPy.

I feel the same about bounds checking in programming languages, apparently not everyone does.

I vaguely remember a story about a hardware manufacturer (Burroughs? Symbolics?) that had a processor with instructions that could bounds-check array accesses with zero latency overhead. It was a great feature for Algol/Lisp, but some customers asked for an option in their Fortran compiler to disable the bounds check.

The sales engineer replied that the bounds checks were zero-cost, but the customers replied that the bounds checks broke their progrems... their programs had silent (or at least unnoticed) array bounds bugs, and the customers preferred to keep those bugs, thank you very much!

I wish I had kept the link. I've tired a couple times to find the story. Does this ring any bells for anyone?

Burroughs definitely had bounds checking.


Its system programming language (initially ESPOL then NEWP), also has support for explicit unsafe code blocks, and there is zero Assembly support. All CPU low level operations are exposed via intrisics. All of this in 1961, almost 10 years before C was invented.

Still being sold nowadays, and naturally Unisys uses security as one of the selling features.


Regarding bounds checking, what I keep around is Hoare's Turing award speech.

"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

I used to agree heavily with this, but nowadays we have effective techniques of memory safety on the software level and I find myself ambivalent on which level of abstraction this occurs.

I do think it means C is probably eventually doomed as a system language, though. I’m curious if we’ll ever see e.g. rust in the *bsd codebases.

Except we don't, that is why ARM, Apple, Microsoft, Google, Oracle are all pursuing variants of hardware memory tagging for taming C, as those software solutions have proven not to work.

Sure, but hardware memory tagging is also not proven to work.

Anyway, it's unclear with which criteria you're judging "proven not to work" with as we're both typing via the software right now, unlike a hardware solution.

Sure it is, thanks to CVE database entries.

Here are Google's rationale for enabling it on Android,

> Platform hardening - We’ve expanded use of compiler-based sanitizers in security-critical components, including BoundSan, IntSan, CFI, and Shadow-Call Stack. We’re also enabling heap pointer tagging for apps targeting Android 11 or higher, to help apps catch memory issues in production. These hardening improvements may surface more repeatable/reproducible app crashes in your code, so please test your apps. We've used HWAsan to find and fix many memory errors in the system, and we now offer HWAsan-enabled system images to help you find such issues in your apps.


> Starting in Android R, for 64-bit processes, all heap allocations have an implementation defined tag set in the top byte of the pointer on devices with kernel support for ARM Top-byte Ignore (TBI). Any application that modifies this tag is terminated when the tag is checked during deallocation. This is necessary for future hardware with ARM Memory Tagging Extension (MTE) support.


"Adopting the Arm Memory Tagging Extension in Android"


"Detecting Memory Corruption Bugs With HWASan"

> Native code in memory-unsafe languages like C and C++ is often vulnerable to memory corruption bugs. Our data shows that issues like use-after-free, double-free, and heap buffer overflows generally constitute more than 65% of High & Critical security bugs in Chrome and Android.

> HWASan is based on memory tagging and depends on the Top Byte Ignore feature present in all 64-bit ARM CPUs and the associated kernel support. Every memory allocation is assigned a random 8-bit tag that is stored in the most significant byte (MSB) of the address, but ignored by the CPU. As a result, this tagged pointer can be used in place of a regular pointer without any code changes.


The post is already too long just with Android, otherwise I would provide similar references for iOS, Solaris on SPARC, Azure Sphere / Phonon.

Why does this point to a hardware-level protection rather than simply removing the dependency on C? I mean wouldn't moving to rust mitigate most of these vulnerabilities? I can't help but think that if hardware protection from vulnerabilities were useful we would have enabled them 30-40 years ago.

I just don't see any value of this to the end-user: the result (a crash) is the same.

If they can get it to work it might be beneficial, but it would also validate the idea that C is broken as a system language because it can't know at compile-time whether or not an fault will occur on memory access, which has been proven to be the correct way to reason about memory.

Because it still won't protect against unsafe code in Rust.

    // very contrived example
    fn main() {
        let mut data = vec!(1, 3, 4);
        unsafe {
            let ptr = data.as_mut_ptr();
            *(ptr.offset(1024)) = 1;
At some level there is some unsafe code, even if in pure Assembly.

There problem with C is that it taints everything due to strings, arrays and UB.

By the way, Android is also taking steps to introduce Rust on its codebase, hence the talks started by Google at Linux Plumbers conference.

As much as I love to rant on C, as long as POSIX based software is relevant, if WG 14 isn't willing to improve its security, hardware solutions are the only way.

Alright I guess I see what you’re saying, and this does seem like natural progress from address sanitization.

> By compiling to wasm we sandbox the code, preventing it from accessing anything on the outside.

Operating systems are in a sad state, as virtual address spaces already offer exactly that in hardware at full speed. It's only through the operating system APIs that processes gain the ability to affect anything outside of the process.

Current operating systems weren't made with untrusted code in mind, but have so much inertia that new operating systems repairing old misfeatures can never succeed. All code is already written based on the bad Operating System APIs like for writing files. This project just handwaves the actual problem away:

> the sandboxed code can’t do anything but pure computation, unless you give it a function to call to do things like read from a file, tell the time, etc.

> Operating systems are in a sad state, as virtual address spaces already offer exactly that in hardware at full speed

Ever since reading about Microsoft Singularity all that time ago, I reached the opposite conclusion: software is in such a sad state that it must rely on hardware to provide isolation. From this perspective, WasmBoxC and many projects like it are IMO a huge step in a desirable direction.

The main lesson from Singularity for me (aside from requirements for memory safety) was that cross-component safety can be achieved by formalizing the protocols those components use to communicate. Sing# had a dedicated type to capture the state machine for every cross-component transaction, with strongly typed inputs and outputs. This problem is not unique to software isolation -- it is the basis for a huge variety of security problems everywhere across the ecosystem, not least network services

Wouldn't it be a wonderful world if we knew our application was fully safe when exposed to a network for the same reason we know it is fully safe to run in the same address space as another untrusted application? That is that path Singularity took us along

Example of such sad state of affairs, Android 11 is adding support for hardware memory tagging, as static analysis alone is not enough to tame the C and C++ components.


iOS, Solaris on SPARC are on this path as well.

Regarding Singularity, we are slowly moving away from C on non pure UNIX clones, but still it will take generations.

100% agreement. With a kernel design like SeL4 there is no need for crazy sandboxing environments. It prevents you from accessing external resources by default. And it is probably as close as we can feasibly get to an operating system that is impossible to maliciously root.

CloudABI is an attempt to solve this the right way: http://cloudabi.org/

> as virtual address spaces already offer exactly that in hardware at full speed

Operating system processors are pretty heavyweight, and there are lots of situations where smaller granularity protection domains are useful.

I'd also quibble with "at full speed." There is a metric shitton of complexity to implement virtual memory, with multi-level TLBs, an extremely careful dance with the operating system currently, IPIs for TLB shootdown, etc. The dynamic cost of the TLB is something that is measurable, and of course, completely depends on your application behavior, kernel, availability of huge pages, etc.

The reality is that we don't really have a control group for hardware without virtual memory because the only such chips are for small embedded (niche) systems, and everything else runs on kernels designed to offer virtual memory to software expecting virtual memory.


"Everything Old is New Again: Binary Security of WebAssembly" - USENIX 2020


That's a good article! Some notes on it:


Perhaps wasm isn't (yet) a good replacement for native binaries for the reasons they mention. In particular such an application usually has access to files and timing etc., and you're (currently) missing some safety techniques native binaries use, which is a risky combination.

But as mentioned in the post here, if you're sandboxing a specific library that does pure computation (say, a codec or a compression library) then using wasm you can make sure it cannot escape the sandbox and that it has no timing or other OS capabilities. Those are powerful guarantees!

Pure computation is subject to UB and memory corruption due to lacks of bounds checking inside linear memory blocks, leading to outputs that cannot be trusted, even though they are sandboxed.

Definitely, yes, and you do need to be careful about those outputs. Still, the sandboxing guarantee here is very useful!

This isn't theoretical, Firefox does this approach in production (using RLBox, which is mentioned in the post),


Thanks for the link, I missed that post.

> The OS-based implementation uses the “signal handler trick” that wasm VMs use. This technique reserves lots of memory around the valid range and relies on CPU hardware to give us a signal if an access is out of bounds (for more background see section 3.1.4 in Tan, 2017).

On Linux you can also make use of userfaultfd(2) rather than handling SIGSEGV.

This is a really cool concept. I'd love a mature supported technology, that could replace .NET's AppDomain.

WebAssembly is anything but mature, and given that .NET also supports C++ from the get go, I would be curious to have a security analysis in how much "secure" WebAssembly actually is.

Thankfully researchers have finally started assessing it, with the first wave of papers reaching USENIX 2020.

That feels a little unfair.

WebAssembly has been shipping in the majority of production browsers for over three years now. That's one of the most security-sensitive attack targets that exists. Those browsers would not have done so if it weren't reasonably safe.

Of course there are vulnerabilities that are discovered, just like in every part of the web platform. Nothing is perfect. But a huge amount of attention has been put on wasm's security and the major implementations are very robust.

It's true that wasm is reaching into other areas besides the web, which does raise new questions (like in that recent USENIX 2020 paper). Perhaps it's fair to say wasm is immature as a replacement for a native executable, or other new ideas that are coming out. But it's very mature as a sandboxing solution for pure computational code, and it's used successfully on the web all the time.

It not unfair when plenty of WebAssembly supporters try to sell it as the ultimate bytecode, without any flaws from all the ones that came into existence since the 60's.

Also it is still catching up to features that those bytecodes already have during the last 20 years.

Those browsers are only shipping MVP 1.0, unless you want to equate Web with Chrome.

Only now are hackers and security researchers actually caring about WebAssembly security, hence the first wave of security papers on USENIX 2020.

I expect those that drove WebAssembly in detriment of what we already had to be eventually surprised.

For what I care, I will just take advantage of it to get back my plugins.

biggest problem is that .net/dotnet core can not be easily wasm aot compiled. currently you compile the runtime to wasm and use the dll with the wasm runtime. this is the approach that blazor wasm uses and it is aweful.


Mostly because Microsoft currently doesn't not know what they want, check the feedback regarding the missing roadmap for CoreRT/.NET Native.

With Reunion, MAUI, Blazor (WebAssembly and mobile), WPF team transfer, they seem to have got back into another reboot the eco-system phase.


Now given that, with Unity's .NET flavour can be compiled via IL2CPP => WebAssembly, or with Mono AOT as in Uno.

I tried this. Aot compiling with the .net runtime just isn't very practical. I know mono does this, with some guided info for reflection based classes. But the way the structure is setup, is that string pulls in globalization. And enumeration. And comparable and equality which both work by reflection to pick the right default implementation. Reflection is expected to work, which pulls in all methods and their dependencies. By the time you have a working executable for hello world, you're a few MB ahead. The class library just isn't setup for this kind of use.

I was thinking about this just the other day, when Microsoft announced the latest dotnet 5 preview, and yet again have postponed AOT - Microsoft have been teasing dotnet devs with the promise of production-ready AOT for something like a decade.

Frankly, I wish they'd put up or shut up - either prioritise it and make it happen, or just admit defeat and say it's not going to happen.

It's not really a matter of prioritization. The current WASM platform and toolchain are missing features necessary for AOT-compiling things like .NET executables. For one example, exception filters ... and unfortunately the standard library uses them, not to mention end-user software.

To provide more detail: Exception filters require the ability to stackwalk and search for filters and run them before actually handling the exception. WASM has no stack-walking functionality whatsoever (this also means getting and introspecting stack traces is currently impossible), so searching for filters is already a non-starter. You can try and emulate this through a normal unwind-only exception flow, but you have to rewrite all your application code to insert lots of checks and flow control changes in order to do it, and it's still observably different.

I've spent months just working on fixing this problem, and it's one of the things that wasm AOT is going to need before it can ship.

WASM is a generally weak target platform. A great 1.0, to be sure, but unless your goal is to run posix C code you're going to hit snags. When we were designing WebAssembly to begin with it was an intentional decision for 1.0 to be limited in feature set with the goal of improving it later - a Minimum Viable Product.

Sorry for any confusion caused on my part - my gripe wasn't with compilation to WASM, but about AOT to native bytecode.

Exception filters are also a problem for AOT to non-WASM LLVM targets, for similar reasons - expressing complex exception handling and flow control in LLVM is very difficult. But the situation is better there, at least.

Production ready AOT for .NET exists since Singularity, which MIDL and Bartok compilers were the basis of .NET WinRT on Windows 8/8.x.

Then some of the Midori tech eventually made its way into .NET Native, which from my point of view UWP + .NET Native is what .NET should have been all about back in 2001.

Apparently after the timid attempt with XAML Islands and MSIX, they seem to be getting the house in order and driving the platform into a way to pretend that Windows 8 and 8.1 never happened, but it seems to be lacking a lot of coordination and long term planning.

Windows 10X apparently is also not getting Win32 sandbox any longer, this assuming it ever gets released.

Still in the middle of the chaos, it still feels much better than if I had to deal with Android on daily basis, one IO best practices is next years legacy.

I guess you know this, but what I really meant was AOT for any dotnet apps - not specifically for UWP or whatever sandboxed, designed for touchscreen thing Microsoft is trying to push.

What many have been waiting for is the ability to AOT for both Windows and Linux.

I recall Mono had something like this around 10 years back, but it was a bit flakey. Not sure if that still exists in some form.

Mono keeps it around given that is how Xamarin iOS works, and also available as option for Xamarin Android (by default it uses JIT).

I think when someone says "production ready AOT" in any context, it's never quite obvious what they mean. As ckok implied, even when AOT is "working" it may generate a 100mb executable. For some end users that is production ready (like Facebook, who reportedly were shipping 1gb+ AOT'd php executables to their server cluster) and for other end users it is not (because a 100mb browser app is a closed tab.)

WASM now and emscripten before it both were designed and optimized for POSIX C apps and for games, and they're pretty good for those scenarios. Larger-scale stuff is pretty tricky and the tooling ecosystem will have to continue to grow to support more real-world applications. JIT, stackwalking, GC, etc are all still not there on WASM - thankfully threading is finally crossing the finish line but even that has taken years.

Some day WebAssembly will match what Flash CrossBridge and PNacl already had 10 years ago.

Alon did a really great job with the article, hats off.

I think WasmBoxC might be useful to try to benchmark how fast we can get Wasm to run server-side. We also hope to eventually beat native execution in server-side specific Wasm runtimes (such as Wasmer using the LLVM compiler and Profile Guided Optimizations) once the runtime ecosystem matures a bit.

Keep up the good work!

Wasm is amazing, but it's not intended to replace JavaScript. If you just want to add a form to a website, animate a drop-down menu, or the like, JS (or TS) will probably always be preferable to writing in another language and compiling to wasm. The web platform moves fast and maybe in a couple of years I'll eat these words. But nothing I've seen so far points towards the demise of JS.

Gary's talk is not exactly about the "demise of JS". Watch the talk, it's a great one :)

WASM will never replace JavaScript. Something else might, and it might have support for compiling to WASM, but we'll see.

Any time these come up i don't find a comprehensive analysis on security. The best, supposedly most secure libraries in the world get significant vulnerabilities.

This is not make any software magically secure. It‘s only purpose is to _sandbox_ a library to prevent it doing anything malicious. As wasm cant do any more than computing you simply contain the library, it cant anymore open any files, make network calls, etc.

> As wasm cant do any more than computing you simply contain the library, it cant anymore open any files, make network calls, etc.

That presumes the WASM implementation is bug free, which is not a great presumption, especially for the more sophisticated implementations with JIT engines, and even more so for those adding multi-threading, GC, etc.

It doesn't prevent Heartbleed like kind of attacks.

Also doesn't prevent internal memory corruption attacks that expose unintended behaviors from the public module interface.

> As wasm cant do any more than computing you simply contain the library, it cant anymore open any files, make network calls, etc.

I wish the world was that simple.

See this paper for some recent research on the subject: http://www.software-lab.org/publications/usenixSec2020-WebAs...

You can probably use wasm to build a very comprehensive secure sandbox but right now it's actually somewhat of a regression when you care about making an application secure. Your OS is safe, at least, as long as you don't let the app make any syscalls.

So, compiling via WasmBoxC gives a 14% - 42% performance overhead compared to native compilation.

Is there also a significant overhead in terms of binary size?

wasm binaries are typically size-competitive with native x86 once you compress them (i.e. foo.so.zip vs foo.wasm.zip), but the jitcode you get out of them is usually much bigger. In my testing the size overhead for stuff like the ICU unicode library was maybe 20-40% on-disk, depending on compiler settings. It's gonna depend on your workload.

Note that even if the actual generated jitcode is computationally efficient, it might be larger and as a result waste more space in the instruction cache, which would hinder performance.

so, if you compiled a language shell, or interpreter this way; you would end up with a sand-boxed shell/interpreter?

One that couldn't do anything. The essence of the method is providing to the sandbox specific, safe means to interact with the surrounding process.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact