The thing I want to achieve with WebAssembly is still proving a lot harder than I had anticipated.
I want to be able to take strings of untrusted code provided by users and execute them in a safe sandbox.
I have all sorts of things I want this for - think custom templates for a web application, custom workflow automation scripts (Zapier-style), running transformations against JSON data.
When you're dealing with untrusted code you need a really robust sandbox. WebAssembly really should be that sandbox.
I'd like to support Python, JavaScript and maybe other languages too. I want to take a user-provided string of code in one of those languages and execute that in a sandbox with a strict limit on both memory usage and time taken (so I can't be crashed by a "while True" loop). If memory or time limit are exceeded, I want to get an exception which I can catch and return an error message to the user.
Surprisingly I've not found a good pattern for running a JavaScript interpreter in a WASM sandbox yet. https://github.com/justjake/quickjs-emscripten looks promising but I've not found the right recipe to call it from server-side Python or Deno yet.
Can Extism help with this? I'm confident I'm not the only person who's looking for a solution here!
The problem you want solved, perfect sandboxing for untrusted code, is only just THE single most important problem in operating system security. If you can solve that then you have the basis of a perfectly secure, unhackable operating system. Anybody claiming to solve that problem at speed in any other software domain can trivially use those same techniques to create a perfectly secure operating system runtime.
So, you have to wonder to yourself, if they can do that why do they not just go and write a unhackable operating system. It is only like one of the single greatest problems of all the commonly used commercial operating systems in what is viewed as one of the most hardcore of software disciplines where solving it would instantly establish you as a supreme software guru. Basically, if you can solve that problem you should make and advertise a unhackable operating system; anything else is selling gold bricks as ballast.
To channel Theo de Raadt of OpenBSD: You are absolutely deluded, if not stupid, if you think that a worldwide collection of software engineers who can't write operating systems or applications without security holes, and then turn around and suddenly write browser sandboxes (originally virtualization layers) without security holes.
Browser sandbox escapes from untrusted JavaScript are discovered and exploited regularly. JavaScript is much more constrained than the full force of a low level language like WebAssembly, and they can not even get the JavaScript sandbox safe to run truly untrusted or malicious code. Why would something harder to do work when they can not even do the easier thing?
Unless you are just talking about something meant to handle accidentally, not intentionally malicious code. Then sure, it is probably be okay for that. But if you are actually worried about malicious code then, no, browsers (and commercial operating systems) do not provide that. And anybody suggesting they can do that is almost certainly lying unless they also claim to have developed a unhackable operating system/virtual machine as well.
I know that it's hard, but I'm not ready to agree that this isn't worth seeking answers to.
AWS run untrusted code on Lambda all the time.
Browsers seem to be handling this pretty well in the face of the most untrustworthy computing environment our species has yet developed. Zero days in browsers are big news, and don't happen very often.
If you can sandbox arbitrary malicious code, then you can make a unhackable operating system/runtime. Such a feat is frequently viewed as literally impossible in many software circles and would constitute a extraordinary claim that demands impeccable, extraordinary evidence to support it such as, minimally, mathematical proofs of the entire code base. Nothing less should overcome the sheer ideological inertia behind the common-sense view that everything is easily hacked as has been continuously demonstrated on basically everybody all the time.
So, unless you want to claim Amazon has invented a unhackable operating system to run AWS, has the mathematical proofs of correctness to support such a extraordinary claim, and has just not bothered to tell anyone, claiming AWS can actually securely run untrusted code is pure unsupported bluster. In fact, I bet exactly zero people at Amazon would back up such a claim if pressed, and if even the people doing it think it is impossible then there is no way they are actually doing it. The same goes for browsers.
As to zero days in browsers being big news, they are really not. Zerodium only pays 500 K$ for a Chrome RCE+LPE [1]. That is pocket change. Ransomware attacks ask for millions of dollars per attack these days. They can literally afford to burn multiple Chrome RCEs per attack (if needed) and still come out profitable. The cost of sandbox escape needs to be somewhere around 20-100x higher for it to be viewed as "secure" against the common threats seen every day.
AWS uses virtualization (Firecracker) to provide isolation for Lambda.
WebAssembly vs browser/javascript isolation is a little like virtualization vs operating system level isolation. WebAssembly and virtualization offer far smaller attack surfaces which mean they are far more likely to remain secure in the long term.
Browsers and operating systems are highly complex abstractions and they only remain secure (if you keep them patched) through the large ongoing investment in them.
Webassembly is far more constrained in the browser than Javascript. Exploits are flaws in the implementation that can be fixed, but what is being asked for is an environment that has fewer privileges by design.
Yeah, you are going to need to support a claim of solving the biggest problem in operating system security with more than a random assertion.
How about you start by finding a quote by any member of the development team who is willing to support the claim that Fuchsia establishes unhackable separation/sandboxes? If nobody making the software is willing to say that then we can be sure that they did not achieve it.
You can then follow it up by pointing to a mathematical proof that the code enforces separation kernel security properties where the proof has been verified by a competent third party. I will not demand you present or link the proof, all you need to show is that one exists and a credible party assents.
Only then do you have the bare minimum of evidence needed to support a assertion like that.
To run a JavaScript interpreter (spidermonkey, in this case) in Wasm, as well as running that same wasm in a JS engine, you want to look at `jco` https://github.com/bytecodealliance/jco
The component model tooling is getting very close to maturity and will solve many of these problems.
Hey Simon, I've done some similar experimentation using Extism to sandbox LLM generated code. This uses our JavaScript PDK (which uses quickjs) and can be embedded in any host language we support https://extism.org/blog/sandboxing-llm-generated-code
I'm also working on a Python-PDK right now and hope to have a beta ready in the next month. That would allow us to do something similar with python.
As a robust sandbox, have you considered using a micro-VM? Firecracker [1] comes to mind, the VM behind AWS Lambda. It's designed to be lightweight to launch, suitable for running ephemeral code.
While I agree that it'd be nice to be able to use WASM for this purpose, it seems like a microVM might provide a more convenient interface: you can "just" run any existing programming language inside it (without needing any specific support for e.g. WASM). Indeed, you could run multiple processes built with different programming languages together and allow them to communicate in standard ways.
Additionally, VMs offer a number of advantage from a security perspective. Hypervisor VMs take advantage of hardware support, and their surface area is arguably well-hardened and smaller than alternatives (hence why VMs are used for cloud computing).
> I've not found a good pattern for running a JavaScript interpreter in a WASM sandbox yet
Is there a good reason to do this? I thought WASM typically used the V8 JavaScript interpreter as its sandbox and to execute code. If you could launch WASM, couldn't you equivalently launch an instance of V8 with the JavaScript code running inside directly? I do think this is a good question, and it raises further questions like: what if I want to run JavaScript and WASM side-by-side, so that they can communicate with each other and/or with native code.
Firecracker is a fine technology, but serverless companies have started taking advantage Wasm's faster start-up and invocation times for use cases of running Wasm on the server (https://www.youtube.com/watch?v=yqgCxhPAao0). The deny by default security policy makes Wasm a popular choice to run code in isolation, particularly for maximizing hardware resources in the multi-tenant environments these serverless companies operate.
> Is there a good reason to do this?
One use case to run JS inside a Wasm VM is Shopify Functions. Shopify allows their customers to customize things like checkout flow by writing code compiled to Wasm which gets executed during the checkout process. They want their customers to be able to write JS as well as other languages. https://github.com/Shopify/function-runner
> I thought WASM typically used the V8 JavaScript interpreter as its sandbox and to execute code.
V8 is popular for running Wasm on the web and for some serverless companies, but there are a bunch of serverless, blockchain, and iot projects that use other Wasm runtimes (Wasmtime, WAMR, WasmEdge, and Wasmer to name a few) - https://github.com/appcypher/awesome-wasm-runtimes
I'm working on this problem as well and would be happy to sling you some thoughts and notes. Check my website https://runno.dev and send an email to the address on that website!
I'm building something that solves this exact problem, and I'll have an alpha version ready soon. Drop me a mail at marc@ my username .net, if you're interested. I'd also really like to have a chat about your use cases if you're up for it.
As another reply says, there's no such thing as perfect sandboxing unfortunately, especially not with the string of CPU side channels that have been uncovered, and continue to be revealed.
The idea of Wasm as a universal plugin system is very promising. But string passing is maybe not the best example to highlight, considering that Wasm is introducing stringref to enable zero-copy string sharing between the Wasm runtime and host language.
I'm working on a WASM project[0] with Andy, the stringref champion, and unfortunately the future for stringref is uncertain. There is now a new proposed alternative called "JS string builtins"[1]. The way we intend to ship support for strings currently is to just use JS string builtins, because it is just a set of imported functions, not new instructions. Those imports can either be provided by efficient, native compiled functions (currently behind a V8 feature flag, not sure if other JS engines support it) or from a JS polyfill. It's not an ideal situation but this will allow for shipping strings that work in all browsers with default settings.
While it'd be a nice addition, I wouldn't expect it any time soon.
It's currently still a stage 1 proposal, while we've been waiting for years for other proposals to be merged. The last time a proposal was actually finished was over 2 years ago.
For your second one, it looks like it is already implemented in Chromium and Firefox but not Safari. Sadly, it's not new for Apple to be dragging their feet on moving web standards forward.
It took what seems like a decade to get proper WebRTC support in Safari.
> WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications.
I was going to say that everything around interoperability in Wasm has been stalled for years, but hey, looks like garbage collection has reached phase 4! That's a pretty big one!
The component model (aka interface types aka snowman types aka...) is still stuck is phase 1, though. After almost four years, that's not encouraging.
This is the first time I saw a mention of WTF-8 [1] and WTF-16. From the spec description this seems strange to use this interoperability "hack" (using the word from the spec) as the foundation of the string proposal. I wonder if they could use UTF-8 instead and keep WTF-16 for interoperability with JavaScript.
> WTF-8 [...] is a superset of UTF-8 that encodes surrogate code points if they are not in a pair. It represents, in a way compatible with UTF-8, text from systems such as JavaScript and Windows that use UTF-16 internally but don’t enforce the well-formedness invariant that surrogates must be paired. WTF-8 is a hack intended to be used internally in self-contained systems with components that need to support potentially ill-formed UTF-16 for legacy reasons.
This and its relatives feel like such a weird positioning choice for a company. If you have resolved to use some client library to do your heavy lifting, you want one focused and hardened on the specific language you are using, not one library of a dozen libraries spuriously maintained by a startup which will most likely eventually deprecate the library if its not used with their Edge hosting service or whatever.
The huge list of languages does not inspire confidence. Like how going to a restaurant where the menu says they have amazing pizza and pho and ice cream makes one think that all three of those items are pretty bad.
This blog article is great, I had a hell of a time integrating Wasm into Ruby, so much so to the point that I gave up.
I was absolutely going to call a function, pass it some HTML to parse (as a string), and return a number, but what I wound up doing is pass a directory with some HTML file in it, return a number as a string, and parse it. Because after struggling with wasmtime and wasmer until I could see it was not going to perform acceptably with a Ruby interpreter packed into the Wasm module, I found that further, it was seemingly impossible to call a function from in Ruby with any meaningful parameters without going through the system interface to the Wasm.
The article does a much better job of explaining these issues concretely, but if you're totally lost, you can hear my fever-dream version of the same ideas here. (I am a Wasm beginner.)
But if you don't have time to watch, tl;dr: I gave up, my Wasm in the end was called as a WASI, which I passed a filesystem context into rather than attempt to call any function at all, and I parsed the output from the system interface. Not too different than what it looks like the Extism example is doing. I will definitely be going through these docs as it seems likely to help me understand better what I've missed, and moreover that it has Ruby support right on the front page chef's kiss
Great talk, thanks for sharing! Would love to hear if Extism allows you to accomplish this, and if not, what prevented you. We're always down to chat on Discord, but an issue on the Ruby SDK repo will do just fine as well - https://github.com/extism/ruby-sdk/
There's a lightning talk version from GitOpsCon (at GitOpsCon/CDCon Vancouver - co hosted with OSS Summit later in the week)
But they are pretty much the same talk, except at CDCon, I hadn't written the Kubernetes operator so that it actually ran the Wasm module yet. At the end of the talk at OSS Summit, I show it running and I'm so glad you enjoy it! I will definitely check out your new ruby SDK :tada:
Come join our Discord if you get the chance and find me there in the ruby-sdk channel. I've been doing some experiments with Wasm and ruby (particularly around rails and higher level application abstractions) and would love to have another rubyist to bounce ideas off of. Also I had trouble tracking down your email, but if you'd prefer you can reach me at ben at dylibso dot com
Another open Source project locking their community discussions away on Discord where it can't be easily discovered or searched.
At least consider mirroring to the web - or at this stage inform people you will one day make posts available on the open web. Acquiring that consent afterwards is painful.
Most forums were awful phpbb nonsense where it was impossible to find the information you wanted. Search always required logging in and the only interface for very long threads was paging through them 10 badly laid out posts at a time.
No thank you.
Of course modern forums are better. Disqus is ok, and D's forum software is arguably the best thing to come out of the D project.
In any case Discord is not a replacement for forums; it's a replacement for IRC.
I recall running into the odd one or two like that - I have vague memories that it was because searching was _slow_ and it didn't take more than a few people using it to bring the site to its knees. Restricting it to logged in users helped, and I think I saw some sites restrict searching to only post bodies for a period too, for similar reasons.
I haven't run into it recently, but I have at least vague memories of it happening in the past.
Just have a browse. It's so fast! Also very clean, not stupidly sparse layout. Proper markdown support. I believe it also acts as a mailing list/newsgroup somehow but I'm not sure on the details of that.
hear ya loud and clear -- and we use multiple forms of communication.. GitHub issues and discussions are actively suggested when we find that a conversation is turning into something that would benefit others and move from Discord.
Discord is really advantageous for us to have real-time conversations with people though.
we also like Discord _for_ the “chit chat” though. getting to know the community that’s growing around your project is one of the most enjoyable aspects of open source IMO!
My point is that people WILL use your Discord as a forum and it WILL accumulate valuable knowledge.
This knowledge should be mirrored to the web so it's discoverable and archivable. And to do that you need consent - you should notify users now even if you don't implement it now.
2. Discord will go away one day and so will all the accumulated knowledge
3. People who aren't directly looking for your project won't find it. It's will have no search footprint. You have to actively seek out the Discord to discover that that's activity there.
"Most people" don't already have it. Only a fairly specific handful of demographics.
> "Most people" don't already have it. Only a fairly specific handful of demographics.
This just shows how in-a-bubble people can be. I'm trying to think of a single person outside of my hardcore gamer friends that I know who would have Discord - drawing a blank.
The problem is that information in Discord is locked away and is completely inaccessible.
As an example. An Elixir library I'm using, Ash, had its support forum on Discord. Just recently they moved it to a proper forum. Here's the immediate result [1]:
--- start quote ---
Moving to ElixirForum just benefited me. I googled elixir ash registry to understand what it is and if I still need it (I ended up with it from an example somewhere). The top result was this excellent question and answer
--- end quote ---
A forum is always nicer, and is accessible to more people.
"Just" use WASI is not useful advice. I've been trying to do that for my own purposes for over a year. The learning curve on that (as a Python programmer who wants to use WebAssembly for sandboxing) is practically a vertical wall.
Thanks. I'm not sure how this maps to the original example, though. This is about compiling a standalone program which translates POSIX apis, it seems? How about calling it from Rust with string args etc?
Exactly. That's what Extism is trying to solve. That WASI post doesn't show at all how to use the WASM code from C, or vice versa, because it just compiles the entire C program to WASM which uses the POSIX-based WASI API. If you want an alternative to Extism, you need something like wasmer.io, not just wasmtime (Extism actually uses wasmtime as mentioned in the post).
Right. The point being that you can take your pick of WASI compliant runtimes which have those features. Extism isn't something special compared to any other open source project in this domain, yet it smells of a commercial venture. Count me out.
WASI is great and obviously Extism supports it as a superset of functionality - but you don’t always want to give your guest code access to system resources even in a limited environment.
Extism also offers a bunch of other features that you don’t get with WASI. But use what’s best suited for your needs!
I want to be able to take strings of untrusted code provided by users and execute them in a safe sandbox.
I have all sorts of things I want this for - think custom templates for a web application, custom workflow automation scripts (Zapier-style), running transformations against JSON data.
When you're dealing with untrusted code you need a really robust sandbox. WebAssembly really should be that sandbox.
I'd like to support Python, JavaScript and maybe other languages too. I want to take a user-provided string of code in one of those languages and execute that in a sandbox with a strict limit on both memory usage and time taken (so I can't be crashed by a "while True" loop). If memory or time limit are exceeded, I want to get an exception which I can catch and return an error message to the user.
I've been exploring options for this for quite a while now. The furthest I've got was running Python in wasmtime: https://til.simonwillison.net/webassembly/python-in-a-wasm-s... and running Pyodide inside of Deno: https://til.simonwillison.net/deno/pyodide-sandbox
Surprisingly I've not found a good pattern for running a JavaScript interpreter in a WASM sandbox yet. https://github.com/justjake/quickjs-emscripten looks promising but I've not found the right recipe to call it from server-side Python or Deno yet.
Can Extism help with this? I'm confident I'm not the only person who's looking for a solution here!