1. Hardware design flaws are not irremediable and will be fixed in time.
2. It's still very useful to run software that is trusted to not be malicious and/but not trusted to be void of memory bugs.
There's no particular reason to believe this. Neither Intel nor AMD have committed to any form of side-effect-free speculative execution or similar. That'd required a rather large chunk of transistors and die space, and if nobody is footing the bill for it it's not going to happen.
Preventing side-effects from leaking between processes (and ring levels) entirely at the hardware level is definitely going to happen. Within a process, though? That's not going to happen without multiple major players demanding it. Since nearly all the major CPU consumers are currently happy with process boundaries being the security enforcement zones, there's no particular reason to believe that in-process sandboxing will ever have hardware fixes to prevent spectre attacks.
Once you have mitigations for colocating trusted and untrusted code in ring 3, doing the same in ring 0 almost certainly isn't a big deal.
In-process sandboxing by all appearances is simply dead.
As for "doing the same in ring 0 almost certainly isn't a big deal" no, very extremely no. If you let code run in ring 0 it has everything ring 0 can do, period. You cannot put restrictions on it, that's what spectre proved. Give code access to a process and it has entire access to that process. Similarly give something ring 0, and it has the entirety of ring 0.
Untrusted code goes in ring 3 in an isolated process. That's the security model of x86, and it's the only model that CPU vendors have any pressure to fix.
Chip vendors have & will fix process boundaries. But nobody is talking about any sort of protection of any kind that would let in-process sandboxing work again. It's just not even on the table at this point.
I do not think there is the degree of distinction between "processes" and "threads" that you think there is. If tools exist to isolate speculation state between processes, those tools can probably also isolate threads in a process.
The first attack isn't relevant to designs that don't use hardware isolation, but the second one absolutely is. If your virtual bytecode (wasm, JVM, Lua, whatever) is allowed access to a portion of memory, and inside the same hardware address space is other memory it shouldn't read (e.g., because there are two software-isolated processes in the same hardware address space), and a supervisor or JIT is guarding its memory accesses with branches, the second attack will let the software-isolated process execute cache timing attacks against the data on the wrong side of the branch.
(I believe the names are more-or-less that Meltdown is the first bug class and Spectre is the second, but the Spectre versions are rather different in characteristics - in particular I believe that Spectre v1 affects software-isolation-only systems and Spectre v2 less so. But the names confuse me.)
Now I'm wondering how browsers presumably already cope with this for JS, or how CloudFlare workers cope with it, or .. etc.
Browsers just resort to process sandboxing entirely. They assume JS can escape its sandbox, but since it can only read contents that were produced from its origin anyway it doesn't really matter.
CPU speculates past the bounds check and loads values you should not read into cache and you try to find out which value was loaded into cache using precise timing but the timer is imprecise so you don't know what the value was which you were not supposed to be able to read.
Xerox Parc workstations, IBM and Unisys mainframes, UCSD Pascal, Oberon, Inferno, Java, .NET, Flash.
Or for more actual examples, Garmin Apps, watchOS bitcode, DEX on Android.
Dalvik on Android is not a software isolation mechanism. Each app has its own Dalvik VM using traditional UNIX processes and user accounts for isolation, which are in turn powered by hardware isolation. (I'm betting they did this because they knew that JVM software isolation had been a disaster in practice, although in part this also means apps can link native code libraries without changing the security model at all.)
The promise of the JVM failed not because of design, but because of buggy implementations. The drive for features and performance brought risk. And WASM is heading down the same path. WASM can add GC with an unimpeachable proof of design safety, but it would be irrelevant.
The issue isn't design or architecture. The issue is that the motivation for adding GC and other features to WASM is principally for performance. Version 1.0 of WASM will be the last specification that made security paramount. Everything after that will be a managed retreat as more and more features and code are placed outside the sandbox in the endless pursuit of performance.
To reiterate: Java applets weren't insecure because the JVM was insecure. Java applets were insecure because the vast majority of the implementation of the environment--and particularly the most complex aspects--existed outside the confines of the sandbox. The more feature rich and performant the sandbox environment then necessarily the more code and complexity must exist outside the sandbox.
ActiveX, Silverlight, and (worst of all) Flash were far more insecure, both in design and implementation. But that's a distinction without a difference; their relative inferiority didn't make Java applets more viable.
WASM will be insecure for all the same reasons: JITs and GCs are tremendously complex beasts, with implementations least amenable to verification methods relative to most other software projects. Rust's borrow checker is worthless for ensuring memory ordering and life cycle invariants of machine objects. DOM implementations have similarly become tremendously complex, yet the focus is on how to expose these implementations and their interfaces in entirely novel and brittle ways, prioritizing performance above all else. The most safe alternative and likely sufficient for 80% of use cases would be a message passing interface, afterall. Instead, design proposals are focused on finding the thinnest possible abstraction over direct addressing of DOM object references from within the VM. Thus the prioritization of GC.
I don't claim to be an expert in this, but I think there's a coherent reason why Java's design failed: Java attempted to do isolation at the language level (and inside a language-specific bytecode), which is a richer interface. It should be entirely possible to develop a high-performance interface for software fault isolation as long as it's a small enough interface to successfully secure, and my sense is that that's where wasm is going.
One important advantage of wasm over Java, Flash, and Silverlight is that it can learn from their failures.
Message-passing doesn't need to be slow either - just write your messages in a high-performance but securely parseable format like Cap'n Proto. We know how to do such things now; we didn't 20 years ago. The state of the world keeps advancing.
Well said. That happened everywhere from desktop to server to embedded. However, safety-critical side of embedded further supported your point by building real-time, safe implementations of JVM (or subset) designed for certification. There were also companies using Ada to get systematic protections against errors with one, Praxis, doing semi-automated proofs with their SPARK language. A JVM implemented in Ada, SPARK, and (where necessary) C/C++ might have been much safer.
"JITs and GCs are tremendously complex beasts, with implementations least amenable to verification methods relative to most other software projects. "
There's actually verified JIT's and GC's they can draw on. I doubt they will, though. History shows they'll go with a non-verified design followed by penetrate and patch.
"Rust's borrow checker is worthless for ensuring memory ordering and life cycle invariants of machine objects."
I've long pushed Abstract, State Machines (or languages based on them) to do this kind of stuff better. The work on memory models can be ported to something like Asmeta. The algorithms can be checked against them by solvers. Then, equivalent software and/or hardware comes out the code generator with more analysis/tests in case it messed up. I got excited seeing Galois was using ASM's for hardware/software verification recently. They'd be great for checking security of interpreters against software and hardware level issues.
"The most safe alternative and likely sufficient for 80% of use cases would be a message passing interface, afterall."
I haven't updated myself yet on advances in typing for that stuff. Pony's method for type checking might help here since it uses a capability-secure, actor model. Wallaroo also uses it for a high-performance database. So, it's not a slouch either.
ART on Android 5 and 6 is as much runtime as any other programming language with a AOT compilation model.
And as of Android 7, there are multiple execution modes. An hand optimized interpreter written in Assembly, a JIT compiler with PGO feedback, and an AOT compiler that takes the JIT PGO data to generate a proper executable on when the device is charging and idle.
Traditional native code is also pretty much clamped down in recent versions of Android via SELinux, seccomp and white list of shared objects.
Google doesn't want you to do more than just implementing native methods, high performance 3D graphics, real time audio or importing "legacy" libraries.
I don't consider WASM that sound because it still allows for internal data corruption of modules written in unsafe languages, instead of supporting proper memory tagging like SPARC and the upcoming ARM architecture.
I think the parent comment's point about Android application isolation being enforced by hardware rather than software still stands.
Going through your list:
Xerox PARC worstations: these didn't really run smalltalk in microcode, but ran an interpreter written Data General Nova asm, with the microcode "emulator task" dispatching and executing Nova machine code. There were a few new instructions added for smalltalk, but that was stuff like bit blit instructions. All that being said, even if the smalltalk VM interpreter was pushed down into microcode, it'd still be (a component of) a VM.
The big iron JITing environments are quintessential hardware/software codesigned VMs.
Oberon the language you might have a point, but the secure loader/verifier/compiler in it's only implementation is absolutely a VM.
Java with ART is absolutely a VM, unless you're going to make the argument that HotSpot isn't a VM.
.Net UWP is absolutely a VM too. Yes, it's partially compiled before it reaches end users, but also includes the entirety of .Net core linked in for cases where you're dynamiclly adding new code to your're running process.
Essientially, I think you've come up with some weird definition of VM that doesn't match industry or academia, and then berating people who don't follow that non standard definition.
System/360 had (and used) hardware privilege levels.
Seems like few of your examples involve not using hardware for process isolation. Java: no, except a few research OSes. Flash: no.
IBM System/38, evolved into AS/400 and IBM i, is one of architectures described in this book:
Far as language-based security, the first mainframe for businesses used a high-level language combined with a CPU that dynamically checked the programs. Still sold by Unisys but I doubt hardware checks still exist.
The Flex Machine implemented capabilities and trusted procedures in the microcode:
ASOS supported a mix of methods where each app was Ada for its safety features but a MLS kernel modeled in Gypsy separated various security levels:
SAFE explored tagging at CPU level which got commercialized as CoreGuard or Inherently Secure Processor:
In embedded, there's Java processors that run bytecode natively with some support for separation. They blur the line between VM's and native apps:
AS/400 though does use hardware heavily in it's isolation model, going so far as to have a custom PowerPC variant currently that adds tagged memory.
And yes I call them mainframes, because it is as I always heard people referring to them during my Summer job back in the day, so the name stuck with me even it isn't correct.
Once upon a time switching page tables was slow, but now we have features like PCID that allow preserving buffers.
Soon, if not already, the principle cost to context switching will be the necessity to flush prediction and data buffers. In-kernel solutions like Wasmjit must incur the same costs. Quite possibly they may turn out to be slower overall: 1) they won't be able to take advantage of the same hardware optimized privilege management facilities (existing and future ones--imagine tagged prediction buffers much like PCID), and 2) they still incur the extra runtime overhead of running in a VM which, JIT-optimized or not, eats into limited resources like those prediction and data buffers that have become so critical to maximizing performance.
Granted, if it's going to work well at all than Nginx seems like a good bet, especially because of I/O. But there are many other solutions to that problem. Obsession with DPDK may be waning, but zero-copy AIO is still a thing and there are more ergonomic userspace alternatives (existing and in the pipeline) that let you leverage the in-kernel network stack without having to incur copying costs. And then there are solutions like QUIC that redefine the problem and which should work extremely well with existing zero-copy interfaces.
CPUs are incredibly complex precisely because so much of the security heavy-lifting once performed in the OS is being accomplished in the CPU or dedicated controllers. And these newer optimizations were designed to be integrated within the context of the traditional userspace/kernel split.
Wasmjit looks like an extremely cool project and I don't doubt its utility. There's plenty of room for alternative approaches, I just don't think the value-add is all that obvious. Probably less to do with performance and more to do with providing a clear, stable, well-supported environment for solving (and subsequently maintaining!) difficult integration problems.
 I just want to reiterate that by saying the value-add isn't obvious I'm not implying anything about the potential magnitude of that value-add. I've been around long enough to understand that most pain points are invisible and just because I can't see them or people can't articulate them doesn't mean they don't exist or that the potential for serious disruption isn't there.
Netmap - DPDK-like packet munging performance but with interfaces and semantics that behave more like traditional APIs. Signaling occurs through a pollable descriptor, meaning you can handle synchronization and work queueing problems much more like you would normally.
vmsplice - IIRC it recently became possible to be able to reliably detect when a page loan can be reclaimed, which is (or hopefully was) the biggest impediment to convenient use of vmsplice.
peeking - Until recently Linux poll/epoll didn't obey SO_RCVLOWAT, which made it problematic to peek at data before using splice() to shuttle data or dequeueing a connection request. I have a strong suspicion that before this fix many apps like SSL sniffers simply burnt CPU cycles without anybody realizing. Though in the Cloud age we seem much more tolerant of spurious, unreproducible latency and connectivity "glitches".
AIO - There's always activity around Linux's AIO interfaces. I don't keep track but there may have been a ring-buffer patch merged which allows dequeueing newly arrived events or data without having to poll for readiness first.
Device Passthru - CPU VM monitor extensions make it easier to work with devices directly. Not quite the same thing as traditional userspace/kernel interfaces, but it seems like people are increasingly running what otherwise look like (and implemented like) regular userpace apps within VM monitor frameworks. Like with Netmap all you really need is a singular notification primitive (possibly synthesized yourself) that allows you apply whatever model of concurrency you want--asynchronous, synchronous, or some combination--and in a way that is composable and friendly to regular userspace frameworks. VM monitor APIs and device pass thru permit arranging the burdens between userspace/VM and the kernel more optimally.
You're going to have to show me on what CPU this is true on. A syscall is no where near as fast as a function call.
The entry and exit cost of a syscall is ~150 cycles. (Source: Many Google hits--blogs, papers--show people reciting 150 cycles exactly so I assume there's a singular, primary source for this. Maybe will track down the paper later.)
I'd say that's comparable. Many syscalls take much longer, but that's just because syscalls tend to be very abstract interfaces where each call performs costly operations or bookkeeping, especially on shared data structures requiring costly memory barriers. That doesn't mean the syscall interface itself is expensive. Microkernel skeptics stopped arguing syscall overhead a long time ago, and proponents are no longer defensive about it.
Lwan's actually small enough that mathematical verification for correctness against a spec is feasible, even though costly. Unlike Lwan, I could never have any hope of proving the correctness of Nginx. Even its safety would be difficult just because of all the potential code interactions on malicious input. Leak-free for secrets it contains? Forget about it. Best bet is to shove that thing either in a partition on a separation kernel/VMM or on a dedicated machine. The automated tooling for large programs does get better every year, though. One can use any compatible with Nginx. And still shove that humongous server into a deprivileged partition just in case. ;)
So I don’t see the point.
: https://www.destroyallsoftware.com/talks/the-birth-and-death... (at 18:46)
Yeah, I know the JVM supports several languages these days but most require non-superficial similarities to Java (garbage collected, etc.)
There were plenty of well-funded efforts to have "write once, run everywhere" in the past that were just VM's and formats (ANDF, etc).
In practice, a lot of things have changed since Java that have made this kind of approach feasible. As a simple example: good compiler infrastructure to build on top of is much more available than it was then. These days you pretty much just have to write a frontend.
Even though GCC existed then, it was still compiling statement at a time!
1. it comes on with platforms already via a browser or node.js
2. like you said, the tools are here now (really LLVM made most of this possible)
Someone may attempt to add a batteries-include system that uses WASM with a bunch of platform-abstraction libraries, but WASM itself does not provide that. And isn't going to provide it.
You can make a portable library with WASM, assuming you have zero dependencies on anything, but that's about it.
It should be noted that doing this requires non-superficial similarities to *nix/POSIX (signals, files, threads, etc). It's not like you could run Nginx on this without its POSIX impl or in the browser w/out Emscripten's POSIX impl or on any other WASM runtime w/out a POSIX impl.
You can (obviously) implement any language without garbage collection semantics using garbage collection, so this requirement is false.
See for example languages like C and C++ running on the JVM.
0 - http://www.graalvm.org/docs/getting-started/#running-llvm-in...
1 - https://github.com/cretz/asmble
I miss the days of pronounceable acronyms
I suppose calling it METAL would have been too on-the-nose.
The talk is great, but I'd suggest that's the reason for the downvotes.
Does it? It seems that the talk has two main points:
2. Ring 0 JIT can be 4% faster that normal binaries.
WASM is primarily a target for other languages, and qualifies as a language that can theoretically be JITted 4% faster than native code can be run.
The execution inside the kernel is related, but nobody replies to a lua in kernel post with a link to the talk.
But they couldn't be more different technically, and if wasm does indeed become the lingua franca of future computing it will be much more boring than the craziness of js doing the same.
The talk was great because it was about an insane yet plausible future. We now have a boring and probable future.
It will just be yet another VM platform.
I'm Syrus, from the Wasmer team.
We have been working in something similar, but with a special focus on maintainability and with bigger goals in mind:
Here is the article about our journey on Running Nginx (which funnily enough we actually accomplished just before wasmjit):
> we actually accomplished just before wasmjit
> Wasmer is the first native WebAssembly runtime [...]
Whoa there. I like both projects and respect competition as much as the next guy, but maintainability is subjective and being first is of little importance.
In the article I've linked there is a better analysis on why:
1. in wasmjit, the machine instructions are hardcoded into the runtime (this is like creating your own LLVM, by hand... and only available for x86)
2. it doesn't have a single test
I was talking about my own experience here, because I tried to contribute to wasmjit before creating Wasmer... and was quite challenging!
It might be useful to check how many people interacted with the code in each of this projects! ;)
Also note the timeline that took for this projects to accomplish the same: wasmer (<2 months) wasmjit (6 months)
But I agree that if you think your solution is better, there's really nowhere better to put it out there than in front of eyes that are looking at something similar.
Personally, I would challenge the piggybacker to show me something.
Talk is cheap.
That was not the intention but rather to showcase and make sure everyone understands the tradeoffs of each of this projects :)
Because of that, we prefer to leverage on existing open-source projects (for example, for parsing or for the IR) that are already working, than to create everything from scratch.
Also your two projects must be collaborating somewhat because it really looks the nginx.wasm file you’re distributing is the one wasmjit compiled! Correct me if I’m wrong but I don’t think there’s any other way they’d end up being byte for byte identical.
Running in Ring 0 might open bigger risks regarding security, and we want to make sure everything is under control (with external security audits) before approaching that space.
Here's a more detailed answer about its risks: https://news.ycombinator.com/item?id=18587353
Does anyone know how to link in native OpenGL system libraries? I'm looking to link to native graphics libraries so that I don't have to pass through Emscripten's OpenGL -> WebGL emulation layer. I'd like to drop the browser render layer all together and just have GLFW or SDL take care of rendering native client windows.