The idea of a multi-user time-sharing virtual memory operating system is seriously outdated.
Today, the server OS could be based on a one user, multi computer abstraction, with multiple gigs of memory, solid state storage, and reliable gigE+ network as assumed essentials.
Entire sub-systems of the current linux kernel can be omitted.
Most of the VFS layer, with so many optimizations for spinning rust. Right out.
Paging, swapping, shared objects, address randomization and PIC. Right out.
User access controls, file access controls, network access controls / firewalls. Right out.
Of the 50,000 device drivers in the linux kernel, probably 50 deserve to be supported.
For a workstation with graphics / GPU and console support, and every pluggable external device support, maybe some sort of legacy emulation layer would work. Basically run a linux VM for backwards compatibility with those 50,000 devices. Would be less work than implementing those drivers...
Remember, Linus was once a 19 year old with a dream, and a small repo of prototype kernel demo code, 25 years ago.
FYI, there have been 6 CVEs last year for code execution in V8 only in the last year. Fortunately, Chrome has great sandboxing and mitigation mechanisms to limit the impact of these, the mechanisms the parent explicitly recommends doing away with.
The point being, when everything is ring 0, you bet on both the hardware and software being perfect. And if there's anything that years of vulnerabilities in cryptographic software has taught me, it's that perfection is REALLY DAMN HARD.
You already aren't running 'em! Also, that's the block device layer, not the VFS layer. The VFS is the bit between the syscall interface and the concrete FS (which then in turn works with the block layer, if you are using a block device).
Don't go signing the death warrant for something you clearly don't understand.
We then layer within it all that other stuff that's chopped away to bring us back to a system that has a bunch of stuff we want, but most importantly that supports a wide range of applications we want to deploy.
The big problem with stripping back to the absolute bare essentials is that you optimize towards a local maxim, and severely limit your flexibility. This is certainly the way you want to go if you have deep pockets coupled with a need for bare-bones speed that can't be sharded in an effective manner. But that's not the majority of workloads.
If you want to squeeze more performance per watt than you can get from a modern server, the only way forward is to code your application in Verilog and to run it on an FPGA or ASIC.
This is just another step in that direction where is the OS the one running that code.
The good news is that WebAssembly is created with sandboxing in mind. I hope they have learned some lessons (https://en.wikipedia.org/wiki/Java_security).
I'm staring at this sentence hoping the author is being supremely sarcastic...
WebAssembly isn’t assembly. It can only refer to memory offsets within its allocated block (so it always does *(baseAddr + offset)), so to generate assembly for it, you already need to add checks. One way to prevent some spectre-like attacks is to mask the offset after those checks. Another way is to use virtual memory to keep program spaces very far apart and only use int32 displacements in memory loads (since they’re always relative to the base memory address).
This is the mitigation we implemented in V8, but it doesn't remove all possible side channels.
This isn’t an unproven space. Singularity proved you can use a single global address space (given 64-bit) and software to isolate processes - something MS Research called Software Isolated Processes.
This requires a verifiable bytecode/VM system so the kernel can verify the instruction stream at load time. In a way, WebAssembly is even easier to verify than C#.
It’s obviously a research toy but that isn’t a bad thing.
I don't think that's the case at all. To my understanding, software has always been capable of doing this just fine, but at a performance cost. So the mechanism was ported to hardware. Optimized implementations have proven to contain security issues. It's not that the security problems are unsolvable, it's that you can't just patch a broken CPU in the wild.
So we fall back to software implementations again, resulting in an overall perform decrease, but with the more correct security.
So much so that Chrome is giving up on software-isolation entirely and is rolling out even more process-based sandboxing https://www.chromium.org/Home/chromium-security/site-isolati...
On Arm the conditional speculation barrier couldn’t be more lightweight (CSDB).
Threads and plugins have proven not to be the best ideas regarding implementing safe systems.
They actually started first with full process isolation, then realized that it had an outsize impact on performance, and now they're going back because the consequences for security are now known. There have been spectre-related patches going into chromium and v8 for the last few months, and they just keep coming, it's hard to overstate how much work must be done to have a decent probability of not being exposed to the known variants of these bugs in a software isolated system.
The hardware based protections are the only boundary where you can "fix" speculative execution without slowing down your system by 100x.
The hardware boundaries effectively act as hints that say to the hardware "slowing down things by 100x here won't kill performance too badly -- we should turn off speculation for safety". Without those hints, everything needs to get that much slower.
The problem is that with pure software isolation, as being discussed here, every access is observable within the "same program", so every speedup would need to be rolled back.
With hardware boundaries, you know which speculations to roll back (or just avoid doing), so you can put a fix in silicon.
yes you could, but the noise floor won’t be great
Like Joe Duffy complained about Midori, the problem is getting management to be willing to push it no matter what.
Eventually Microsoft took some of those ideas into Windows 8.x and 10, but they still are implemented in the context of a COM based world, thus only partially implemented.
This seems a like a pretty strong claim. I hope that it's true, but I'm not going to be running WASM modules in ring 0 any time soon.
What probably has the most bearing on the securability of a platform is its simplicity, which is on WebAssembly's side.
It will have its fair share of "typeof null" style bugs to come.
Who knows what was in their heads, but stack level attacks are as easy as to exploit unsafe type casting in anything that amount to a stack pointer.
My guess why they choose to do it that way is simply because there are more literature available for mid-tier coders in style of "VMs for dummies" and they wanted to always have an option to not to do extensive research on every small mater, and just copy JVMs behaviour.
The security problems of java are not related to it being a stack-based VM at all. The problems are that the api lets applets do things they shouldn’t be able to and arbitrary code execution during serialisation.
You're looking at at least a couple of orders of magnitude more work to get to the same level of correctness for a WebAssembly runtime.
I'm not sure guaranteeing that a program halts really matters, either; really what you want is the ability to limit the amount of time a filter can run, which is simple to do directly. (In fact, it's simpler to add a timeout than to perform control flow analysis.)
I mean, if you're doing something so crazy as pushing user controlled code into interrupt context, you care about performance. And the BPF scheme is within spitting distance of natively compiled code.
> I'm not sure guaranteeing that a program halts really matters, either; really what you want is the ability to limit the amount of time a filter can run, which is simple to do directly. (In fact, it's simpler to add a timeout than to perform control flow analysis.)
Right now, I don't think that there's a way for a BPF filter to 'fail' once it's been verified. It's sort of like a graphics shader in that regard.
And timeouts can't be implemented with a timer since the the filters run at interrupt context already, and manual bookkeeping comes with a perf cost (at least a lost register, and some basic block epilogue code). And that's in addition to the "well the filter failed, now how do we handle that" question that's hinted at above.
That's the kicker. BPF is simpler, _and_ in spitting distance of native code perf. And if you're doing something crazy like injecting user code into interrupts, you care about perf.
That being said, you're totally right that it's possible to get parity, just orders of magnitudes more work.
That's it. I hope he gets the support he needs.
That's what you say when someone checks into rehab, not releases a software project :)
This throws away the very important security property of defense in depth. A system design should include interlocking levels of security, so even if there is a vulnerability in one place, extra work may be required to exploit it.
I suspect you are being downvoted for being overly emphatic. I can certainly think of scenarios where having this extra security is more costly than helpful.
An interesting point of note is that the mill architecture has been designed to have much cheaper hardware protection than other architectures. 
This started back in 2011 with https://lwn.net/Articles/437981/
It's since been extended quite a bit from even that, and can be used for a variety of things, including dynamic tracing (BCC is an excellent frontend here), processing beyond just filtering (XDP), and more.
It might be less complex than the WASM VM, but it's quite a bit beyond just a packet filter these days.
You are actually making my point.
Re: your followup about defensive in depth, this is a common and frankly boring fallback argument. At some point computers had much less reliable internals, and for example even the result of strlen () could vary across runs. Should we also perpetually account for the presence of unreliable registers or memory too?
Which we still don't. Rowhammer, spectre/meltdown, etc... proved that even if the code doesn't violate any sandbox constraints that doesn't mean it didn't violate everything the sandbox was attempting to protect.
Hardware isolation is still very important and very necessary, now more than ever.
And a few of those so far have no known software protection domain fix, relying instead of hardware domains (eg spectre, which is why Chrome pushing site isolation hard - because they can't fix the software protection and are relying on hardware ones instead)
My position about defense in depth is not a fallback argument. It was my original point.
Is your point that because eBPF exists in a different OS that there is no security impact to using Nebulet?
Access to syscalls would not be unusual in any case. This is the normal attack surface for an OS. The new attack surface is the WebAssembly compiler and checker.
To be honest though, given the processor vulnerabilities that have come about, I don't know if I really feel so bad about software protections like this anymore. Nothing is a panacea, even magical processor protection rings.
It may be memory safe, but honestly, memory-safety is no great trick anymore... pretty much everything except the languages currently used to implement kernels is memory safe. Memory safe is still not "generally safe".
Personally, I say more power to this high schooler. It's just a research project, and it'll be interesting to see where it goes. Nobody is suggesting you replace Linux/OSX/Windows with this thing anytime soon.
You still will have native applications for a lot of different situations where performance is more critical and paying the higher cost of creating, testing and distributing for different OSs. That is a skill that should never die, like creating the hardware itself.
HTML5 is a good attempt to this, but it has problems with not-well-defined behaviours and the fact that not all applications, e.g. games, fit the hypertext approach.
WebAssembly doesn't has this history and it can be used in ways that Java has been discarded for historical reasons.
congrats to that kid, he's got a damn good head on his shoulders.
Getting married and having children will make you reassess your statements. Heck, just getting married will. Of course, I am assuming that you are single.
Ideals are great to strive for BTW, just don't lie to yourself about already having gotten there.
It might be interesting to try and run the webassembly in a VM ring 0 where the ring -1 can do some security routines and checks on it, but that might be beyond the scope and intent of this project.
Face it. Linux had its time, its conception of the world reflect a developers wet dream from the mainframe era, with many users, few resources, and the soon to be extinct dictator, err, universally hated system administrator.
Who needs elaborate permissions when you're the only user on the system? Which user still shares data without sending it, manipulating permissions? Who likes managing installs and incompatible dependencies? Hell, just copy the data, that's what we all do. The file-system dedups it anyways if necessary.
The list of unused and unwanted features goes on, but developers just keep reincarnating this same old fantasy of an anachronistic OS. It seems to me nobody but Linus can make them see again..
Linus, there's so much pain. What's the use case these days? Do you see Linux going over its own horizon, how? I suspect your answer involves a server OS.
PS, this idea of course also reminds of Microsoft's Singularity OS.
You may be right that the traditional permissions model is outdated, but it represents only a very small part of what Linux does, and indeed it seems totally unrelated to this project, so I'm not sure what point you're making.
To use the Android example again, it's proof that you can layer a very different permissions model (single-human-user app sandboxing) on top of the Linux kernel.
So how is it "very different" (in implementation, not surface appearance)?
I'd just like to interject for a moment. What you’re referring to as Linux, is in fact, GNU/Linux, or as I’ve recently taken to calling it, GNU plus Linux. Linux is not an operating system unto itself, but rather another free component of a fully functioning GNU system made useful by the GNU corelibs, shell utilities and vital system components comprising a full OS as defined by POSIX.
Many computer users run a modified version of the GNU system every day, without realizing it. Through a peculiar turn of events, the version of GNU which is widely used today is often called “Linux”, and many of its users are not aware that it is basically the GNU system, developed by the GNU Project. There really is a Linux, and these people are using it, but it is just a part of the system they use.
Linux is the kernel: the program in the system that allocates the machine’s resources to the other programs that you run. The kernel is an essential part of an operating system, but useless by itself; it can only function in the context of a complete operating system. Linux is normally used in combination with the GNU operating system: the whole system is basically GNU with Linux added, or GNU/Linux. All the so-called “Linux” distributions are really distributions of GNU/Linux.