Whatever the ultimate "isolation unit" ends up being, it needs to be recursive. That means being able to run processes within your process (essentially as libraries), create users within your user account (for true first-class multi-tenancy), or VMs within your VM (without compounding overhead).
It turns out that this author also wrote "Docker: Not Even a Linker" which was also deeply insightful about unconscious/accidental architecture decisions. I'm impressed by his insight and disturbed that most people don't seem to understand it.
>processes, users, containers and virtualization are all essentially the same thing.
...and so are modules/objects/whatever your language of choice calls them. Abstraction boundaries, to be precise. Abstraction, security, and type-safety, are all very closely related.
These language-specific mechanisms for isolation are recursive - trivially so. And language runtimes and compilers make security cheap - so cheap that it's ubiquitous.
Processes, users, containers and virtualization all rely on an operating system for security, which in turn relies on hardware features. Specifically, virtual memory and privileged instructions. And those hardware features are slow, and more importantly: they're not recursive!
But hardware-based isolation does have one key advantage over language-based isolation: It works for arbitrary languages, and indeed, arbitrary code.
I completely agree that recursive isolation is necessary. We need to figure out rich enough hardware primitives and get them implemented; or we need to migrate everything to a single language runtime, like the JVM.
I think moving isolation out of hardware is really important (both to make it recursive and portable). NaCl is an interesting step in that direction. If you could use something like it to protect kernelspace (instead of ring 0), syscalls could be much, much faster.
There's another problem with language-based isolation: it makes your language/compiler/runtime security-critical. Conversely, NaCl has a tiny, formally proven verifier that works regardless of how the code was actually generated, which seems like a much saner approach.
I'll also say that I don't think it's reasonable to expect every object/module/whatever within a complex program to be fully isolated (in mainstream languages at least). There's no need for it, and it will have too much overhead (in a world where objects in many languages already have too much overhead). Better to start relatively coarse-grained (today the state of the art is basically QubesOS), and gradually improve.
It's actually partly inspired by how old security kernels work mixed with SFI. The first, secure kernels used a combination of rings, segments, tiny stuff in kernel space, limited manipulation of pointers, and a ton of verification. Here's original ones:
A Burroughs guy who worked with Schell et al on GEMSOS and other projects was the Intel guy who added the hardware isolation mechanisms. They were originally uninterested in that. Imagine the world if we were stuck on legacy code doing the tricks no isolation allows. Glad it didn't happen. :)
Eventually, that crowd went with separation kernels to run VM's and such that market was demanding. They run security-critical components directly on the tiny kernel.
The SFI people continued doing their thing. The brighter ones realized it wasn't working. They started trying to make compiler or hardware assisted safety checking cost less with clever designs. One, like NaCl and older kernels, used segments to augment SFI. Others started looking at data flow more. So, here's some good work from that crowd:
So, have fun with those. :)
...this is extremely relevant to what you're saying. A talk worth watching.
They mark the turning point I stopped worrying about UNIX deployments.
An application server has all the features I care about from a container, including fine grain control over which apis are accessible to the hosted applications.
It doesn't matter if the application server is running on the OS, an hypervisor, container or even bare metal.
Back in 2011 we were already using AWS Beanstalk for production deployments.
Also OS/400 is like that, user space is bytecode based. For writing kernel space native code, or privileged binaries you need the appropriately called Metal C compiler.
The latter runs FreeBSD on a FPGA. There's others that use crypto to do similar things with sealed RAM. What we need isn't tech to be invented so much as solutions to be funded and/or bought. Neither most suppliers or demand side want to make sacrifices necessary to get the ball rolling. Most prototypes are done at a loss by CompSci people. There's a few niche companies doing it commercially. High-assurance security w/ hardware additions is popular in smartcard market for example. Rockwell-Collins does it in defense with AAMP7G processor but you bet really low volume in orders. Price goes up as a result.
If we had a way to implement a capability-secure runtime on Linux on Intel CPUs, in a way that improved performance rather than making it worse, and was straightforwardly usable with existing programs, then we could make a ton of progress and easily be commercially successful.
But that is technologically difficult. Maybe we can still figure it out, though. Or maybe we just need to figure out the right trade-offs to get something viable out the door and widely used.
This part is not realistic if it's apples to apples. Performance will always drop because the high performance came specifically by doing unsafe things that enable attackers. Adding checks or restrictions slows it down. The question is whether it can be done without slowing things down too much. I'm hopeful about this given many people are comfortably using their PC's and the net on 8-year old PC's w/ Core Duo 2's that still run pretty well. That was 65nm tech. Matching it or at least a smartphone is what I'd aim at for first run.
No, this is not true. Strong static typing and other compile-time information can (at least theoretically) allow improving both security and performance.
For example, look at single address space operating systems. Switching between security domains is cheaper when you don't have to switch address spaces. That's an improvement to security that allows increased efficiency.
Every time I read about it I wonder how things would have turned out, if they managed to do a proper job with the CPU.
I've come to the conclusion that the only realistic approach to security is to begin a slow, methodical replacement of every line of C code in the Linux kernel with Rust. No VC is going to pay for that, so some other funding mechanism will be required.
Hence why you will see my schizophrenic posts regarding Go, although I dislike some of the design decisions, every userspace application written in Go is one application less written in C. And if bare metal approaches like GERT take off even better.
 - https://github.com/ycoroneos/G.E.R.T
In an ethereal sense sense every Turing complete machine is recursive, because by definition you can implement another Turing complete VM on it. CPUs with kernel/user separation took it further by allowing vanilla instructions to run on bare metal within a virtual world defined by the kernel. Modern CPU virtualisation features extend this to the control instructions that an OS would use.
What else can you ask for?
One issue with being "recursive" is efficiency. Ideally, running 500 levels deep would be just as fast as running at the top level. In language-based systems this can be achieved with inlining and optimization, but it's difficult in hardware, where there is much less semantic information available about the running code.
If port 22 is not privileged, then what is to prevent my daemon from listening on that port and collecting all the credentials of other users trying to log into the machine? Nothing. This is why users don't get to bind to privileged ports -- it's why privileged ports exist. The workaround is that every user get their own ssh daemon under their control and for every user to request that their ssh daemon handle their own login by specifying their own virtual network address: firstname.lastname@example.org and email@example.com -- instead of the current solution of using shared hosts with system services: alice@sharedhost and bob@sharedhost
What you cannot do is have system services (a shared host) with user control over daemons that fulfill those services. It has to be system control over shared services and user control over user services.
But every user having their own ssh daemon and their own hostname/IP is certainly looking a lot like the virtualization/containerization solution, no? The opposite of the virtualization option is not "get rid of privileged ports", but "have privileged ports" -- e.g. have resources controlled by the system and not any particular user.
The real complaint here is that using custom port numbers is unwieldy and we need more robust mappings from custom domains to shared domains with custom ports. For example, make it easier for users to set up their own virtual hostnames to map to a shared host with a custom port. Getting rid of privileged ports doesn't solve this problem at all.
For the same reason, users can't bind to port 80, because then my webserver could steal credentials to your site, as both our sites use the same common webserver. So either none of us controls the webserver or we each have our webserver, and with our webserver we'll need our own copies of other system libs, which again puts us back on the containerization path.
Again, the choice is of using system libs versus duplicating and then isolating user libs.
It seems to me that this is the fundamental trade off, and focusing on privileged ports as a problem, when they are one side of a fundamental trade off, is not really insightful at all.
I think the systemd socket activation with declarative configuration files, and the Serverless cloud computing fad, are hints of how it is possible to control exactly what program is running, and even have some custom code, without having to duplicate and maintain all the binaries. Too bad they’re doing it on Linux, so they still have all those accidental limitations.
Perhaps I should phrase it, there is no fundamental reason why IP addresses are associated with hosts rather than users, or even services. There is no fundamental reason why you need to be root to listen to privileged ports, which includes many of the most useful ports.
Once you have a runtime, dynamic linking suffers. Once you have a VM, you lose vast swathes of control over I/O and resources. And once you target a different language you end up with lowest-common-denominator semantics and debugging capabilities.
In some respects, the JVM-style sandboxed language runtime is an "original mistake" because it's an OS-like environment that doesn't have access to OS-like features, leading to a lot of friction at the edges and internal bloating as more and more features are requested. If we had similar access to memory, devices etc. everywhere the friction wouldn't be experienced as such, even if there were protections enforced that hurt performance in certain cases. You'd design to a protocol, and either the device and OS would support it or it wouldn't. That's how the Internet itself managed to scale.
But as it is, the stuff we have to work with in practical systems continues to assert that certain line of coder machismo: unsafe stuff will always be unsafe and You Should Know What You Are Doing, and anyone who wants safety is a Newbie Who Should Trust A Runtime.
"12. Trust the programmer, as a goal, is outdated in respect to the security and safety programming communities. While it should not be totally disregarded as a facet of the spirit of C, the C11 version of the C Standard should take into account that programmers need the ability to check their work."
This is not true. They each are defined by different security boundaries, have vastly differing properties of isolation and communication, contain different data, and are contained by different components.
From a security perspective they are not the same, though from a functional perspective they each solve similar problems.
The parent made a specific statement about the security properties of various isolation mechanisms, equating them all from a security perspective.
Consider that in Linux, processes and threads are implemented via the same abstraction (tasks). This abstraction actually leaks in some unfortunate cases, but it's generally considered "good enough."
In the case you mention, your choice of abstraction may affect your threat model, depending on if there is shared state and what data may require isolation.
I was not talking about ideal security but that certain pre-existing mechanisms do not have equivalent security postures, as the parent had mentioned. The point isn't that with enough work isolation can be achieved but that that work has not in fact been done and the various mechanisms are distinct and their security values should not be conflated.
As an example of this in action, the test runner  creates a new application environment for each test.
Thank you. I've always thought about this and figured I must be crazy since nobody else seems to care about it.