Fascinating article—a really good read. I definitely want to check out the work at rumpkernel.org. I do have one quibble with this line, though:
> Even on the desktop, the square peg is not the correct shape: we know that the system will be used by a single person and that the system does not need to protect the user from non-existent other users.
I don't know if this is really true. Many desktops are shared (e.g. by members of a family). And of course data centre machines are shared by multiple users, although often those users are all using processes operating using the same OS credentials.
Wouldn't it be interesting if processes running on my behalf within Facebook or HN couldn't access other users' private data, rather than relying on the programmers at FB or HN to get it right?
I think desktop-style devices are going more in the direction of being personal. And where they aren't, probably in the majority of cases you don't want the OS to mediate access poorly. I'm not saying that there aren't any counterexamples, though.
What you say about running processes "on behalf of you" is really quite interesting. There is no reason you should trust the application programmer to get it right, yet that's what the OS currently gives -- you can run your db as user "db" and httpd as user "httpd", but it doesn't do much good in terms of the actual user. So, some radical thinking is required. The editor of ;login: actually tried to point me in the direction you mention when we were working on the article, but I couldn't formulate clear enough thoughts on the subject to include in the article. Maybe someone else here has already thought about it and can put it into writing?
> I think desktop-style devices are going more in the direction of being personal.
For many people you're probably right. I do think that there's tremendous value in segmenting out one's various personæ. There's no particularly good reason why I should give a binary game blob access to the same user data that contains my financial data, passwords &c.
A finer-grained system would be nice, no doubt, but OS users are pretty time-tested.
> Maybe someone else here has already thought about it and can put it into writing?
Well, in principle capabilities systems can do a lot of this already. As myself, I can give a capability to a server, and it can use that capability to execute work on my behalf; once I've received the result, I can (presumably) revoke that capability. Capabilities can even be used to implement filesystems: my process might have a filesystem root capability, which permits it to see a single directory, which is itself a list of capabilities to directories and files, &c. Pretty neat stuff.
There's been some interesting work with capabilities done in EROS, its successor Coyotos & Tahoe-LAFS.
> A finer-grained system would be nice, no doubt, but OS users are pretty time-tested.
Mmmhmm. From what I understand, Android runs each Android application on the system as its own user and handles application permissions by making each permission its own group. Linux's user isolation is pretty good.
Other than lack of manpower and lack of interest, there's no reason why a Linux distro couldn't put in the medium-to-large amount of work it would take to make wrappers to do similar things for their most popular Linux applications. :)
Anyone who's ever nervously handed their kid their phone so the kid could play Angry Birds for a bit can tell you why Android's method of handling users isn't perfect.
That's why it's great that Android supports multiple users on a phone. Setting up a guest user is easy, and setting up a dedicated user is only slightly less so.
The folks at genode.org are also relying on capabilities for the system. Maybe they have some ideas on how to relate it to application/service programming, and more importantly, how to get from current reality to "good enough"?
I think the last two paragraphs of "Conclusions" are the most interesting part of the thesis.
I think the confusing thing is that in a typical unix system we have the same separation for users of processes and users of the physical device. In the physical world, maybe the user-abstraction should be something physical? Maybe the OS should not handle physical users, but instead every physical user brings their own storage area protected by symmetric encryption (either in the form of a USB/memory-stick or a separate partition on a hardrive)? That would be a concept that more would resemble how we think of personal belongings in the real world. Just like I can share an electric tooth brush but I bring my own brush head.
>Wouldn't it be interesting if processes running on my behalf within Facebook or HN couldn't access other users' private data
No, because HN comments are public and the only reason to share data with Facebook is to make it available to (some) other people. It might be interesting for Dropbox or Google Drive, but even for Google Drive, the killer app and value-add over Microsoft Office is realtime collaboration with (specific) other users.
The best criticism of the concept of an "Operating System" is implicit in Joe Armstrong's thesis about Erlang:
---------------------------
Such techniques are common in hardware platforms for building fault-tolerant systems but are not commonly used in software solutions. This is mainly because conventional languages do not permit different software modules to co-exist in such a way that there is no interference between modules. The commonly used threads model of programming, where resources are shared, makes it extremely difficult to isolate components from each other — errors in one component can propagate to another component and damage the internal consistency of the system.
...In our system "processes" and "concurrency" are part of the programming language and are not provided by the host operating system. This has a number of advantages over using operating system processes:
Concurrent programs run identically on different OSs—we are not limited by how processes are implemented on any particular operating system. The only observable difference when moving between OS’s, and processors should be due to different CPU speeds and memory sizes etc. All issues of synchronization, and inter-process interaction should be the same irrespective of the properties of the host operating system.
Our language based processes are much lighter-weight than conventional OS processes. Creating a new process in our language is a highly efficient operation, some orders of magnitude faster than process creation in most operating systems, and orders of magnitude faster than thread creation in most programming languages.
Our system has very little need of an operating system. We make use of very few operating system services, thus it is relatively easy to port our system to specialised environments such as embedded systems.
Note: I'm having a lot of fun with Erlang. I'm very, very far from being an Erlang expert, but I'm having a lot of fun working on my little Erlang project.
I don't read Armstrong's commentary as a criticism of the concept of an OS. I read it as the assertion that if you have specialized needs, the general-purpose code in most OS's is not-infrequently a poor fit for your application.
Erlang still heavily relies on the hardware abstraction and device drivers provided by modern OS's, and still makes use of OS services, it just makes use of fewer of those services than some other software projects.
> Would smart but non-open hardware be a disaster?
We can draw some inspiration from the automobile industry.
Over the previous 30 years, we lost the ability to fix our cars and tinker with them. People like to complain about the loss of that ability. Nobody remembers to complain about how much better modern cars perform when they are working as expected.
That's quite funny to read this while the VW 'software cheat' is unfolding..
"It is no longer a catastrophe if an unprivileged process binds to transport layer ports less than 1024. Everyone should consider reading and writing the network medium as unlimited due to hardware no longer costing a million dollars, regardless of what an operating system does."
This isn't why unprivileged users are unable to bind to ports lower than 1024. Software binding to those ports is assumed to be trusted system software. Preventing unprivved users from binding to those ports prevents a malicious unprivved user from finding a way to crash one of these trusted daemons and standing up his own, malicious copy in its place.
You could accomplish the same "don't allow Joe Random UserID to bind to port X" by specifically mapping designated UIDs to ports. This doesn't require that accounts be root, avoids the security issues of running network services as root, and prevents arbitrary processes from binding to ports.
Sure, from a security perspective, there's no requirement that 0 be the only UID that can bind to privileged ports. But by doing what you suggest, you've created a privileged user that can create privileged processes. The author states:
"It is no longer a catastrophe if an unprivileged process binds to transport layer ports less than 1024."
except that -from a security perspective- it kind of is. :)
Edit: To put a finer point on this and make it more explicit: Permitting unprivileged users to bind to "privileged" ports changes a service crash bug from a DoS into a complete service takeover.
If one and only one UID can bind to a given port (much the way other service access is limited by UID or GID), then you're reducing overall attack surface:
1. You're eliminating entire classes of root-level processes.
2. You're restricting all other non-root processes from accessing those port(s).
I mostly agree. I guess you misunderstood what I wrote.
I was double-checking to ensure that you understood that my original comment was addressing only the author's assertion, [0] and that I did not intend to assert that the only way to achieve the same security properties of the current way of doing things was the way things were currently done. :)
> 1. You're eliminating entire classes of root-level processes.
Eh... Given that the way this sort of thing is currently handled is to bind as root, then drop privs and fork, handing the FD to the bound socket to the forked child, nothing competently written ends up running as root that doesn't need to run as root for other reasons. (For example: sshd needs to switch to any user on the system, so it runs as root. Apache only needs to access things that httpd can access, so it runs as httpd.)
Edit:
> 2. You're restricting all other non-root processes from accessing those port(s).
Linux already does this, but in a more general way. If you don't pass SO_REUSEPORT, you can't bind to a port that's already in use. If you do use SO_REUSEPORT, only code with the same EUID as the first code to bind to that port can bind to that port. See [1] for details.
Root-and-drop was a nuance I considered mentioning. There's a lot of incompetent code though. I may have written some of it.
The user-specificity of ports I'm speaking of would require that all access to a port be through a specified UID. Port reuse would only prevent attaching to already active ports. Nothing about keeping an otherwise, say, unused SMTP port 25 from getting snaked, no?
> Root-and-drop was a nuance I considered mentioning. There's a lot of incompetent code though.
And there are many incompetent sysadmins out there. ;) Any system that has admin-configurable rules is bound to be misconfigured by some portion of its userbase.
Not that SELinux's or GRSecurity's MAC systems are easy to configure, but I think that they do a better job of what you're trying to do here than either of us would be likely to do with our first couple of iterations.
> The user-specificity of ports I'm speaking of would require that all access to a port be through a specified UID.
Right. That's obvious. I hope you didn't think that I thought otherwise.
> Nothing about keeping an otherwise, say, unused SMTP port 25 from getting snaked, no?
Nothing except the fact that -on almost every Linux system, and OS X system, and modern Windows system- 25 is in the privileged range, which requires that someone with root privs run the code that binds to the port. [0] :)
Generally, if you have root, you get to do whatever you want. So, if an untrusted user is running code as root, they're likely going to be able to either
* reconfigure whatever system either you or I cook up to prevent them from binding their code to a particular port
or
* run their malicious code with whatever EUID is require to bind to the port that they want
[0] Or for the system admin to have marked the binary with the right cap bits to override the privileged port restriction.
> ...services which aren't running as root cannot be compromised at root level [remotely, barring sploit-an-error-on-the-local-machine-to-gain-root-sploits].
This is so obvious that I seriously don't understand why you're bringing it up.
> SELinux is a marvelous pain in the ass, though, isn't it?
Flexible MAC systems are necessarily complex. The configuration for a complex system is bound to also be complex. This is rather unavoidable.
One thing age and experience have taught me is how often stating the bleeding obvious is in fact necessary. If the point is obvious to you, simply noting it is sufficient.
And, yes, mapping of complex realities onto interfaces for mediating those realities typically results in complex interfaces. Those which aren't sufficiently complex have simply squeezed the actual complexity elsewhere.
"The solution for hardware device drivers is to push the complexity where it belongs in 2015, not where it belonged in 1965. Some say they would not trust hardware vendors to get complex software right, and therefore the complexity should remain in software running on the CPU. As long as systems software authors cannot get software right either, there is no huge difference in correctness."
It's not a matter of whether hardware companies can get their firmware right, it's a matter of
* Who keeps developing the firmware after the company abandons the hardware?
* Who bothers to realize that the code for $HARDWARE_COMPANY's 200+ devices can be refactored down to a single module with a few, very minor tweaks for each device?
History tells us (and the explosion of model-specific drivers on Windows, for which there exist single-driver-with-minor-model-specific-tweaks on Linux, demonstrates) that hardware manufacturers generally have little to no desire to do either thing.
"Even on the desktop, the square peg is not the correct shape: we know that the system will be used by a single person and that the system does not need to protect the user from non-existent other users."
As I've intimated elsewhere, an (ostensibly single-user) modern PC has a bunch of unrelated daemons running to serve the logged-in user. User isolation, along with running each daemon as a different system user is what keeps those daemons from messing with the actual computer user's files and processes. This isolation is a pretty important property. Extending that isolation further; by -perhaps- writing wrappers that run each application as a different system user might be quite desirable. :)
"The current cloud trend is gearing towards unikernels, a term coined and popularized by the MirageOS project..."
Maybe. As Linux's ability to sandbox a given process from the rest of the system gets better and better, I suspect that this sandboxing will be the way that a lot of people choose to secure their systems. The sandstorm.io folks report that they have had great success just by using libseccomp on the software that they deploy.
Obviating the OS was a natural consequence of one of the original goals of OOP as conceived by Kay et. al:
First, recall that the advantage of universal computers is that they can simulate anything, including better computers. Objects themselves were supposed to be fully universal computers that got work done through message passing.
Second, the people at PARC had a "no centers" philosophy. They recognized that the key to building scalable systems is to keep responsibility widely distributed among the components.
Thus it's easy to see why you wouldn't need or want an OS:
* An OS is just a simulation of a nicer computer running atop a not-so-nice computer (i.e. the hardware). But, real objects can give you the same thing, and it just so happens that modern hardware components are real objects (i.e. universal computers that get work done by passing messages).
* Having centers in your system makes it hard to scale. And by "scale", we're not necessarily talking about things like "number of simultaneous users/processes", but rather about things like making sure that the Nth modification to the system is just as painless as the first one. The OS is clearly an unnecessary center because the hardware components are now better at handling the responsibility for their respective functionality, but the OS is also an undesired center because its opaqueness and rigidity makes it hard to modify big systems. Fortunately, since an OS is a simulation of a universal computer that means that it can also simulate other universal computers, so we're able to abstract away the OS by using it to run simulations of better computers with names like "Erlang", "Java", "Python", etc.
Finally, while I enjoyed the article and found it interesting, I do disagree with the author's tacit assumption that getting rid of the OS implies more opaque and locked down systems, or that locking down the system implies better reliability and security. Firstly, message passing is already a secure medium; stupid or malicious parser implementations are the biggest cause of the Internet's insecurity [1]. Secondly, it's an empirical fact that the best way to achieve systemic reliability is with redundancy. This means having components that perform the same function, but that are produced independently by isolated teams that use different technologies and techniques. So reliability means having communication standards of some kind (i.e. protocols). Indeed, as long as companies continue to sell computers (i.e. a thing designed to let me simulate what I think is a better computer), we'll end up with more freedom to tinker. What we really have to worry about is companies selling things that they claim are computers, but that lacks that crucial facility (which is why the trend of calling an iPhone a computer, or OSX's SIP can be a little disquieting).
"The author's" assumption is that getting rid of unnecessary moving parts is an improvement. Having many slightly different copies of the same functionality in the stack is not redundancy, it's just silly -- "all problems with operating systems can be solved by removing layers of indirection"
I'm not too worried about companies selling tools instead of computers, just like I'm not upset upset that I didn't get a metallurgy plant when I bought a hammer. Yes, the philosophy of computation is a different thing (as Dr. Kay often points out), but sometimes you just need a tool instead of poetry. (nb. I'm not making a statement about whether or not everyone should understand computers)
Maybe if you think about a driver as a unit which performs a computation, the article will make more sense. I like to call drivers "protocol translators", but translating a protocol from one representation to another is really just a computation. The idea is to slowly liberate those units of computation from the clutches of the "centers", and then improve them, while still keeping the world functioning.
I truly don't understand how someone could write this and not see the answer. I've long been bothered by this problem, but here the author manages to completely miss it: The very things we're trying to get out of hypervisors today are identical, or nearly so, to the things that we wanted to get out of operating systems. That is, what we call a "hypervisor" today is really just an OS.
The author of this paper had all of the facts necessary to come to this realization:
"The early time-sharing systems isolated users from other users. [. . .] The time-sharing system also isolates the system and hardware components from the unprivileged user."
"The hypervisor provides the necessary isolation and controls guest resource use. Since the hypervisor exposes only a simple hardware-like interface to the guest, it is much easier to reason about what can and should happen than it is to do so with containers."
Those two paragraphs are just the same things written in different ways: isolation of users from each other, hardware abstraction, and resource-sharing[1]. Those are the fundamental tasks of an operating system. If we call the operating system a "hypervisor"[2] and the user an "application", it makes no difference.
Hypervisors have a place in allowing non-communicating users to run (different) existing kernels (and thus operating systems) on the same hardware. Great for VPS's.
However, using them for multiple potentially cooperating (but not necessarily mutually-trusting[4]) applications is missing the point. Operating systems have all of the necessary tools (hardware hooks) to properly isolate different users ("applications"); if they currently don't do a proper job of it then that's just a sign we need redesigned operating systems, not an additional layer which has real hardware costs![3]
The author of this paper laments that "when you virtualize, it is more difficult to optimize resource usage, since applications do not know how to play along in the grand ecosystem", but this is one problem which operating systems have been solving fairly well for decades—or at least a lot of effort has been putting into solving these hard problems, and there's no sense in splitting that effort into hypervisor solutions.
[1] which is really just the intersection of the other two; each provides an abstraction to the user that it is the only consumer of the hardware resources—equivalently, an abstraction to the application that its virtual machine is the only consumer of the hardware resources.
[2] Yes, hypervisors do some different tasks than OS, but I think that's largely for two reasons, which are accidental as opposed to fundamental: 1) In order to make existing operating systems work without (much) modification, the hypervisor has to do hardware abstraction in a less abstract way, so to speak, than the OS can offer to processes. 2) At the CPU level, designers of the virtualization interface had both hindsight and freedom to improve the interfaces for hypervisor–OS interaction as compared to OS–application interaction.
[3] Examples of hardware costs are more memory usage by separate kernels and additional context switch cost between kernels.
With regards to doing a better job of it (and thus removing the current need for hypervisors for this task) I think moving away from POSIX/UNIX could help; the author has many points raised against operating systems that are really only valid criticism of *NIX.
[4] Any claim that hypervisors offer greater isolation between users/applications is either 1) a small result of the improvements in technology discussed in [2], or 2) a misunderstanding of the situation, where less (visible) work has been put into compromising hypervisors instead of OSes, but we have seen hypervisor exploits nonetheless.
> Even on the desktop, the square peg is not the correct shape: we know that the system will be used by a single person and that the system does not need to protect the user from non-existent other users.
I don't know if this is really true. Many desktops are shared (e.g. by members of a family). And of course data centre machines are shared by multiple users, although often those users are all using processes operating using the same OS credentials.
Wouldn't it be interesting if processes running on my behalf within Facebook or HN couldn't access other users' private data, rather than relying on the programmers at FB or HN to get it right?