
Single address spaces: design flaw or feature? - ingve
https://matildah.github.io/posts/2016-01-30-unikernel-security.html
======
willvarfar
(Mill team)

The Mill CPU is Single Address Space (SAS). It has separate Protection (PLB)
and Translation (TLB), with the PLB being in parallel with the L1 and the TLB
being between cache and main memory.

Unlike previous SAS machines, the Mill supports fork()
[https://millcomputing.com/topic/fork-2/](https://millcomputing.com/topic/fork-2/)

PS sorry to everyone suffering Mill fatigue :(; we love bragging about the
baby ;)

~~~
reitzensteinm
The opposite. I haven't heard anything in a while. Please do mention it when
it's relevant.

------
nickpsecurity
This writeup talks like MMU's always kill performance. It ignores over a
decade of work by microkernels to reduce that problem to single-digit
percentages. L4 even fits in a L1 cache with plenty room to spare. QNX also
has clever low-oveehead scheduler & message passing. LynxSecure leaves CPU 97%
or so idle at 100,000 context switches a second.

And all this even assumes you have to do the switch. That's not necessary if
you split your system between kernel and user threads that run side by side on
different cores with message passing, memory, or CPU interrupts to do
notification.

So, not only are VM's and Unikernels old, their advocates are ignoring
improvements to the other side of CompSci for MMU systems. Interestingly, that
was the side making self-healing, live-updating, and NSA-resisting systems all
these years on COTS hardware. A little weird that such architectures get the
least attention. ;)

~~~
JoachimSchipper
Good design does help a lot, but communication still isn't free. Commercial
microkernels are usually sold on low, predictable latency ("real time"), not
on raw throughput.

Do you happen to have any relevant references on modern anti-exploit
technology like ASLR (which creates tons of MMU entries to describe the
fragmented address space) and microkernels (which tend to rely on fast context
switching)? I can imagine a few partial solutions, but... And no, "switch to
OCaml" doesn't solve the problem. ;-)

~~~
nickpsecurity
ASLR is a tactic that was likely to be bypassed. High security doesnt rely on
that. Microkernels are just a foundation to build on: components and
interactions must be done right.

Far as modern tech, Ill try to get you a few when Im back on my PC at home. I
have _tons_ of them actually so I need to look again to apply the mental
filter. Two interesting ones for you to Google for now are SVA-OS by Criswell
and Code Pointer Integrity. Certain aspects of those have minimal overhead
with strong prevention.

Best solutions are CPU mods that give better protections with lower costs.
Especially memory safety. Architecture is main problem. All this other stuff
is how we BAND-AID it. ;)

~~~
JoachimSchipper
You're not wrong about band-aids, but some of us have to ship. ;-)

~~~
nickpsecurity
re shipping

Oh, no you didn't! You saying Green Hills ain't been shipping? Secure64?
Infineon? Better security != not shipping. Although, if I read that as a
confession, then it might make a bit more sense. :P

re security mitigations

Ok. Thirty plus minutes into collection shows, aside from no organization,
that I need to narrow this down. Most of great stuff is CPU modifications or
compiler transformations that make safety/security easy. The HW has relatively
low overhead in varying degrees of features supported while the SW approaches
have significant overhead but support monoliths like Unikernels (or say Dom0).
There are VM-style papers in my collection with clever, low-overhead stuff but
bound to be breached like other clever stuff was. Nizza Security Architecture
& MILS kernels are still best of that breed.

So, need to know if you're interested in the HW mods and/or stronger SW safety
tricks. Honestly, they're most likely to pay off. Worst case: throw extra HW
at either to cover performance hit. Will get cheaper in volume. Plus, a few
are simple enough to apply to your domain if they do custom CPU's for RedFox,
etc.

Want me to send a few?

~~~
JoachimSchipper
Thanks for the offer. I need a little time; expect an email within 18 hours.

------
lispm
The MIT Lisp Machine and its commercial versions from Symbolics, LMI and TI
did this.

The memory was not bits and bytes, though. For example a Symbolics 3600 used
36bit words with all data being tagged. various number types, characters,
strings, vectors, bitmaps, arrays, hash tables, lists made of cons cells, OOP
objects, ... That was used and checked on the processor level. That way
software would not manipulate raw memory in some region, but actual data
objects, knowing which space they use and what structure they have.

TI used 32bit Lisp processors in their later machines, while Symbolics went to
40bits to support larger address spaces.

~~~
justincormack
Risc-V is going to support tagged memory.

~~~
nickpsecurity
That's actually meaningless without context. Reason is tags are used to fo all
kinds of things in tagged CPU's. Some have almost no safety while some support
arbitrary security policies. So, we have to wait and see for each individual
implementation.

Far as safe/secure, look at crash-safe.org for best in class of those. SAFE
architecture isn't just tagged: it's holistic in addressing security & SW dev
issues at each layer. Actually, they might be trying to do too much haha. Told
people they should've just ported Oberon or Java to the CPU to give us interim
solution.

------
amluto
> While there are hardware mechanisms 2 to optimize context switches, they do
> not eliminate all the performance degradation. Context switches have
> indirect costs outside of the time used for the switch itself – all the
> previously-mentioned caches get utterly trashed.

This is an odd statement. That footnote (2) directly refutes part of it.

Concretely, lots of CPUs have "address-space identifiers" that enable MMU
contest switches without TLB flushes. Intel Sandy Bridge and up has a limited
form of this capability (Intel calls it PCID for incomprehensible reasons),
and I'm working to enable it on Linux 4.6 or 4.7.

With ASID available, MMU context switches are a single inatruction and have
negligible cache footprint. The extra bookkeeping needed will be one or two
cachelines.

~~~
pslam
On the non-x86 side, ARM has had this for a VERY long time. Certainly at last
since ARMv7-A onward, and I think even before that.

Pretty much any core in "Cortex-A" series (e.g popular Cortex-A8, A15, A57)
has support for fast, flush-less address space switching using ASIDs. Address
space / context switch overhead has NOT been a good reason to avoid this kind
of compartmentalization, for at least the last decade or so.

------
pjc50
Also prior art: the Nemesis operating system.
[http://www.cl.cam.ac.uk/research/srg/netos/projects/archive/...](http://www.cl.cam.ac.uk/research/srg/netos/projects/archive/nemesis/)

Nemesis has a single address space _and_ memory protection: you can twiddle
the permission bits in MMUs without necessarily incurring the cost of a TLB
flush.

A unikernel system is usually a single-application system with no local
multiuser capability; security is applied at the virtualisation layer and
within the application. This is quite different from the traditional
timesharing model.

~~~
cpeterso
Opal was an operating system research project at the University of Washington
in the mid-1990s that had a single, 64-bit address space but also had memory
protection between processes. The single address space avoided TLB flushes
during context switches, as you point out, and made shared memory RPC much
easier (because pointers could be passed between processes because virtual
addresses were the safe everywhere).

I recall some discussion about leaking virtual addresses as an optimization to
avoid needing to manage or GC large page tables. This was the mid-1990s so a
64-bit address space was more than anyone could possibly need. ;)

[http://homes.cs.washington.edu/~levy/opal/opal.html](http://homes.cs.washington.edu/~levy/opal/opal.html)

------
fizixer
My motto in this regard is 'simplify the hardware as much as possible and do
your partitioning and structures and abstractions in the software'.

So single address space or no? I ask 'which way gives the smaller chip
footprint'?

I think we spend (waste?) too much time trying to do things in hardware when
we could offer an extremely simple core to the s/w layer. That makes h/w
designer's job easy and leaves a ton of footprint where you can put more
cores.

Make the core RRISC (reduced-reduced-instruction-set-computer) and offer it in
64-core or 256-core versions and let the programmers have fun with it.

Looking at the brainiac-vs-speed-demon debate (from this [1]), I'm squarely in
the speed-demon camp.

[1]
[http://www.lighterra.com/papers/modernmicroprocessors/](http://www.lighterra.com/papers/modernmicroprocessors/)

~~~
pjc50
256-core units are already available:
[http://www.mellanox.com/page/products_dyn?product_family=241](http://www.mellanox.com/page/products_dyn?product_family=241)
. They're kinda hard to use, and most programmers would much rather have a
single fast processor than a zillon cores.

~~~
fizixer
It depends what you mean by hard to use. If it's an obsure architecture for
which assembler, compiler, linker, the whole toolchain is not easily
available, then the difficulty has nothing to do with the high core count.

Also I'm not sure why a simpler core (note: simple architecture not slower
core) would be so slow that it would be unusable. I mean such cores are called
"speed demon" for a reason I assume.

I personally would love to see 256-core 64-bit ARM-compatible becoming
mainstream (by compatible I mean take the ARM instruction set and reduce it in
half or a quarter).

~~~
pjc50
The Tilera architecture required assembler for effective usage due to its
idiosyncracies.

The main problem is access to DRAM is a terrible bottleneck, and adding cores
makes this _worse_ if they all need to access it. You can fill a die with ALUs
but unless you can keep them fed this doesn't help you at all. That's why
Tilera focused on the network stream use case; there's enough storage on-die
for a few frames per core, and the data to operate on can "flow" through the
system from one 10G Ethernet MAC to another.

------
kabdib
The Apple Newton PDA was single address space. The ARM 610 MMU was designed
specifically for the Newton (the Domain protection system and sub-page
protections, in particular).

This was circa 1992, if it matters.

~~~
tyfon
I though all the mac os version up until they started using *bsd was single
address space.

They might have changed it in the very latest version, but from what I
remember applications used to crash the os all the time.

~~~
compiler-guy
The apple Newton didn't run the Mac OS, but rather the Newton OS, which was
almost completely unrelated. It was comparatively modern for its time,
although quite quirky.

------
toolslive
Microsoft's singularity also ran everything in the same address space (iirc)
[https://en.wikipedia.org/wiki/Singularity_%28operating_syste...](https://en.wikipedia.org/wiki/Singularity_%28operating_system%29)

~~~
JoachimSchipper
If you're interested in Singularity, do read Joe Duffy's blog on Microsoft
Research's later Midori project. Basically a C#-ish OS based on OO principles,
with interesting ideas about memory safety, concurrency and errors in a system
that compiles down to native code.

~~~
tbrownaw
_do read Joe Duffy 's blog on Microsoft Research's later Midori project._

Link: [http://joeduffyblog.com/2015/11/03/blogging-about-
midori/](http://joeduffyblog.com/2015/11/03/blogging-about-midori/)

------
the_mitsuhiko
If unikernels and single address space is what's going to kill fork() then I'm
all for it.

~~~
mwcampbell
Just curious, why do you think killing fork() is a good thing?

------
dap
We (the computing industry) did the single-address thing for many years. It
was terrible. Virtual memory was a major development in the robustness of
computer systems because it allowed unrelated components to be isolated into
separate fault domains. We (as an industry) adopted it universally, despite
performance costs that were much greater than they are today.

You could argue that unikernels are by definition one component, so it's fine
to share one fault domain. That's easy when they're missing all the facilities
that even the OP admits still need to be built. If you're going to have
something like a network, a filesystem, and so on, you need tools for
understanding them. It seems like you'll want an interactive environment for
using those tools, along with common facilities for filtering and otherwise
processing that output. And we're back where we started -- with several
discrete components that are better off in separate fault domains.

You can argue that the isolation is better provided by the language, and the
article claims that "it’s easier to create quality tooling for something
written in a single language with a decent type system that lives in a single
address space". That's only true if you allow these components to be tightly
coupled. But neither removing direct access to memory nor providing a rich
type system magically eliminates the possibility for a bug in one part of a
program to affect a different part of the program. And why should all these
components be tightly coupled anyway? And besides all that, this is an
argument for monoculture -- everything must use the same language and runtime
environment. But different languages are better suited to different tasks.

The author also claims a false dichotomy between code reuse and multiple
address spaces. But it's completely possible to build common facilities for
instrumentation and reporting and still have them be loosely coupled.

All of this typifies a lot of my issues with unikernels: they represent a
complete rejection of major advances in modern systems and software design
without addressing the underlying problems that those advances were built to
solve. There's some baggage in modern operating systems, but many (most?) of
the major architectural decisions are the results of thoughtful incremental
improvement by engineers looking at concrete problems. Let's not throw all of
that away.

~~~
Ericson2314
> We (the computing industry) did the single-address thing for many years.

While using shitty languages, so not that relevant IMO.

> But different languages are better suited to different tasks.

I think the industry can support multiple competing unikernels in different
languages.

> ...tightly coupled...

Rigorously define ones interfaces while hiding implementation details :). Yeah
it takes discipline, and OCaml/Rust/Haskell/etc are not able to codify all the
invariants one might want to enforce. But as more powerful languages are
polished, I believe that the situation will improve. I dream of a computing
service where one submits software / with proof that it will "play nice" with
other tenants, no hardware sand-boxing needed.

~~~
dap
> I think the industry can support multiple competing unikernels in different
> languages.

That way we can reimplement existing, common facilities not once, but N times
-- and still not support what I was alluding to (namely, allowing specific
subcomponents written in a language appropriate for that component).

> Rigorously define ones interfaces while hiding implementation details

Fine, but then there's little advantage to mandating a single address space
and language.

~~~
Ericson2314
> That way we can reimplement existing, common facilities not once, but N
> times -- and still not support what I was alluding to (namely, allowing
> specific subcomponents written in a language appropriate for that
> component).

Ah sorry did not realize you meant that. Well the other answer is that more
powerful languages can support more expressive and diverse embedded languages.

> Fine, but then there's little advantage to mandating a single address space
> and language.

Rigorous interfaces != primitive interfaces, but Unix forces both those on us.
For example, how feasible is it to share a tree between to processes? Powerful
languages allow us to specify the end-goals of per-process address spaces etc,
while leaving the means much more open ended.

------
fulafel
This confuses system calls and context switches. A syscall does not need to
switch address spaces and does not have the associated costs.

~~~
axelfontaine
Here is a more detailed explanation of how and when they are related and when
not: [http://stackoverflow.com/questions/9238326/system-call-
and-c...](http://stackoverflow.com/questions/9238326/system-call-and-context-
switch)

------
popee
> To reduce the performance impact of syscalls without modifying application
> software, exceptionless/asynchronous syscalls have been demonstrated. With
> the regular syscall interface, the userspace process requests a syscall by
> executing a special software interrupt instruction to cause a context switch
> to the kernel. The arguments for the syscall are put in the general-purpose
> registers. The exceptionless syscall model requires small modifications to
> the libc and the kernel: when a syscall is requested from the application
> program, the libc places the syscall’s arguments in a special page of memory
> (“syscall page”) and switches to another user-level thread.

It would be great to have standard API for this, let's say fake mux syscall
that would queue other syscalls and trigger them all at once after some period
of time.

~~~
vmorgulis
They try to do that in IncludeOS.

------
cbd1984
The best argument is this: We've done it. We did it on hardware which was
cutting-edge in the 1960s. It worked acceptably then.

This stuff is literally a half-century old now. IBM did it with CP/CMS back in
the mid-1960s on the original System/360 hardware. Thumbnail sketch: CP
(Control Program) is now called VM (Virtual Machine). It is a hypervisor. CMS
is the Conversational Monitor System, once the Cambridge Monitor System. It's
about as complex as slime mold and/or MS-DOS: Single address space, no
hardware protection. CP allowed people to run multiple instances of CP and CMS
as guests. CMS provided a command line and an API, CP provided separate
address spaces and multiplexed the hardware.

As for "write everything in OCaml", we did that with Lisp and Smalltalk back
in the 1970s and 1980s. Hell, we're writing IBM PC emulators in Javascript
that run acceptably fast most of the time, and that's heavyweight full
hardware emulation, which is a lot more complex than a unikernel would (or
should) be.

~~~
lolo_
How secure were these devices? How much work was put onto application
developers to maintain isolation? I mean it wasn't feasible back then since we
didn't have MMUs (ifaik)

This argument is pretty poor, since we've done a lot of things - cooperative
multitasking, windows versions which had little to no isolation between the
root user and other users, etc. etc. it doesn't mean they are the best
approach now, or were even a good idea at the time. Yes we got them to work.
Microsoft Bob and Windows ME worked too.

I don't think the javascript PC emulator runs that acceptably fast either.
That was more of a toy example, I don't think you'll find many people using
that for anything real.

At the kernel level performance/resource usage really does matter. I don't
know enough about OCaml to comment on how it performs but especially when you
make your argument about how (allegedly) terribly linux performs in certain
circumstances, and that's a lot of what the selling point of unikernels are,
it doesn't really follow to talk about how 'acceptable' performance is
possible from higher level languages.

~~~
cbd1984
> How secure were these devices? How much work was put onto application
> developers to maintain isolation? I mean it wasn't feasible back then since
> we didn't have MMUs (ifaik)

I'm almost willing to bet money VM has been used in production for longer than
you've been alive. And of course machines had MMUs back then: They built a
special one for the IBM System/360 model 40, to support the CP-40 research
project, and the IBM System/360 model 67, which supported CP-67, came with one
standard. IBM, being IBM, called them DAT boxes, because heaven and all the
freaking angels forfend that IBM should ever use the same terminology as
anyone else...

> This argument is pretty poor, since we've done a lot of things - cooperative
> multitasking, windows versions which had little to no isolation between the
> root user and other users, etc. etc. it doesn't mean they are the best
> approach now, or were even a good idea at the time. Yes we got them to work.
> Microsoft Bob and Windows ME worked too.

The difference between this and Windows Me is that we knew Windows Me was a
bodge from day one. Windows 2000 was supposed to kill the Windows 95 lineage.
(What's a "Windows 2000"? Exactly.)

Anyway, the hypervisor design concept came from people who'd seen what we'd
now call a modern OS; in this case, CTSS, the Compatible Time-Sharing System
(Compatible with a FORTRAN batch system which ran in the background...). They
weren't coming from ignorance, but from the idea that CTSS didn't go far
enough: CTSS was a pun, in that it mixed the ideas of providing abstractions
and the idea of providing isolation and security into the same binary. The
hypervisor concept is conceptually cleaner, and the article gives evidence
it's more efficient as well.

> I don't think the javascript PC emulator runs that acceptably fast either.
> That was more of a toy example, I don't think you'll find many people using
> that for anything real.

You missed my point. My point was that it works acceptably fast (yes, it does,
I've used it, and you won't convince me my perceptions are wrong) _even
though_ it's operating in the _worst possible_ context: In a userspace process
on an OS kernel, where everything it does involves multiple layers of function
call indirection and probably a few context switches. Compared to that,
getting a stripped-down unikernel written in OCaml to be performant has got to
be relatively easy.

> At the kernel level performance/resource usage really does matter. I don't
> know enough about OCaml to comment on how it performs but especially when
> you make your argument about how (allegedly) terribly linux performs in
> certain circumstances, and that's a lot of what the selling point of
> unikernels are, it doesn't really follow to talk about how 'acceptable'
> performance is possible from higher level languages.

First: Only the unikernel would be written in OCaml. The hypervisor would have
to be written in C and assembly.

Second: I never said Linux performs terribly. Linux performs quite well for
what it is. It's just that what it is imposes inherent performance penalties.

Third: Although the article focused on performance, the main reason I support
hypervisors is security. Security means simplicity. Security means
invisibility. Security means comprehensibility, which means separation of
concerns. Hypervisors provide all of those to a greater extent than modern
OSes do.

~~~
lolo_
>I'm almost willing to bet money VM has been used in production for longer
than you've been alive. And of course machines had MMUs back then: They built
a special one for the IBM System/360 model 40, to support the CP-40 research
project, and the IBM System/360 model 67, which supported CP-67, came with one
standard. IBM, being IBM, called them DAT boxes, because heaven and all the
freaking angels forfend that IBM should ever use the same terminology as
anyone else...

You clearly know more about the details :) however my point is that modern
requirements aren't the same as those of the past, particularly with regard to
security but also workloads, use cases, etc. are generally very different.

>The difference between this and Windows Me is that we knew Windows Me was a
bodge from day one. Windows 2000 was supposed to kill the Windows 95 lineage.
(What's a "Windows 2000"? Exactly.)

You said "The best argument is this: We've done it" \- the point of these
examples is that, yes we've done many things, so it's not a very good
argument. If you want to skip over ME, then 3.1 - it used cooperative
multitasking. Arguably this might be more efficient than pre-emptive
multitasking (I'm not saying it is, rather saying maybe somebody _could_ argue
this), and it was good enough for the time, but the fact we've done it doesn't
mean we should do it.

This applies even more to security - for many years there were little to no
efforts made towards hardening software. We live in a world where this just
cannot happen any longer.

I'm not saying by the way that the past use doesn't have value or demonstrate
the usefulness of the approach, it might do, just that the fact it was done
before doesn't _necessarily_ mean it's a good idea now.

>Anyway, the hypervisor design concept came from people who'd seen what we'd
now call a modern OS; in this case, CTSS, the Compatible Time-Sharing System
(Compatible with a FORTRAN batch system which ran in the background...). They
weren't coming from ignorance, but from the idea that CTSS didn't go far
enough: CTSS was a pun, in that it mixed the ideas of providing abstractions
and the idea of providing isolation and security into the same binary. The
hypervisor concept is conceptually cleaner, and the article gives evidence
it's more efficient as well.

Interesting. Not sure the article does demonstrate that though, it does
suggest performance penalties, some serious, due to the abstractions of a
modern OS. I'd want to look more closely at these before I believe the
penalties are THAT severe, other than in the case of networking where it seems
more obvious the problem would arise.

>You missed my point. My point was that it works acceptably fast (yes, it
does, I've used it, and you won't convince me my perceptions are wrong) even
though it's operating in the worst possible context: In a userspace process on
an OS kernel, where everything it does involves multiple layers of function
call indirection and probably a few context switches. Compared to that,
getting a stripped-down unikernel written in OCaml to be performant has got to
be relatively easy.

I raised the performance issue because this seems to be the main selling point
of a unikernel, but now we're losing performance because it's acceptable? Ok
fine, but I think a 'normal' kernel in most cases has acceptable performance
penalties. This is something that really requires lots of data, and maybe even
ocaml is nearly as fast anyway (I hear great things about it), but I just
wanted to point out the contradiction.

>First: Only the unikernel would be written in OCaml. The hypervisor would
have to be written in C and assembly. >Second: I never said Linux performs
terribly. Linux performs quite well for what it is. It's just that what it is
imposes inherent performance penalties.

Absolutely, and agreed there are inevitable perf penalties (as the article
describes well.)

>Third: Although the article focused on performance, the main reason I support
hypervisors is security. Security means simplicity. Security means
invisibility. Security means comprehensibility, which means separation of
concerns. Hypervisors provide all of those to a greater extent than modern
OSes do.

I really find it hard to believe that security is really wonderfully provided
for in a unikernel - you have a hypervisor yes, but if you get code execution
in the application running in the unikernel you have access to the whole
virtual system without restriction. I'd bet on CPU-enforced isolation over
software any day of the week, even memory safe languages have bugs, and so do
hypervisors.

I may have made incorrect assumptions here so feel free to correct me. I'm
certainly not hostile to unikernels, either!

~~~
richard_todd
> I'd bet on CPU-enforced isolation over software any day of the week, even
> memory safe languages have bugs, and so do hypervisors.

... and so do CPUs! I do like CPU protections as long as they are dirt-simple,
but it really scares me sometimes how complicated CPUs and chipsets are
getting with their "advanced" security features. When an exploitable flaw is
found, and malware survives OS/firmware reinstalls, it will be a mess.

~~~
lolo_
Yeah this frightens me too :) and things like rowhammer [0] are surprising in
this regard. Nothing can be trusted, but you can have a little more faith in
some things.

[0]:[https://en.wikipedia.org/wiki/Row_hammer](https://en.wikipedia.org/wiki/Row_hammer)

