

How retiring segmentation in AMD64 long mode broke VMware (2006) - userbinator
http://www.pagetable.com/?p=25

======
userbinator
I think this makes for an interesting cautionary tale: when removing features,
no matter how little-used they may appear to be, there's always the
possibility that it will break something very important and useful that
depends on them. The path to the solution also shows how this attempt to
remove complexity actually resulted in _even more_ complexity later on, in the
form of diverging virtualisation extensions and non-uniform support for
segment limits.

Segmentation isn't the only one, however; many of the existing seldom-used
one-byte instructions also inexplicably went missing (became completely
invalid - they didn't reuse them for prefixes or anything like that), and
among them were SAHF/LAHF which also turned out to be important for
virtualisation. In this case the solution was that both AMD and Intel put them
back sometime later, and had to add an extra "feature bit" to indicate this.
It's quite absurd considering that these instructions were present in 32-bit
mode, and leaving them unchanged in 64-bit mode from the beginning would've
avoided this issue completely.

As successful as x86-64 is, I definitely think the 64-bit transition could've
been much better, similar to how the 32-bit extensions that came with the 386
fitted nicely into the existing instruction set and could even be used from
16-bit mode. In contrast, there is no way to use 64-bit registers in 32-bit
code despite there being reservations in the existing instruction set that
would've made it possible. V86-mode, which has been present since the 386,
could've been extended in a relatively straightforward manner to make
virtualisation easier.

~~~
nnx
> In contrast, there is no way to use 64-bit registers in 32-bit code despite
> there being reservations in the existing instruction set that would've made
> it possible.

Isn't it what the X32 ABI do?

[https://en.wikipedia.org/wiki/X32_ABI](https://en.wikipedia.org/wiki/X32_ABI)

~~~
ANTSANTS
No, x32 is still long mode (64-bit), it's just an ABI that restricts all
memory allocations to 4 gigabytes of address space so that 4-byte pointers can
safely be used instead of 8-byte ones. You can get a similar effect by
manually mmaping memory into the lower parts of the address space even in a
regular 64-bit program (luajit relies on this), but you can't stop malloc, the
kernel, other libraries, etc. from using the rest of the address space.

On x86 CPUs since the the 386, you don't need a hack to make 32-bit mode look
like a 16-bit mode, you literally can access the full 32-bit registers in
16-bit modes with an operand size prefix.

------
metacorrector
yeah but... the real problem lies elsewhere...

VMware (and that style of virtualization) is so popular because of the utter
failure of OSes to do what they are supposed to do.

Protected multiuser multitasking operating systems are supposed to be able to
protect processes from impinging on one another, and they can't do it. The
result is that on a very large scale we have datacenters that spawn VMware
virtual machine after VMware virtual machine, a "make everything look like a
nail so we can use this hammer" approach that is wasting huge amounts of
resources.

There would still be a use for VMware if OSes worked, but nothing like we see
today. People should get back to work on fixing OSes so a cloud host could
actually run processes from different customers in a lightweight way and have
them not screw each other up.

~~~
Karellen
"Protected multiuser multitasking operating systems are supposed to be able to
protect processes from impinging on one another, and they can't do it."

Can you elaborate? In what way are they failing here?

~~~
lmm
Say you have three unreliable programs that occasionally leak memory, spin CPU
etc. (yes, it would be nice if all our programs were perfect, but they're
not). It should be possible to run these three programs on the same server in
such a way that the failure of one won't affect the others (so e.g. you have 3
servers, each runs one of each program, and you load balance between them, and
you have some system that eventually detects when one of the instances fails.
So as far as the outside world is concerned, all your programs are running
reliably). At the moment, the most practical way to do this is to run 3
different VMs on each server, one for each program. Which is insane. There are
some encouraging recent developments (e.g. docker/lxc), but it should be easy
to do that kind of isolation purely at the OS level.

~~~
Karellen
In what way does the shell builtin "ulimit" and a "while (true); do command;
done" loop not suffice for this case?

------
godzillabrennus
Great share!

