
XSA-156: x86: CPU lockup during exception delivery - yuhong
http://xenbits.xen.org/xsa/advisory-156.html
======
userbinator
In other words, the triple-fault[1] is broken? This looks like a bad hardware
bug, a _really bad_ one. AFAIK on real hardware it does what it should, i.e.
causes the CPU to reset. The fact that OSs in the past have relied on triple-
faulting to cause a reset[2] makes this all the more unusual. Then again, I
suppose no one has really tried to run MS-DOS and related software in Xen...

 _the vulnerability can be avoided altogether if the guest kernel is
controlled by the host rather than guest administrator_

That sort of defeats the point of using a VM, doesn't it?

[1]
[https://en.wikipedia.org/wiki/Triple_fault](https://en.wikipedia.org/wiki/Triple_fault)

[2]
[http://www.rcollins.org/Productivity/TripleFault.html](http://www.rcollins.org/Productivity/TripleFault.html)

~~~
zvrba
Triple fault? It seems that double fault exception (abort) should be triggered
first.

~~~
userbinator
I believe the phrase "it is architecturally specified that these would be
delivered sequentially" means that #DF doesn't always occur, depending on what
the two exception types were; this goes back to the 80386:

[http://intel80386.com/386htm/s09_08.htm](http://intel80386.com/386htm/s09_08.htm)

That has always been there, but I guess the wording is a bit unclear/the edge
case where a "benign exception" occurs while handling another one was never
really considered. If I had time I'd try these scenarios on real hardware to
see if double or triple-fault happens, or if the CPU does get stuck in a loop.

The real problem might not be this edge-case itself, if real hardware can also
get into an infinite loop (after all, some process running in a VM can easily
execute one of those); it's the fact that the host loses control of the
virtualised CPU.

~~~
yuhong
Yep, I think they said the problem is the CPU hanging in a infinite loop in
microcode with not even SMIs being delivered.

------
comex
Two things I'm wondering:

\- What kind of performance impact does the workaround have?

\- Will Intel or AMD be able to fix this in microcode (by making it do the
right thing if an external interrupt or NMI arrives)?

~~~
kogepathic
> Will Intel or AMD be able to fix this in microcode (by making it do the
> right thing if an external interrupt or NMI arrives)?

What I want to know is: does this affect other hypervisors as well? If this is
a bug related to the CPU, why haven't we heard from KVM, VMWare, etc about it?

I can't believe Xen basically just said "run PVM or get pwned"

~~~
yuhong
If the hypervisor already intercepts #DB and #AC, they are not affected.

MS has also released a fix: [https://technet.microsoft.com/en-
us/library/security/3108638](https://technet.microsoft.com/en-
us/library/security/3108638)

KVM:
[https://lkml.org/lkml/2015/11/10/214](https://lkml.org/lkml/2015/11/10/214)

------
nnx
This looks bad.

Did AWS comment on this yet?

To my limited understanding of the advisory, Xen's recommended mitigation
would be for AWS to "convert" all EC2 HVM instances to PVM?

Is that even possible?

~~~
yuhong
They most likely already patched it.

