
XSA-108 Advisory - ukandy
http://xenbits.xen.org/xsa/advisory-108.html
======
sudowhodoido
Well looks like Mr De Raadt was right again:

 _' x86 virtualization is about basically placing another nearly full kernel,
full of new bugs, on top of a nasty x86 architecture which barely has correct
page protection. Then running your operating system on the other side of this
brand new pile of shit.

You are absolutely deluded, if not stupid, if you think that a worldwide
collection of software engineers who can't write operating systems or
applications without security holes, can then turn around and suddenly write
virtualization layers without security holes.'_

Source: [http://marc.info/?l=openbsd-
misc&m=119318909016582](http://marc.info/?l=openbsd-misc&m=119318909016582)

Personally, I have hope for things like cgroups/jails and MAC/SELinux over
virtualization.

~~~
joosters
This is just FUD and in the usual insulting De Raadt style of communicating.

'barely has correct page protection' is just a way of saying 'has correct page
protection, but I want to be really snotty about it'. So, highlighting a non-
problem.

No-one is claiming that virtualization makes a system magically completely
secure, but do people actually believe that it makes it _less_ secure?
(Compared to, running the same software on the same hardware using a single
OS). I don't think so.

~~~
sudowhodoido
Not really. He's right. The memory, paging and protection model for X86 is
fugly at best. It's very easy to hang yourself as demonstrated.

You can be as insulting as you like if you compare X86 (and X86-64) to SPARC
and POWER which is what I assume he is doing here considering he provides
operating systems for multiple architectures. We're talking about an
architecture that started with the 8086 and despite changes to the underlying
microcode architecture, the front end ISA and system interface is still
plagued with poorly designed extensions hacked on.

Regarding virtualization, any sharing of resources, particularly at a hardware
level is an attack vector if not implemented correctly. Whether or not it is
implemented correctly or is exploitable is merely a matter of time and effort
as demonstrated here. That is unless mathematically verified, which it isn't
and based on the evolved x86 architecture probably isn't possible so it can't
be more secure and is unlikely to be as secure. That leaves only less secure.

~~~
trekkin
I implement high-performance software systems in C++ as my day job. The
software has to compile and run on Linux, Solaris, and AIX. The same code is
2x slower on AIX (Power) and 3x-5x slower on Solaris (Sparc) than on Linux
(x86). So say whatever you want about theoretical differences in
architectures, but in the real world Sparc and Power systems are absolutely
not competitive, both on price (absolute $$, and per CPU) and performance (per
CPU - they do have more cpu cores, usually).

~~~
Someone1234
That's just a circular way of saying "x86 is more popular, therefore better."
Which doesn't address the person aboves' point that x86 is inferior in terms
of its design.

Of course x86 is going to be faster per dollar spent. One is mass market
(x86-64) and the other two are hugely niche (Sparc and Power). Plus the Linux
kernel has by far the most human-hours spent on its development relative to
every other operating system in the world.

There's also a reason why some of x86's market share has been eaten up by ARM.
Moving from x86 to ARM was hugely expensive by all measures, but it was
worthwhile because x86 was so wasteful.

~~~
mbreese
It's not just "x86 is more popular, therefore better." It's that the
performance of x86 was better than SPARC or Power. Regardless of the cost of
the chip, performance is what is really important here. In some instances,
performance per watt is more important, but either way... it's performance
that's key, not market forces driving cost savings.

I haven't had much experience with SPARC, but I've done some work on Power
systems (long ago). Back then (10-ish years ago), Power chips were more
powerful than their x86 contemporaries. But at some point, that relationship
switched.

However, I wonder how much of this is the chip, and how much is the tooling.
Its been awhile since I've needed to think about C/C++ compiling, but from
what I remember, the Intel compiler produced (slightly) faster binaries than
gcc. Now this is where popularity could prove to be decisive... if the
compiler that the OP uses works for x86, SPARC, and Power, how much do you
suspect each of those architectures has been optimized? Even if the non-x86
chip itself is capable of running faster than x86, if the toolchain isn't
similarly optimized, they could end up having worse performance.

------
walterbell
Thanks to Jan Beulich, the SUSE Xen maintainer in Germany who is credited with
finding this x86 HVM vulnerability.

It would be helpful if errata announcements included documentation of the
static analysis tools, code review process or automated testing techniques
which identified the weakness, along with a postmortem of previous audits of
relevant code paths.

What made it possible for this issue to be identified now, when the issue
escaped previous analysis, audits and tests? Such process improvement
knowledge is possibly more valuable to the worldwide technical community than
any point fix.

Heartbleed was discovered by an external party, but this issue which affects
the data of millions of users was found by the originating open-source
project. Kudos to Jan for finding this cross-domain escalation.

~~~
seanp2k2
Was [http://www.brendangregg.com/blog/2014-09-15/the-msrs-of-
ec2....](http://www.brendangregg.com/blog/2014-09-15/the-msrs-of-ec2.html) the
reason they were looking?

~~~
walterbell
Don't know. A downthread post
([https://news.ycombinator.com/item?id=8393911](https://news.ycombinator.com/item?id=8393911))
links to the Dec 2010 Intel patch which introduced the bug. It may have
originated with a RedHat KVM patch in June 2009,
[https://lkml.org/lkml/2009/6/29/205](https://lkml.org/lkml/2009/6/29/205)

------
brendangregg
I have to wonder if my recent blog post, "The MSRs of EC2", on September 15th
has prompted this discovery: [http://www.brendangregg.com/blog/2014-09-15/the-
msrs-of-ec2....](http://www.brendangregg.com/blog/2014-09-15/the-msrs-of-
ec2.html) . When I posted that, I could find no examples of MSR usage in EC2
or Xen.

I haven't checked after the reboot, but I hope the MSRs I'm using can still be
accessed: IA32_MPERF and IA32_APERF (to calculate real CPU MHz);
IA32_THERM_STATUS and MSR_TEMPERATURE_TARGET (to calculate CPU temperatures);
and MSR_TURBO_RATIO_LIMIT and MSR_TURBO_RATIO_LIMIT1 (to see turbo ratios).

I use them in the showboost, cputemp, and cpuhot MSR-based tools:
[https://github.com/brendangregg/msr-cloud-
tools](https://github.com/brendangregg/msr-cloud-tools)

~~~
mzs
You should email Jan and ask, I'm curious too.

------
alexduros
interesting link ... from 2010 [http://xen.1045712.n5.nabble.com/x2APIC-
emulation-for-HVM-gu...](http://xen.1045712.n5.nabble.com/x2APIC-emulation-
for-HVM-guest-td3288786.html)

~~~
mzs
good find:

    
    
        @@ -2189,6 +2190,11 @@
                 *msr_content = vcpu_vlapic(v)->hw.apic_base_msr;
                 break;
         
        +    case MSR_IA32_APICBASE_MSR ... MSR_IA32_APICBASE_MSR + 0x3ff:
        +        if ( hvm_x2apic_msr_read(v, msr, msr_content) )
        +            goto gp_fault;
        +        break;
        +
             case MSR_IA32_CR_PAT:
                 *msr_content = v->arch.hvm_vcpu.pat_cr;
                 break;
        @@ -2296,6 +2302,11 @@
                 vlapic_msr_set(vcpu_vlapic(v), msr_content);
                 break;
         
        +    case MSR_IA32_APICBASE_MSR ... MSR_IA32_APICBASE_MSR + 0x3ff:
        +        if ( hvm_x2apic_msr_write(v, msr, msr_content) )
        +            goto gp_fault;
        +        break;
        +
             case MSR_IA32_CR_PAT:
                 if ( !pat_msr_set(&v->arch.hvm_vcpu.pat_cr, msr_content) )
                    goto gp_fault;

~~~
diydsp
What is the "..." operator? I have never seen that before. I can't find any
references to it. Is that a macro specific to this project? [I checked the
post above, but it doesn't match this source code exactly (and doesn't have
... as an operator).]

~~~
_delirium
It's a GNU extension that lets you define ranges in switch statements' cases:
[https://gcc.gnu.org/onlinedocs/gcc/Case-
Ranges.html](https://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html)

------
asb
Can somebody please confirm that it is impossible to boot a HVM system on
Linode? The hypervisor on my Linode host certainly supports HVM (according to
cat /sys/hypervisor/properties/capabilities). The host Xen is 4.1 and
therefore vulnerable in the case that another user could be running HV guests.

~~~
akerl_
There is no such thing as an HVM Linode server, all Linodes are PV.

~~~
asb
Thanks for the confirmation. Hoping to see PVH support in the future!

------
eik3_de
It will be interesting to see which providers didn't get the embargoed
release.

So if you get a reboot announcment from your xen vps provider after today
12:00Z, you should list them here.

~~~
ukandy
Pre-disclosure list is at the bottom of this page..
[http://www.xenproject.org/security-
policy.html](http://www.xenproject.org/security-policy.html)

~~~
danielweber
Huh, I have a tiny machine at one of those smaller places, and they are on the
list. Good to know the smaller players can build up a reputation for
embargoing, too.

~~~
timoth
Interesting that over half of the companies on the list were added within the
last week, if the dates in the page changelog accurately reflect when they
were added. If so, perhaps they all suddenly bundled in so they could find out
what the embargoed vuln. was.

------
spydum
Ouch, isn't one of the cardinal sins of hypervisors reading memory outside of
your allocation? Certainly explains the embargo

------
aus_
I assume this is why AWS forcefully rebooted many of their VMs recently?

~~~
mukyu
There is no need to assume anything.

""" Yesterday we started notifying some of our customers of a timely security
and operational update we need to perform on a small percentage (less than
10%) of our EC2 fleet globally.

AWS customers know that security and operational excellence are our top two
priorities. These updates must be completed by October 1st before the issue is
made public as part of an upcoming Xen Security Announcement (XSA). Following
security best practices, the details of this update are embargoed until then.
The issue in that notice affects many Xen environments, and is not specific to
AWS. """

[0] [http://aws.amazon.com/blogs/aws/ec2-maintenance-
update/](http://aws.amazon.com/blogs/aws/ec2-maintenance-update/)

------
comex
I wonder if Amazon or someone will take the time to make a ksplice-like system
for Xen so that future security upgrades probably won't have to go through
such disruptive reboot events.

(Or, for that matter, whether they considered making an ad-hoc machine code
patch - based on the source patch, it looks like it would probably be doable
just by changing a few bytes. I guess it's a bit risky...)

~~~
semenko
Seems a little unlikely, given Ksplice's patents (now Oracle's patents)
covering the area.

From an older post @
[https://news.ycombinator.com/item?id=2791756](https://news.ycombinator.com/item?id=2791756)

The first is "Method of finding a safe time to modify code of a running
computer program": [http://bit.ly/ksplice-1](http://bit.ly/ksplice-1)

The second is "Method of determining which computer program functions are
changed by an arbitrary source code modification":
[http://bit.ly/ksplice-2](http://bit.ly/ksplice-2)

~~~
comex
Sigh. That's obnoxious - yet another example of software patents confusing
proving that an idea is commercially valuable with inventing it in the first
place. Anyone with the requisite skills in reverse engineering, compilers,
etc. could have told you that hot patching functions in memory is possible and
would take at most a few minutes to notice that this may be unsafe if some
suspended thread is sitting in a function prolog. Yet "identifying a portion
of executable code to be updated in a running computer program; and
determining whether it is safe to modify the executable code of the running
computer program without having to restart the running computer program" (an
actual claim, not the abstract or title quotes that people tend to
misconstrue) is now locked out for the next decade or so.

------
mappu
The advisory says PV is safe and HVM is dangerous. What about PV-HVM?

Got a reasonably long list of VPS providers to submit tickets to.

~~~
liuw
PVHVM is HVM with PV drivers, so it's still HVM. Don't hesitate to submit your
tickets.

------
pjungwir
I had "dedicated" AWS instances that were rebooted. A dedicated instance means
there is only one guest per box, right? So I'm curious why those had to be
rebooted if there is no network-facing vector to this vulnerability. I guess
because we could have read from the hypervisor's memory?

~~~
hoov
I believe that "dedicated" means that it's dedicated to your account. So,
there can be more than one guest, but those guests are your EC2 instances.

~~~
pjungwir
Ah, thanks for the clarification! I guess that still means they were rebooting
our machines in case the malicious actor was _us_. :-)

------
walterbell
_Desktop_ risk analysis from Qubes, [http://qubes-os.org](http://qubes-os.org)
via [https://groups.google.com/forum/m/#!topic/qubes-
devel/HgQ_aW...](https://groups.google.com/forum/m/#!topic/qubes-
devel/HgQ_aWt-EBU)

\---

This seemingly looks like a serious problem, but if we think a little bit
about the practical impact the conclusion might be quite different.

First, there are really no secrets or keys in the hypervisor memory that might
make a good target for an exploit here. Xen hypervisor does not do encryption,
neither it deals with any storage subsystems. Also there is no explicit guest
memory content intermixed with the hypervisor code and data.

But one place to see pieces of potentially sensitive data are the Xen internal
structures where the guest _registers_ are stored whenever the guest execution
is interrupted (e.g. because of a trap). These registers might contain e.g.
(parts of) keys or other secrets, if the guest was executing some sensitive
crypto operation just before it got interrupted.

The vulnerability allows to read only a few kB of the hypervisor memory, with
only relative addressing from the emulated APIC registers page, whose address
is not known to the attacker. Still, for the exactly same systems (same
binaries running, same ACPI tables, etc) it's likely that the attacker would
be able to guess the address of the APIC page. However, it is much less
probable she would be able to predict what Xen structures are located in the
adjacent memory. Much less the attacker would be able to control what
structure are located there, as there doesn't seem to be many ways of how a
malicious HVM might be significantly affecting the layout of the hypervisor
heap (e.g. force arch_vcpu structures of interesting domains to appear
nearby).

Nevertheless, it might happen, by pure coincidence, that an arch_vcpu
structure with a content of an interesting VM will just happen to be located
adjacently to the emulated APIC page.

In that case, the next problem for the attacker would be lack of control and
knowledge over the target VM execution: even if the attacker were somehow
lucky to find the other VM's register-holding-structure adjacent to the APIC
page, it would still be unclear what the target VM was executing at the time
it was suspended and so, whether the registers stored in the structure are
worthwhile or not.

It is thinkable that the attacker might attempt to use some form of a
heuristic, such as e.g. "if RIP == X, then RAX likely contains (parts of) the
important key", hoping that this specific RIP would signify a specific
interesting instruction (e.g. part of some crypto library) being executed
while the VM was interrupted, and so the key is to be found in one of the
registers.

But the attacker's memory reading exploit doesn't offer a comfort of
synchronization, so even though the attacker might be so extremely lucky as to
find out that

    
    
      *(apic_page + guessed_offset_to_rip) == X 
    

(the attacker here assumes the 'guessed_offset_to_rip' is the distance between
the APIC page and the address where RIP is stored in the presumable arch_vcpu
structure, that presumably is located adjacently), still there is no
guarantees that the next read to

    
    
      *(apic_page + guest_offset_to_rax) 
    

will return the content of RAX from the same moment that RIP was snapshot (and
which the attacker considered interesting).

Arguably the attacker might try to fire up the attack continuously, thus
increasing chances of success. Assuming this won't cause system to crash due
to accessing non-mapped memory, this might sound like a somehow good strategy.

However, in case of a desktop system like Qubes OS, the attacker has very
limited control over other domains. Unlike as in case of attacking a VM
playing a role of a Web server for instance, the attacker probably won't be
able to force the target VMs to do lots of repeated crypto operations, neither
choose moments when the target VM traps.

It seems like exploiting this bug in an IaaS scenario might be more practical,
though, as the attacker also has some control of domain creation/termination,
so can affect Xen heap to some extent. But on a system like Qubes OS, it seems
unlikely.

So, are we doomed? We likely are, but probably not because of this bug.

\---

------
ivank
Some impact analysis from the Qubes OS project:
[https://groups.google.com/forum/#!msg/qubes-devel/HgQ_aWt-
EB...](https://groups.google.com/forum/#!msg/qubes-devel/HgQ_aWt-
EBU/8VWzu2IrQdQJ)

------
otterley
Patches to XenServer (releases 6.0 through 6.2) are now available from Citrix:
[https://support.citrix.com/article/CTX200218](https://support.citrix.com/article/CTX200218)

------
regularfry
> A buggy or malicious HVM guest can crash the host

Given that writes are no-ops, I don't understand this mechanism. Can someone
explain it, please?

------
chippy
Interesting that the patch is just one characters difference (applied in two
locations)

0x3ff to 0xff

~~~
boardwaalk
Seems to me like they should be using constants. Especially since this implies
the 0xff/256 is elsewhere as well:

 _elsewhere a 256 rather than 1024 wide window is being used too_

~~~
tarpherder
*0xff/255

Simple but potentially dangerous right? :P

------
fndrplayer13
Heh, it's been a rough couple of weeks.

