
New cloud attack takes full control of virtual machines with little effort - mikecarlton
http://arstechnica.com/security/2016/08/new-attack-steals-private-crypto-keys-by-corrupting-data-in-computer-memory/
======
userbinator
The rowhammer "attack" is successful only because the hardware is just plain
_broken_ , and I consider it in the same category as things like a CPU which
will calculate 1+1=3 if the computation of 1+1 is done enough times ---
nothing software should even try to fix, because the problem is at a lower
level. The solution is to demand that the hardware manufacturers make memory
which actually works like memory should; and it should be possible, since
apparently previous generations of RAM don't have this problem at all. In the
early 90s Intel recalled and replaced, free of charge, CPUs which didn't
divide correctly. Perhaps the memory manufacturers today should do the same
for rowhammer-affected modules and chips.

Memory errors are particularly disturbing because they are often highly
dependent on data and access patterns, and can be _extremely_ difficult to
pinpoint without special testing tools. I've personally experienced a
situation where a system which otherwise appears to work perfectly well would
_always_ corrupt one specific bit of a file when extracting one particular
archive.

As a testing tool, MemTest86+ has always worked well for me, and the newer
versions can detect rowhammer, although there is this interesting discussion
about whether it is actually a problem (to which I say a resounding _YES!!!_ )
or if there's some sort of cover-up by the memory industry:

[http://www.passmark.com/forum/memtest86/5903-rowhammer-
probl...](http://www.passmark.com/forum/memtest86/5903-rowhammer-problem-not-
found-by-memtest86-6-3-0)

[http://www.passmark.com/forum/memtest86/5475-memtest86-v6-2-...](http://www.passmark.com/forum/memtest86/5475-memtest86-v6-2-0-released-8-sept-2015)

Run it on your hardware and if it fails, I think you should definitely
complain and get it fixed.

~~~
niftich
> The rowhammer "attack" is successful only because the hardware is just plain
> _broken_

I too am of this opinion and am surprised this view isn't widely shared. With
DDR4, we should be asking for a refund and/or starting a class-action suit,
yet we're putting up with software 'mitigations' instead.

This isn't like the 2008 Phenom TLB bug [1] where the CPU was locking up so
AMD released a workaround that kept it from freezing at the expense of a 14%
performance penalty. This is like the floating point division bug [2] where
the device no longer meets basic operational and accuracy guarantees. RAM
cells bleeding into each other ought to be considered a fatal flaw, not some
intellectual curiosity.

[1] [http://techreport.com/review/13741/phenom-tlb-patch-
benchmar...](http://techreport.com/review/13741/phenom-tlb-patch-
benchmarked/4)

[2]
[https://en.wikipedia.org/wiki/Pentium_FDIV_bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)

~~~
userbinator
_I too am of this opinion and am surprised this view isn 't widely shared.
With DDR4, we should be asking for a refund and/or starting a class-action
suit, yet we're putting up with software 'mitigations' instead._

I extensively test all the hardware I buy (CPU: LINPACK, RAM: MemTest86+) and
if it fails any of those tests, it gets returned as "not fit for purpose".
I've done this successfully a few times. A lot of other enthusiasists/power
users do the same too, especially if they're overclocking, and searches on
other forums show plenty of users testing and finding (mostly other, not
rowhammer) errors in newly-bought RAM even when not overclocking. But as noted
in the threads I linked to, manufacturers may be trying to cover this up and
downplay its severity. Even in the original paper on rowhammer, the authors
didn't disclose which manufacturers and which modules were affected, although
I think this should really be treated like the FDIV bug: name and shame. I
blame political correctness...

~~~
pjmlp
How do you do it, regarding LINPACK?

I assume just compiling it and execute some tests that are part of it?

~~~
userbinator
The Intel LINPACK distribution contains, besides the library, a sample
benchmarking application using it, and that happens to be a very intense and
"real" workload (solving systems of equations, i.e. scientific computation.)
There's plenty of posts on various PC enthusiasists forums about how to run it
correctly. (And plenty arguing that it's irrelevant, mostly because their
insane overclock _seems_ fine but instantly fails this test. There's a good
reason most doing "real" scientific computing don't overclock; a lot of CPUs
just barely pass this absolutely realistic test with stock speeds and
voltages.)

------
andrewstuart2
It seems the HN title and original title are _both_ pretty wrong, at least
according to the article content. The attack vector is really the ability to,
if you have a known public key and a server using it, perform a pre-calculated
bit flip such that the new public key is much easier to factor, and thus
obtain a corresponding private key.

So you're not obtaining original private keys, you're altering original public
keys so that you can more quickly factor a private key that will be accepted.

If this is an SSH public key, then you can obtain SSH access. If it's a PGP
key trusted by the package manager, then you can craft signatures on packages
that _would_ be accepted as valid, assuming you can also get the target
machine to download said package.

I think SSH is probably the most interesting attack vector assuming you can
get network access to the host once you've jumped through the myriad hoops to
perform this attack.

It's a serious issue that should be addressed (probably via forced from-disk
reads or at minimum integrity checks), but I think the authors are perhaps a
little too eager on the practical implications of corrupting in-memory public
keys.

~~~
DigitalJack
It's the sort of thing for which the NSA would spend their resources to
develop an exploit tool.

~~~
userbinator
Rowhammer is such a subtle effect and very easily blamed on many other things
that it's not hard for the more paranoid among us to imagine the NSA
deliberately sabotaging memories with it to use as a backdoor. When it was
first discovered I wrote my thoughts on it here:

[https://news.ycombinator.com/item?id=8716977](https://news.ycombinator.com/item?id=8716977)

------
xorgar831
Here's the crux of the memory issue from one of the link in the article:

DDR memory is laid out in an array of rows and columns, which are assigned in
large blocks to various applications and operating system resources. To
protect the integrity and security of the entire system, each large chunk of
memory is contained in a "sandbox" that can be accessed only by a given app or
OS process. Bit flipping works when a hacker-developed app or process accesses
two carefully selected rows of memory hundreds of thousands of times in a tiny
fraction of a second. By hammering the two "aggressor" memory regions, the
exploit can reverse one or more bits in a third "victim" location. In other
words, selected zeros in the victim region will turn into ones or vice versa.

~~~
runeks
So it doesn't allow reading any data? I'm most nervous about leaking private
keys.

~~~
jfoutz
in and of itself, no. but it could alter a permission bit, for example, and
then reading would be allowed.

------
frostmatthew
This attack wouldn't work with [current versions] of ESXi since VMs now share
pages only if the salt value and contents of the pages are identical (each VM
uses a unique salt by default).
[https://kb.vmware.com/selfservice/microsites/search.do?langu...](https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2097593)

~~~
andrewstuart
Sharing pages seems a big price risk to pay for saving a little memory.

Why not turn it off entirely?

~~~
ams6110
I would guess if you're a big VM hosting provider and you have thousands of
VMs all running the same version of Windows or Linux distro, that it could add
up to some real savings to have them share common pages.

~~~
andrewstuart
I guess so.

Seems the savings would be somewhat offset by having your whole business
destroyed because its easy to crack.

~~~
jerf
Conceptually, it's safe. UNIX distributions routinely do the equivalent
operation within single machines, it's a fundamental part of their operating
model.

It's just that in the face of defective hardware, it's not safe. But this is
not surprising, because nothing is safe, so it isn't particularly a criticism
of page sharing. This specific attack may have used it, but Rowhammer is a
powerful tool. This is not the only way it can be used; it is merely an
exemplar.

------
ars
People are focusing too much on the exact specific attack shown here:
Deduplication, modifying a public key, etc. (And proposing solutions like
turning off deduplicaiton, checksum, etc.)

But that's just _this_ attack - the fact that they have that much control over
memory means there are FAR FAR FAR more possible attacks.

If you can control memory to that level then you are limited only by your
imagination.

The only mitigation I can think of at the moment is ECC memory. And shame on
Intel for only supporting that on Xeon.

~~~
andrewstuart
What can you do to attack other VMs if you don't have shared memory with them?

~~~
rasz_pl
depends, do they run node? because there were successful javascript rowhammer
implementations demonstrated

------
rotrux
For those of you worried about your aws workloads, this may help make ya feel
a (slight) bit better.

[https://forums.aws.amazon.com/thread.jspa?messageID=739485&t...](https://forums.aws.amazon.com/thread.jspa?messageID=739485&tstart=0)

~~~
saltyhiker
Why only slightly better? The response to that forum post, as far as I can
tell, means EC2 is not vulnerable to this attack.

------
walrus01
It is more costly, but this is a good reason to use a dedicated chunk of
memory for every Xen PV domU. No oversubscription!

Allowing multiple domU VMs on the same dom0 (or the equivalent in other
hypervisor platforms) to re-use memory and balloon/contract memory on the fly
is what enables this.

~~~
runeks
Can you point me to some services that provide, specifically, Xen PV VMs with
non-oversubscribed memory?

I'm considering deploying a custom unikernel for protecting the private key
data for my app[1], until I have enough money for a Hardware Security Module.

[1]
[http://security.stackexchange.com/questions/135457/penetrati...](http://security.stackexchange.com/questions/135457/penetration-
resistance-of-a-halvm-unikernel)

~~~
walrus01
Sorry, I can't, we use Debian stable + xen on our own bare metal hardware
machines with from 256gb to 1tb of RAM. Never tried to buy a rental VM using
the same dom0+PV setup. All of my off site VMs are for testing, some cheap
$4/mo type openVZ that are basically glorified jails.

------
Animats
Will rowhammer attacks work against ECC RAM? Multibit memory errors should be
detected, even if they can't be corrected.

~~~
strstr
ECC is one of the mitigations, as well as increased refresh rate.

The 'best' solution is better ram: some vendors are more vulnerable than
others.

------
trendia
Would this be a threat to services running on AWS?

~~~
wmf
No, because AFAIK EC2 does not dedupe RAM.

~~~
SBArbeit
What about Hyper-V / Microsoft Azure? Anyone know if they de-dup memory like
this?

~~~
Wakko1
No. Hyper-V has no memory De-dup function. Azure runs on Hyper-V so it's not
vulnerable either.

~~~
saltyhiker
What popular cloud providers _are_ vulnerable?

------
micro_softy
"For the attacks to work, the cloud hosting the VM must have deduplication
enabled so that physical pages are shared between customers."

But the vendor's cloud will not disable sharing pages of physical memory
because ____.

This is a great counterpoint to the salesman trying to sell you on "cloud"
anything.

Why is it less expensive to use the "cloud"?

One reason is because you do not get your own physical server, including your
own RAM.

When the "cloud" buzz began to gain momentum years ago I raised the issue of
not knowing who your "neighbors" were on these physical servers that customers
are sharing with other customers in datacenters.

As usual, these concerns will just fade into the background... again.

------
runeks
Ouch. Before reading this article I was seriously considering deploying a
signing service as a HaLVM (Haskell) Xen PV unikernel running on EC2. The
service would receive its private key after startup, such that the key never
touches disk. Now I'm a lot less inclined to pretend that the Xen interface
actually protects me...

~~~
ploxiln
Xen has had page-table and interrupt vector related security vulnerabilities.
But I don't think EC2 would use non-ECC RAM, so I don't think it's vulnerable
to this "rowhammer" technique. (I also don't think EC2 would do cross-VM page
deduplication, another necessary condition.)

~~~
willvarfar
Perhaps we need more certainty than just "think"?

That AWS don't boast that they are not susceptible to this suggests that
perhaps at least some of their setup is?

~~~
Rezo
The EC2 FAQ [0] states:

"In our experience, ECC memory is necessary for server infrastructure, and all
the hardware underlying Amazon EC2 uses ECC memory."

While ECC does apparently not completely mitigate Rowhammer, it helps.

[0] [https://aws.amazon.com/ec2/faqs/](https://aws.amazon.com/ec2/faqs/)

------
Annatar
_For the attacks to work, the cloud hosting the VM must have deduplication
enabled so that physical pages are shared between customers._

This "Flip Feng Shui" wouldn't work in SmartOS simply because the hypervisor
does not implement memory deduplication.

Good luck with VMware though.

------
tmaly
It seems like a dedicated server would solve this issue in some sense. If your
not on a shared VM, then an attacker could not affect the memory.

For those that cannot be on a dedicated server, what changes could be made to
the shared VM memory setup to reduce this attack surface?

------
lifeisstillgood
some thoughts:

    
    
      For the attacks to work, the cloud hosting the VMs must have deduplication enabled so that physical pages are shared between customers.
    

This seemingly is an attack where two VMs on the same host can read each
other's memory, if a deduplication flag is set on the VM controller. This
seems to offer cloud holsters some easy (paid for) upgrades to be honest

its not (afaik) heartbleed time. It's bad but the effort required is high and
afaik the attacker will replace your key with their key - making it clear you
are compromised.

~~~
acobster
The abstract says the attack allows "flips over arbitrary physical memory in a
fully controlled way." If I'm understanding that correctly, it would be
trivial to then restore the old key alongside it, leaving the victim none the
wiser.

Also, as others have pointed out, this is a hardware issue and the clear
solution is to swap out the vulnerable RAM. Yeah, paying more is an "easy" way
to have peace of mind (if that's even an option for you as a "cloud hoster"),
but that's just backwards IMHO: a security vulnerability on the host's side
should not translate into an upsell.

------
caf
I wonder if it would be worth checksumming public keys and re-checking the
checksum each time it's used?

~~~
runeks
That's security by obscurity (which may work to delay the attacker). If an
attacker can modify your public keys he can modify your checksums as well.

Seems to me that a public key should be identified by a cryptographic hash of
it, rather than the public keys itself. Then the attacker would need to
replace the entire hash, rather than just a few bits, because the hash changes
completely just by flipping a single bit in the input.

~~~
caf
The attacker isn't making targeted modifications to your public keys, though:
they're randomly glitching it, and using the page sharing implemented by the
hypervisor to read out and factor the glitched version.

Even with say a 64 bit checksum then there's only a 1 in 2^64 chance of the
randomly modified key/checksum pair matching. But you could use a
cryptographic hash as your checksum if you wanted.

I only suggest this not because I think it would be a complete defence against
all Rowhammer attacks - it wouldn't - but because the general fragility of the
RSA construction means that doing it with any potentially corrupted input
gives me the willies. There are other sources of bitflips other than Rowhammer
and it just strikes me as a generally good idea not to leak the results of RSA
operations performed on potentially bitflipped inputs.

------
arrty88
Is Linode safe?

