
Project Zero: Exploiting the DRAM rowhammer bug to gain kernel privileges - j_baker
http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html
======
ChuckMcM
Once again, I pine for ECC memory on my Laptop. I know you can get ECC
SODIMMS, I got 16GB worth for a Supermicro ITX motherboard. And while the
paper talks about multi-bit errors getting through ECC (which is certainly
possible with enough flips) single flips causing alerts and double flips
causing halts would really get your attention that something bad was
happening. As opposed to silently sitting there while my memory is shredded.

~~~
ScottBurson
If I understand correctly, Intel doesn't even ship a consumer CPU (i.e., a
non-Xeon) that supports ECC. (Don't know about AMD.)

~~~
rodgerd
> Intel doesn't even ship a consumer CPU (i.e., a non-Xeon) that supports ECC.

Not true - there are some Atoms that do, but they're targeted at NAS type
uses. It is the case you can't get Core-series processors with ECC.

AMD used to offer very broad support for ECC, but data integrity clearly
didn't win market share.

~~~
sliverstorm
_data integrity clearly didn 't win market share_

It requires more expensive, compatible DRAM right? Knowing that, it shouldn't
really be super surprising. Enabling it on die is just one piece of the
equation.

~~~
rodgerd
The problem is that if you say "fast, cheap, reliable, pick two" people will
pick "fast" and "cheap". Even people who might ordinarily worry about whether
their workstation is silently corrupting their data.

------
sharkbot
There was an older paper discussing using various methods of fault injection
(heat, voltage changes, etc) to attack Java smart cards, essentially
destroying the type system guarantees and thus opening up an attack surface:
"The Sorcerer’s Apprentice Guide to Fault Attacks",
[https://eprint.iacr.org/2004/100.pdf](https://eprint.iacr.org/2004/100.pdf)

~~~
bri3d
Fault injection is also how older Dish Network and DirecTV smart cards were
hacked - there used to be a cottage industry selling "voltage glitchers" to
reprogram Dish Network smart cards with the keys for additional programming
tiers.

~~~
makomk
I believe some pay TV smartcard hacks also made use of clock glitching,
basically sending a shorter-than-usual clock pulse that means some of the
internal signals don't make it to their destinations on time. The pay TV
hacking industry had some pretty clever tricks a decade or two ago.

~~~
Scoundreller
They were quite cool.

From memory, I think one card had some internal startup check that checked to
see if its EPROM got marked by the "Black Sunday" countermeasure and then hung
itself.

The hackers, having a ROM dump and having knowledge of how many clock cycles
each instruction took the CPU, knew that it was at ~clock cycle 525 or so that
this internal check happened.

Knowing that the instruction was a "Branch if equals to" (I think), and that
instruction took 12 cycles, they figured out which of those 12 caused that
branch to happen, figured out the precise time to glitch (whether via voltage
or a single rapid clock cycle), and caused the CPU to skip changing the
instruction pointer and then continue through its ROM code as if the check had
passed.

Within a month or two, hundreds of thousands of receivers had a man-in-the-
middle device just to glitch reprogrammed cards every time they were started
up.

Apparently the north american provider had tested the same countermeasure in
their south american division, so the north americans had advance notice of
what they had to do to get back in action.

I recall, for another system, a small memory chip was required for a pre-
existing man-in-the-middle card, and overnight every electronics supplier went
out-of-stock overnight. Digikey sold out of 50k units overnight.

~~~
rasz_pl
Coincitentally hardware to play with those types of attack just got
commodified

[https://www.kickstarter.com/projects/coflynn/chipwhisperer-l...](https://www.kickstarter.com/projects/coflynn/chipwhisperer-
lite-a-new-era-of-hardware-security)

[https://www.assembla.com/spaces/chipwhisperer/wiki](https://www.assembla.com/spaces/chipwhisperer/wiki)

~~~
Scoundreller
Same with the JTAGulator units. 10+ years ago, countermeasures would reprogram
the very-difficult-to-desolder TSOP EEPROM on the receiver.

The manufacturers seemed to use an externally accessible JTAG access point to
program the receivers in the factory, which was a convenient boon to hackers
that didn't even need a screwdriver to reprogram the units through their
parallel ports.

------
kmowery
The starting research that enabled this security work appeared last year at
ISCA, but didn't fully discuss the security implications:

[https://www.ece.cmu.edu/~safari/pubs/kim-
isca14.pdf](https://www.ece.cmu.edu/~safari/pubs/kim-isca14.pdf)

~~~
userbinator
I noticed the security implications of "memory that doesn't always behave like
memory" when that paper came out a few months ago and was discussed briefly on
HN:

[https://news.ycombinator.com/item?id=8713411](https://news.ycombinator.com/item?id=8713411)

------
j_baker
You know, this makes me wonder. If a car manufacturer or a toy company made a
product that was found to be unsafe, there would be a recall. If hardware
manufacturers make a product that is insecure, will there be a recall?
Unfortunately, I suspect that this is a case where the law hasn't caught up
with technology.

~~~
pdpi
Can you get killed as a result of privilege escalation? The law hasn't caught
up in part because the potential consequences aren't nearly as dire.

~~~
datenwolf
Modern medical technology relies heavily on computers and software. Take an
infusion pump for example. Controlled by a microcontroller and using software.
Or insulin pumps; and some vendors are actually considering to add Bluetooth
to insulin pumps, so that patients using such a pump can check its status on
their smartphone (or on the upcomming smart watches). Also you can adjust the
infusion rate of an insulin pump to accommodate for ingested sugar. Overdosing
on insulin can send a person into shock and kill.

~~~
tedunangst
If somebody is running their ramhammer exploit on your insulin pump, it's
probably a bit late.

------
jacquesm
Laptops are particularly at risk for stuff like this: components are more
densely packed and may use smaller process sizes and have less powerful
supplies which may be a factor in keeping bits in adjacent rows stable.

That may be the reason why the desktops mentioned are less sensitive, they'll
use full size memory modules and will have beefy power supplies.

It'd be interesting to repeat the experiments with the laptops running off
their internal battery.

~~~
Aissen
Also, lower refresh rates on DRAMs means less power consumption (so it's an
easy fix in BIOS, independent of OS, clearly attractive to laptop makers), but
also more exposition to this issue.

------
Aissen
Very little information on time scales. In one case they speak about 5 minutes
vs 40 minutes (both might be acceptable for an exploit). Also no information
about how long it took to bitflip in their per-hardware table.

And why name no hardware vendor ? I'm guessing they expect people to use the
tool they provided and draw their own conclusions, but I don't understand why
they'd treat them differently from software vendors.

~~~
jacquesm
At a guess to avoid labeling laptop manufacturers and getting sued if it turns
out that something else was at fault? The DRAM itself might be the culprit
(probably is), laptops of a certain brand might come with RAM from different
manufacturers.

~~~
Aissen
I understood the litigation risk. In an integrated system it's always someone
else's fault (DRAM, BIOS, CPU, laptop vendor). IMHO the last integrator (the
one selling you the goods) is always the culprit.

Why would they fear hardware manufacturers' litigation more than software
vendors' ? Especially at such a big company like Google ?

~~~
acveilleux
They also don't want to say "DellappLenoHP" laptops could not be attacked and
turn out to be wrong. Or maybe they're right but only with factory 2GB modules
used between May '11 and July '13.

Way too many variables to make any claims that is ethically defensible.

~~~
userbinator
They could specify the detailed system configuration with the CPU, chipset,
and DRAM part numbers (including date codes) so others can compare. It's much
better than leaving things in the dark completely.

~~~
thomasdullien
Remember that there is a github repo with code you can use to test your
specific Hardware. Why not run it & post results?

------
yuhong
memtest86 etc should add tests for this if they didn't already, as this is the
best place for such tests.

~~~
otakucode
If they did so... the fallout would be interesting. Does anyone know what
proportion of modern memory has this flaw? Would it result in tens of
thousands of customers returning stick after stick of DRAM until they were
able to get a reliable one?

~~~
tenfingers
memtest86 has this feature in beta, and it's already generating some heat.

I would be personally more interested in this test on memtest86+ though.

~~~
AceJohnny2
What's the difference between memtest86 and memtest86+?

OK, from WP [1]: "Memtest86 was developed by Chris Brady. After Memtest86
remained at v3.0 (2002 release) for two years, the Memtest86+ fork was created
by Samuel Demeulemeester to add support for newer CPUs and chipsets. As of
November 2013 the latest version of Memtest86+ is 5.01."

And the original has become a commercial program by PassMark. So I think at
this point if anyone is talking about memtest86, they're likely referring to
the still open-source '+' version.

[1]
[http://en.wikipedia.org/wiki/Memtest86](http://en.wikipedia.org/wiki/Memtest86)

~~~
hannob
There is a github repo with a rowhammer test based on memtest86+:
[https://github.com/CMU-SAFARI/rowhammer](https://github.com/CMU-
SAFARI/rowhammer)

------
p1mrx
On my desktop (DH87RL / i7-4770 / 2x8GB Crucial DDR3L-1600), rowhammer_test
reported errors after ~20 iterations (less than a minute).

I went into the BIOS and tried lowering the tREFI value from 6300 to 3150 (not
sure what the units are). So far, it's gone 1000 iterations with no problems
detected.

Edit: Actually, the units are probably multiples of the cycle time, just like
CAS latency. So, for DDR3-1600, that would mean 6300x1.25ns=7.8μs, and
3150x1.25ns=3.9μs

[http://en.wikipedia.org/wiki/CAS_latency](http://en.wikipedia.org/wiki/CAS_latency)

~~~
thomasdullien
Single-sided or double-sided hammering?

~~~
p1mrx
I used rowhammer_test.cc, which I think is single-sided.

------
zokier
Surprised that the mitigations section did not mention ECC RAM. Wouldn't it be
effective mitigation?

~~~
lelf
Not necessary, see the original paper.

 _For example, SECDED (single error-correction, double error- detection) can
correct only a single-bit error within a 64-bit word. If a word contains two
victims, however, SECDED cannot correct the resulting double-bit error. And
for three or more victims, SECDED cannot even detect the multi-bit er- ror,
leading to silent data corruption._

Edit: link [http://users.ece.cmu.edu/~yoonguk/papers/kim-
isca14.pdf](http://users.ece.cmu.edu/~yoonguk/papers/kim-isca14.pdf)

~~~
acveilleux
Technically, SECDED cannot _reliably_ detect errors involving more then 3 bits
since they might generate a valid code, they might not however and in that
case they might be detected as single or double bit error or possible
something else.

~~~
yuhong
Also the typical reaction to an uncorrectable ECC error is to halt the system
with a NMI.

~~~
makomk
Yeah, ECC is going to make exploiting this reliably a lot harder - you'd need
to flip three or more bits in the right combination, without first hitting a
combination of bits that'd be detected as an uncorrectable error. Google's
report suggests they haven't even been able to cause uncorrectable two-bit
errors yet, let alone undetectable three-bit ones.

------
Kenji
Now someone has to come up with a JavaScript version of this exploit and the
disaster is complete.

~~~
yuhong
More difficult since you can't execute CLFLUSH from there.

~~~
makomk
Maybe, though as they say it'd potentially be possible to cause a cache spill
and attack it that way. I was looking at the associativity of various CPU
caches with a vague eye to trying this in JavaScript a few days back and in
theory it shouldn't take many reads to evict a cache line, so long as they're
from the right addresses.

------
ymra
Would reducing the speed memory is clocked at prevent this?

~~~
thrownaway2424
Yes, it would, as would overvoltage, and reducing the refresh interval. The
latter reduces memory subsystem performance, however.

~~~
rasz_pl
row access counters in memory controller would solve this problem - too many
accesses between refresh cycles -> force refresh cycle for that particular
row/potentially affected rows

------
randomdevlpr
My first gen Toshiba Chromebook ran the test 130 minutes without an error.

------
0x0
Does anyone know if Macbooks are known to be affected?

~~~
aselzer
Seems like my Macbook Air 2014 is not affected (with a high probability)

here's the test: [https://github.com/google/rowhammer-
test](https://github.com/google/rowhammer-test)

~~~
tux3
Thanks for the link.

I haven't seen anything after 375 iterations (600s). So I may still be
exploitable, but that means you'd have to keep something running at 100% CPU
for > 600s and somehow have me not notice the laptop fans going crazy.

~~~
Dylan16807
An exploit tool could always run slower and hide from that.

Also consider that it might work better when your laptop is in lower power
mode because of reduced voltages.

------
upofadown
Is a memory error actually an exploit? If so then are the unwanted changes
that occur with no deliberate action an example of the computer cracking
itself?

Philosophical...

~~~
Dylan16807
Everything is a memory error on some level.

Back to grounding in reality, a way to reliably[1] break security measures is
an exploit. Cosmic ray bit flips are anything but reliable.

[1]The threshold of reliability being somewhere below "instant and always" and
somewhere above "one in a million if you give it a day to try".

