
Researchers have significantly increased the scope of the Rowhammer threat - rayascott
https://www.wired.com/story/rowhammer-ecc-memory-data-hack/
======
userbinator
Attack or otherwise, this is ultimately a hardware reliability problem. Any
access pattern that can cause bit errors is indicative of faulty memory. If I
remember correctly, the original Rowhammer paper shows that RAM from ~2009 and
before was completely unaffected. Yet in the continuing quest for higher
densities and lower costs (is RAM not cheap enough already?) the manufacturers
are sacrificing reliability and correctness, and IMHO that is not acceptable,
nor is their insistence that this is not a problem (it seems they were
powerful enough to convince one well-known memory testing application to make
the RH test optional(!) and spread FUD that it wasn't really a concern if that
test failed, because a lot of RAM would fail it.) NO access pattern should
ever cause errors to occur on correctly functioning hardware.

~~~
firethief
Sometimes it's worth dealing with complex abstractions in a higher layer
rather than making the sacrifices necessary to implement a neat abstraction
natively.

The problem of corruption at the physical layer when certain types of bit
pattern occur is also encountered when transmitting data over a wire;
constraining the physical parameters to remain suitable for the naive
representation of binary works well for getting a signal across the PCB, but
would be extremely limiting at intranet scales. The usual approach is to
modulate the data in a way that avoids encoding the problematic bit patterns.

Are the tradeoffs necessary to maintain the simple abstraction worth it in
this case? I don't know, but considering how much of a bottleneck RAM has
become for modern hardware, I think it's worth considering the alternatives.

~~~
clhodapp
I think the problem here is that nothing at a higher layer is mitigating the
effect of the attack. It's not so much a choice to put the complexity where it
is cheapest as it is a total ball-drop on overall composite system correctness
(and thus security).

From a different angle: I think your point is fair but I also think that for
it to apply to this situation, the memory vendors would have needed to loudly
and openly say that they were invoking that tradeoff so the OS vendors could
adjust. Presumably that would also result in a lot of benchmarking being done
to see if the net effect of a physical-layer vulnerability and a software-
layer mitigation was actually a net positive.

------
kens
Surprisingly, Rowhammer-like memory problems go back to the early 1950s. Early
computers (such as Manchester Baby and the IBM 701) used electrostatic
Williams tubes as their main memory, storing data as dots and dashes on CRT
tubes. One problem with Williams tubes was that if you accessed a location on
the screen multiple times, the charge on a neighboring spot could be affected,
flipping the bit. (Of course back then this was a correctness issue, not a
security issue.) The quality of the tube was measured by the read-around
ratio, the number of times you could read a bit without corrupting the
neighbors. A good tube might have a read-around ratio of 50. Nobody missed
Williams tubes when they were replaced by core memory.

~~~
CamperBob2
_Of course back then this was a correctness issue, not a security issue_

It's still a correctness issue today, too. I don't understand why
manufacturers (and their customers) consider it OK to ship broken DRAM chips
that do not conform to their stated specifications.

Rowhammer isn't (just) a security issue to be worked around, it's a hardware
bug that needs to be fixed. As far as I can tell, it hasn't been.

~~~
TeMPOraL
> _I don 't understand why manufacturers (and their customers) consider it OK
> to ship broken DRAM chips that do not conform to their stated
> specifications._

Because they can, and sucks to be you. This is how things are everywhere. For
competitive markets, the only real quality pressure is regulatory and
contractual (and maybe reputational, sometimes). There needs to be a direct
feedback loop between the value end-customers care about and the profit of
producers/sellers for that value to matter.

As a random and interesting example of this phenomenon (really seen
everywhere), here's something I learned yesterday: according to Derek Lowe[0],
there's no graphene supplier anywhere that actually supplies you graphene, and
they all tend to lie about it. Apparently this is one of the big things that
holds graphene research back (and probably invalidates a bunch of papers).

\--

[0] -
[http://blogs.sciencemag.org/pipeline/archives/2018/10/11/gra...](http://blogs.sciencemag.org/pipeline/archives/2018/10/11/graphene-
you-dont-get-what-you-pay-for)

------
femto
Couldn't the error rate of the ECC system be monitored, to detect an attack in
progress and raise an alarm?

Even if the attacker was able to get the flipping completely reliable, there
would presumably be a learning/probing phase with a period of elevated ECC.
Either this probe could be detected, or the attacker would be forced to remain
below a threshold of detectability slowing the attack down enough to make it
impractical?

~~~
jacob019
In Linux you can generally check the number of ECC errors that have occurred
since boot.

/sys/devices/system/edac/mc/mc0/ce_count

------
carbocation
In brief, the authors show that ECC is also affected, not just non-ECC RAM.

~~~
crankylinuxuser
Can a software defense mechanism be implemented; say a checkbit per 7 bits
that emulate ecc?

Sure, that would reduce the total ram by 1/8 ... But that would be a design
choice to implement. Is ECC ram only 12.5% more expensive than non-ECC? If its
higher, it may indeed be more advantageous to use non-ECC -if- a software
compensation can be implemented.

~~~
dboreham
Databases often do this already (I'm more familiar with databases but I
suspect filesystems probably do too). The original motivation was to provide
some defense against bug reports along the lines of "your database ate my
data", that turned out to be due to 3rd party code inside the same process
crapping on memory, hardware errors etc.

These checksums are typically done on blocks of payload data of course, not
all memory content.

~~~
blattimwind
> I suspect filesystems probably do too

Most don't :)

------
kibwen
I wonder if having a separate stick of RAM exclusively dedicated to
kernelspace would provide any mitigation against privelige escalation via
rowhammer. Are we considering a future where every "ring" is literally a
separate set of CPU, RAM, etc in order to stymie side channels, or is that
just too crazy?

~~~
paulryanrogers
If kernel space were relatively small this might be practical as a motherboard
feature, possibly soldered in place. Though I doubt it'll become standard
unless there are no other alternatives since it seems like a very specialized
solution.

~~~
Insequent
The kernel's memory usage is typically pretty small, unless you're considering
the page cache to be part of it.

Although: I once investigated a soft freeze on a realtime-patched Linux system
that turned out to be caused by a vendor's software somehow managing to
indefinitely stall an RCU grace period, eventually consuming all available
memory on the system. The kernel core dump being over 4GB in size was a bit of
a give-away.

------
mettamage
Haven't read the full article, but if I remember correctly in order for
ECCploit to work you do need to reverse the ECC function of a memory
controller first.

Also for people who just want the link of the academic article (including
abstract):

[https://cs.vu.nl/~lcr220/ecc/ecc-rh-paper-eccploit-press-
pre...](https://cs.vu.nl/~lcr220/ecc/ecc-rh-paper-eccploit-press-preprint.pdf)

------
mirimir
This is certainly a serious threat.

However, it's my understanding that exploits depend on running code (including
JavaScript) on the target system (or in a sandbox or VM). Is that true?

~~~
Eridrus
There has been one paper recently that showed a rowhammer exploit over the
network against a key-value server:
[https://www.usenix.org/conference/atc18/presentation/tatar](https://www.usenix.org/conference/atc18/presentation/tatar)

I haven't read the paper, so I don't know how reliably they can do it in a
real world setting where they are not the only people interacting with the
server, but they demonstrate that it's possible.

~~~
mirimir
Impressive.

But isn't a key-value server perilously close to a database prompt? And this
exploit depends on having authenticated access, right? Otherwise something
like fail2ban would prevent hammering, I'd think.

~~~
rocqua
Unless your ban works either on the network, or at the firmware of your
network card, the packet data is going to reach kernel memory.

It then seems somewhat plausible that such packet data could effect faulty
RAM.

~~~
mirimir
Good point. Thanks.

------
justaj
If I'm not mistaken this attack is negated by DDR4 RAM, is that correct?

~~~
saati
No. It's negated by DDR2, but good luck getting any new DDR2 hardware.

~~~
userbinator
Early DDR3 is unaffected too.

~~~
kingosticks
Why is that?

------
ccnafr
Leave it to Wired to blow a theoretical attack out of proportion

~~~
itsnotlupus
What did they exaggerate?

They mentioned the attack can work with roughly a week worth of unprivileged
runtime, as long as the ECC mode of the ram chips in the targeted system has
been previously sufficiently reverse engineered.

Is that too alarmist? To me, it sounds like something perhaps too cumbersome
for casual drive by attacks, but it seems right down the alley of so called
"persistent threats", or whatever it is we call those guys nowadays.

~~~
ngcc_hk
Agree very much especially in cloud and iot. Later you have much less
protection already and physical (in some case) easy to get to.

