Hacker News new | past | comments | ask | show | jobs | submit login
Google says Rowhammer attacks are gaining range as RAM is getting denser (therecord.media)
275 points by valprop1 on May 28, 2021 | hide | past | favorite | 120 comments



But while there are no known cases where Rowhammer attacks have been used in the real world

Not on purpose but I'm sure that either it or phenomenon like it are the causes of a lot of odd "glitchy" behaviour that people encounter, because systems are run so close to their limits that there are bound to be cases when they surpass the limits. I remember many years ago discovering that a system which passed the CPU and memory stress tests of the time 100% (ran many days), but would very reliably corrupt a particular .zip file's contents upon extraction; and no other that I could see. Turning down the FSB by 1MHz(!) was enough to make it stable again.

But I remain convinced that Rowhammer is a fundamental defect and all RAM which is susceptible to it should be recalled and replaced. It really says something about the industry when they've managed to convince memory testing tools to treat RH tests as "optional" and "not a real concern, most RAM will show errors" when the discovery first came to light. 26 years ago, Intel offered to recall and replace processors that couldn't divide, after an initial period of reluctance, but only once the bad publicity started. Can RAM manufacturers be coerced into doing the same, and perhaps even go back to pre-Rowhammer process sizes? Sacrificing correctness should never be an option.

(This post made from a 10+-year-old machine containing RAM that is perfectly free of Rowhammer.)


The problem isn't RowHammer, the problem is lack of ECC. This is mostly Intel's fault, as they cripple their consumer CPU lines to disable ECC for market segmentation purposes.

The entire premise that we can store tens of gigabytes of information reliably in tiny silicon capacitors without any error detection or correction whatsoever is ludicrous. Every other storage technology uses advanced error correction - NAND Flash, HDDs, optical media, etc. We've been doing this for decades. Even floppies at least had error detection (checksums). All high speed transmission protocols also do error detection and re-transmission, except HDMI (because I guess nobody cares about the odd corrupted pixel). Same for anything going over radio.

Once you have ECC, your error rate margin goes up by many orders of magnitude. Worst case these attacks turn into a DoS as ECC fails to correct a badly corrupted word, and most of the time it will just work. Plus you can detect attack attempts as the memory controller will report increased error rates.

It's not possible to design modern RAM that is perfectly reliable. That's just physics. We know how to solve this problem. We just aren't doing it because Intel thinks consumers don't deserve reliable RAM. The only other option is doing ECC on-chip in the RAM itself, and that has other efficiency trade-offs which probably make it not worth it, at least not with DDR style interfaces (it might've made sense in an FB-DIMM style world where the memory controller is in the RAM)


Not here to defend Intel (their decision to remove ECC support from X299 chipset pretty much tanked their own HEDT platform), but I often wonder how much of the current disregard of ECC is the fault of the system integrators.

As of now they would gladly ship configurations with single channel memory and under-par cooling to save a few bucks, while these cut corners could hurt performance by as much as 30% due to insufficient memory bandwidth and thermal throttling.

Even if ECC is available everywhere I doubt any OEM will use it without charging their users a premium. This is kind of what has been going on with AMD MSDT platforms: ECC support is in the CPU but many motherboard does not even implement the necessary memory traces to enable it, and software support is even messier to the point that most people can't be bothered with it.


This is a self-fulfilling issue; since Intel does not support ECC, ECC is not mainstream, so mainstream platforms don't care about ECC. It would cost motherboard manufacturers ~nothing to support this properly, so if it were accepted as a thing they would, and it would eventually be a requirement for newer RAM technologies. ECC should've been mandatory since about the DDR2 era, practically speaking. That it isn't is a collective failure of the computer industry, and Intel takes a big part of the blame.

At least I had no issues getting ECC to work on my Threadripper box, but sure, that's HEDT.


Maybe while both ECC and non-ECC are both being sold, but that would be a temporary situation. The goal would be to have non-ECC memory be effectively deprecated, so the next revision of the spec enforces ECC as a requirement.


Doing ECC on-die (as in DDR5) does solve issues related to the storage itself, and the access timing of the array, but still leaves data vulnerable to corruption on the memory bus itself. Memory is indeed the only place were we are moving and storing data and it is commonplace to use zero layers of data protection; everywhere else we have at least one, often multiple, checksums and storage always has had forward error correction for a very long time, regardless of medium. But of course, generally speaking, all data processed by a computer moves through the main memory.


> The problem isn't RowHammer, the problem is lack of ECC. This is mostly Intel's fault, as they cripple their consumer CPU lines to disable ECC for market segmentation purposes.

Whilst Intel did play a part, when they offered it as standard the uptake in the consumer market was low and the price factor and need at most consumer level was not there.

Today it is there, though the consumer mindset is unaware of it and I do foresee a time in which either some in-the-wild exploit that makes media traction and they roll with it event. Then there will be a sudden panic rush with media rhetoric akin to the run-up to Y2K and drama headlines how everybody needs to buy a new computer. Now if that happened - Intel on the desktop would sure take a wack as it is currently.

Then the whole supply and demand aspect that factors into the cost and alas - until there is some mass consumer awareness.

Well, until then for the majority consumers, computer memory is just computer memory and for most of those, they don't even look into things like memory speed or dual-channel memory even. Which see's your DELL's et all selling systems with one large stick of RAM even today as your average consumer just don't know any better. So the move to ECC for consumers will need en event or large player marketing it. Maybe a balance of the two.

Just the aspect of size of memory today in computers, let alone density and speeds, bit errors become far more likely and that is without the security aspects at play. So for stability alone ECC is becoming more and more justifiable.


Please, The ONLY time Intel offered ECC as standard on its products was during brief 486 period when all chipsets the offered were considered highend and their chipset market share was in single digits.

ECC was standard on first IBM PC 5150, on PS/2 line, on pretty much all 286 clones etc.

Intel market segmentation shenanigans started around 1995 with Pentium line, with ECC absent from all consumer oriented product lines.


True, the whole dip into RAMBUS kinda saw them shift things and when they came back to less niched out RAM, they split things business and consumer.

however, been some odd glimmer of hope for the cheaper end for ECC suport from Intel - https://www.truenas.com/community/threads/surprise-surprise-...

What really needs to happen is all these AMD Ryzen users to buy into ECC, help get the price down and more so, push Intel in another tickbox catchup in that market.


Intel cut ECC from its consumer chipsets almost 10 years before Rambus.


I thought ECC wasn’t effective protection against Rowhammer? Agreed that it is incredibly useful otherwise. https://www.vusec.net/projects/eccploit/


ECC helps a lot, and combined with memory encryption (which is another thing we should be doing) it effectively mitigates the issue, as then you can't target specific bits any more, so your chances of getting past ECC without causing hard errors are negligible.


ECC is at best a temporary workaround, and at worst an implicit approval of the creation of defective products. It's purpose is to prevent transient bitflips due to cosmic rays and such, not consistently reproducible errors from what is otherwise perfectly normal operation.

Plus you can detect attack attempts as the memory controller will report increased error rates.

Now you've turned specific access patterns into "attack attempts", and by discriminating against them as such, entirely destroyed a fundamental part of general-purpose computing. What's next, CPUs that are only guaranteed to do certain operations correctly, and everything else is considered an "attack"? That is NOT a direction that we should be heading. Hell fucking no!!!

RAM should always hold what was last written to it, under all conditions of software accesses. To expect or imply that anything less is acceptable, is to destroy one of the foundations on which general-purpose computing is built.

NAND Flash

Where manufacturers are trying to sell 2-4x the capacity with 1/4 to 1/16th of the endurance? That's a rant for another thread...

We know how to solve this problem.

This problem didn't exist 15 years ago. It only got created because the RAM companies have forsaken correctness, and somehow managed to convince everyone else that it's not their own fault.


> Now you've turned specific access patterns into "attack attempts",

We already have error correction detection available to us and it's in use today.

> RAM should always hold what was last written to it, under all conditions of software accesses.

OK? That's what anyone who is advocating for ECC is already advocating for. ECC is error correcting up to 1 bit and error detecting up to 2 bits. The goal is to ensure that the property you've described holds.

> It only got created because the RAM companies have forsaken correctness,

This problem exists because... physics. Do you actually have a design for RAM that can match today's cell density while somehow providing perfect isolation? Saying this problem didn't exist 15 years ago is like saying that malware didn't impact the abacus - RAM 15 years ago was designed completely differently, and had considerably lower capabilities in terms of storage and latency.

And for reference, hamming codes (which back ECC) were created in the 1950s for error correction in punch cards. This is not a new problem.

ECC is the existing, obvious solution to this problem, it just wasn't built for this threat - but it's an excellent, proven method for correcting bit flips. Even without being designed to prevent rowhammer it already makes the attack much slower, less likely to succeed, and easier to detect (just monitor for a massive spike in ecc corrections).

Trying to design high density RAM that maintains perfect isolation seems pointless. Improving ECC or ECC-like approaches is likely going to be much more effective.


> RAM 15 years ago was designed completely differently, and had considerably lower capabilities in terms of storage and latency.

You got that backwards, chief. RAM latencies have increased tremendously over the last 15 years.


Actual latency has improved, but clock-relative latency hasn't. So what?


No, actual latencies have been climbing since ddr2. Thanks for the downvote though


This article from Crucial probably explains the two arguments: https://www.crucial.com/articles/about-memory/difference-bet... But their conclusion is that practical latency has indeed decreased over the last 15 years.


After looking at the table, I am ready to swallow my pride. Looks like practical latencies have changed a tiny bit. So me and the person I replied to originally were both wrong


You were wrong about the memory latency¹ increasing, but the memory latency² has increased (substantially) for most systems, while the memory latency³ has indeed decreased. In general, the memory latency¹ has not improved much. Of course, the memory latency⁴ has greatly increased, due to the clock frequency being increased so much to enable the higher bandwidth, while the memory latency¹ stayed mostly the same.

[1] as in: access latency of the DRAM [2] as in: how long does a CPU memory read which is _not_ cached take [3] as in: how long does a CPU memory read on average take [4] as in: CL


Also note that practical non-high-capacity DDR4 setups reach 3600 CL16. That's way faster than the 3200 CL22 they gave at the bottom.


No, I was right. Not that it matters, feel free to replace latency with bandwidth. Or replace it with nothing and focus on capacity. It's a blip on my point, which is that the architecture has changed considerably to allow for progress in other areas.


> I'm sure that either it or phenomenon like it are the causes of a lot of odd "glitchy" behaviour that people encounter, because systems are run so close to their limits that there are bound to be cases when they surpass the limits.

I don't know about that. Rowhammer really emphasizes the hammer. A normal read pattern will basically never hit the lines in a small area that much, it will hit cache instead.

> Now you've turned specific access patterns into "attack attempts", and by discriminating against them as such, entirely destroyed a fundamental part of general-purpose computing.

Can you show me any code that causes this kind of access pattern that wasn't deliberately designed to implement rowhammer?

> Where manufacturers are trying to sell 2-4x the capacity with 1/4 to 1/16th of the endurance?

We could go the other way too and glue together multiple SLC cells to increase endurance and cost. Are you claiming there is a specific correct amount of endurance? Because "as much as possible" isn't feasible and "as much as SLC used to have" could only be correct from an extreme coincidence or from conflating nostalgia with correctness. Is there a different number you have in mind?

> This problem didn't exist 15 years ago. It only got created because the RAM companies have forsaken correctness, and somehow managed to convince everyone else that it's not their own fault.

Old ram had errors too. RAM companies can and should put in mechanisms to prevent rowhammer, but if I was buying a stick of ram and had a choice between ECC and perfect rowhammer prevention I'd choose ECC.


What if the choice was between a normal stick of ram and a row hammer mitigated stick with double the CAS timing and that cost 50% more? I guess my point is, considering how fundamental to the design and physics of ram, it is unlikely any product designed to mitigate the problem would be equally performant or economic, at least in a medium timeframe.


I'm much more optimistic. Use target row refresh but do it right. Very low silicon cost, low performance cost under malicious loads, zero performance cost under non-malicious loads.


> ECC is at best a temporary workaround, and at worst an implicit approval of the creation of defective products. It's purpose is to prevent transient bitflips due to cosmic rays and such, not consistently reproducible errors from what is otherwise perfectly normal operation.

No, ECC is there to correct any errors, including normal errors that occur during operation. It allows us to build technologies that have a much higher raw channel error rate, and decrease the system error rate to better than before. In return, we get much higher density, performance, or whatever other metric you want to optimize for.

This is basic information theory. It is extremely inefficient to attempt to build a channel with a low enough raw bit error rate to be usable. Instead we let the bit error rate rise, then correct the errors with error correction codes. This reduces the bandwidth, proportional to error rate for advanced error correction, which is still a negligible amount compared to the total channel bandwidth - the performance improvements from allowing the raw error rate to rise more than make up for this, and result in orders of magnitude better performance.

> Now you've turned specific access patterns into "attack attempts"

Ever heard of files you can't burn on CD-R, or transmit through certain variants of Ethernet? We use technologies that are probabilistic all the time; they assume data is not correlated with a pseudorandom internal characteristic of some sort. These files only exist because someone explicitly targeted them; the chances of someone stumbling onto such data randomly are basically nonexistent (for any well designed system).

That said, this can be mitigated without throwing ECC out the window.

> RAM should always hold what was last written to it, under all conditions of software accesses.

And it will, after ECC. Why are you drawing an arbitrary line for RAM and saying it can't use ECC? Are you also against HDDs (LDPC error correction), Wi-Fi (convolutional codes), NAND storage of all kinds (increasingly advanced ECC), USB (error detection and re-transmission), TCP (error detection and re-transmission), the internet as a whole (flow control by dropping packets), CDs (three layers of error correction for data!), Bluetooth (FEC and/or ARQ), Digital TV (ATSC, DVB, ISDB all use FEC of some kind), QR codes (reed-solomon), and even things like RAID6? The people who designed these systems aren't dumb, they know the only way to get performance and reliability is to use error correction.

It's a damn miracle we get away with non-ECC RAM, today. That stuff should've died 20 years ago, and ECC should be standard. Even the caches inside CPUs often have ECC these days.

> Where manufacturers are trying to sell 2-4x the capacity with 1/4 to 1/16th of the endurance? That's a rant for another thread...

Yes, manufacturers having a race for the bottom to achieve the highest capacity at the expense of all reliability and performance metrics is a completely different problem that doesn't mean we need to throw ECC out the window, which would also hurt all of those metrics. Optimizing for endurance and performance still requires ECC.

The solution is simple: don't buy QLC flash. That stuff's garbage, at least today.

Or would you prefer to go back to the era of 16MB CompactFlash cards (that's MB) using SLC NOR Flash? That stuff doesn't need ECC, certainly, and the endurance is amazing. Let me know how the cost per gigabyte and write performance metrics work out in your NOR world, though :-)

Or perhaps you'd rather have us store all of our data on 16KB EEPROM chips, with even better endurance! I'm sure nobody will mind the 1KB/s write performance (if you're lucky). Maybe if you RAID a few thousand of those it'll work! Oh wait, then random errors will kill your data anyway, without something like RAID-6. Oops!

> This problem didn't exist 15 years ago.

15 years ago the average computer had 256MB of RAM. I had one of those. And I had a bad RAM bit back then, which I had to mask out with a kernel patch (that I had to port from x86 to amd64, as I was an early adopter of 64-bit), which would've been a non-issue with ECC RAM.


Ever heard of files you can't burn on CD-R

I know about weak sectors. Only shitty (i.e. defective by design) drives wouldn't be able to handle them, while others would do it just fine.

Why are you drawing an arbitrary line for RAM and saying it can't use ECC?

I'm not saying that ECC is not a good idea. I'm saying that ECC errors in RAM occurring simply because of specific access patterns is far from "normal operation". In fact, if you use ECC RAM and see a lot of errors, it should be considered a sign of failing hardware. In other words, advocating for ECC in this scenario is deflecting the blame and compensating for a defective product.

The solution is simple: don't buy QLC flash. That stuff's garbage, at least today

"QLC" and "TLC" are complete misnomers developed specifically to hide the truth. It's not 4 or 3 "levels", but 16 and 8, respectively. 16LC and 8LC are the technically correct terms.

Corporate greed has somehow resulted in everyone parroting the dominant industry position and using Intel and ECC as a scapegoat. That just shows how entrenched their propaganda is.


> "QLC" and "TLC" are complete misnomers developed specifically to hide the truth.

I know what those terms mean. I find it amusing that you'd think I don't, given I just rattled off the ECC types used by a dozen technologies and mentioned 3 different kinds of nonvolatile charge trap memory.


Thanks for posting this. I think there isn't much knowledge about ecc among typical customers. I wish at some point ecc would be the norm. The cost doesn't seem high but for any professional PC usage it seems it's necessary.


Chrome gave me a segfault crash message I've never seen before by simply visiting youtube this morning.

I thought for a moment what the most likely explanation was, given Chrome has surely seen a boatload of stability testing. It's a sandbox viewing a website. A segfault?!

I wonder if it's cosmic rays on my non ECC RAM or some accidental rowhammer moment.


Rowhammer does not really happen in real life scenarios, because if you access the same block of memory multiple times it usually gets cached in L1, L2 or L3 cache. You really have to try to induce such a magnetic field in RAM. The things you're referring to are probably from other sources of random bit flips, such as cosmic radiation.


ECC Ram would mitigate some these impacts at least. But yes, it's ridiculous.

Some memory sticks I buy get corrupted bits at rated speeds for my memory heavy workloads.

Crucial (micron) would always accept the warranty claim, but Corsair has tried rejecting multiple claims saying their memory is designed for gaming.

Now I decided to finally pay the xeon tax and go ecc memory. Yes, I know AM4 had "ecc support", but even on asrock motherboards it doesn't necessarily work.


> ot on purpose but I'm sure that either it or phenomenon like it are the causes of a lot of odd "glitchy" behaviour that people encounter,

Transient bugs are some of the worst bugs out there. Erlangs approach of crashing + restarting seems to make sense after all. (I used to think that such cases would not exist at all, but have been proven wrong time and time again. It also can make sense for cases such as dropped connection)


> no known cases

Is there any way to get a sense for how meaningful this is? Like, is it a pretty sure bet that no attack has happened? Or, if an attack has happened, it's likely been targeted state-sponsored? Or, is it every time you deploy on a multi-tenant host (e.g., ec2), you're playing Russian roulette?


Cloud hosting (ec2) will use ECC ram on the baremetal hosts, which make this already impractical exploit basically impossible.



Is there any evidence of Rawhammer being used in a successful attack in the wild?


According to the article, no.

What I want to know is if this works on ECC memory. I'm guessing not, which makes the "vulnerability" even more of a non-issue in mission-critical applications that likely moved to ECC a while ago.


Yes, Rowhammer can bypass ECC. Forgot to include this in the article, mainly because there's so much Rowhammer research.

See here: https://www.vusec.net/projects/eccploit/


It's really worth noting that ECC does impact Rowhammer effectiveness, even if it is not enough to prevent the attack 100% of the time.


But as part of this it'll also have a high chance of triggering a system shutdown due to ECC mismatch, right? So in most cases it can't be exploited for things other than DoS.


ECC won't necessarily shut the system down as it can actually repair single bit errors, and mismatches can be monitored for as well. But your point stands - for an attacker to do damage they'll likely end up flipping bits in unintended ways first.


Can Rowhammer bypass ECC and not be detected by an hw_event_mc_err_type? I don't think so. Why would someone have ECC without a sufficiently sophisticated driver?


> Can Rowhammer bypass ECC and not be detected by an hw_event_mc_err_type?

Afaik, yes it can (unless you're counting HW_EVENT_ERR_CORRECTED). They specifically try to get 1 or 3 bit flips, never 2.

See here: https://www.vusec.net/projects/eccploit/

(yes, that's the same link)


> Can Rowhammer bypass ECC and not be detected by an hw_event_mc_err_type?

It's definitely possible in theory. You'd need four bit flips rather than three, so you'd probably need more time between accesses to the victim row, but thats a quantitative improvement at best. This can be mitigated by using different ECC bit encodings per memory location[0], so hammered data, with correct ECC for its row, always has wrong ECC values for the adjacent rows, but I don't think anyone does that.

0: This is important in order to make fake ECC memory, which uses a (cheap) combinatoric circuit in place of a (more expensive) ninth DRAM chip, not work, so it should be happening even without Rowhammer, but AFAIK it isn't.


We should be using memory encryption with random per-boot keys to prevent cold boot attacks, which would also solve the issue. Then the software doesn't know how its data maps to the encrypted data at the RAM. Rows may map 1:1, but you wouldn't know which bits you're targeting.


Good point. Although memory encryption is much more useful if applied at the CPU rather than the memory controler (which would need a way to turn encryption off for DMA regions, since the memory controller doesn't know the key), there's no actual reason why you couldn't do it in the memory controller, if you trust all the hardware with DMA access, or (more plausibly) trust the memory controller to limit that access properly.


> Why would someone have ECC without a sufficiently sophisticated driver?

Have you, personally, tested your servers to make sure the driver correctly handles bit errors? Can't say I have.


Last time I tested such things, they handled ECC correctly (x86 and arm).


Apparently it does but I haven't tested it myself


Wasn’t there a rowhammer based website for rooting android phones in the past?


Good question. what would the evidence be? memory errors? i wonder how easy to detect these would be


Is there any evidence it's not just a matter of time before there is?


Rowhammer, Spectre, etc. are all very high-information attacks which strike me as not worth the effort for run of the mill adversaries. Three-letter agencies, however, I suspect might have played around with them - if a cloud vendor is secure, and they need a way to un-secure it, they have the resources to get microarchitectural researchers sworn to secrecy to make these attacks work.


TBH I think right now it's more of an incentive thing. Consider that:

a) Attackers would have to develop the POC. This will require a somewhat different skillset than typical exploitation, so there's effort involved.

b) Rowhammer is best for privilege escalation. For desktop users (where the most money is) privesc is already pretty trivial - no need for fancy exploits, you can pretty much just ask for root or use a much more straightforward exploit.

c) Exploit devs don't typically like a lot of attention. You don't want to be the first person selling rowhammer exploits unless you can charge out the ass for it, because you're gonna get way more attention for it.

These incentives aren't super technical, they're mostly market driven, and it's why we don't see fancier attacks in practice, even if they're practical. As one attacker I know was saying to me, they wish they had an excuse to do the fancy thing, but it's never worth the investment when you can own boxes way more easily. Another hacker said "I don't want a Krebs article on me because I did something clever".

NSA's incentives are not super difference, except they're not as directly financial. They also have crazy resources to fuck around with this stuff, so it would make sense that they'd demo it in a lab. Would they actually deploy it? Eh, doubtful. Against another advanced defender rowhammer should be pretty detectable by monitoring ECC metrics, and again, why would you do something fancy when you can just buy 0days - the USG purchases hundreds a year from private companies and develops more themselves.


Rowhammer has been known of for over 5 years.


You don't need to have a practical attack for something to be a credible threat that needs to be addressed in a multi-tenant system (like say, cloud providers).


Just roll out ECC everywhere. It's not like the circuit density is going to go down again.


Though ECC is a must in DDR5 spec. We still need a physical version of ASLR to mitigate this. Ideally, we can make Rowhammer even not practical for Denial of Service. For example, running a full-speed attack for an hour only has a 0.01% chance to flip a bit.


That's on-die error correction, DDR5 does not require ECC on the bus.


What is the typical range of mean time to attack? Seconds?

Is it fair to model the attack as random coin flips (i.e., following a geometric distribution)?


So it sounds like security related flags need to be more than a single bit, and should probably be stored in different variables


I imagine the silent downvotes are because that’s trying to solve the problem at the wrong layer.

For your suggestion to work, you’d need to also duplicate the logic that checks the redundant bits, etc, etc.

If you really want to do this, you end up doing what satellites do to deal with cosmic rays:

Ship three identical computers. If state diverges, majority rules. See page 9 (Radiation-Effects Mitigation and Hardness) for an FPGA example:

https://www.xilinx.com/support/documentation/white_papers/wp...


In other words, blame Intel for trying to pass off ECC as a "Enterprise Feature" instead of the basic necessity that it is.


ECC is vulnerable to Rowhammer.


I saw this article linked later. https://www.vusec.net/projects/eccploit/

It's interesting! It seems fixable, based on information later on in the article ("Can I get DDR3 DIMMs that are Rowhammer-free?"), and ECC only seems to be part of a solution, and not a complete solution.

But you're still right, and I was still wrong: ECC alone isn't good enough.


I mean the punchline is: if you heat something up enough then you can get electrons to wiggle. At some level of cell wall thickness, the cost/time/annoyance of triggering a rowhammer exceeds the value of the attack, and other methods become cheaper or more practical.

Ultimately, nothing that an attacker has physical access to can be completely secured, we can only raise the cost in terms of time and money to attempt to breach the system. Even a system with tamper-destructive enclosures have seen attacks (it's just more expensive and difficult than other attacks).

In short, the more annoying/expensive you can make it to attack your system, the smaller the set of attackers becomes.


We are not talking about physical attacks. Neither are we talking about attacks based purely on heating up the DRAM chips. The topic here is a remote attacker who can target specific words or bits to be flipped.


When bits are flipped in a row hammer attack, what is happening to the dram? Most people would argue the energetic potential is increasing causing a failure of the boundary between cells...


Much less so.


Row hammer is already super impractical. ECC does make the exploit harder.


Equally vulnerable?


The same way that masks don't prevent COVID.


Right in statistically mostly it does / they do.


No, statistically it’s almost useless to prevent yourself from getting COVID. It’s mainly about reducing your ability to spread it.

That’s why people get mad when you don’t wear a mask even if you “don’t care about getting covid”.


Yes that was a prominent theory at one time and helped a lot of the public adopt masks but it is actually both https://www.npr.org/sections/health-shots/2020/11/11/9339038...


Do you think many Ryzen PCs use ECC? I doubt that. It's accepted truth among gamers and power users, that ECC is waste of budget. I don't share this position, but if you'd ask on some computer forums, that's what you'll hear. If ECC would be enabled on all Intel CPUs, nothing fundamentally would change, most users would prefer to save 10% on their RAM.


> It's accepted truth among gamers and power users, that ECC is waste of budget.

A waste of budget at what price?

Are they actually looking at $80 of ram becoming $90, or are they looking at it becoming $160 and slower? Which is about what I see when I go check newegg.

I don't think you can extrapolate current decisions to a world where adding ECC is 12% of the RAM budget and 1-2% of the total budget.


The price of ECC is artificially inflated because it's not available to consumers.


At minimum ECC will cost you an extra 12.5% for the extra bit of RAM. That part is not inflated.


Yeah I wasn't trying to imply it would cost nothing to add ECC.


I don't use it on mine, but my Ryzen PC is my gaming box.. Where the impact from such things is very limited.

But ECC RAM is a lot more than 10% more expensive. This is part of the problem. Intel pushes it into a high-end niche which puts it in a much more expensive category, and it also loses economy of scale.


Not sure about PCs, but my Ryzen laptop (HP EliteBook 835 G6) seems to use the AMD technology for memory encryption, with per-boot keys. This is not ECC, but also good enough to counteract RowHammer, because the attacker cannot target specific bits.


I use it on my Ryzen PC, but I am also running ZFS, and I'm a little paranoid about using my data.

This is one anecdote, though, so it's probably not useful. But people using ECC on Ryzen do exist.


Agreed. Rowhammer is just one more example of this, but it's frustrating that ECC is not widely deployed.


DRAM manufacturers continue (knowingly at this point) to manufacture faulty products, and we should blame Intel?


Yes.

Intel is an industry leader. EFI, Thunderbolt, and the "ultrabook" product category are all their ideas. By adding a feature to their CPU products, they induce demand for anything that complements it.

By putting ECC support into their highest-end mobile CPUs only, they made them into high-end luxuries instead of industry standard. https://ark.intel.com/content/www/us/en/ark/search/featurefi...


As others have mentioned, ECC is nothing more than a bandaid for what is ultimately a defective product.


I don't think that's true at all. ECC wasn't designed specifically with this attack in mind, and it's still a pretty effective mitigation.

There's nothing defective about RAM, it's never going to be the case that errors won't happen, it's just that before we were worried about heat and radiation and now we have to worry about attackers too.


The Ultrabook brand was Intel's idea but I'm pretty sure it was created to make sure that the Wintel ecosystem could stay competitive with the MacBook Air.


That doesn’t explain Apple and the M1 though.


I am not familiar with DRAM spec sheets, but are manufacturers specifying that there will be zero errors?

Without a specification that says so, I don't think it's necessarily the fault of the manufacturer if they cannot build perfect RAM!

Suppose someone builds a car with one these computers in a safety-critical role, and then someone gets injured because of an error that "originated" with the RAM.


They specify timing and when it is followed, RAM should work without fault.

But if there are corner cases like this, they should be added to specs. Most likely it would require memory controller to remember last addresses and insert delays if rowhammer attempt is detected. And/or make CPU microoperation scheduler avoid it. No idea how expensive would that be, surely nontrivial.


There are already measures like that where the memory controller attempts to identify "victim rows" and refresh them, but attacks can fool the victim row detection and succeed anyway.


If my car's spec doesn't say "the wheels stay on" and then the wheels fall off, the car is still defective.


The true safety-critical stuff is far from ordinary and would never use commodity DRAM.


They could include ECC but it wouldn't work on most systems so why would they bother?

At least on AMD it works these days.


How many times more expensive would DRAM that is immune to rowhammer cost?


Not an expert, but I feel like the answer is not much. ECC RAM is not, to my knowledge, considerably more expensive to produce - the reason it costs more is because it only works with specific devices, so it's more of a market cost.

ECC RAM is pretty effective as-is against rowhammer, and imo you could very likely tweak ECC's approach to get even further.

The 'worst case' solution wouldn't be more than 2x, right? Since you'd just have to keep a clone of RAM, but obviously shuffled so that the same errors wouldn't occur in the same places. So 2x seems like the sort of absurd upper bound, but I feel like we could do a lot better.


My understanding is that rows in memory chips have thousands of bits in them. If you add 10-20 more to act as a refresh counter, then you can trigger extra refreshes if it gets high enough. Done right, that should give you immunity to rowhammer, while only costing a fraction of a percent in extra die space. And you won't be able to trick it like you can hash-table-based refresh counters.


Mid 2008 or so I think was when rowhammer-prone DRAM started appearing, so consider the density of that DRAM compared to today.


3


I'm just amused that the article ends with the quote: "the challenge is substantial and the ramifications are industry-wide." Heheh. RAMifications.


Article title is slightly misleading: by smaller they mean process size, increasing the range of the rowhammer attacks logically due to decreased distance between memory cells even though the physics limited distance is the same.


"Rowhammer attacks are gaining range as RAM is getting denser"?


Ok, let's try that above. Thanks!


I'm the author of the article.

With all due respect, but I will have to push back on your categorization as 'slightly misleading' here. Your explanation effectively explains the headline and is also what Google researchers said. How is the headline misleading?


I don't think it is misleading entirely, however my first reaction was you meant smaller in storage capacity or physical size of stick this interpretation is not uncommon and can cause some confusion


When I initially read the headline I thought for a second that it meant storage capacity is getting smaller. But then I realized that that doesn't make sense and it's referring to process size.


Not sure how you could have read it that way... storage capacity is clearly not getting smaller.


I had similar thoughts as the parent, then your point came to my mind, next thing I thought was perhaps not storage capacity but maybe smaller form factor of the stick itself.

Clearly I was wrong, but confusion can happen with just saying "smaller", many meanings are there for that word.


Not intentionally misleading, of course. But I too first misinterpreted "RAM is getting smaller" as RAM that has smaller storage, which is counter to experience. But that's the size dimension I confront in everyday life, not the physical dimensions of the chip. I knew I must not be getting it, but I didn't think of the physical size until I read the article.


I also read it that way and the paper does not use that phrase. I'm not saying intentionally misleading, but clearly some people were mislead.


Misleading may have been the wrong word, though I will say I qualified it with slightly.

The changed title is much more informative.


Slightly confusing is better verbiage. I had to think for a sec (I have had cache memory on the mind, which is "smaller" than main memory).


Definitely not misleading. Possibly easy to misunderstand? It did take me a second to realize what was meant.


As opposed to what other interpretation of the word small?


Small RAM: memory with (relatively) few bytes of storage.


[flagged]


In chrome you can open up the developer tools and select the element in the source code and then just delete it.


While hovering over the comment, press Alt+F4 on your keyboard.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: