Hacker News new | past | comments | ask | show | jobs | submit login
ECCploit is the first Rowhammer attack to defeat error-correcting code (arstechnica.com)
129 points by rbanffy on Nov 22, 2018 | hide | past | favorite | 43 comments

Rowhammer is simply defective RAM marketed as usable. If your RAM will corrupt itself with certain access patterns, it is defective and should be replaced.

The manufacturers obviously don't want to, but the only way to stop this stupidity is to reject/return/refuse the product as defective if it shows this vulnerability. There's been a lot of efforts to downplay it, to the point that even some memory testing tools have made the tests for RH optional. This is ridiculous not just from a security standpoint, but overall correctness. Memory that just doesn't behave like memory should, is not fit for purpose.

Unfortunately RAM defects are often very subtle --- I remember a particularly irritating one which happened only when extracting a certain ZIP file; all the memory testing tools said the RAM was fine even with a few days of continuous running, and the file extracted correctly on a handful of other systems, but on this one it would always end up corrupt. Attaching a debugger or otherwise attempting to trace the cause naturally made it disappear. It was only swapping the RAM with a new module that fixed it.

As a follow on from this, I recently went to a talk by the author and asked whether certain brands of RAM was more vulnerable than others. They said that the newer the brand (and so newer the production process), the more bits are likely to flip, but even more established brands would still have some vulnerable bits.

So it seems more of an issue with a change in acceptable tolerances - what is fine for normal usage might not be secure. Also, I might have been mistaken but the author's response implied that no brand of RAM was not vulnerable.

Hmmm . . . some bright spark in marketing will realize there's an opportunity in "secure DRAM" now . . . :-/

I get the sentiment around this, especially as there's no way to reasonably get ram that isn't susceptible to these failure patterns.

But, with that said, memory is extremely low margin (dram is a commodity market) and is priced part based on yield. Manufacturers adding rounds of QA likely to fail many chips, pushing yield down and promoting process changes is extremely expensive. I also want to pressure manufacturers to make more reliable, correct product, but that has a cost. Even if a law was passed that all commodity dram had to be RH-hard, price would jump. In any other scenario, RH-hard dram would be a speciality need and priced accordingly.

And even ECC dram has tolerances and potential for silent corrupting failure. Most aren't perfect.

especially as there's no way to reasonably get ram that isn't susceptible to these failure patterns.

If I remember correctly from the original Rowhammer paper, anything from ~2009 or earlier is not affected. Price for RAM was not particularly high in those times either, and I'd be willing to pay the same (inflation-adjusted) amount today if it meant I didn't have to worry about these problems.

In any other scenario, RH-hard dram would be a speciality need and priced accordingly.

It shouldn't be, because the access pattern could show up incidentally in other ways and cause problems like silent corruption.

Is it simple? From what I understand it is still not clear how Rowhammer is physically possible. If you don't know how it's possible, then current manufacturers may not know how to create good memory. We don't get everything in physics yet and it's affecting our ability to make good memory chips that are secure.

One possible explanation of how it is possible, the parasitic coupling effect: https://github.com/google/rowhammer-test/blob/master/docs/re...

It is entirely clear how and why it happens.

It is a trade-off of density and coupling: the closer you get, the higher the parasitics. But if you stay far apart, you have low density (expensive). You also likely need different caching, error correction, and refresh settings.

The concept of parasitic write induced loss is really old in memory. Happens in NAND flash too. It is perhaps surprising that attacks like rowhammer never happened before. Or maybe we never knew?

> Rowhammer is simply defective RAM marketed as usable

Huh? Rowhammer is an access pattern that may reveal certain defects in RAM.

I wonder to what extent the encrypted memory enclaves in modern CPUs can mitigate these kinds of attacks.

If the data stored in the physical memory has been encrypted using a randomly generated secret key by the memory controller on the CPU, it should be impossible to generate an exact target data pattern in the victim even when the aggressor and victim are in the same enclave.

That in itself isn't sufficient to guard against all attacks, because the memory enclaves don't provide data integrity, only encryption. So if all you're trying to do is to change a victim's boolean that controls whether you have e.g. root access, then changing that boolean from false (0) to true (any non-zero value) is going to succeed with high probability despite the encryption.

Still, maybe there's an angle here that can help put an end to Rowhammer once and for all in a few hardware generations?

On the subject of RAM encryption, there's even an kernel modification that (ab)uses the x86 debug registers to store are AES key and transparently encrypts / decrypts RAM; Tresor.

> That [encrypted memory] in itself isn't sufficient to guard against all attacks

You might want to read about authenticated encryption [2]. The authenticity of the decrypted data often lies in the block cipher mode (for example OBS) rather than the block primitive, where Tresor uses the AES-NI instruction. I would imagine SGX does something along the same lines.

[1] https://www.usenix.org/legacy/event/sec11/tech/full_papers/M...

[2] https://en.wikipedia.org/wiki/Authenticated_encryption

TRESOR doesn't have anything to do with encrypting RAM - it's just a tweak to existing disk encryption methods which stores the encryption key in a debug register, rather than RAM. This gives resistance against attackers trying to break your disk encryption by trying to read your key directly from RAM.

Yes, indeed. Thanks for correcting that.

On SGX, I believe the general model is that the CPU doesn’t trust the physical memory backing the enclave at all. But I suspect that the CPU will detect the attack and disable SGX entirely, systemwide, until a reboot.

Enclaves do have integrity protection (at least SGX does).

Do you have details on that? The only references I see to integrity on Intel's own overview site are for data at rest (i.e. stored on disk or in network transit etc.). This is easy to achieve and plausible with cryptographic signatures, and both AMD and Intel have some form of this.

The reason I'm doubting that there's integrity protection for data while in RAM is that it's information-theoretically impossible to do that without memory overhead.

This doubt seems to be confirmed by the descriptions of the ELDB/ELDU/EWB instructions: they perform cryptographic authentication while copying pages to and from the enclave. There don't seem to be any integrity checks while data is live in the enclave -- and that's what matters for Rowhammer protection.

There is a memory encryption engine on each CPU that transparently encrypts and integrity protects each cache line as it flows to and from main memory. The integrity protection info is stored in a tree -like structure in part of one of the on CPU caches. The storage overhead for this data structure is one of the reasons the maximum enclave size is currently limited to about 100MB, although rumour has it that limitation will be removed in future generations. In saying all that, I would be very interested to hear ideas you might have on how Rowhammer could bypass SGX :)

There's a ton of overhead. SGX uses encrypted ram, if that helps your mental state

(previous post: https://news.ycombinator.com/item?id=18503795)

> Fortunately, while the attack would be extremely difficult to prevent, it also looks to be very difficult to actually pull off in the wild. (...) the VU Amsterdam team said a successful attack in a noisy system can take as long as a week. (from https://www.theregister.co.uk/2018/11/21/rowhammer_ecc_serve...)

Well, if you don't know that you are under attack, taking a week isn't exactly an advantage. And if the attack can be divided among many agents, even if not in parallel, can make it harder to find out that you're under attack.

This attack depends on using many 1-bit errors to characterize the physical memory address. In other words, it defeats ECC if your system ignores errors. A lot of systems do ignore errors, but if you don't you can easily detect this attack before it's ready to strike.

If someone has gotten this far with relatively ordinary tools and published the results, someone else probably has an exploit based on it but is not talking about it. You have to assume that the PLA's Third Department and NSA are working hard on it. Maybe the guys in St. Petersburg, although they haven't demonstrated that level of capability so far.

How does this fare vs 2-bit ECC(e.g. Chipkill)? Article only mentions 1-bit ECC.

They talk about it in the paper. It doesn't really make things more difficult, it just affects which bits need to be flipped.

I always assumed that ECC on these high-end systems was done in hardware by the memory controller. And thus there would be no timing side-channel. Is that really not the case?

Hardware implementation does not mean that there is no timing side-channel. Compute ECC takes few cycles. To avoid timing side-channel, you must design hardware for this purpose.

Yeh I realised afterwards that at these high clock speeds maybe they do need some extra cycles to do the correction. I don't see why you must do it in hardware to avoid timing side-channels. You just need to provide a constant latency i.e. The cpu ucode does some nops when correction is not necessary.

I mean, as long as your ECC is built directly into the silicon

In that case you need to do the exact opposite i.e. avoid the obvious optimisation of returning data early if no correction is required.

If you can flip one bit, then it isn't a big leap to flip multiple bits. No one who knew what they were talking about should have believed that ECC was a protection against rowhammer-style attacks.

Leave it to Goodin to overyhype a non-practical attack. I wish this guy would retire. He's brought that overhyping clickbait style of reporting from the register to ars technica and I can't stand it anymore.

Rowhammer has been around for four years, yet is hasnever been seen in practice. Ever. Can we stop with reporters passing basic academic research as a current threat. It's just research pr0n, sci-fi hacking. It's not even remotely closely to being a threat.

FPGA developer here.

In a former project (with a custom board and FPGA, no processor), we encountered random bugs. We put many checks and finally came to the conclusion that our DDR modules were flipping bits randomly, but only on normal load. All test benches were running fine. Putting the modules in a PC did not show any problem.

How can you tell your boss "all DDR modules are faulty but run fine in a PC" and not seem crazy?

It was when I read about rowhammer attacks that I made the link. Changing the addressing schema completely solved the issue.

All these hardware related failures/attacks may not be a threat (yet), but for me, the underlying sand castle we are building things on is very worrying.

>Changing the addressing schema completely solved the issue.

Can you elaborate at all? Do you mean you changing the FPGA to not write to exactly the same spots over and over? And just set up something up similar to wear leveling?

It was very simple, I kept all the "logical" addresses the same, but swapped some low address bits with high address bits at the memory controller interface.

With this, the slowly moving indexes became big jumps. This avoids reading the same row over and over again (mixed with other requests).

> This avoids reading the same row over and over again

Didn't you notice a huge drop in performance producing row-buffer conflicts over and over ?

Good question. Jumping around the memory is clearly not good for throughput. Before the change, the mixing of requests between processes was already detrimental for performance anyway. All I can say is that we needed a fixed bandwidth and it was enough, before or after the change.

Sure you didn't just have a timing problem, such as the all-too-common "Hmm, that wasn't a false path after all" variety?

When I read about rowhammer, I did analyse carefully the memory request pattern and found many similarities to a rowhammer attack on PC. It was several processes issuing small requests on alternating rows, with slowly moving indexes.

After I implemented the change and had a stable system, I went as far as implementing a switch of addressing pattern so that, all other things being equal, I could trigger the bug or not. And it worked as expected.

So yes, i am very confident in the analysis.

There was a practical attack to gain root on android via it wasn’t there?


There was at least a TA dumper for Sony Xperia models based on Dirtycow that allegedly worked for earlier Android N builds...

You mean the attack that required you to have malware already on your phone (RAMpage)? Or the one that was mitigated via Chrome years back (Drammer)?

Cause both are impractical. Using a minutes-long RAMpage attack to root an already compromised device is impractical. There are easier ways to do that. And it's easier to update Chrome than the Android OS, for Drammer, so that's been mitigated ages ago. Nobody uses a three-year-old Chrome on Android. Get real ... and spare me the comment about how some users don't update apps, cause they do, Google makes sure to spam users about it daily.

To be fair, it's as easy to turn off the daily notices as it is to turn off the automatic updates. And, as updates seem to break various things at least as often as they improve things it's nothing unusual about users turning off one or both of the features.

(Note that I'm not commenting on the attack vector at all, only about the Android update settings)

It's not just easy but I have some friends who dom't update their (low end) phones because they're afraid that the updates will slow the system down.

If you have read the article you would have known that Dan took a very measured approach and didn't over-hype anything. The headline might have led you into reaching your conclusion, and I don't blame you.

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact