Hacker News new | past | comments | ask | show | jobs | submit login
Blacksmith – Rowhammer bit flips on all DRAM devices today despite mitigations (ethz.ch)
662 points by buran77 69 days ago | hide | past | favorite | 309 comments



OK, for starters, ECC has to become standard.

Then the rate of ECC errors has to be monitored. If something is trying a rowhammer attack, it's going to cause unwanted bit flips which the ECC will correct. Normally, the ECC error rate is very low. Under attack, it should go up. So an attack should be noticeable. You might get some false alarms, but that just means it's time to replace memory.


While ECC memory is probably important and probably better than nothing, if there's one thing we've learned about Rowhammer it's that the obvious mitigations that "should" stop or detect it often fail to a clever attacker.

Just the first thing that popped into my head, but: say you watch the ECC correctable error rate over time, and somehow (not so easy!) determine which process is causing those errors. You forcibly kill the process and log a message about it, and also terminate/notify processes potentially affected (say, send them a SIGBUS or something and unmap the pages containing the affected data).

I, a "clever" attacker, use this to leak out your secrets- I do my hammering juuust right so that, if some secret bit is 0, your hammering flips ECC bits, while if it's 1 your hammering doesn't affect things. Lovely little side-channel.

Universal memory encryption and authentication seems to be the only sure way out of the cycle of "attack, mitigation, attack the mitigation", and it's already starting to roll out.


> I, a "clever" attacker, use this to leak out your secrets- I do my hammering juuust right so that, if some secret bit is 0, your hammering flips ECC bits, while if it's 1 your hammering doesn't affect things. Lovely little side-channel.

Would ASLR make this harder? I assume it would because it'd be a lot harder to get to the correct memory location (?).


ASLR adds nothing to attacks that rely on the physical location of pages since those area already allocated independently of the virtual address space.


Noob question: With DDR5's option for in-chip ECC, if Animats' suggestion of monitoring ECC anomalies is implemented on a control unit on the DIMM module, then will that make the attack impossible?


When I was a CS student, a little more than a decade ago, we learnt what ECC was and how it worked. Interestingly, I remember that we were taught that ECC was and should be everywhere.

From physical storage to data transfer protocols, we were said you really wouldn’t bypass ECC anywhere if you wanted things to work reliably against the hard physical world. It was like a basic CS rule.

And still, there we are, still arguing that some reliability is needed into objects as critical as DRAM chips.


Maybe this is common knowledge, but can ECC be done in software? I don’t necessarily see why not, despite the common-sense wisdom that it has to be done in hardware. With dram so cheap these days, why not just go full RAID 1 (where ECC is analogous to a RAID-5 implementation), and store each critical-path memory byte twice to detect bit flips or three times to fix? Let’s see row hammer attacks flip the same bit in two or three places at the same time. Sure it would impact performance, but maybe for something like encryption or banking it’s an acceptable trade off.


In some safety related contexts you do have bitflip protection on the software level. E.g. you write every variable together with it's complement, to see if a random bit flip occurred.

It's quite a lot of work, but if you have a long running embedded system, or a lot of the same, e.g. in cars, the incidence of bit flips increases.

This is it course additional to other protection mechanisms.


In enterprise server memory controller firmware we use soft and hard row repair algorithms.


I'm a CS student as well, curious what course taught you about ECC? Would be interested in such a course as well, the only course I can imagine teaching this is an information theory course (which my university doesn't offer).


Luckily all DDR5 DIMMs will have on-chip ECC. My understanding is it's not a complete mitigation but does make exploitation harder.


> My understanding is it's not a complete mitigation but does make exploitation harder.

It won't. It's designed to counter silicon limitations to increased density, i.e. it's made to correct the errors that result from packing cells beyond the limit of error-free operation.

The extra redundancy from on-chip ECC is intended to be "consumed" by the chip itself, and since this will allow optimizing chip manufacture to denser and cheaper, it's no question at all that it will get pushed to the very limit.

There's still "classic" ECC for DDR5. 8 bits mapped to 9, terminated at the CPU which can look at things. That's what I want, need, and will buy.

P.S.: Shame on Intel for still walling off desktop CPUs from ECC. https://ark.intel.com/content/www/us/en/ark/search/featurefi...


> It won't. It's designed to counter silicon limitations to increased density, i.e. it's made to correct the errors that result from packing cells beyond the limit of error-free operation.

I'd love to see actual parameters for the error correction codes, but DDR5 could pretty easily be a lot more robust than DDR4.

When you have no error correction at all, you need ridiculously high reliability. Even if these new memory cells are have a much higher error rate, if they're designed to seamlessly handle a few bits in the same row flipping then the overall reliability could skyrocket.

Edit: Oh, there's a paper from micron talking about DDR5 only having single bit correction internally. That's not as useful as it could be against attacks...

> There's still "classic" ECC for DDR5. 8 bits mapped to 9

But Single Error Correction, Dual Error Detection is not enough to prevent attacks.

Also because DDR5 uses a smaller width you actually need to map 8 bits to 10.


> I'd love to see actual parameters for the error correction codes

Well, the spec is $369, but a PDF with some early discussion (rev 0.1…) says:

DDR5 devices will implement internal Single Error Correction (SEC) ECC to improve the data integrity within the DRAM. The DRAM will use 128 data bits to compute the ECC code of 8 ECC Check Bits.

However, I have no idea whether this is what got specified in the end. 8 ECC bits on 128 data bits would be half the amount of added redundancy compared to "classical" ECC. Note also it's not SECDED, just SEC, confirming the Micron paper you mentioned.

> But Single Error Correction, Dual Error Detection is not enough to prevent attacks.

The OP suggested monitoring for a sudden increase in ECC events, which hopefully this would work for. It's not a perfect countermeasure, just a statistical one, but I'll take what I can get...

> Also because DDR5 uses a smaller width you actually need to map 8 bits to 10.

Now that kinda makes it better, doesn't it? :)


> Note also it's not SECDED, just SEC, confirming the Micron paper you mentioned.

Which is a shame because it would only cost 1 more bit to do SECDED.

> Now that kinda makes it better, doesn't it? :)

Very slightly, but 32->40 isn't much better than 64->72.

SECDED requires 39. I don't know if the 40th bit is used for anything? It would certainly be possible to use the 16 extra bits per burst to add a second layer of SECDED. Or an extra layer of triple error detection, if I'm remembering right. That would be really good against rowhammer.


> SECDED requires 39. I don't know if the 40th bit is used for anything?

FWIW this is completely up to the CPU. "Classical" ECC is completely transparent to the DIMM, it just remembers and returns what the CPU gave to it.

32 → 40 has already been around for quite some time (particularly when the DDR controller only has 64 data lines but you still want ECC you can sometimes do 32+8 at half performance & max capacity) but I don't see anything beyond SECDED in the datasheets... (looked through some Freescale/NXP PowerPC bits)


> FWIW this is completely up to the CPU. "Classical" ECC is completely transparent to the DIMM, it just remembers and returns what the CPU gave to it.

True, but I guess I assumed normal ECC was standardized. Is it?

> 32 → 40 has already been around for quite some time (particularly when the DDR controller only has 64 data lines but you still want ECC

Is that keeping a burst length of 8 on DDR3/DDR4? One thing to consider is that a full memory transfer plus 16 extra bits is much more useful than a smaller transfer with 8 extra bits. That could easily be a tipping point from "not worth bothering" to "they really should do something here".


> True, but I guess I assumed normal ECC was standardized. Is it?

This is outside my area of expertise, but I believe everyone just does the same thing without it being a standard. (Especially since even CPU vendors frequently just buy a done-and-tested DDR controller design.)

> Is that keeping a burst length of 8 on DDR3/DDR4?

All the "classical" ECC just uses additional data lines, and I assumed DDR5 would do the same thing. I hadn't checked until a minute ago, but yes, DDR5 (like previous versions) just extends the width of the data bus by the ECC bits.


> All the "classical" ECC just uses additional data lines

I know that. But DDR5, at least under normal use, increases the number of bits sent across each data line in a single memory access. So now you have 16 spare bits across the entire thing, instead of just 8. This means that even if you had 32->40 ECC before, the value you could extract from the extra bits goes up a lot with DDR5.


Oh, sorry, I misunderstood. I don't think any implementation does that, since it would require holding back the entire burst in a buffer in the DDR controller. Which would significantly impact performance through the added latency.

But honestly I don't know — and also this seems like if some vendor wanted to take (or not take) the performance hit for extra safety, they could well deviate from everyone else.


To be 100% safe, it would add latency to part of the reads. Though not much compared to the normal speed of a memory access.

To be 99% safe you could send a signal a couple nanoseconds late that says "wait, this data is bad, you should abort".

It also might make writes a bit more awkward.


In theory 8 bits on 128 is way better than 2 per 16 a it allows to use a better error correction code. This all due to Shannon theorem on noisy channels.

The problem is that this assumes that a probability of flipping individual bit is independent from others. But this may not be the case. And if so rowhummer is possible.


That’s not exactly correct whilst it does there mainly to allow for higher densities and frequencies it’s designed to prevent bit flips from happening on chip.

It’s not end to end ECC as in it doesn’t prevent flips that happen on the bus or in CPU cache but it does prevent single bit errors on DRAM.


> it’s designed to prevent bit flips from happening on chip.

Bit flips that are getting more and more common as the RAM cells are getting tinier and tinier, the stored charges ever smaller and smaller, and thus susceptible to flipping…


I think on-chip ECC would mitigate this problem just as well as off-chip ECC. Off-chip ECC is meant to catch errors during transmission (i.e. 72 bits transmitted for 64 bit words), not necessarily just the ones that occur internal to the package.

I agree it's meant to counter limitations due to increased density, but it should catch this to an extent also as this error is induced on-package right, not during transmission. Or am I mistaken?


I think the point the parent is making is that this ECC is already fixing errors. There's no redundancy because the redundancy is already consumed by the defective cells in the chip. Any additional flips such as through rowhammer have no extra redundancy to fall back on in the general case with DDR5's built-in ECC.


Ah i see. I didn’t view the errors as additive. Thanks for the clarification!


Yes, the article mentions it:

> What if I have ECC-capable DIMMs?

> Previous work showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder.


It sounds to me like ECC isn't being included in the DDR5 spec due to magnanimity so much as because it doesn't function without it. That ECC has become 'load-bearing'.

Does that mean we need an extended ECC to deal with critical systems that require additional robustness?


Who error checks the error checkers?


It's just a matter of time before someone finds a way to exploit the ECC part, calls it Hammerrow and brings us back to square one...


Rowhamming would be a better pun, as DDR5 uses a Hamming code for error correction.


> Luckily all DDR5 DIMMs will have on-chip ECC

Until someone starts building chips with fake ECC to boost profit.


From the article:

> What if I have ECC-capable DIMMs? > Previous work[1] showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder.

1. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8835222


This attack is great news for ECC. Now someone like Apple could launch a consumer product with “High Security Memory” and the rest of the industry can follow.


They already tried with G5. May be they can try again?


From the article: > We demonstrate that it is possible to trigger Rowhammer bit flips on all DRAM devices today despite deployed mitigations on commodity off-the-shelf systems with little effort.

The fact that user space code can cause bit flips in your RAM is a hardware bug. I'd love to see this code in memory testers like memtest86 so I could send the RAM back if it ever caught a problem like this.

I guess it shows just how close to the edge of not working our modern computing environment is.


The problem is right now DDR4 devices that work correctly in that regard, do not exist. Likely the best mitigation for any sensitive application, even so lightly, is DDR4 with ECC (even if it may not be enough, it is vastly better than nothing, and not just because of rowhammer)

And I have no idea if the internal "ECC" of standard DDR5 helps or not. It is not intended for regular ECC level of reliability anyway. (And I have seen discussion about likely bitflips detected in crash dumps of M1 Pro devices)

So, as much as I would like to return defective devices, I would probably be left with no computer, no smartphone, etc.


According to older paper [0] ECC can also be bypassed after reverse-engineering the mitigation in DDR3 DIMMs. Also:

> “DDR4 systems with ECC will likely be more exploitable, after reverse-engineering the ECC functions,” researchers Razavi and Jattke said

> What if I have ECC-capable DIMMs? Previous work [1] showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder.

[0] https://cs.vu.nl/~lcr220/ecc/ecc-rh-paper-eccploit-press-pre...

[1] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8835222


While it can not protect from bit flips it can most likely warn about attack in progress due to a large number of bit flips.


> The problem is right now DDR4 devices that work correctly in that regard, do not exist.

Yes, and this is actually the problem. Without safe hardware, it is almost impossible to write safe software.


Meanwhile you can still buy DDR3 --- used --- from various shops in China, and it'll be 100% perfect and Rowhammer-free because it was made before the insane process shrinkages that caused it.

(Have done this a few times. Was hesitant initially because of the low price and used nature, but when the first stick passed MemTest86+ including RH perfectly, I was convinced. Half a dozen more sticks later and still good. But maybe they've started to run out of the good old stuff now...)


> you can still buy DDR3 --- used --- from various shops in China, and it'll be 100% perfect and Rowhammer-free because it was made before the insane process shrinkages that caused it

That's demonstrably false. I remember rowhammer being thought of and demonstrated on DDR3 modules.

[0] https://hammertux.github.io/rowhammer-ffs-ddr3


Rowhammer started appearing in newer DDR3; see the original paper (figure 3): https://news.ycombinator.com/item?id=8713411

The tl;dr is that everything 2009 and older is good, 2010-2011 is possibly bad, 2012 and newer is almost certainly bad.


> Yes, and this is actually the problem. Without safe hardware, it is almost impossible to write safe software.

Software that attempts a hostile rowhammer attack aims to be unsafe by definition. It's not a matter of trying and failing to be safe. Rather, we're seeing that it's very difficult to build hardware and software that prevents hostile code from running amuck: if the attacker can run hostile code on your computer at all, you have already lost. That's bad news for cloud hosts and confirms what we already knew about web scripting, but doesn't seem all THAT bad for the rest of us.

Of course it's possible to build computers with no DRAM at all (i.e. only SRAM, which costs more). That is not economically sane for big servers, but might be the right thing to do for security sensitive components.


The sizes of current CPU caches is enough to run a lot of code…


Maybe a paranoid app could try to allocate the memory adjacent to any sensitive bits as a buffer? That would be pretty difficult to do but might be possible if you are tremendously paranoid and have a good way to examine the hardware.


You'd have to have the operating system's cooperation, because it may be mapping individual 4k pages all over the place, and it's the only thing that has a chance of knowing how the memory is laid out on the actual chips.


And the CPU and the chipset's cooperation. In many cases it is even a trade secret how data is striped across the different slots/banks.


<i> it is even a trade secret how data is striped across the different slots/banks </i>

If that is true, vs just the usual "you don't need to know" trash common these days, its crazy. A pretty good picture can be built up with just software timing analysis, but its hardly rocket science to hook a logic analyzer/scope up and determine bit swizzling and page/bank interleave. Plus, many of the more open BIOS vendors have tended to provide page/row/controller/socket interleave options for quite a while, partially because it can mean a 5-10% uplift for some applications to have it set one way or the other. Its been one of those how do you tune your memory system options for a couple decades now.


Doesn't the same thing to exploiting rowhammer though? If you're after specific data, how would you know what physical address to use, even if you knew the physical address of the target data?


Yes, but there is a difference between an attack being run by a low level hardware hacker vs. the software being run by average users. This is why I said it would be hard, you would need to encode a lot of low level hardware knowledge into your application.

This might only be useful in very specialized circumstances, like with kernel support on only a handful of carefully chosen hardware platforms. However, someone like Apple could do this on their systems, as they control both the kernel and the hardware. Sensitive memory locations could be cordoned off in special zones with buffering to prevent Rowhammer type attacks.


Or use different banks of DRAM for different security contexts, managed via NUMA?


I wonder how easy that would be to reason about though. I really don't know much about modern DRAM circuitry I feel like it might be abstracted away to some level, also there are bios-tweaky settings that might make a difference (i.e. things like channel configuration and/or bank interleaving, if that's still a thing.)

IDK though. Maybe there's a way. I've worked on code that uses padding around a data structure to ensure it has it's own cache line. Maybe if you allocate a large enough contiguous block you'll be OK?


If your app is particularly paranoid you could just double your memory usage, and verify errors against the clone.


Yep, though I agree with the parent, if a RAM returns data from a location that is different from what is stored in that location, assuming all recommended timings have been followed, then that RAM is defective. If that means they should be recommending a refresh after every single operation, then that's what it means.

In other words, the whole industry sits on a bed of lies at this point and it's only because government is technically incompetent that we haven't seen the world's biggest class-action.


> In other words, the whole industry sits on a bed of lies at this point and it's only because government is technically incompetent that we haven't seen the world's biggest class-action.

That's a pretty bold claim - if it's accurate, can you point at the government regulation that precludes a class action lawsuit? I would assume it's something in the T&C or EULA for the hardware rather than a regulation?


Bold but sadly correct. The industry runs on misdirection and deception.

If users knew better, we'd have the equivalent of another Pentium FDIV bug.


The whole industry built this bed of lies by their own hands, I don’t see how it’s government’s fault.


I think this illustrates the cloud is unacceptable for anything more than storage and retrieval.

All computed results from data science must include steps and code to verify locally.

It calls into question privacy on federated networks and crypto networks; any node can be manipulated locally to change payload outputs on delivery, reveal secrets, disrupt workloads.

This makes sense to a lot of folks in computer engineering and physics, versus abstract software. No physical theory I know of offers any guarantee our arbitrary computing machines will ever be securable. We put fart pipes on Hondas.

Science proves it’s titillating smoke and mirrors once again. Still waiting for nuclear rocket cars.

I think this proves further as well why general computing chips need to be replaced with workload specific designs, where the anticipated inputs are well known and no vague logic paths to intentionally allow software monkey patching ever ship.


Security doesn’t scale at a price point that private sector companies could typically afford.

Perhaps we fail at pricing security into the value of a company, or maybe that’s what risk appetite is about.


The problem is that you can get away with minimal security for a long time. Sure, if you get hit, shit hits the fan. But by then, it's quite likely that all competitors that spend money and time on security and good infrastructure are long gone.

This is worsened by the fact that it's very hard for laypeople to assess the security of a specific application and that, by now, "cyberattack" has become common enough that it's easily accepted as an excuse.


Which is why certifications, audits, and minimum mandated standards are critically important.

The market just yawns at this stuff, until it gets fragged. Then it forgets and the cycle repeats.


> Which is why certifications, audits, and minimum mandated standards are critically important.

Not sure about that. All the security standards want me to run software written in an unsafe language as root on every device, intentionally parsing malicious inputs continuously.

That’s not making anything safer.


Pretty clearly, the standards have to be effective and well-designed. And yes, there are problems with that.

But the point remains that markets do very poorly at rare and/or cumulative risks. And that's the comparison I'm making. The market of and by itself will give you a race to the bottom in standards.

A longer-term view, whether through government regulation and oversight, social suasion, religious morality and ethics, or (possibly) insurance-oriented risk management (yes, a market mechanism, though something of an exception to the rule), will typically operate by the mechanisms I've described above. That there may be poor implementations doesn't obviate the fact that there can also be good ones, and that that's the goal we're aiming for.


Probably, at least some part of a modern financial sector including startups has many things in common to pyramid schemes.


> I think this illustrates the cloud is unacceptable for anything more than storage and retrieval.

You can run in dedicated tenancy where you have the whole machine or a metal configuration where you also have the whole machine.


That would ruin providers' user:hardware ratios, one of the foundational principals of cloud computing.


On the contrary, certain CSPs did this already (quietly) and further, had already developed hardware mitigations for things like meltdown, spectre and rowhammer.


But did they manage to keep their prices low?

(And those who employed hardware mitigations wouldn't have a problem with user:hardware ratios, would they?)


> I'd love to see this code in memory testers like memtest86 so I could send the RAM back if it ever caught a problem like this.

Is this an individual stick bug, or a design bug akin to Spectre? Given that they tested 40 different sticks and seemed to find effective patterns for all of them make me think it's the latter. In which case, just return all your computers?


Yeah, this is a problem with high density DRAM. Fixes are possible but difficult and expensive. In the cutthroat DRAM market nobody seems willing to put in the engineering effort to develop a solution and then manufacture it. It would have to sell at a premium, in a market where it's difficult to even find ECC memory at times.

The article states that ECC isn't sufficient to solve Rowhammer, but it does make the attack harder. If you are looking for a hardware solution it is your only option currently, even if it is imperfect.


I imagine you are right that consumers don't care or know enough to put pressure on manufacturers, but what about big companies like the FANGs? Or Government agencies like NSA etc? What are they doing about it? I imagine intelligence agencies are well aware of the issue and have or are trying to get solutions to protect their own infra.


Off topic but shouldn’t FAANG really be referred to as MAAAN now?

Google -> Alphabet

Facebook -> Meta

Meta, Apple, Amazon, Alphabet, Netflix. MAAAN.


Google still exists. Facebook still exists. Meta/Alphabet are mainly holding companies. For example, I received reach outs from recruiters for Google and Facebook today. Not from Meta and Alphabet. Maybe once Meta and Alphabet deliver on things and build brand recognition, it should change. Until then, I'd vote "no".


Now I can add "I have never worked for the MAAAN" with pride on my business cards!

Every now and again I worry I've grown up. Then I realize there is no grown up, just a Matroyshka doll of more and more eclectic subtexts


MANGA

M - Meta A - Apple N - Netflix G - Google A - Amazon


I'm now adopting this, thanks


AMANAM

Apple Microsoft Amazon Netflix Alphabet Meta


If you're doing that: M3AN.


I like MAGMA more


Right, the idea was to swap Netflix for Microsoft, giving MAGMA or GAMMA:

https://news.ycombinator.com/item?id=29042753


If you detect flips you can send it back, it's violating the advertised interface.


Sure, but if it's possible to cause bit flips in basically all currently available components, then what's the point? Return all of your RAM and have a non-functional computer?


Can't get bit flips on my RAM if I have no RAM.


So you just.. Send back all of your RAM all the time because they are all susceptible to bitflips?

This is not a tenable solution.


It happened because at some point in the past (around 10 years ago I believe), the memory vendors decided that it was ok to design product with known "pattern sensitivity", because...I suppose they made more money that way and no customer complained loudly. Pattern sensitivity in memory chips is a very old problem, and previously was treated as a fatal design error, fixed and the affected chips used for parking lot infill. But today every chip sold is affected. Basically the same as Boeing and the MAX, just fewer people killed.


> Pattern sensitivity in memory chips is a very old problem, and previously was treated as a fatal design error

You seem to know more about this than I do. Where might one go to read more about this? Are there whitepapers, design docs, other resources which could be used to assert positively that the industry used to consider this a defect?


The problem is that apparently the authors of memory testing programs were somehow convinced to hide the severity of this issue when it first appeared, claiming things like it being a rare edge case and that too much RAM would fail.

I believe newer versions of MemTest86+ have Rowhammer tests but it's disabled by default.

See my previous comment on this matter:

https://news.ycombinator.com/item?id=12410274


MemTest86 already has a rowhammer test, not sure how it compares to blacksmith but some sticks I have do fail that test and can only be mitigated by setting tREFI extremely low (while also taking a large performance hit).

Most of the higher end hynix/micron/samsung sticks I've tried do not fail at JEDEC or XMP after 7+ passes.


Pretty sure the passmark version does https://www.memtest86.com/compare.html

Except that the distro's are still shipping an older version due to license issues (or something like that AFAIK).


You'll be sending back all of your RAM then.


Google is working on platforms to make this easier to explore this problem, see https://opensource.googleblog.com/2021/11/Open%20source%20DD...


> I'd love to see this code in memory testers like memtest86 so I could send the RAM back if it ever caught a problem like this.

How do you run a memory tester after you have returned all CPUs after Meltdown and all the subsequent microarchicture bugs? I'd call all modern CPUs faulty products. Some more than others.


If it's a problem with chips becoming higher density, I wonder if older hardware will become popular in circles that care more about security than performance per rack unit.


Eventually, it will be obvious that running shared workloads on a single piece of physical hardware has fundamentally unremediatable security implications. This slow recognition is deeply perpendicular with how the current landscape of x86 chips are both manufactured and priced, as well as how cloud providers have structured billions of dollars in DC investment; in other words, they'll down-play it. This will be a massive opportunity for ARM & SoC manufacturers in the coming years, as its far better positioned to offer, for example, a single rack appliance with 64 individual 2-core mini-computers at a price-point competitive with a 128 core x86 rack, as one computer.

Computing moves in cycles:

- 2000s: gigahertz race on each core

- 2010s: increase core counts, multicore everything

- 2020s: back to core-efficiency and increasing per-core performance. M1 is already leading this charge, but is obviously a mismatch for a DC environment.

AMD and Intel need to adjust, or face extinction. Its not just about pushing ultra-high per-core performance (they're both good at this); its about pushing for more efficiency, so per-blade density in a DC can be pushed higher in the face of more, smaller individual computers. If they don't evolve, AWS will for them [1].

[1] https://aws.amazon.com/ec2/graviton/


"We have never shared two threads on a core between EC2 instances" - https://www.youtube.com/watch?v=kQ4H6XO-iao&t=2485s

Interesting that AWS has been mitigating for side channel attacks since before they became a big news item. Curious about Azure and GCP's stance on this


Maybe they were super smart and foresaw side-channel being such a big problem.

Or, maybe, they just thought the lack of deterministic performance created billing/accounting/customer service problems. (One hyperthread can just about completely starve in many circumstances).


Microsoft Research looked into it - Paper is from 2020 and is reference 24 in the document mentioned in the main post here.

"Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers"

https://arxiv.org/pdf/2003.04498.pdf

Although their answer in this paper was diplomatic, my interpretation is that they confirm it as a problem. Their conclusion was it would not be as bad it was considered at the time. To be revisited on the context of this more recent work.

Edit: Adding main reference

"BLACKSMITH: Scalable Rowhammering in the Frequency Domain" https://comsec.ethz.ch/wp-content/files/blacksmith_sp22.pdf


Well, you pick EC2 because you want dedicated cores.

You pick a VPS because you want to save costs and share cores.

So it’s not so much AWS choosing as you the customer choosing.


I was under the impression that AWS provides vCPUs, not dedicated CPUs unless you go bare-metal?


You can get dedicated tenancy instances in AWS, for an upfront monthly cost of ~$1.2k (per account) and about +10% on top of your normal EC2 instance costs.

I ran the numbers back in 2015 and persuaded the previous employer to go for dedicated tenancy with all performance-critical and privacy-sensitive workloads. I was effectively hedging against unknown but practically guaranteed cross-VM attacks to pacify a paranoid regulator.

Then Rowhammer happened. Less than a day later, our contact with the regulator comes to me asking how it affects us. Being able to answer - with absolute confidence - that it did not, was one of the proudest moments of my career. And the turnaround from "regulator comes asking awkward questions" to "regulator is happy and sees no reason to ask again" of less than 48 hours must be some kind of record too.


You get dedicated time on a core while your task is running. For instance t3.medium is 2 vCPUs (because hyperthreading) but as you can see in [0] it's only one physical core.

[0]: https://aws.amazon.com/ec2/physicalcores/


> Maybe they were super smart and foresaw side-channel being such a big problem.

The looming thread of side-channel attacks on SMT systems has been known since well, before we had SMT systems (because it also can apply to Co-Processor, and non SMT multi core systems).

The difference between back then and today is "we believe it's possible but haven't found a way yet" and "there are multiple known ways", as well as it being wide spread known instead of just in some communities.

The reason we still shipped the problematic CPU's is because improvement in perf. and as such competitiveness and revenue on the short term where more important.

There also was a shift in what people expect from security and which attack vectors are relevant. For example user applications a user installed where generally trusted as much as the user. While today we increasingly move to not trusting any applications even if they are installed by a trusted user and produced by a trusted third party. Similar running arbitrary untrusted native code from multiple untrusted users and "upholding" side-channel protection wasn't often an important requirement in the past.


> The difference between back then and today is "we believe it's possible but haven't found a way yet" and "there are multiple known ways", as well as it being wide spread known instead of just in some communities.

I think the big difference is that we have attacks with high bandwidths of exfiltrated information-- instead of very slow, difficult to control leaks.


They offer vCPUs in multiples of 2 so it makes logical sense to divide the resources that way; performance would be a lot more unpredictable if you could be sharing a single core with another EC2 user/instance.


Viable sidechannel attacks on SMP/Hyperthreaded designs have been known about since 2005, only a few years after Intel brought their first SMP designs to market.


That's ok unless you are running something virtual like kubernetes on top of the EC2 instance but want to ensure isolation between containers/pods.


That’s insecure today anyway. Containers are not a security boundary. (If you don’t use a VMM like gVisor, Firecracker or go the Drawbridge way)


> Eventually, it will be obvious that running shared workloads on a single piece of physical hardware has fundamentally unremediatable security implications.

Sure, but this is worse than that. This is "your online poker game client can gets access to your web browser's bank account session info."

We need process isolation within a single machine, or else we are kinda screwed as a field.


IMO this is a perspective from software engineering.

But this is an electrical problem. Interference is a huge issue with any engineering that involves physical things and these kinds of attacks are just interference problems. This issue is no different from a microwave knocking out your Wi-Fi. These attacks have become possible because the acceptable interference threshold that chip makers have been using has turned out to be too low.

How do you fix interference problems? First, you choose a new threshold of acceptable interference and then you engineer better isolation, you lower density, and/or you switch technology.

We could make shared computing complete safe tomorrow if we wanted to so I think calling this the end to shared computing is quite alarmist. The issue is that we collectively want to both have the cake and eat it too: we currently have a certain cost-to-compute ratio that we have become accustomed to and we don't want to compromise that. We're basically buying time until we can invent a new technology that can achieve the same density without the same level of interference.


In a world where insecure high density exists, secure low density is at odds with cloud computing: the cloud makes sense only if you can efficiently utilize you computing power round the clock and make it cheaper than shipping mostly idle terminals. If securing the cloud is expensive, then there's a cutoff point where it's better to ship highly dense, cheap terminals.

So maybe "screwed as a field" is not an exaggeration if the field is butt computing.


"Lower density" is a really hard sell at the moment...


Right, we may be entering an era in which secure network computing is impractical and the impact could be very far reaching.


I've been experimenting with isolating work activity from personal activity. It's amazing how difficult it is to prevent information from leaking between networks and applications. It's hard to find alternative solutions for entrenched convergence/convenience features like copy/paste, messaging and entertainment. Working remotely reveals too much about your personal and work relationships to corporations and VPNs only help to connect more dots. I can't curl up in a hammock with 3 laptops and a phone on a nice day, so I keep returning to a single device that does it all.


Sorry, entering? Just like climate change some of us have been warning that a lack of focus on and willingness to challenge the fundamentals of building software and systems would lead to non-securable computing in the general case. That warning has been sounding since the 1990s. Nobody cares. It took a meat plant and pipeline paying a ransom in cryptocurrency for everyone to notice that we are completely and irredeemably fucked as a computing species. We are already there, my guy, and it’s only a matter of time now.

Think about someone trying to do basic ETL. Like having a tabular file and summing it or something. Don’t use Excel, we say, stand up a $4 million Spark and AWS architecture with seven hundred pitfalls that can let bored Russians take over your whole network as if they were going to the dry cleaners because remember, you just might be Google someday. That’s where we are. It’s been a complete industry failure for a decade and it’s only getting worse. Accelerating, even: now you need some operationally-terrifying Kubernetes to even be at the table, and then as an industry we (rightly) say running this stuff ourselves is too hard, so pay Amazon to do it rather than even ponder if we have settled on the right approach.

Tada: Humanity just lost computing to three companies. We very likely aren’t getting it back.

There are probably 5,000 people doing this work who can adequately secure such a system and make it mostly impermeable. Where middle computing is royally screwed is that nearly all of them work in San Francisco or its clones abroad. So then you get “best practice” blog posts and industry think pieces and the lowest bidder ties them together into something resembling a competent computing system. That’s been the state of the art since 2004 everywhere except Santa Clara county.

With the exception of some areas in the IC and DoD, I just described the entirety of US government IT. That ETL example? It’s actually real and underpins a small part of Medicare across several government contracts. Because the tools the valley exports are all they’ve got, and we sure love building systems with massive footguns, and then shaming organizations publicly for missing item #543 on the tribal “secure your computing system” checklist and shooting themselves.

The entire industry must change, top to bottom, but just like climate change, again, that’s a nonstarter. Posix and the Web are not the path forward and I hope I live to see the industry figure that out. I’m increasingly skeptical. The good news is my hometown might flood into the sea first, sparing me from considering in my last moments that every argument I’ve ever made in this profession has fallen on deaf ears and that everyone has to derive our industry’s peril from first principles for themselves.


Right, on average industry has been failing forever. The difference now is it might not be practical for anyone to actually secure an internet server or web browser, full stop. I think that is a fundamentally different situation.


I think we have been in that situation since everybody started mimicking how Google does things


I like your writing style.


I don't see how Graviton/custom ARM chips are evidence of this predicted trend. ARM chips tend to have even higher thread counts and poorer per-core performance. The biggest security difference is the absence of SMT/hyper-threading.

I think it will come down to what you are willing to call 'individual' processors. But actually having physically distinct memory seems like a lot of overhead for attacks that won't matter for 90% of users. Also I would think that the on-board ECC of DDR5 would protect it against these types of attacks.


Graviton is not a prediction of the trend; its a signal that Amazon is willing to make very deep investment into custom, customer-facing hardware if Intel/AMD can't deliver what they need.

The trend is yet to come. My statement is that, if AMD/Intel doesn't adapt, Amazon has the hardware investment to leave them behind, just like Apple did.

But to be clear on two points: They will probably adapt. And Amazon/etc will probably never leave them behind fully. DCs, especially public cloud, are not all-or-nothing like Apple's Mac Lineup is.

Then the question follows, why would they want something Intel/AMD isn't offering right now? The trend is System-on-Chip. Beyond Security (this isn't the last electrical interference/speculative execution-like attack we'll see). SoCs are easier to service (easier != cheaper. holistic replacement versus per-component debugging. servers are cattle, not pets). Denser. More vertically integrated. Capable of far higher IO performance. Lots of benefits.

Mega-servers with 256 cores and 4 terabytes of memory still have a huge place in all DCs; but not when multiple untrusted workloads are running simultaneously. They're not for EC2/Fargate/Lambda/etc; they're for S3. Highly managed, trusted workloads.


IBM Z series silicon (z architecture and it's predecessors s390,...) Is designed with multi tenancy in mind from the get go. Finding a way to escape virtualization let alone partitioning to access confidential competitor data was a no go.

And indeed to my understanding spectre, meltdown, rowhammer and similar attacks are not an issue there.

https://en.wikipedia.org/wiki/Z/Architecture

I wonder when more features from the mainframe cross pollinate Intel amd arm CPU architectures.


Z Series is very much susceptible to meltdown, spectre, and rowhammer. IBM says that it should be fine because Spectre et al need to be running untrusted workloads to work, but they haven't updated their advisories since NetSpectre. : /

A lot of the talk of mainframe levels of security is specious at best.


Wow. Your right, for spectre patches were needed on system z.

https://www.suse.com/de-de/support/kb/doc/?id=000019105

Meltdown didn't affect z nor amd.


Series Z and POWER up to and including POWER9 are susceptible to meltdown as well.

https://www.zdnet.com/article/meltdown-spectre-ibm-preps-fir...


Ok. debian is even more explicit than suse above regarding "no meltdown on Z":

https://wiki.debian.org/DebianSecurity/SpectreMeltdown#Syste...



Which aren't POWER cores. POWER is an IBM trademark; the 7400 is a Motorola PowerPC core.


A fair point, though early POWER cores probably aren't for the same reasons these aren't (can't speculate through indirect branches using SPRs), and IBM was involved in the development of both.


> Eventually, it will be obvious that running shared workloads on a single piece of physical hardware has fundamentally unremediatable security implications.

If I understand correctly, Homomorphic Encryption aims to solve for these kinds of attacks (although presumably the computations are more expensive and programs must be restructured to use HE primitives?). https://en.wikipedia.org/wiki/Homomorphic_encryption

EDIT: Why the downvotes? Am I mistaken?


HE is incredibly slow, and unlikely to ever be fast enough for common workflows. As in, an HE implementation of an algorithm that runs in seconds in Python on a regular machine might run in minutes or tens of minutes.

Not to mention, to avoid side-channel attacks, an HE scheme still needs to always run the longest possible sequence of operations regardless of input (otherwise, information about the input data is leaked through timing). So an HE version of a quick-sort scheme would always run in O(n^2), otherwise it would leak details about the contents of the list. In some cases it would even have to run in the same amount of time regardless of the size of the list, to avoid leaking information about that.


To be fair, the performance problem you're talking about should affect latency rather than throughput. If you can batch lots of operations (not all controlled by the same user) then you can do things as fast as you can without leaking (much) information.

This is still phenomenally slow, of course.


HE hides the execution complexity from the system doing the computation, so no, it can't do someone else's computation and just wait in between to avoid leaking information, it's designed so that the computing operation order and quantity is simply independent on the input data, i.e. the worst case complexity, and a valid HE scheme would have mathematical proof that it's impossible for the system to find a way to do it faster than the worst case.


Is HE any different from taking the encrypted input data, sticking the program you want to run on the end, and returning that? If anything, it sounds less efficient than that.


HE is designed for running a secret algorithm with secret input data and obtaining secret output data, all on an untrusted computer.

It works a little bit like this: you write your algorithm using only HE computing primitives, then you encrypt it and the input data and send those to the HE runtime, who you don't trust, and who DOESN'T have any way to decrypt your message. The HE runtime executes your encrypted program and sends back the still encrypted output. Once you receive your output, you decrypt it using the key you have kept secret.


That's not how it works, that's what it does. I provided an example of how it could work, but it had better be more efficient than that or it's a silly idea.


One note, first of all: I made one mistake, the algorithm is not secret in HE (the HE runtime has to know it). Only the input data is guaranteed to be secret.

I'm not sure what you mean. This is how HE works, at a high level. The whole point is that you're doing public operations on secret data on a machine that can never know what the data was, nor what response it gave you. Furthermore, the point of HE is also that you know that your data remained secret regardless of how the HE runtime was implemented, as you just send it encrypted information.

Making it more efficient by giving it access to the decrypted data would defeat the whole point. Making it more efficient by optimizing based on the actual data would allow your data to leak through timing side-channels, again defeating the whole point.

Basically, HE refers to computing schemes where DECRYPT(ALGORITHM(ENCRYPT(X)) == ALGORITHM(X). You send [ALGORITHM, ENCRYPT(X)] to the runtime, and it gives you Y. DECRYPT(Y) is then the result of your computation.

As far as we know, this is indeed hopelessly slow, and there is no guarantee that we will ever find fast secure HE primitives.


Seems like it still might be useful for small bits of data like session tokens or other encryption keys


Yeah HE is a totally different security model. It moves all the trust out of the hardware. You ship encrypted workloads to an untrusted party, who computes on the encrypted data and returns an encrypted result. Then you decrypt the result to use it.

And yeah, this is just as inefficient as you'd think.


I'd say its much more inefficient as you'd think.

Firstly the computations on encrypted data just take a LOT longer, especially for Fully Homomorphic encryption. With a single 64 bit addition taking microseconds.

Secondly, the FHE code cannot branch based on data. If it could, it would know something about the data, and it wouldn't be proper encryption. This means an if statement becomes "calculate both branches, and throw away the result you don't need". Similarly, a for loop becomes "give me an upper bound of how long this will run", then "loop for exactly the upper bound number of times, if the loop is 'done' early, just throw away the result of the remaining operations".

FHE is really cool, and has its uses in situations where you want to cooperate without needing to trust, but it is stupidly inefficient. (Things get really cool if two parties want to cooperate and the computing party can e.g. branch based on their local unencrypted data)


HE is stupidly inefficient, and is entirely an academic oddity, and there is every indication that it will always remain that way. Bringing up HE in a context of real problems needing real solutions is unproductive.


If solving a problem in a stupidly inefficient way is the only way of mitigating said problem, you may not have much choice--at least for some use cases. Saying something can't be answer because it isn't an efficient answer (today) is also unproductive.


There's a simple way of mitigating said problem - don't use shared hardware, and run each thing on aseparate, isolated chip. That's also inefficient, but compared to the immense overhead of HE it's quite reasonable.


AMD and Intel deliver CPUs tailor made for large cloud customers already; those SKUs are not available via normal channels.


Can you elaborate or provide a reference on how those CPUs are different?


Anyone who would know the fine grained differences is almost certainly NDA'd up quite tightly.

But it's likely things like different core counts, frequency performance curves, perhaps more or different memory/IO controller counts, perhaps instruction extensions of some particular interest (I'd wager BF16 was available in data center SKUs long before consumer SKUs), etc.

It's nothing radical, but if you're going to buy an awful lot of them and want something slightly custom, Intel will definitely do that for you.


It's unlikely a different die (silicon chip), so the usual number of cores and usual size of caches, and it's unlikely that AWS gets a secret x86 instruction unlockable via msr or something like that. It's probably just constants for the frequency scaling (e.g. all cores and single core turbo).

Cloudflare mentioned they use a custom Intel SKU on their blog, and they're not as big as AWS.


I’m familiar with Google’s custom SKUs, and I assure you they’re unique enough to experience microcode bugs nobody else on the planet ever hits. There was a weird FDIV one back in the day that wasn’t the same FDIV bug that hit us all a long time ago. Cloudflare isn’t really at that customization table.

Your take sounds accurate for non-FAANGs (and a couple of those). For Google, not so much. There is a lot of custom acceleration and such that isn’t available to any other customer, and a nearby commenter is right, every detail about them is locked behind a very strong NDA. I’ve heard thirdhand AWS does indeed have custom instructions in the virtualization extensions stuff, but don’t know if that’s true.


My understanding is that Google's SKUs are the same die. Non Google versions just have the Google specific silicon fused off, or require a MSR knock sequence to turn on.


> I'd wager BF16 was available in data center SKUs long before consumer SKUs

Right you are, Ken.


What kind of levers do big ISPs/cloud providers want over their CPUs, other than the basic efficiency gains we all want? Isn't it risky doing any customization since you don't get commodity pricing?


> Isn't it risky doing any customization since you don't get commodity pricing?

Depends on how many you order and how reasonable the customization is. Like, what if I wanted a 10,000 38 core Xeons with a 2.5GHz Base clock and 3.5GHz Turbo clock? Such a part doesn't exist, yet sits smack dab between two existing Models, the Xeon Platnium 8368 and 8368Q. Assuming yields are already good enough, that might be the sort of thing a CPU maker will do. (Might need more than 10k units though, IDK.)


I'd wager Pat doesn't get out of bed for 10k units and neither does Lisa.


At AWS scale you can feasibly just buy up the entire run of chips. What's a few million CPUs more or less between friends?


Additionally, Amazon is absolutely the type of company to create their own x86 processor if Intel/AMD are unwilling to make some customization.

Intel does not want to be competing with an Amazon Basics processor.


They already have pretty significant investment in their own ARM-based chips for servers. Given Apple's recent success in x86 emulation on ARM, I doubt it's worth the trouble of directly competing in the x86 arena.


You basically need a license for X86 from Intel for this. Do you think x86 is open source!


The usual line of operation is to include VIA in the operation.


A recent HN posting points to a news article about AMDs customized processors for Facebook although it provides little detail: https://news.ycombinator.com/item?id=29204257

"The custom Epyc 7003 part that Facebook has commissioned from AMD has 36 cores out of its potential 64 cores activated, and with a bunch of optimizations for Facebook applications at the web, database, and inference tiers of its application stack."


not really, just heard about those. but it wouldn't really be that much different than a customized SKU for a playstation or an xbox.


But what model number would they show up as in the OS facing the users? From what I see most CPU names are the ones available in the market.


Their next gen "normal" server CPUs also have "cloud optimized CPUs".

While this seems to be mainly about Power/Heat/Perf. optimizing for typical "cloud" workloads it might also have design decisions related to this problem. We will see, but they look interesting anyway.


> This will be a massive opportunity for ARM & SoC manufacturers in the coming years, as its far better positioned to offer, for example, a single rack appliance with 64 individual 2-core mini-computers at a price-point competitive with a 128 core x86 rack, as one computer.

I'm curious about the eventual end-game of security in this space. Take the 64 individual processors in your example, give each one their own independent memory bus to their own ram chip, isolate them from each other as much as possible. What else can be done, if a malicious process on processor Z has to go all the way to disk to try to get back at data working on processor J, is that as maximally secure as it can be without being in a completely separate chassis with only network access to the other device?


I think what you describe is a likely end-state, and what large cloud providers will move to. The problem right now is that a setup like this isn't cheap; but its only expensive because economics of scale aren't behind it; that will change if demand for more isolated VMs increases.

Essentially, a board with an array of "mini computers", 1-4 cores bussed to their own physically and logically isolated memory pool. Storage is oftentimes already networked in public cloud, and seemingly less susceptible to these kinds of attacks, so probably little change is necessary there.

Its also likely we'll see vendors produce more "card computers", like the raspberry pi, but oriented at a server environment. At massive scale like what AWS runs at, these are less compelling, but they do have advantages; the biggest being, maintenance is a cinch, and the blast radius on one failing is isolated to just the workload on that card. I imagine these will be more interesting for current VPS providers like DigitalOcean, who also want to offer dedicated hosts with smaller core/memory counts. Even RPis today are super compelling; a $50 multicore computer that's easy to manufacture and replace, with density high enough to fit 100+ inside a 4U? Just need a bit more core perf and ARM adoption (though its easy to see Intel also hoping on this train with their long-term investment in pro/consumer NUCs).

Writing is on the wall. The tipping point will be an industry thought leader saying, in effect, "you should be using dedicated hosts, because at this point physics is our enemy and we're not winning, by the way here's a new dedicated host instance type that's perfectly suited for horizontally scaled workloads, its 20% more expensive than virtual hosts of the same size, byeeee".


>"as its far better positioned to offer, for example, a single rack appliance with 64 individual 2-core mini-computers at a price-point competitive with a 128 core x86 rack,"

I have server application capable of utilizing many threads and thousands requests/s. You think I will deploy it on tiny 2 core CPU? No thank you, it currently runs on powerful dedicated server from Hetzner where I control everything.

>"AMD and Intel need to adjust, or face extinction ....

If they don't evolve, AWS will for them..."

Sounds like pontification / FUD.


> Sounds like pontification / FUD.

Sounds like you didn't fully grasp what the grandparent was writing about.

>> Eventually, it will be obvious that running shared workloads on a single piece of physical hardware has fundamentally unremediatable security implications

> I have server application capable of utilizing many threads and thousands requests/s. You think I will deploy it on tiny 2 core CPU? No thank you, it currently runs on powerful dedicated server from Hetzner where I control everything.

Your use case isn't covered here, you're neither a cloud nor VPS customer.

There will always be individuals and companies looking for dedicated hosting solutions, or just hosting it themselves.

However, for the rest of the world that wants to run their stuff in the cloud that's where Amazon's and other companies investments in ARM come into play. That's where we'll see disruption, the extinction the grandparent was talking about.


But you still need data exchange between parallel/concurrent workloads. The security focus will shift to the data nodes which exchange data between the CPUs. And then the focus will probably shift towards latency and performance and moving these data modules closer to each other.. kinda like RAM :P


I am not at all convinced that this is not a solvable problem. It may require significant changes in how schedulers work, such as resurrecting the idea of processor affinity.

Unfortunately it will likely have negative performance implications for multi-tenant work loads.


AMD seems to be doing just fine. With a roadmap for the future. They will be just fine


The M1 family certainly doesn't go into the direction of less cores, and even if you had a single one you could probably rowhammer during your timeslice and then patiently wait for the target process to execute. Since even individual computers execute random garbage code straight from the Internet (e.g. JS), there is still a needed for internal security.

That being said, and even if I consider that a quite different subject, I agree that the current efficiency story of Intel is not very good, but I hope they will improve in a not so far future. The dev lifecycle of CPUs is quite long and it seems to be an obvious target. I suspect they will be forced to improve their efficiency, because that's actually were the performance potential is today (the current dissipation level of their last desktop CPUs is not reasonable, and prevents scalability). And trying to lower the core count also can yield to high consumption, e.g. if you want your performance back by increasing the frequency. Wide and "slow" is needed, and it is harder to increase the internal number of execution units per core and have them actually used, than to increase the core count -- plus ironically one way to do that is through HT, which goes against your wish to share less hardware. (Now if you compare their P-core and E-core in Alder Lake the story is more complicated, but their marketing figures seem very strange, so I won't conclude anything for now. The current instances of P-core we have is for that weird desktop market with unreasonable high TDP anyway.)

Now if you really want miniaturized individual computers that would not be shared at all, I'm not sure the market will actually go into that direction because big systems will continue to be needed (and clusters are a niche mostly for HPC), and I'm not sure a "slightly more secure on smaller systems" market would be interesting enough. Esp. in an era of chip shortage. And also because it would still be bigger than a shared equivalent system make with the same techs... But if that's really a niche that has to be addressed, I suspect it would mostly be a matter for Intel to create new small and slower SKUs ("slower" compared to their desktops insanities) -- they even kind of have that already, but yes the physical miniaturization aspect is not handled yet -- that does not really depends that much on the cores, though. And even in those computers, I'm not sure there would be much demand for very low core counts. The threading of pretty much all workloads tends to increase, nowadays.

One last point, after the e.g. Pentium 4 fiasco nobody really left the IPC race. AMD had some difficulties when trying "weird" ideas (part of which were because of their marketing communication), and then again a completely new design from scratch to market takes time. In general there was a pause of performance growth around 2016 for a few years, and that was mostly Intel having process problems and the rest of the industry catching up (and then overtaking them).


Intel gets it and is adjusting to be a foundry that builds chips to application spec.

For me cloud computing is just where the best pay is. I do not at all see it as the future of computing.

One reason is ML will help us realize we write code we don’t need; so much of it is syntax sugar for business specific needs; infra, security… it’ll be realized cloud software is solving unemployment not technical problems of value. That many issues with software back in the day were lower quality networks, and consumer hardware. I mean any phone can abstract metadata from any one users amount of behavior, we do it in a DC because that’s where the jobs are. Chip manufacturing will include ML normalized logic for specific application.

LAN IOT will improve and we’ll realize the Metaverse can be implemented with a local client and AI generated art, on a mobile GPUs power in a few years. Middle men like Zuckerberg face the most uncertain future. He failed to diversify as well as Bezos, Newell, and others.

IMO, Valve is a serious threat with Steamdeck; an open IOT brain in a kid friendly form factor could be the new cigarette. Even Apple may have to take them seriously. My kids iPads need replacing soon; a flat glass slab with no interactive controls, requiring another $800+ machine to develop on, bloated development tools, fees, and a bunch of cloud logins, are not going to motivate kids to feel creative.


This seems bad, but as a practical matter, what does this mean?

If you have a process running on your machine can it use this to get root? Read Keys?

It looks like they ran their process for 12 hours to do the flipping.

And if your flipping your process's memory for that long, what are the chances you are next to sensitive memory for another process? It seems bad, but it seems like if your randomly flipping bits in memory the system will likely crash.


> If you have a process running on your machine can it use this to get root? Read Keys?

Yes. There's a BlackHat talk on that[1]. DoS is a big issue on shared hardware too. You can crash other processes, the kernel and the hypervisor.

> And if your flipping your process's memory for that long, what are the chances you are next to sensitive memory for another process?

You don't need to leave it up to chance. There are apparently ways you can control it and of course you can just spray everything.

1. https://www.blackhat.com/docs/us-15/materials/us-15-Seaborn-...

2. https://github.com/google/rowhammer-test/blob/master/rowhamm...


Intersting. I can see taking the system down, especially on shared hardware.

My (probably slightly naive) understanding is the OS is allocating memory, acting like a memory cop so processes don't overlap and swapping when needed and so forth. It seems like these hardware errors might be mitigated by the OS.

I'm in a bit over my head in the pdf, but it explains: "Our kernel privilege escalation works by using row hammering to induce a bit flip in a page table entry (PTE) that causes the PTE to point to a physical page containing a page table of the attacking process. This gives the attacking process read­write access to one of its own page tables, and hence to all of physical memory."

I'm still a little fuzzy on how they get the right location in the page table without hosing the system, but this gives me enough of a gist. Thanks!


> My (probably slightly naive) understanding is the OS is allocating memory, acting like a memory cop so processes don't overlap and swapping when needed and so forth. It seems like these hardware errors might be mitigated by the OS.

You don't need to have access to the memory you want to attack, only the ability to cause something to read from memory that physically neighbors the memory you want to attack.

Think of page tables like a big array in kernel memory (I'm being imprecise.) The entries in this table are mappings from your virtual addresses to physical addresses. You can cause the kernel to read your at least part (or all of?) your PTEs, which means you can cause it to flip bits in neighboring physical cells, which means you can modify the PTEs. TL;DR: If you can cause the kernel to read some specific memory then you can modify memory near it. In this case, you can modify the page table entries, allocating memory that isn't yours to you (or doing other funny things.)

A simpler to explain but probably not realistic explanation: Imagine if your UID is stored right next to your GID in some kernel structure, if you could cause the kernel to repeatedly read your GID then you could cause bit flips in your UID, which would change which user your user...


That's my question as well. I'm not a security expert, but this doesn't seem all that concerning for anything other then the highest security applications. It seems to me that executing this attack not only requires running code on the target machine (which admittedly isn't that big of a hurdle), but requires basically complete knowledge of memory allocation at the time of the attack, something that is fairly opaque and ever changing on most hardware.

Is there something I'm missing here? What are the realistic attacks that this vulnerability allows?


> What are the realistic attacks that this vulnerability allows?

https://github.com/IAIK/rowhammerjs


There's no reason the code couldn't just blindly try every attack until it finds one that succeeds. If there's a malicious app that runs a lot (lookin' at you, random firefox tab that eats 99% of my CPU for minutes at a time and then goes idle again), it has all the time in the world.


It might be hard to do a targeted attack. On the other hand, flipping random bits is likely to just start crashing stuff and / or producing incorrect results pretty reliably.


See my reply above: in-memory data analysis…


See my reply, sibling to yours.


> If you have a process running on your machine can it use this to get root? Read Keys?

Which keys? OpenBSD / OpenSSH for example does now (since 2019?) encrypt SSH keys in RAM to prevent rowhammer like attacks. They're mixing the key with a huge "pre key" and an attacker would need to side-channel attack the entire pre key to be able to read the real SSH keys.

So it's not as if there was nothing that could be done to guard against this.

It may require changing lots of software/libraries but at least the security-conscious ones already started, without waiting for this "blacksmith" attack.


Cookie theft is the most important attack. Think about it. You could have a web page running who knows what scripts on your laptop while you're doing banking. Oops.


Complex modeling, that runs for days, could have completly messed up results, especially since the old standard of using ECC, is not proof of infallibility. Often those results are in series, so early errors would compound, and it may very hard to determine if and why results are wrong.


This may be very bad for data analysts. Imagine trying to study a huge in-memory database and getting different values each time…


Assuming you run your database in a public cloud, what's the likelihood of someone throwing money at the cloud provider to run the attack on a host shared with your database, with no added benefits to them?


The implication of the post was that your own data analysis workloads could cause rowhammer flips unintentionally, I think.


That would be a wrong implication, your data analysis workloads (or any other "reasonable" reading of memory) are exceptionally unlikely to trigger these events unintentionally.


Is this possibly a route to jailbreaking for iOS, via the temporary provisioning profile for apps? It seems like you could run a Rowhammer memory corruption app on your personal device until getting escalated privileges. Newer OS releases may not patch the hole since this is a hardware flaw. But I admittedly have only the vaguest idea of what defenses need overcoming on a modern iOS device.


Rowhammer is still pretty hard to exploit because typically you can't reliably flip most bits, and you can normally only flip bits that are very close in physical memory address to those you control.

Combine that with a lack of knowledge of physical memory addresses and inability to have much control over memory layouts, and it really gets tricky to gain privileges outside a lab environment in a reasonable short time.

Remember that flipping bits at random will almost certainly kernel panic the machine before it gives you root access.

I'm sure a determined attacker could do it though.


This seems like a case where targeting an iPhone might make life easier. The hardware is quite uniform for a particular model.


Which gets you some architecture knowledge, but doesn’t promise or indicate that your userland ram space is adjacent to anything important. It’s a better start at playing the game though.


> very close in physical memory address to those you control.

Not quite. It doesn't require the ability to write to the neighboring cells, just read from them.


Hopefully, the OpenBSD extreme implementation of ASLR makes it even safer.

On every boot, there is a brand new kernel and C library:

   reordering libraries: done
   reorder_kernel: kernel relinking done
ASLR has been compromised in the past, so this likely isn't completely secure.


Maybe OS's or hypervisors should support a mode where different processes/security domains are forced into contiguous blocks of physical memory with buffer zones between. Especially for cloud computing, pre-allocating memory to different VMs should be reasonable. I could even see browsers taking advantage of it, given that they already force javascript threads into per-domain processes for Spectre/Meltdown protection.


Funny thing about that.

At Alpha Microsystems, about 1982, I was in charge of the diagnostic programming group. At that time failing memory boards were expensive and customers would not be happy with such components.

I wrote a memory testing diagnostic that was based on knowing exactly how addresses were mapped to cell locations so I could try to provoke such failures.

Chip manufacturers were aware of this problem which is why they scrambled the addresses.

Potential vendors, Motorola et al, were required to provide mapping information before we would consider their chips.

Now I'm curious to know what such mapping looks like with modern memory chips.


Not related to article: would love to hear about your time at Alpha Microsystems. See https://ampm.floodgap.com/ (hosted on an Alpha Micro Eagle 300).


Here's a bit of history ... I think the lead up helps to understand what I did first at AM. I'll send you an email with more info later.

1976-1985 My first job was at Basic Four Corporation. I got in as a test technician, assembling small refrigerator-sized mini-computers, giving them their first tests and swapping components until they passed. I soon learned how to use the machine-language assembler and started writing small programs to help me determine which components were failing without swapping and hoping. Within a few months I was testing more than 2x the number of machines the other techs were testing. Management noticed and soon the other techs were getting up to speed.

At this point management pretty-much turned me loose. I moved up to diagnosing and repairing failed components (8"x11" pcbs). My understanding of programming and digital circuits allowed me to write small programs that could be used to "light up" specific circuits on the board making it easy to poke around with a scope to see where things went wrong. Again this was a huge productivity boost and the technique was propagated to other techs. A couple years later I went back for a while part-time as a consultant and wrote my first DSL for techs to use.

Next I talked my way into the firmware development group and worked on firmware for tape, disk and other devices. This is the period of time when microprocessors were being incorporated into everything and my experience with the Micro-68 put me in a good position to participate. I also got to write microcode for a 2901-based cpu that was in development.

And then, somehow, perhaps at a user's group, I learned about Alpha Microsystems.

When I first visited Alpha Microsystems their idea of "burning in" pcbs consisted of putting them in a powered backplane, in a wooden box, with a lightbulb, where they sat for some period of time.

Basic Four had serious testing which included putting entire computers into temperature-cycling ovens where they ran tests for 24 hours. That knocked out a pretty high percentage of boards. After I told them what they were missing Alpha Microsystems hired me to improve their process.

For the next several years I participated in creating the flow of production and testing. A department evolved to handle the hardware side of things and I became head of a diagnostic programming department which grew to, variously, between 6 and 10 programmers. After that department was functioning and had someone who could step up I transferred into the operating systems group. I was one of only three people allowed to work on the operating system code, let alone even see it since it was held as a trade secret.

During my last year at Alpha Microsystems a brilliant programmer I had hired introduced me to the then just-released Structure And Interpretation Of Computer Programming, a new textbook for students at MIT. That book's use of the language Scheme introduced me to first-class functions, closures, and many other concepts which found their way into popular programming languages decades later. The SICP had all the information one needed to create a Scheme interpreter. I wrote one using 68000 assembler so I could run the sample code.


Wauw, amazing! Thank you so much for sharing your story!


ha, thanks!

i love programming so much i've never quit

taught fullstack at ucla BC (before covid)

did a large frontend project with vue the past year

i'll die with a keyboard under my fingers!


"There are no subdirectories in AMOS."

Yowch. Like PC-DOS 1.0 all over again.


Technically that's correct, but a bit misleading.

You need to bear in mind how limited disk storage was in the 70s and early 80s. When I got to AM systems were booting from 8" floppy disks. That gave you about 500kb, iirc.

The file system had "directories" in this form:

[nnn,nnn]

Where, again iirc, nnn was 8 bits, so 11 111 111, 377 for example. Only 16 bits were required to specify the location of a file.

Filenames, btw, were six characters.


Your description ([nnn,nnn]) reminded me TOPS-10 and indeed, that AMOS link mentions this: "The similarities with DEC's operating systems, particularly TOPS-10, were not lost on Digital Equipment Corporation". So, that file hierarchy model is even earlier, than 70s and 80s. Wikipedia says that TOPS-10 was released in 1967 (back then different name, renamed in 1970). DECsystem-10 was anyway pretty different hardware from modern computer, running from tapes and so on.

Sorry for that offtopic rambling, I am little bit enthusiastic every time, if I see something TOPS10 related.


iirc (so many years) AMOS was based on DEC's RSTS hence a lawsuit which AM eventually won

the western digital chipset that powered the first am systems was a five-chip 40-pin set connected by a private cmos bus ... probing with a scope was amusing

the instruction set was basically identical to the pdp-11


That's a good point. I don't really remember bothering with directories on any of my 360kb 5.25" floppies in the 80s. When you can only fit 10 files on a disc a hierarchy is usually unwarranted. You hierarchy is those little plastic dividers in your floppy disc storage box.


All these new hardware level physics exploits in software fascinate me. At the same time they make me wonder if the hardware will ever truly be able to be secure and perhaps we need to just move on to new methods and concepts in hardware design and maybe trade size for security.

It also brings to mind that Simpsons scene: "Stop stop! He's already dead!"


How curious are you? Here's a whole class on exactly that from the authors of the OP [1] at ETH Zürich.

1. https://www.youtube.com/watch?v=AJBmIaUneB0&list=PL5Q2soXY2Z...


Onur Mutlu is a really good lecturer, to my eyes at least.


Some of his TAs are really funny too :)


It also makes me think of the phonograph in Godel Escher Bach. One character made a record player and the other made a record that always caused the record player to break. The crux was that a record player cannot be built that can play all records; it is impossible.


I think moving such complexity to hardware was a mistake (like branch predictor, etc). Perhaps exposing a very low level API to CPU functionality (like microcode level) and JIT compiling existing x86 to that could perhaps work? We have just enough problem managing complexity in software but at least it is fixable there.

(As for the possibility of such, there is already an x86 to x86 JIT compiler that increases performance)


What does any of this stuff have to do with Rowhammer, an attach that doesn't even target the CPU, which just so happens to have ECC on all its registers and cache lines?


Parent was talking about modern hardware in general, and it’s not like CPUs don’t get their own fair share of similar vulnerabilities. Also, a JIT compiler can probably realize that the same memory region gets rowhammered and may throttle execution of such thread.


I fully expect apples M1 chip to get a spectre type vulnerability at some point and then any improvements it had will disappear.


M1 is vulnerable to Spectre (some variants at least) as far as I'm aware.

Any speculation around memory accesses will yield this.


That why the solution is to only allow speculation when it's "ok". M1 is believed to have a number of Spectre-esq security mechanisms built into it to determine just that. For example: https://patents.google.com/patent/US20200192673A1

Also, In dougallj's code [1] the zero of registers should be superfluous so it is assumed the function below is needed to make the experiments run stably by claiming ownership of the registers as part of a general anti-speculation security mechanism.

static int add_prep(uint32_t *ibuf, int instr_type);

The M1 explainer [2] has lot of interesting ideas like this contained inside it.

[1] https://gist.github.com/dougallj/5bafb113492047c865c0c8cfbc9...

[2] https://news.ycombinator.com/item?id=28549954


So it's another Target Row Refresh bypass.

Which is only possible because the DRAM has limited memory for recently-accessed rows.

When is a company going to put out chips that have the access count stored inside the row? It's the most obvious way to do it and makes this entire class of attack impossible.

Edit: Okay, reading the paper more apparently LPDDR5 has something similar to this. Why is LPDDR so divergent from normal DDR?


Um, how concerned should I be about this? Is it time to turn off javascript in the browser? And if it is, is this not the End Times for browser distributed software? I ask because sometimes you need to ask if it's really the end of the world or just Monday.


It depends on how concerned are you about the regular security issues. Which ideally depends on the things you are responsible for. If it's your family photos in iCloud? Nah. If it's the nuclear launch codes for the whole world? Then yeah, you should already be isolating separating compartmentalizing...


You don't have to be in charge of launch codes to worry about this one. I mean, if anyone can 'npm install rowhammer' and reliably attack anyone that visits their site, even in a modern browser, then it's time to turn js off. I'm okay if there are some minor or even moderate XSS exploits out there; but a hardware vuln accessible to javascript means javascript goes bye bye for now, at least from my local machine's browser.


If it comes to that you'll hear about it on HN first. If folks start to explot it you'll hear about it.

If it comes you'll hear about "apps" that freeze every other browser process while you do online banking.

That said, proactively blocking JS is great for many reasons, and you can whitelist sites you sort of trust.


>If it comes to that you'll hear about it on HN first. If folks start to explot it you'll hear about it.

This is a pretty laissez-faire attitude with regards to security-critical exploits. You might be the one of the first ones affected. After all, someone has to be first one before they can post it on HN.

In a more general sense, it's a bad idea to rely on someone else as your primary alert. Whether it is a HN post, news article, whatever.

(I know that you aren't quite saying to use it as a primary alert, but a cursory read of your comment implies it, so it's worth discussing.)


> You might be the one of the first ones affected

If you are regularly the fist target of zero-days your threat model should be stricter yes. The other 7.9 billion people in the world can afford to be a little less strict.


I didn't consider "don't wait for someone else to get hacked and tell you about it" to be a particularly strict method of security, but sure, I guess so.


> If it comes to that you'll hear about it on HN first. If folks start to explot it you'll hear about it.

It did come to that, years ago: https://github.com/IAIK/rowhammerjs


> photos in iCloud?

I mean, the last fappening was done by phishing, as I recall. Are you saying another one, done by rowhammer, wouldn't be a problem?

I suspect most folks have stuff they'd rather not make public. Credit card numbers, at the very least. This affects us all.


While there are clearly bad (and some interesting) ethical aspects of the Fappening, it was not a big problem for society. Everyday many people experience the same violation of their privacy (revenge porn sites, accidentally sending nudes to the wrong address, etc), and life goes on. (And as far as I know nobody of the affected celebs have taken drastic measures, while unfortunately this cannot be said for non-celebs, as cyberbullying regularly gets people to attempt suicide with "success".)

Similarly the day-to-day ransomwares probably cost more than securing browsers will.


JavaScript, despite compiler hardening, is still subjectable to SPECTRE-like actions. It’s in the compilers’ mitigation effort and CPU memory controllers both.

Something about CPU firmware and its inability to fix some aspect of memory bank controllers.


> Is it time to turn off javascript in the browser?

I've been browsing with Javascript off-by-default for years and recommend it regardless.

Sure, perhaps a site I trusted and whitelisted will try to hack me, but I feel it's far less likely than some random typo-squatting site or advertising-broker logic.


> Um, how concerned should I be about this?

Not at all. It is another one of the Chicken Little's security theater that sound sexy and sophisticated but in practice mean nothing.

If you are concerned about this issue you should first ensure that you never type "npm install" or copy something off Stack Overflow into your code.


> Not at all.

https://github.com/IAIK/rowhammerjs

This has existed for years and its use has been observed in the wild. RowHammer isn't new.


Does. Not. Work. Against. A. Random. Machine. That. Fetches. It.

You pulling 15Gb of Nodejs modules in your app that you have no idea about because you think someone, somewhere ensured that they are "correct" importing a supply chain attacks works every single time.


I think the scary part is this one:

> “DDR4 systems with ECC will likely be more exploitable, after reverse-engineering the ECC functions,” researchers Razavi and Jattke said.


Where are you reading this quote? I'm seeing essentially the opposite in the linked post.

> ECC cannot provide complete protection against Rowhammer but makes exploitation harder.

edit: I looked at some linked papers and I see similar quotes, though not that one.

edit2:

https://arstechnica.com/gadgets/2021/11/ddr4-memory-is-even-...

> “DDR4 systems with ECC will likely be more exploitable, after reverse-engineering the ECC functions,” researchers Razavi and Jattke said.

OK, so not more exploitable relative to not-ecc RAM, just relative to ecc ram pre-RE.


I believe that is saying understanding ECC function details makes it easier to exploit EEC devices, not that it will be easier than exploiting non-ECC devices.

The ECC code word is bigger so it is a larger target, but you have to flip multiple bits to cause pain. If you have 2 bit detect, you need to flip three bits to get something that corrects to a different value.


I would imagine that SME/TME (AMD/Intel memory encryption) would mitigate Rowhammer-style attacks quite effectively because attackers would not be able to control the physical bit patterns anymore?


Nope. It only requires being able to read neighboring cells. It would make a privilege escalation attack harder but not impossible. DoS attacks would still be relatively easy.


>DoS attacks would still be relatively easy.

Local DoS? Can you elaborate on this?


They are saying that privilege escalation is harder because it's really challenging to target specific bits to flip, whereas flipping random bits will eventually lead to a crash of some kind causing the service to fail which is effectively a DoS


Rowhammer is a memory corruption technique, so if you corrupt the right memory of a process in the right way then you can crash it; same for a kernel or hypervisor.


While they were able to get bitflips with all the modules. Difference between 100000 and 15 bitflips during 12h seems significant to me. Whatever mitigation manfacturer B has seems to be work a lot better than others. That's potential reason for choosing to buy that instead of others. If that where to be improved further and decreased probability 10000 times more it might reach the point where its comparable to random bitflips from cosmic radiation.


Great work. Fascinating and depressing at the same time. Like watching your house on fire, but not being able to avoid getting mesmerized by the beautiful flames and tones as your designer furniture burns away :-)

"...Are there any DIMMs that are safe?

We did not find any DIMMs that are completely safe. According to our data, some DIMMs are more vulnerable to our new Rowhammer patterns than others.

Which implications do these new results have for me?

Triggering bit flips has become more easy on current DDR4 devices, which facilitates attacks. As DRAM devices in the wild cannot be updated, they will remain vulnerable for many years.

How can I check whether my DRAM is vulnerable?

The code of our Blacksmith Rowhammer fuzzer, which you can use to assess your DRAM device for bit flips, is available on GitHub. We also have an early FPGA version of Blacksmith, and we are working with Google to fully integrate it into an open-source FPGA Rowhammer-testing platform.

Why hasn’t JEDEC fixed this issue yet?

A very good question! By now we know, thanks to a better understanding, that solving Rowhammer is hard but not impossible. We believe that there is a lot of bureaucracy involved inside JEDEC that makes it very difficult.

What if I have ECC-capable DIMMs?

Previous work showed that due to the large number of bit flips in current DDR4 devices, ECC cannot provide complete protection against Rowhammer but makes exploitation harder. What if my system runs with a double refresh rate? Besides an increased performance overhead and power consumption, previous work (e.g., Mutlu et al. and Frigo et al.) showed that doubling the refresh rate is a weak solution not providing complete protection.

Why did you anonymize the name of the memory vendors?

We were forced to anonymize the DRAM vendors of our evaluation. If you are a researcher, please get in touch with us to receive more information. ..."


why paste the FAQ into your comment?


Its only a partial quote of the whole FAQ. I know its usual to get a gist for an article from the comments before getting to the full details...


Thanks! I wasn't going to read the article -- and this answers my questions:)


Is this a threat to servers only, or to any network attached computer with ddr4?

By coincidence, I’ve been bluescreening with ram related error codes for the last 2 days haha


Threat to your phone and your Routers:

"Drive-by Rowhammer attack uses GPU to compromise an Android phone" [2018]

https://news.ycombinator.com/item?id=16984663

"Inducing Rowhammer Faults through Network Requests"

https://arxiv.org/pdf/1805.04956.pdf


Scary, thanks


This is probably a native question, but: could this sort of attack be prevented by having the physical values be trivially encrypted version of the logical values? I’m thinking something as simple as:

    $value XOR f(memory address, random number etched onto chip)


That could certainly make it more difficult to exploit, sure, but keep in mind that being able to force a specific value change is not a hard requirement for this to be a security bug.

Even then, memory vendors tend to want to compete on frequency and access timings, which means doing any additional work not strictly required by the JEDEC standard will make their product appear worse than competing products, so I doubt they will want to do that.

Plus a similar technique could actually be done by the CPU's memory controller to similar effect, and historically DRAM design has favored pushing things to the controller when possible.


Flipping a bit flips the output of xor.


AMD EPYC processors already support AES encryption of memory (https://developer.amd.com/sev/) where VMs themselves cannot know the key. Interesting that I didn't see that mentioned in the paper as a possible mitigation.


Why do you believe this would help? Have you seen the attack from the BlackHat?[1] It doesn't require being able to read any plain text and it doesn't matter how the data is stored, only that it's near. You don't even have to have any access to the target memory or even know where it is, only the ability to cause something else to read it predictably.

1. https://www.blackhat.com/docs/us-15/materials/us-15-Seaborn-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: