Hacker News new | past | comments | ask | show | jobs | submit login
Flipping bits in memory without accessing them [pdf] (cmu.edu)
209 points by ColinWright on Dec 23, 2014 | hide | past | web | favorite | 81 comments



What goes on at the chip level is terrifying. "You have to understand," a hardware engineer once said to me, when we shipped a consumer computer that was clocking its memory system 15% faster than the chips were supposed to go, "that DRAMs are essentially analog devices." He was pushing them to the limit, but he knew what it was, and we never had a problem with the memory system.

There was a great TR from IBM describing memory system design for one of their PowerPC chips. Summary: Do your board layout, follow all these rules [a big list], then plan to spend six months in a lab twiddling transmission line parameters and re-doing layout until you're sure it works . . .


True horror story: The first board I was involved with when working at Lucent in 2001 was a monster modem card, provided something like 300 modems plus or minus. For a long time in the development process we had a weird reliability problem which just could not be tracked down. Until in desperation a hardware engineer started putting his scope probe everywhere ... and found a line (or set of them) where the signal was messed up in the middle but not at the end!!!

Analog is a black art.


There's a lovely book on high-speed signaling subtitled "a handbook of black magic". I always found it very fitting. It explains the rules and heuristics and methods of working around and exploiting the analog nature of digital signals.


I highly recommend that book as well, I learned more about signal integrity from reading it twice than I did during all of EE undergrad.

Link: http://www.amazon.com/High-Speed-Digital-Design-Handbook/dp/...


My company designs & sells high-perf memories. My lead AE has a copy of this within reach of his desk.


I currently work in the embedded field and we had a similar issue where my software team spent days trying to track down weird problem that looked like hardware.

Long story short was that spansion changed from gold to copper feeder wires inside their memory chip and was in every so many chips causing bit flip problems.

Our end product is in the automotive space averaging 300,000 units per year. It's a non safety related component.


Black arts are just arts with more rules than we ourselves understand. I don't think we should let that scare us from trying to understand. It's all just physics.


It's also about manufacturing tolerance, making room for error, suffering environmental stuff and understanding that the universe is out to get you. Sure, in reductio ad absurdum it's all just physics, but most of shipping a product is making engineering tradeoffs, dealing with complex stuff you may not fully understand, and making plans for when vendor B's product isn't exactly the same as vendor A's chips, but vendor A was just bought by Apple and won't talk to you any more, at any price. :-)


Of course. But the actual physical models that underpin these things are vastly different than the mental models we use to reason about them - even for people who understand them. Frequently the "lower level" model only needs to be pulled out at troubleshooting-time.

Hiding complexity behind abstractions is what allows us to build complex things.


It's all well modeled by transmission line theory and electromagnetic compatibility, as far as I know; although I'm sure modern processors have some additional problematic quantum behaviors too.


Yup. As clocks increased, traces began to more/less resemble waveguide designs a-la microwave. And if you've ever done microwave RF eng, it's all black magic.


Or don't use auto-route feature of many electronic design programs have...


When was the last time you heard of a signal in a PC trace degrading and then getting much better?!?!!??!!!! So much better that when previously tested that trace wasn't deemed suspicious?


Look at load line impedances. The effective output impedance of a transmission line depends on the fraction of wavelengths of waveguide transmit the wave. At any half wavelength, the output and input have equal impedances. If the output were matched to the input but no the line you get the exact effect described.

http://en.wikipedia.org/wiki/Transmission_line#Half_wave_len...


Not a PC, but a good story about a PDP-10 back-in-the-day:

http://www.catb.org/jargon/html/magic-story.html


Serial LVD buses make clock skew and interference mostly irrelevant. It's a reason besides cost that parallel ports and PATA and parallel SCSI are dead and USB, SATA/SAS dominates.


This is a good example of analogue-ness causing a very subtle and intermittent bug which remained unsolved for over 30 years: http://www.linusakesson.net/scene/safevsp/ (previous discussion here: https://news.ycombinator.com/item?id=5314959 )


Most circuits works probabilistically and come up randomly because of timing and thermal noise, so it's basically impossible to get the same exact running state twice... It's just impossible.

For a small example, Look at the simplest SR latch circuit... It's metastable because it feedsback into itself.


Note: the regular memtest+ doesn't have this test. Use the researcher fork:

https://github.com/CMU-SAFARI/rowhammer

in Ubuntu 14.04, run this to bring all the dependencies for building: sudo apt-get build-dep memtest86+

Update: just finished running the test on my cheap Lenovo laptop. Not affected. phew! :)



The success of this approach to corrupting memory depends upon knowing the geometry of the memory chip. Naive calculations of which addresses correspond to an adjacent row may be incorrect.

It's interesting to see this issue addressed in 2015. In 1980 I worked at Alpha Microsystems and designed a memory chip test program which used translation tables based upon information we required chip manufacturers to give us in order for their chips to be used in the systems we sold.

That approach required us to only put one type of memory chip on a memory board. But back in the day microsytems were expensive and customers expected them to be well-tested.


My family thanks you and your colleagues.

Back in mid-1979, just before I went to college, I looked at the current alternatives and picked Alpha Micro as the company to go with for a system to computerize a bunch of doctor's offices. It worked very well, and then another one a few years later when they used one to help systematize a company providing satellite TV gear.


Actually, triggering rowhammer-induced bit flips does not require knowing the memory geometry. It is possible to trigger bit flips just by picking random pairs of addresses to hammer. See the README file at https://github.com/mseaborn/rowhammer-test.


I work in the chip industry. This was a good paper.

1. Note that chip-kill/Extended ECC/Advanced ECC/Chipspare which are all similar server vendor methods for 4-bit correction will prevent this problem. These methods are enabled on the better reliability server systems.

2. This failure mode has been known by the DRAM industry for a couple years now and the newest DRAM parts being produced have this problem solved. The exact solution varies by DRAM vendor. I wish I could go into specifics but I am unaware of any vendor that has stated publicly their fix.


the newest DRAM parts being produced have this problem solved

How new exactly? The newest tested in the paper is from July 2014 and that still has the problem.


You're more than right. In fact, the paper explicitly mentions 4 correct 5 detect (but fail) doesn't solve the issue because each victim row on either side of the attacker row attains varying levels of multibit (5+) errors.

It doesn't fix systems deployed right now, and could be used for attacking hypervisors and other multitenant systems. Might make an interesting class of local privilege escalation attacks to try probabilistically of otherwise correctly-secured systems too.


Excellent article! The fact that they can reliably produce errors in most ram chips is worrying. They also provide a solution (probabilistic refresh of neighboring lines).


The scary thing is that such a "solution" could be silently disabled, with effectively no signs of any problem (and in fact it would probably increase performance a little!) - I mentioned this in the previous discussion here: https://news.ycombinator.com/item?id=8716977

I know there are other parameters of the memory controller that could be changed to cause corruption, e.g. reducing the refresh rate or tweaking the timings, but that is likely to yield random corruptions in normal use instead of this precise one.

To me, the real solution seems to be stop making DRAM with such high density processes until some design changes can be done that make them as reliable as before, because at some point it just stops behaving like real memory anymore and turns into a crude approximation of it; memory should be reliable and store the data it holds, without any corruption regardless of access pattern.


How long until someone uses this as the basis of an exploit? Maybe not root access, but if you can figure out an OS call that replicates the access pattern, you can corrupt machines just by interacting with them.


I'd start looking at this for ideas. It's a paper on using memory erros to break out of a java virtual machine. They didn't have a good way for generating them, so resorted to waiting.

https://www.cs.princeton.edu/~appel/papers/memerr.pdf


There's no need for an OS call. Just userspace access to the same mapped memory is going to stay in the same physical page for far, far, longer than a few hundred thousand DRAM cycles. Obviously the hard part to an exploit would be locating those corrupt bits elsewhere in the system. That's going to depend entirely on the hardware layout of the DRAM chip.


And I guess ASLR would make it even harder.


ASLR only affects the virtual address space. The physical memory allocations are all probably unaffected by ASLR.


Not really; if the blocks of memory you allocate are not the desired distance apart, just try again... or allocate a block big enough to guarantee it, then start the alternate read sequence to trigger corruption. Of course this assumes you can already run your code on the machine e.g. in a VM.


One wonders if this has already been used in a exploit.

A good first check for security companies - examine all known attacks for fence instructions, which are rare. (Without a fence instruction, hammering on the same addresses will just cycle the caches, and not go out to DRAM.) Look at the code near them for a hammering loop.

This is a promising attack, because it might be able to break through a virtual machine boundary.

A test for this should be shipped with major Linux distros, and run during install. When someone like Amazon, Rackspace, or Google sends back a few thousand machines as rejects, this will get fixed.


Fences neither guarantee, nor are required, to hit RAM. You are thinking of flush (for writes) and invalidate (for reads). Alternatively, just ping N+1 addresses that share a cache slot (where N is the way-ness of your cache).

(Fences guarantee only memory ordering, and are typically implemented by flushing to cache, not to RAM.)


I'm curious if it's possible to use this to execute a Bellcore-type fault injection attack against RSA signatures.


This article actually contains one of the better-written fundamental explanations of DRAM operation that I've read. Thanks for the post.


Here is a program for testing for the DRAM rowhammer problem which runs as a normal userland process: https://github.com/mseaborn/rowhammer-test

Note that for the test to do row hammering effectively, it must pick two addresses that are in different rows but in the same bank. A good way of doing that is just to pick random pairs of addresses. If your machine has 16 banks of DRAM, for example (as various machines I've tested do), there should be a 1/16 chance that the two addresses are in the same bank. This is what the test above does. (Actually, it picks >2 addresses to hammer per iteration.)

Be careful about running the test, because on machines that are susceptible to rowhammer, it could cause bit flips that crash the machine (or worse, bit flips in data that gets written back to disc).


Hopefully this doesn't affect ECC DRAM? Also does the problem get worse with increased density - i.e. 16GB modules are more vulnerable than say the 8GB ones?


It helps, but they point out you can get two-bit errors ECC can merely detect (or, much more rarely, 3+-bit errors ECC isn't guaranteed to even detect). Mighty tricky to work out that exploit that flips just the right two security-relevant bits, though.


From abstract:

High speed DRAM reads influence nearby cells. Reproduced on 110 of 139 mem modules after 139k reads on intel and amd. 1 in 1.7k cells affected.


Problem: They propose a solution and calculate the reliability of the solution. Why not test it with their FPGA based memory controller and demonstrate an improvement?

Second: While the problem looks real enough, the tests to demonstrate it are not realistic. Hammering the same rows with consecutive reads does not happen in the real world due to caches which the get around via flushes. I'd like to see more data on how bad the abuse needs to be to cause the problem. Will 2 reads in a row cause errors? 5? 10? 100? They never address how likely this is to be a real-world problem. I don't doubt that it is, but how often?


They address all of these in the paper. It takes ~180,000 "hammered" reads to trigger. The problem is even the least-privileged code can do it because it's just reading without using the cache - a perfectly valid thing that must be allowed for multi-threaded code to even work correctly.

Secondly, the DRAM makers don't currently provide enough information to reliably know what neighbors to refresh. I suppose they could have used their guesses to test on the FPGA rig but given the rest of the paper I'm reasonably satisfied that they have correctly identified the problem and that their solution would work.


Sometimes an exploit simply needs to make more trusted code take an exception (often after "paving" memory with known values). Since an exception can be theoretically be induced by user-mode code flipping bits in structures it doesn't own . . .

I can see "exploit resistant memory" being a selling point, maybe soon.


Multi-threaded code does not need cache flushes / invalidating reads to work correctly, on any halfway modern architecture that includes a hardware cache coherence protocol.


Reading the same address in an infinite loop is quite common in the multi-die/core real time low latency systems. In fact this is exactly what you are doing - when reading the FIFO queue pointers, etc. And rather than relying on QPI/cache coherency, you may even want to forcefully flush the cache every time you read, to reduce the latency.....


This kind of thing may not happen often but when it does it can cause errors which are very intermittent and impossible to diagnose. I have seen memory chips that will fail consistently, but only when running a specific program out of dozens of others tried. So it kind of looked like the program had a problem, except that it worked on other hardware.


It's quite easy to bypass cache without explicit flush. Most caches are 4-way or 8-way: just hit 4 or 8 addresses sharing the same cache line, and you got your original data evicted from cache.


I've completed many multi-gigahertz product designs during my career. If you take the time to study and understand the physics involved and bother to do a bit of math none of it is particularly difficult. I reject the characterization of this as some kind of a black art. It's not magic. Yes, of course, experience helps, but it isn't magic. One problem is that some in the industry are still using people who do layout based on how things look rather than through a scientific process. Yes, it's analog electronics. When was it anything else?

Want to wrap yourself around another challenging aspect of high-speed design? Power distribution system design (PDS). You can design perfect boards based on solid transmission line and RF theory and have them fail to work due to issues such as frequency-dependent impedances and resonance in the PDS.


So basically you just make a couple memory reads a few hundred thousands times and this will alter some near cell? why manufacturers didn't test this? it looks like a pretty obvious thing to test while working at these scales.


b/c the applications that care the most about these types of errors (big iron networking, and mil/aero) implement software error correction in the processor. It's not worth spending the extra money in final test when commodity DRAM is, well, a commodity.


Yes, ECC memory, although it's mostly implemented by the chipset and DRAM modules, with only a small part of it being in the processor.


Thanks for the clarification John.


commodity DRAM is, well, a commodity.

You're essentially saying that producing DRAM that doesn't work like memory should, regardless of access pattern, is excusable because it's "commodity"?


Interestingly, the researchers used a Xilinx FPGA, not just an off-the-shelf AMD or Intel PC.

Why not?

If the attack can only be reproduced by custom hardware, why should anyone care?

Also, precise patterns of access to DRAM would require disabling the L1 and L2 caches. Doesn't that sort of thing require privileged instructions?

With caching in place, memory accesses are indirect. You have to be able to reproduce the attack using only patterns of cache line loads and spills.


No, they reproduced on "Intel (Sandy Bridge, Ivy Bridge, and Haswell) and AMD (Piledriver) systems using a 2GB DDR3 module." (see Section 4)

They evict cache lines using the CLFLUSH x86 instruction, which I believe is unprivileged.


CLFLUSH is definitely unprivileged - I made use of it on a recent project (evicting outbound messages from a core's cache cut cash misses meaningfully).


I think row hammer is basically a DRAM design defect and wish it was fixed in the DRAM instead of on the controller side. At the very least the DRAM vendors should document this access pattern limitation in their datasheets.


Am I wrong (if you happen to work on processor microcode) or could microcode patches per processor insert a minimum delay where needed based on RAM parameters and organization to prevent this?


So, busywaiting on spinlocks considered dangerous?


That would probably read from cache most of the time.


Why doesn't it mention the manufacturers' names?


The work may have been funded by the manufacturers in return for the analysis and discretion.

There's not that many DRAM manufacturers in the world. Micron, Samsung, Elpida, Hynix are all excellent bets.


Discussion from a couple of weeks ago:

https://news.ycombinator.com/item?id=8713411


Missed that - thanks - although to be fair there doesn't actually seem to be that much discussion. Maybe most people missed it.

Edit: Just checking, it was at 120, looks like it got artificially promoted to 25, then sank quickly.

http://hnrankings.info/8713411/


> looks like it got artificially promoted to 25

How does that happen?


The mods have in place mechanisms to find submissions they think have slipped through the net and provide a boost for them. It's experimental and undocumented, but I saw it mentioned a while ago[0].

I think it's a good thing, even if it's somewhat arbitrary and has multiple false-positives and false-negatives. I've always been a fan of stochastic processes - they exhibit lots of good behavior, and comparatively few pathological flaws.

[0] https://news.ycombinator.com/item?id=8157698


Honestly I'm bummed to see a lot of the recent aggressive moderation. I know the community is far from perfect, but seeing arbitrary posts moved to the front page, existing stuff on the front page change not only in title but also in the article its linked to after existing discussion have happened etc. all strike me as obviously worse than just letting the chaos of the community sort itself out.


I'm sure you are not alone, but with respect to the vast majority of what the mods are doing, I substantially agree with them. I disagree over some of the changes in titles, but I'm sure I only see a small percentage of what they do, so logic dictates that most of what they do is good, otherwise it would be more noticeable.

And with regards finding and promoting the sort of submissions they think deserve attention but have been missed, I think that's a good thing.

But I've seen other "communities" where the owners have taken a policy of "... just letting the chaos of the community sort itself out" and in my experience they have quickly descended into cesspits. HN is doing significantly better, and I believe that to be, at least in part, to the "aggressive moderation."

So with respect, overall, I disagree.


Fair points - it may be I'm just not noticing the mods/moderation when it does its job well. I have watched numerous changes happen in policy, like removing upvote counts etc., and have never noticed positive results from them, but I may be in the minority, and it may be a plugging holes as a ship sinks phenomenon (ie: not the fault of the hole-pluggers).

I guess my feeling is that with a site structured around karma and upvotes it feels messy and complicated to add the "*but sometimes completely changed by moderators."

I use sites like this to get something closer to the firehose, not something hyper curated. Give it to me warts and all, at least I'll know what I'm getting then.


Count me as someone whose experience of HN has dramatically improved this past year.

I don't spend that much time here and I don't have much idea what the staff is doing or why. But I don't think your sentiment makes any sense. That's like saying "Let a city of 100k+ people just sort it out, without governance." It wouldn't be a city if that happened. You can argue about what kind of governance is best, but there has to be some kind of governance. Even "Burning Man" began instituting a kind of city planning once it got big enough.


I'm bummed that you're bummed. But we may not disagree as much as it seems, so let me try clarifying.

Our plan is to turn as much moderation over to the community as possible; getting there is going to require additional mechanisms.

HN has one axiom: it's a site for stories that gratify intellectual curiosity. All the rest is details. The existing mechanisms—upvoting and /newest—are insufficient for the community to sort out which stories are best by that standard. The trouble with upvoting is that stories that gratify intellectual curiosity regularly attract fewer upvotes than stories that are interesting for other reasons. The trouble with /newest is that not enough users will sift through it to find the best submissions. That's easy to understand: it's tedious. Wading through hundreds of posts to find perhaps 3% that gratify intellectual curiosity, does not gratify intellectual curiosity! The very reason people come to HN is a reason not to want to do that, and so we have a tragedy of the commons.

So if upvoting and /newest are suboptimal for HN, what should we do? In the long run, the answer hopefully is to have a new system that lets the community take care of it. But in the short run, before we know what the new system should be, our answer is (a) to experiment, and (b) to do things manually before trying to systematize anything.

Over the last several months, we've tried various things. For example, we tried randomly placing a story from /newest on the front page. We didn't roll that out to everybody, but we did for enough users to make it clear that it wasn't going anywhere. The median randomly selected story is too poor in quality for this to work.

Of our experiments, the one that has produced by far the best results (to judge by receipt of upvotes, non-receipt of flags, and comments about the article being good) is reviewing the story stream for high-quality submissions and occasionally lobbing one or two of them onto the bottom of the front page. That's the experiment ColinWright was referring to. The idea is, from the pool of stories that would otherwise fall through the cracks, for humans to pick candidates for second exposure. These get a randomized shot at the front page long enough for the community to decide their fate. Most fall away quickly, but some get taken up, and so the HN front page sees more high-quality stories. It's important to realize that this is a supplement to the ordinary mechanism of users upvoting stories from /newest. That works the same as always.

This is not a permanent system—just an experiment to gain information—and one lesson we've drawn from it is that moderators should not be doing all the reviewing. We'd prefer it to be community-driven, and anyway there are too many stories and too few of us to look at them all. At the same time, the results have been so salutary that I feel obliged to continue doing it until we can replace it with something better.

What system might we build so the community can do this work and we don't have to? If upvoting and /newest can't do it, what could?

HN already has a mechanism for mitigating the problems with upvoting: flagging. Where upvoting works worse than one would expect, flagging works better. (We do have to compensate for bad flagging, but surprisingly little.) But flagging only helps weed out inappropriate stories; it does nothing to help the best stories surface. So one thought is that we need a mechanism similar to flagging, but positive rather than negative.

As for /newest, if the problem is that there's no incentive to do it, we need to make it rewarding. HN already has a reward mechanism: karma. Perhaps users who put in the work of sifting through new stories could be rewarded in karma [1].

Put these two thoughts together and the idea that emerges is of a story-reviewing mechanism, similar to flagging but focused on identifying good stories, where any user who puts in the effort and does a good job is rewarded in karma for service to the community.

The challenge is in defining a good job. It can't be something you could write a computer program to do—short of writing a program to identify all the best stories, of course, in which case you deserve all the karma. If a story eventually gets lots of upvotes (and few flags), that would be one way of scoring a good review. But there need to be other ways, because there are many more good stories than slots on the front page.

And with that you pretty much have a core dump of our current thinking on story curation—subject to change as new ideas emerge.

1. HN could use a new way of earning karma anyhow. A common criticism, which I think has merit, is that the current system is a rich-get-richer affair where most gains accrue to a clique of old-timers and there's little chance for anyone new to catch up.


Thanks for replying, dang, I appreciate it.

I'm on this site a fair amount and don't feel super informed about the ongoing plans (was surprised to hear about the artificial movement of stories, for instance). Any details you could provide there would be really helpful, and I'm sure the community would appreciate knowing what's in store. Maybe even a dedicated page for news about the hn platform/algo and experiments.

> The existing mechanisms—upvoting and /newest—are insufficient for the community to sort out which stories are best by that standard. The trouble with upvoting is that stories that gratify intellectual curiosity regularly attract fewer upvotes than stories that are interesting for other reasons.

I think this fact is undebatable but the question is which scales better? Large communities with heavy moderation have frequently descended into corruption quickly. Again, I think communicating your plans/experiments would be really helpful (and also gratifying of intellectual curiosity!)


I'm in the process of expanding on my comment above; "I'll add more in a bit" means in a few minutes when I have time to write more. That's the only reason I don't post more about this, by the way—it's time-consuming, and takes already-scarce time away from actually doing any of the things being written about. So I mostly just reply to questions when people have them. Anyhow, please check back in a little while and I'll try to explain more.


That's cool, but with respect, it might be better to turn it into a brief summary and repost it as a top level item. This child thread is deep and buried now so few users are going to see it.


That's a natural suggestion, but my experience is that it doesn't work so well. If I make a top-level post out of this, it will lose the feeling of "conversation with a user", become something official, and other factors will kick in. There's a place for that, of course, but it's a different thing and one mustn't overdo it. I will formally ask for feedback when we get closer to knowing what we're asking about, but for now it's just thinking out loud.

Also, you'd be surprised at how much the information in discussions like this makes its way into circulation. People do find this stuff, and I think it's more fun to run across it this way.


One common criticism, which I think has merit, is that the current system is a rich-get-richer affair where most gains accrue to a clique of old-timers and there's little chance for anyone new to catch up.

This is undoubtedly true, but perhaps mitigated by the fact that the marginal value of a single karma point is far greater for a user with a current score of 30 as compared to a user with a current score of 30,000.


I can't edit my old comment, so just wanted to thank you again for providing such detail into what you are doing. I still stand by my statement that I think the rest of the community would be interested in it in a more formalized way, but I appreciate the effort you are putting in and it's clear that you are being thoughtful about what you are doing.


Since what I wrote came out more like a blog post than a conversation (sorry), I think you have a point.


Sounds like a great technique for ddos.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: