Hacker News new | past | comments | ask | show | jobs | submit login
Speculative Dereferencing of Registers: Reviving Foreshadow [pdf] (arxiv.org)
168 points by beefhash 50 days ago | hide | past | favorite | 53 comments



Linux has flags that allow disabling mitigations to get performance back. MacOS and Windows probably have the same. Pretty much the only thing that is running unauthenticated workloads on my dev machine are my browser and mail client. Is there a way to disable the mitigation’s globally but enable them per process? I don’t mind much getting a 10-30% performance hit on browsing, but I do mind when compiling/testing things.


I'm going to say no, or at least, not easily. Some of the mitigations like reptolines involve changing how the host kernel is compiled, and you can obviously only do that once per machine. Separate kernel/user page tables is one where there's a big overhead and it seems as if it could be possible to do this per-process (ie. have some processes which do the "old way" and share the kernel page table, while your hardened browser would use the slower separate page tables). But the devil will be in the details and I'm no Andrea Archangeli.

In my experience the biggest problem is something that cannot be solved by the kernel anyway, which is the microcode updates. AFAIK it's impossible to get that performance back by any means, even the mitigations=off flag cannot do that.

Maybe look at QubesOS (it runs many desktop processes in separate virtual machines) which will give you defence in depth, and mitigates many other browser attacks as well as architectural ones. You could run your trusted stuff outside Qubes (in the dom0) where it I guess ought to run at full speed.

BTW I solve this problem by having a separate development machine for compiling and testing my own code which never runs anything untrusted. And that other machine is an AMD Zen 2 so it's not quite so vulnerable and also much faster for the price!


> In my experience the biggest problem is something that cannot be solved by the kernel anyway, which is the microcode updates. AFAIK it's impossible to get that performance back by any means, even the mitigations=off flag cannot do that.

It is my understanding that these microcode updates get applied either by the operating system at boot time (applying the update every time you boot), or by the motherboard, if the BIOS was updated. As only this last method is "permanent" (or at least it requires you to downgrade your BIOS, something not always feasible) if you never upgraded your BIOS, I guess it should be possible to prevent the OS from applying the updates.


From my understanding, the microcode is the firmware of the CPU.

It's usually upgraded by a BIOS update or by Windows update. Windows update send driver/firmware updates, security patches or general updates if the vendor wants to support that way of distribution.

I think it's quite device specific and method specific whether the microcode can be upgraded and downgraded the same way.


The other poster is correct... microcode can be upgraded by BIOS update or Windows update, but the update doesn't persist on the CPU itself after a reboot (it has to be reapplied every boot, by either BIOS or Windows)


I’d happily reboot (though preferably not use a separate image entirely, nor a separate machine) to a get 5-10% speed increase when gaming.

If I can’t do that and I have to choose between fast/insecure or slow/secure I’ll take my chances with fast/insecure. I’ll rather give up doing anything sensitive on my machine than give up 5% perf.


This is a good idea that opens you up to the attack. If browser running in 'safe mitigated mode' was compromised then it itself could try to disable the mitigation that was on for it.


They're also pointless when running number-cruching instances in the cloud.


I thought a big problem with these side-channel leaks was that other tenants on the same cloud instance could view your data.


When you're number crunching you usually borrow a full machine ("exclusive VM").


Run a browser in a VM with a hardened kernel?

Or can a properly designed attack break out of the VM and hardened kernel?


The original Spectre paper[0] describes an attack on KVM where software with guest ring 0 access can read host memory. I'm not sure if it's possible without guest ring 0 access.

https://spectreattack.com/spectre.pdf


A thought just came to my mind. Let's say 30 years ago, I said to a colleague with whom I shared accesses to some Unix systems, "You know, I can use 'ps' to see what processes you are running. If I know the details about certain flaws of those binaries, I may be able to run a custom binary simultaneously on the system and figure out some of your data!" Would he/she be surprised or alarmed?


30 years ago it was not uncommon for everybody to know the root password on Unix systems. Even if you didn’t, there was so many setsuid holes that it didn’t matter.

A lot has changed since then.


This is a bit before my time but doesn't sound right to me - my ISP in 1994 certainly didn't give everyone the root password with our shell accounts, nor did my high school in 1996. Security holes were gaping by today's standards, but attackers were equally unsophisticated. Most setuid programs still required wheel access.


30 years ago timing attacks were unknown (at least publicly). The core idea of exfiltrating some secret based purely on specific input data to a program that was memory-safe and perfectly validating its inputs would have been a surprise on its own. Doing so entirely outside the process's direct view doubly so.


The paper "A Retrospective on the VAX VMM Security Kernel" (1991) presents a summary of analysing timing sidechannels in things like hard drive arm movements as prior art from 1977 in section VI.E, while explicitly addressing more usual timing side channels with various references to 1991 papers. One of their solutions to address timing side channels was fuzzy time (seem familiar?).


Interesting! I have always seen timing attacks cited as originating in Kocher (1994), but I suppose this is some bias towards the current Unix / PC / crypto world away from mainframe development. The 1991 paper is quite clear, though the 1977 paper it cites just kind of punts on the issue (end of section 3.6):

> Hence, it will not be possible to transmit information over a covert communication channel at a high enough bandwidth to make such attempts worthwhile.

Skimming the citations though, I'm not 100% sure it's the same thing as Kocher (1994) which has a more direct line to Meltdown/Spectre. The idea that intervals of high-precision clocks could carry information is the same, but especially the KVM/370 paper seems concerned with it being used for unmonitored communication between two malicious actors, or as a tool to learn something generally about what other users of the system are doing, not exfiltration of the the stored data itself across a security boundary with an oblivious user. The 1991 paper seems to sit somewhere in the middle.


Importantly, fuzzy time was referenced in the Light Pink Book [1] from the DoD Rainbow Series.

The problems of timing attacks in shared resource systems were well understood well before the Spectre mess.

1. https://fas.org/irp/nsa/rainbow/tg030.htm#5.0


I guess it was not only unknown for the public. Processors had been so slow nobody most often you did use one server for one application, those machines did not have enough compute power for multiple workloads simultaneously.


Timesharing was dominant from the 70s up to the start of the PC era, and never really went away on the server side. Performance gains vs. tool cost have been sublinear for decades now; servers used to do only a little less with a lot less.


Thirty years ago was 1990 so I'd guess probably not that surprised. Superscalar speculative processors were in their infancy but still understood, although I don't know if anyone had seriously considered the problems that have been exposed recently.


30 years ago the global economy wasn't running on shared servers. But even back then, if you also mentioned that that "data" was their bank transactions and the password to their account, I'm pretty sure they would be alarmed.


Can we change the link to the actual paper?

Speculative Dereferencing of Registers:Reviving Foreshadow

Martin Schwarzl, Thomas Schuster, Michael Schwarz, Daniel Gruss

https://arxiv.org/abs/2008.02307


Is there a better article than this? Maybe I'm just tired but this doesn't seem to be very coherently structured.


Direct link to research: https://arxiv.org/pdf/2008.02307.pdf


Has anyone used these techniques in the wild to steal certificates from another customer on AWS or use Javascript to start probing memory on my machine from the browser? Are these attacks really severe or is it all theoretical?


Side-channel attacks are an inherent property of computing hardware. You can never fully "disguise" a computation as a different one or as no computation.

You can just lower the signal to noise ratio by various means, so you gain some time until better statistical methods or clever tricks filter out the signal again.


There are side channels and there are side channels. A computer fan outputting noise is a side channel that indicates both that a computer is running and may indicate something about the workload, but if you are close enough to hear it, you are close enough to do a lot more than listen to the fan noise so there is no real more in suppressing it.

Isolated processes being able to determine the memory or computation of another process is something else altogether and surley could be mitigated, even if the mitigation comes at a cost.


You can have the fan on fixed rpms, hence the only side channel is that the machine has power.


There's more. Vibration and electromagnetic emissions (TEMPEST) are also viable side channels. Or ultrasound in speakers to cross airgaps.


But that would be wasteful and needlessly noisy. The point was that there is no need.


DJB wrote a paper about how "Speculative execution is much less important for performance than commonly believed." https://arxiv.org/abs/2007.15919


The premise of the paper is not that branch prediction isn't important for performance, but that it is possible to design an entirely different ISA where it matters much less.

The idea here is to essentially create a branch delay slot instruction, which then can be used to set the latency of a branch such that it doesn't require prediction to not stall the pipeline. Like so:

    dec r1
    basicblock 5  # next five instructions WILL execute, branches after that, if any
    jz exit_loop
    this
    will
    always
    run
    -- branch takes effect here
Then the compiler may commonly reorganize the code such that the variable branch delay instructions perform useful work.


The problem with techniques like this is that they're microarchitecture dependent. The optimal amount of delay slots to have will depend how how long the pipeline of the CPU is, for instance. This means that you'll always be leaving performance on the table, because you have to cater to the lowest common denominator, or change the isa and recompile everything


Yes; the Mill architecture's (I'm beginning to think it'll never be released) answer to that is to make compiling the code from an intermediate representation either just before runtime or at install time a normal part of the process.

https://en.wikipedia.org/wiki/Mill_architecture


Maybe desktop applications will see something like what Android does, wrt shipping applications as bytecode and doing the last stages of compilation on the device, to get uarch-specific optimizations. CPU upgrades potentially being traumatic would be a downside, of course...


We already have that, at least on windows with .net. Only problem is that there's still native applications around for whatever reason. Even android apks can contain native libraries.


Does that do AoT-compiling-on-install, or is it still just JIT? I thought the JVM ecosystem was ahead on purely static binaries with Graal, but I might be mistaken.



I think the idea here is that the INT/FP pipeline length of CPU designs has stayed relatively fixed over ~15 years, so if you compile something today for "BBRISC-V", the branch delay lengths chosen by the compiler will likely be adequate for a pretty long time. I suspect the variable-length delay makes it much easier for a compiler to actually perform useful work here, so if a uarch would need a bubble, the performance loss is fractional (e.g. delay length in code is 3 insns, but the core would rather have 4, then only one bubble has to be inserted).


Note the important cleverness - the delay is set by compiler, so the delay is only as big as it's useful and no need to change ISA.

Architectures with ISA-specified delay has significant issues as you mention.


Wasn’t sparc like this? Lots of noop instructions had to be inserted by the compiler.


MIPS was. It had a single branch delay slot, which was often unused and also not enough for long pipelines. DJB’s variable-length system could work better.


I'm very skeptical about the paper's conclusions.

the measurements show that they can generate bb of about 10~20 instructions (optimistic numbers as they measure an average of 5) which allows them to move up the branches of about 10~20 instructions. As with this ISA the bb determines the bound of the instruction window, then the instruction window of this ISA is limited around 20~40 instructions (current bb plus next bb). But modern superscalar processors have a instruction window > 100 instructions to provide high performance.

The low performance loss of their model compared to the branch prediction model may perhaps be explained by the fact that they use an in-order CPU that makes very little use of the ILP (Instruction Level Parallelism).

Moreover, it only addresses the problem of speculative execution, but there are other types of transient execution.


Thanks for sharing, very interesting paper!


Some might enjoy a teaser video of related work by some of the same authors: https://www.youtube.com/watch?v=baKHSXeIIaI. For context, the teaser was produced due to the conference going virtual because of the pandemic.


The article's presentation of the research is completely wrong. There aren't any new side channels here. All the paper purports to show (albeit in a somewhat self-aggrandizing manner) is that the mechanisms for Meltdown and Foreshadow were incompletely understood when originally presented. But there's nothing knew in the notion that speculative execution optimizations are responsible for most side channels, nor that they played a role in Meltdown and Foreshadow. "Spectre" is a play on words--it alludes to speculative execution.

There aren't any major new exploits detailed in this paper. They introduce a slightly new gadget for exposing data, but it can be and is mitigated by existing techniques (e.g. retpolines). The only noteworthy aspect is that, as it regards SGX, the mitigations haven't yet been generally applied. But new ways to break SGX are a dime a dozen these days.

Interesting and rigorous work, but there don't seem to be any real implications here. It's more like a more concise restatement of researchers' present understanding, using the benefit of hindsight and some additional footwork to fill in some small gaps.


Yeah, if I'm understanding this correctly they just show that the part of the Foreshadow attack that loaded kernel memory contents into cache prior to leaking it, rather than working in some obscure Intel-specific way as previously thought, is actually just a known Spectre variant which works on a whole bunch of processors including AMD and ARM ones. It's not massively useful on those processors because the stage that actually leaks the memory contents and known alternatives like Meltdown are Intel-specific, so it requires a certain amount of cleverness to even detect that it's happening, but it does happen.


> Beyond our thorough analysis of these previous works, we also demonstrate new attacks enabled by understanding the root cause, namely an address-translation attack in more restricted contexts, direct leakage of register values in certain scenarios, and the first end-to-end Foreshadow (L1TF) exploit targeting non-L1 data. The latter is effective even with the recommended Foreshadow mitigations enabled and thus revives the Foreshadow attack.

> We demonstrate that these dereferencing effects exist even on the most recent Intel CPUs with the latest hardware mitigations, and on CPUs previously believed to be unaffected, i.e., ARM, IBM, and AMD CPUs


> The latter is effective even with the recommended Foreshadow mitigations

That's different than saying it's effective even with mitigations required for other Spectre-related exploits. When larger problems arise and more general mitigations applied that also happen to be better mitigations for previous, narrower exploits, people don't usually make much effort to review and revise old papers.

AFAIU, discovery of Meltdown slightly predates Spectre, or at least the point at which the implications began to blow up. The Meltdown and Spectre papers were published the same month Foreshadow was discovered and privately reported. It seems two researchers were involved with both Foreshadow and the earlier Spectre work, but that doesn't mean they would have or should have fully grasped the deeper relationships. And all of this happened over 2 1/2 years ago. Since then researchers' understanding of the underlying issues has improved greatly.

The tone of the paper is, I think, problematic. Just read footnote #1, which self-defensively says: "Various authors of papers exploiting the prefetching effect confirmed that the explanation put forward in this paper indeed explains the observed phenomena more accurately than their original explanations. We believe it is in the nature of empirical science that theories explaining empirical observations improve over time and root-cause attributions become more accurate."

So their presentation is "more accurate". And the writers of the 2+ year-old papers readily admit it. All of which is another way of saying these were already accepted, if not yet concretely expressed, beliefs in the research community. There's much value in putting pen to paper and running confirmatory experiments. But that doesn't make it groundbreaking.

EDIT: Regarding "We demonstrate that these dereferencing effects exist even on the most recent Intel CPUs with the latest hardware mitigations, and on CPUs previously believed to be unaffected, i.e., ARM, IBM, and AMD CPUs". If you read closely it's clear that the context is Foreshadow, mechanism and mitigations. They're saying that when Foreshadow was published those architectures were believed immune. But applying the principles of their "more accurate" understanding you can in fact achieve Foreshadow (or Foreshadow-like) side channels on those architectures even with Foreshadow-specific mitigations. But, again, only subsequent to the initial discovery of both Foreshadow and Spectre did it became known that those architectures were more susceptible to speculative execution attacks than originally understood. Thus elsewhere in the paper it's admitted that more general and modern Spectre mitigations also prevent these "new" Foreshadow exploits.


> AFAIU, discovery of Meltdown slightly predates Spectre, or at least the point at which the implications began to blow up. The Meltdown and Spectre papers were published the same month Foreshadow was discovered and privately reported. It seems two researchers were involved with both Foreshadow and the earlier Spectre work, but that doesn't mean they would have or should have fully grasped the deeper relationships. And all of this happened over 2 1/2 years ago. Since then researchers' understanding of the underlying issues has improved greatly.

I'm under the impression that the side channels created by caches and speculative execution have been known publicly (but with limited reach/impact) as far back as the 90s. E.g. the 1991 paper "A Retrospective on the VAX VMM Security Kernel" mentions data security problems and the creation of side channels when processor caches are used.


This website sucks, it attacked me with two newsletters popups. Definitely shouldn't be on Hacker News.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: