Hacker News new | past | comments | ask | show | jobs | submit login
LLVM patch to fix half of Spectre attack (llvm.org)
433 points by Kristine1975 on Jan 4, 2018 | hide | past | web | favorite | 242 comments

Page was down when I tried to read it, but it's archived here: http://archive.is/s831k.

Its hard to get your head around how big a deal this is. This vulnerability is so bad they killed x86 indirect jump instructions. It's so bad compilers --- all of them --- have to know about this bug, and use an incantation that hacks ret like an exploit developer would. It's so bad that to restore the original performance of a predictable indirect jump you might have to change the way you write high-level language code.

It's glorious.

>Its hard to get your head around how big a deal this is.

It truly is difficult to predict all the ripple effects from this. I can't think of a single computer bug in the last 30 years that's similar in reach to this Intel Meltdown.

[EDITED following text to replace "Intel bug" with "Spectre bug" based on ars and jcranmer clarification. The Intel Meltdown can be fixed with operating system update patches for kpti instead of a complete recompile.]

Journalists like to overuse the bombastic metaphor "shaken the very foundations" but this Spectre bug actually seems very fitting of it. Off the top of my head:

- browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

- cloud providers have to recompile and patch their code to protect themselves from hostile customer vms

- operating systems like Linux/Windows/MacOS have to recompile and patch code to protect users from malware

Imagine the economics of all these mitigations. Also imagine that each of the cloud vendors AWS/Google/Azure/Rackspace had very detailed Excel spreadsheets extrapolating cpu usage for the next few years to plan for millions of $$$ of capital expenditures. Because of the severe performance implications of the bugfix (5% to 50% slowdown?), the cpu utilization assumptions in those spreadsheets are now wrong. They will have to spend more than they thought they did to meet goals of workload throughput.

There are dozens of other scenarios that we can't immediately think of.

> to this Intel Meltdown.

Wrong bug. Intel meltdown is bad, but not anywhere near as bad as Spectre which affects everything! No AMD immunity here.

Meltdown is far worse in practice than Spectre.

Spectre needs a more perfect storm of factors to lead to exploitation. No hardware is immune to it, but not all software is vulnerable, either. You need code execution and you need a vulnerable target and you need to somehow trigger the vulnerable targets path and that vulnerable target needs data you want.

Meltdown just needs code execution and you have full read access to all memory.

> Meltdown is far worse in practice than Spectre.

Far worse for an unpached system, yes.

But in terms of fixing the problem, Spectre is much worse, with a larger impact.

It's so bad that I suspect some people will deliberately run without Spectre protection.

Other than javascript in a browser there does not appear to be much in the way of actual spectre attack vectors, though. So once browsers are patched spectre appears to be basically "fixed" for most practical purposes.

Worth noting that many of the claims around Spectre are wholly un-demonstrated. The PoC only involves reading memory from within the same process (aka, the PoC read memory through a side channel that it had full ability to read anyway). Trying to exploit this in a different process is entirely undemonstrated, and there's not even any real discussion in the paper of how it would work. In theory it's doable, but the issues around how you do this once process switching and IPC enters the picture seems substantial yet the paper does not make any attempt to tackle any of that.

>Worth noting that many of the claims around Spectre are wholly un-demonstrated.

This is untrue.


Variant 2 is Spectre.

"This section describes the theory behind our PoC for variant 2 that, when running with root privileges inside a KVM guest created using virt-manager on the Intel Haswell Xeon CPU, with a specific version of Debian's distro kernel running on the host, can read host kernel memory at a rate of around 1500 bytes/second."

>>Worth noting that many of the claims around Spectre are wholly un-demonstrated.

>This is untrue.

Anything that is not demonstrated in a reproducible way (that is, some downloadable PoC code) is wholly un-demonstrated. To date, afaik, that goes for Spectre in whole.

Note that variant 2 of spectre has not been demonstrated on AMD and AMD claims to be unaffected ( http://www.amd.com/en/corporate/speculative-execution )

However, the description of spectre from spectreattack.com is this:

"Spectre breaks the isolation between different applications. It allows an attacker to trick error-free programs, which follow best practices, into leaking their secrets."

That, however, is not demonstrated by any of these PoC, not that I can find.

> Intel meltdown is bad, but not anywhere near as bad as Spectre

I mentioned Meltdown because multiple entities (gcc, llvm, Google Cloud, Azure, Linux, Windows, etc) have already converged on concrete solutions such as new compiler flags and patches which gives us a glimpse into the costs and severity. The Spectre bug may be "bigger" but it doesn't have complete consensus mitigation yet and in the meantime, we really can't tell people to "just keep your laptop unplugged from internet and don't run any apps to avoid the Spectre bug." The Spectre hole seems like it will be an open problem for many years and the new gcc/llvm is an incomplete fix.

Meltdown is fixed by page-table isolation (unmap the kernel's pagetables after syscalls are done). All the other bugs are Spectre, and these are the ones that require fixing every compiler.

This document has performance impact estimates from Red Hat Performance Engineering: https://access.redhat.com/node/3307751

As far as I can tell this only measures the impact of the Meltdown bugifx/kernel patch for Intel CPUs. Would be interesting to see the accumulated impact of the mitigation of Meltdown (KPTI) and Spectre (using Retpolines)

"The Red Hat Performance Engineering team characterized application workloads to help guide partners and customers on the potential impact of the fixes supplied to correct CVE-2017-5754, CVE-2017-5753 and CVE-2017-5715, including OEM microcode, Red Hat Enterprise Linux kernel, and virtualization patches."

The numbers seem to be for patches on several Intel CPUs for all three of the disclosed vulnerabilities. [reworded]

they'll be releasing a fix for RHEL 5. hats off to these gentlemen as the patches probably aren't anywhere close to applyable.

> - browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

Not meaning to be that rude, yet this itself summarises (and the issue perhaps will shed more light on) how stupid an idea is to let everybody run untrusted code from other peoples, let alone third party stuff like "privacy-intrusion-as-a-service" startups et aliae.

That won’t really be a problem for the cloud providers. That simply charge more because the customers will use more compute.

But it will cost them on I/O which they can not be passed to consumers as the price is contractual. Either the cloud providers on the hook or have to pass it somehow to Intel.

Which in some cases might make it cheaper for customers to use their own hardware, resulting in cloud providers losing business.

Maybe. But their own hardware will also be slower, no?

Maybe. When it is your own hardware you can do a different risk analysis. If you control all the code that runs on the system you don't need any mitigation. Most servers don't run arbitrary code form the internet - at least not intentionally. (a security hole that allows remote code execution is a real issue, but that risk can be managed)

My company doesn't need to mitigate the risk of me using these tricks. I'm not going to, but even if I was I have many other ways to get at sensitive data if I tried. If I get caught I'm fired and put in prison which is enough mitigation.

Is this a 5% to 50% performance hit on all workloads or specific workloads?

The 50% was for a microbenchmark of C++ code making heavy use of virtual. V-tables and jump tables get much more expensive. Any execution path that is known at compile time is not affected.

I wonder where that leaves Java, C#, Node.js etc. Do the VMs generate code that suffers from this?

Hmm, interesting question. On the hand JITs do devirtualization, so that on lots of code paths the indirect calls are replaced with direct ones. Which should mitigate the performance issues. However I think the JITs also need to insert the logic and branches for re-jitting and spectreulative execution. These might be additional indirect calls.

Would be interesting how it balances out in total.


And I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem. They were just the variants that the few people in on this found time to develop exploits for. There can now be security bugs in things your program doesn't do; it seems like there is room for nearly unlimited creativity in finding them.

From the spectre paper:

"A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further."

I think the main takeaway should be "speculative execution creates exploitable side-channels, and you should assume your hardware is exploitable until proven otherwise." AMD and ARM are probably still exploitable with unknown exploits, possibly even at Meltdown-levels of exploitability, but people haven't taken the time to reverse-engineer the microarchitecture enough to find the exploits.

If I were developing processors, I'd be having emergency meetings on trying to craft exploits to figure out where our processors' weaknesses are. While being happy that Intel is getting all the bad PR for this and I'm not.

AIUI the fundamental difference between Meltdown and Spectre is Meltdown involves speculating execution that loads memory across privilege domains and Spectre doesn't. If both AMD and ARM won't speculate memory loads across privilege domains, then it sounds like they're strictly immune to Meltdown.

> main takeaway should be "speculative execution creates exploitable side-channels, and you should assume your hardware is exploitable until proven otherwise."

Speculative execution does not create side-channels in and of itself, side effects of speculative execution does that. In this case the side effect of cache state. Just don't change the cache during speculative execution and there's no problem.

Why can't the processor isolate those cache lines during speculative execution?

And roll them back? It can, but it doesn't for performance reasons. What the performance impact would be is unknown but this requires a silicon change so unless you work at Intel you'll probably never know.

This needs to find its way into the hands of every manager of companies that make processors.

> d I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem.

Agreed. This is an entirely new class of vulnerabilities, and we're just at the beginning.

As evidenced by the Mozilla announcement.

ARM’s white paper details a variant 3a that affects some of their cores that are unaffected by var3 (and vice versa)

Is glorious the right word for it? We’re going back to the stone ages where processors couldn’t predict the targets of indirect jumps. More generally, this seems to me like an attempt to patch out of what is really a class of attacks leveraging fundamental assumptions about high-performance CPU design. Before, OOO just had to preserve correctness and (some of) the order of exceptions and memory operations. Now, it has to preserve (some of) the timing of in-order execution too? Where does this path end?

Legitimate question: on any non-shared non-virtualized system is there any reason to enable these workarounds besides running sandboxed applications such as javascript in a web browser (or flash/java applets/Active X, but those are not really super popular nowadays)?

For any other non-sanboxed application you pretty much have to trust the code anyway. Privilege escalation is always a bad thing of course, but for single user desktop machines getting user shell access as an attacker means that you can do pretty much anything you want.

As far as I can see the only surface of attack for my current machine would be a website running untrusted JS. For all other applications running on my machine if one of them is actually hostile them I'm already screwed.

Frankly I'm more annoyed at the ridiculous over-engineering of the Web than at CPU vendors. Because in 2017 you need to enable a turing complete language interpreter in your browser in order to display text and pictures on many (most?) websites.

Gopher should've won.

This unfortunately also affects almost all mobile apps and modern Windows installations, as they all run Javascript-enabled ads. Maybe this might cause Microsoft to reconsider what it allows to run on Windows but I don't see mobile ads going away any time soon.

>javascript-enabled ads

Good opportunity to get rid of them.

Or, as a compromise, no third-party javascript. Google can easily code up the 100 most common javascript ad formats and let advertisers pick from a menu.

Why would you need a Turing-complete language to describe advertisements? Why can't they be static images? Or static HTMLs?

From the advertiser's perspective, it wouldn't be a Turing-complete language, since they would only have access to standardized templates. Such a system would probably have to be implemented in Javascript at the browser level though, unless you could do it all with CSS animations.

I'd rather just be rid of them altogether. No compromise.

Do they really need 100 formats?

1. Video ad that autoplays.

2. Punch the monkey.

3. ???

> on any non-shared non-virtualized system is there any reason to enable these workarounds

Does the non-shared non-virtualized system have any encryption keys in memory that you want to protect?

Do you use full-disk encryption or ssh to other machines or use a cryptocurrency wallet?

If one hostile application running on my machine isn't sandboxed then SSH local keys are pwned anyway. Might as well install a keylogger or just hijack ssh-agent directly. Full disk encryption keys might not be but the app will have access to any mounted and unlocked safe. Ditto for cryptocurrency wallets without a hardware token (and even with a hardware token if the app can get it to sign a bogus transaction).

I don't think this particular vulnerability significantly increases the surface of attack for any non-sandboxed application running on my computer. There are much easier and straightforward ways to get access to anything an attacker with shell access may want that don't involve dumping the kernel VM. So in my situation the only vector of attack I'm worried about is JS running in the browser since I gave up on javascript whitelisting long ago when I realized that most of the web is unusable when you don't allow heaps of untrusted scripts to run all over the place. I don't have time to audit the source code of every random website I visit.

These questions are only relevant if you're not controlling and trusting all the code you're running that system. For a consumer system this is true if (and basically only if) you're running a web browser on that system.

If you're confident in the software you're running on a non-shared hardware, both Meltdown and Spectre are non-issues requiring no mitigation. This is a narrow class of systems, but it exists.

> For a consumer system this is true if (and basically only if) you're running a web browser on that system.

… which is pretty close to universally true, especially when you consider how many people use apps which are based on something like Electron. If those apps load code or, especially, display ads there's JavaScript running in places people aren't used to thinking about.

> If you're confident in the software you're running on a non-shared hardware...

and that you won't be hit by a remote execution vulnerability.

That's what being confident in the software means.

As the owner of gopher://jdebp.info/1/ and the author of gopher://jdebp.info/h/Softwares/djbwares/guide/gopherd.html , I disagree. GOPHER needs a lot of improvement merely to learn the lessons that people learned with FidoNet in the 1980s.

Followup legitimate question: the only way to read data is to control the results of a speculative execution or fetch right?

For JavaScript won't it be sufficient to check all the calls out of it so that they can't pass data that controls an exploitable speculative execution, and also generate JIT code so the JS itself can't create exploitable instructions. The API will have to be heavily scrutinized and the JS will run somewhat slower.

If the rest of the browser code is vulnerable, but the JS code can't control the speculative execution then it should be safe to run any JS.

Javascript is theoretically fixable -- what's needed is less fine-grained timing capabilities. I think it would be very hard to completely eliminate the possibility of controlling speculative execution as long as you can predictably invalidate the CPU branch predictor, which can be done even at a high level with if statements. Unless you get rid of the notion of contiguous arrays (making predictable word-aligned cache manipulation nearly impossible) and potentially remove the JIT completely in favor of an interpreter, it's hard to completely be rid of this class of an attack when executing any sort of code on the processor. Those steps are probably possible, but the JS performance hit that ensues would be the death knell of any browser.

That said, without some way of extracting timing at the granularity of 10s of instructions, this attack is moot. So that's likely going to be the mitigation. Unfortunately, the web frames used in some apps are infrequently if ever updated, so JS engine updates there are gonna be hard.

Why does it matter if they can cause the CPU's predictors to guess incorrectly if they can't control the target address of the branch or memory access?

Example: "if (a < length) return data[a]". If "a" comes directly from JavaScript then they trick the CPU into fetching data[a] even if it's invalid speculation and thrown out. But if there's a safe barrier between "if (a < length) { prevent_speculative_execution; return data[a]}" then they cannot learn anything.

I concede that safely checking all data coming from JS code to the browser would be a huge task, but pretty sure it would work to fix the problem for JavaScript although not in general, between processes with shared IO pages and such.

Honestly, you probably don't even need the barrier in your example. Getting data[a] into the cache is no information leak if the attacker already knows a. That's why the example in the Spectre paper uses an additional level of indirection.

Yes thanks to HN being so quick to freeze comments I was unable to fix the example.

Point is, a JavaScript program in isolation cannot read anything, it has to interact with the other target code somehow. If that interaction (the data passed over the API call) can't fail after a certain point and can't be used to read data before that point, then the JS can't read anything.

> Where does this path end?

It ends with the performance advantages of OOO execution being effectively negated by the workarounds to address the security issues it causes.

The following parable is edifying: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

Seems like the ultimate end-game here is to have mini-vms for every process using CPU-level ring protection. If you can't speculate across privilege levels, only inside them, it isn't a security problem anymore.

Or time to have Kernel live on dedicated cache not ever accessed/shared with anything else. Let the CPU speculate all it wants, just not when playing in the kernel's cache. It may even be time for dedicated kernel cpus/cores.

Reading Kernel memory (Meltdown attack) is extra bad but regular user processes being able to read each other's memory (Spectre attack) is also very bad and not solvable by isolating the kernel.

Im less worried about my steam client reading my chat cache than something inside my web browser reading the keys that encrypt my home directory. Short of abandoning all sharing, the least we can do is isolate kernel cache.

That depends on what you are chatting with. My chatlog would be very interesting to our competitors. The key that encrypts my home directory isn't useful because the firewall blocks your access to my home directory (that a different layer of security).

Why one solution is put secrets in kernel and use meltdown mitigation to protect.

> It may even be time for dedicated kernel cpus/cores.

Oh yes, I agree! One needs to be able to phycically (un)lock the "kernel fpga" like a door without remote capabilities, except for server cpu's. Or whatever chip designers believe is a good "physical kernel embodiment" other than fpga.

EDIT: I know it's not really clever, but I would really enjoy hearing any solutions that doesn't try to fix it at the hardware level.

Qubes OS [1] does something like that

[1]: https://www.qubes-os.org/

Yet Meltdown nuked exactly that.

> mini VMs for every process using CPU ring protection

Yes. We should really start to learn from history, MULTICS operating system had already 16 CPU ring support back in the early 1970s. MULTICS is the mother of UNIX, its smaller child. MULTICS had so many advanced features that barely got implemented (often reinvented) in newer OS. It's time to read old docs and ask the old devs who are still alive. (Another such often overlooked gem is Plan9, but it's better known thanks to Go lang devs).

Older Intel CPUs only supported 2 rings. Modern Intel CPU supports only 4 rings. Windows and Linux use ring 0 for kernel mode and ring 3 for user mode. And Intel introduced a ring -1 for VT.

  "To assist virtualization, VT and Pacifica insert a new 
  privilege level beneath Ring 0. Both add nine new machine 
  code instructions that only work at "Ring -1," intended to 
  be used by the hypervisor
It's time for modern operating systems to use more rings, and modern CPUs to correctly protect between different rings.



We have different utility functions, you and I.

tptacek exploits computers for a living, so it's glorious for him :)

It's not that it makes the practice of breaking into computers that much more interesting so much as it makes the underlying field much more interesting to work in. The engineering problems just got a lot more complex. We're all taking an attack vector seriously --- microarchitectural side channels --- that we weren't taking as seriously before, except as an abstract threat to crypto and a way of defeating a mitigation --- KASLR --- that nobody believed in anyways.

What's glorious is that serious software security people now have to start being literate about what it means to reverse engineer and dump the branch history buffers on different CPUs. Getting dragged through this kind of minutiae is the reason I'm still in this field after 22 years.

And I'm just a bystander here. Imagine what it must have been like for Jann Horn over the last several months!

This subsection describes how we reverse-engineered the internals of the Haswell branch predictor. Some of this is written down from memory, since we didn't keep a detailed record of what we were doing.

... because shit was so crazy while they were working this out that they didn't have the cycles to write everything down!

I'd be surprised if other Googlers like Christian Ludloff (also of sandpile.org fame) and Dean G were not involved. They know x86 better than many engineers at Intel/AMD, having been in charge of the most performance/cost critical code (e.g. tuning search serving down to the last cycle), as well as qualifying new platforms and identifying a steady stream of CPU bugs.

As someone specializing on a different field, this comment reads to me as if you were an astronomer being excited about discovering a gigantic meteor heading towards the Earth :)

Sure, if your subfield of astronomy was all about practical ways to adapt to living on meteor fragments!

Some of us would be terrified, but also excited about the prospect of studying a supermassive black hole from the inside. You know what I mean? Some things are just so singular and incredible, and so to the heart of a given field of interest, that “glorious” is a perfect word to describe it. An alien invasion of Earth would be a nightmare, but for some it would be the moment they could finally move past the realm of the hypothetical.

Passion is passion, even when it’s terminal.

> they didn't have the cycles to write everything down!

hahaha that was a good pun! Do you have a link to Jann Horns personal blog or Github? I've not never heard of him before.

Isn't that a bit like a firefighter saying your house burning down is "glorious"?

How many firefighters do you know? I guarantee you every one of them gets excited at the prospect of a "good" structure fire.

Maybe more like a meteorologist excited by a really bad storm.

The result might not be glorious but the fire is amazing to watch.

To the extent that he exploits computers for a living (e.g. pentesting) and not stops exploits for a living, it seems more like a (presumably law-abiding) arsonist calling houses burning glorious.

Arsonists set fires. Vendors create vulnerabilities, not pentesters.

Ooooh, I dunno, I might argue that vendors have their role in creating pentesters. ;-)

I like to think of a lot of vulnerability discovery and research as solving a puzzle. In the sense that this puzzle has so many far reaching implications makes it totally compelling to me. tqbf says "glorious", and I couldn't disagree.

[Edit] Or, how far down does the rabbit hole go?

Additionally, it is quite fascinating to me to compare the complexity of modern CPRUs with, say, a compiler.

> leveraging fundamental assumptions about high-performance CPU design.

I believe the generalized fix is to restore the entire CPU state after a mispredict. You’d either need to add an extra copy of the entire processor state (tens of megabits) for every simultaneous predict you support ($$$) or keep track of how to revert all changes and revert them one at a time ($, slow).

This is harder than it seems, because once cache is deleted you can't just un-delete it, you'd have to go back to memory and pull it again.

Only the "extra copy of processor state" thing is really viable. You have to have a speculative cache and buffer in reads that only get flushed to the main cache once they're confirmed to be valid, which is enormously complicated. This facility already exists for writes, but now it needs to exist for reads too.

GP is absolutely correct that this is a fundamental assault on processor design as we know it, the speculative execution concept is going back to the drawing board for a major re-think.

I can’t help wondering what igodard’s day is like so far...

"I'm walking on sunshine..."

Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?

Sorry for quoting wikipedia, but I'm not at school, hah! [1]

'''' TSX provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision (HLE) is an instruction prefix-based interface designed to be backward compatible with processors without TSX support. Restricted Transactional Memory (RTM) is a new instruction set interface that provides greater flexibility for programmers.[13]

TSX enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses, while aborting and rolling back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.[13]

In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow (fallback) path is still a normal lock. ''''

[1] https://en.wikipedia.org/wiki/Transactional_Synchronization_...

Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?

CPUs have been vulnerable to this attack since 1995. How did it collectively take us 22 years to figure this out? I know it's a highly esoteric complex attack, but there's no shortage of clever hackers in the world.

- we didn't have browsers compiling JavaScript into machine code

- we didn't have hyperconverged cloud infrastructures running arbitrary entities' code next to each other

Sounds like it's time for me to give up on the so-called "modern web" and install noscript.

You can have both "modern web" and block most JavaScript. You just need to keep adding scripts to the whitelist until sites you trust work again. It's a bit arduous at first, but possible to get used to once everything you visit daily has been added.

Unless, of course, the site you trust is hosted in a shared hosting VM which is also vulnerable to spectre or meltdown. In which case, you can’t trust the scripts.

spectre can read, not write.

If I can read arbitrary data, what’s stopping me from reading the credentials I need to write data?

What if I read the sites TLS/SSL keys? I could MITM the connection and inject JS to do more malcious thing.

Or even easier get the ssh key for the VM. Then do what ever I want.

If it can read the right data (private keys, etc.), then it can write whatever it wants.

Great answer.

The web issue is easier to mitigate if not fix completely since there is already a massive infrastructure for widespread, rapid browser updates, and crippling Javascript to eliminate attack vectors such as high-resolution timers is completely acceptable.

The cloud/vm infrastructure is a massive problem though. It is 100% required that VMs be fully isolated. The entire infrastructure breaks down if they aren't.

It's sort of been well-known that speculative execution opens up the possibility of side-channel attacks for quite some time. Hell, it's long-known that SMT (e.g., HyperThreading) can leak keys in a not-really-fixable way.

What's new and surprising is the power of these side-channel attacks--you can use these, reliably, to exfiltrate arbitrary memory, including across privilege modes in some cases (apparently, some ARM cores are affected by the latter vulnerability, in addition to Intel).

Honestly we knew about this in the 70s. Mainframe/time share systems had lots of protections against attacks like this. The problem is mainstream computing when cheap/single user and attempted to build a multi user/untrusted code execution environment on top of it. Now it's come back to bite us in the ass.

>there's no shortage of clever hackers in the world.

Are you sure?

Well, these are workarounds because fixing the problem at the source is hard.

The right fix is to prevent speculatively executed code from leaking information.

Here that perhaps means associating cache lines with a speculative branch somehow so that they aren't accessible until/unless the speculative branch becomes the real branch. (I have no idea exactly how that would be done or what the performance cost might be... I'd really need to know the details of how speculative execution is implemented in a particular CPU to even be able to guess.)

Agreed. I haven't had this much fun thinking through the implications of a new exploit technique in a long time. It is truly beautiful.

Prediction: This will be just like any vulnerability disclosure. The infosec people and media will scream hysterically about how game changingly bad it is. The OS vendors will patch, and business will go on as usual.

i know this came out as a leak, but makes one wonder how "responsible" even a Jan 9 official announcement would have been. the scope is absolutely terrifying. this bug will be exploitable for a very long time.

They had like 6 months or so... how is more time going to make things less painful?

Jan 9, 2019? 2050? How much longer is long _enough_?

i guess at minimum it's worth asking how many major hosting providers have been fully patched at the time of disclosure. in addition to browsers and OSes.

You don't "think infosec". If I'm an attacker and I notice both amazon and azure rebooting all their systems I know something is up. When I see that both Microsoft and Redhat employees are working overtime it gives away more information. All I have to do is crack on of their patched systems and I can bin diff it and figure out what is up.

Then I sell it off to blackhats before the rest of the world is aware.

When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%.

Ouch! This is independent of other performance hurts, like from the kernel syscall overhead that was the hot topic yesterday. This is pretty crazy.

That's bad. A single 5% hit might not be the end of the world, but 5% here and 10% there and another 5% over there in the common case adds up badly enough. Doubly-pathological cases (indirect calling-heavy code calling lots of syscalls)... a 50% slowdown and a 30% slowdown combines to a 60% total slowdown. Yeowch.

Will be intrigued to see how processor manufacturers respond to this. If they were even slightly relaxed about it prior to disclosure I expect there's going to be some very hurried attempts to engineer some solutions pronto. This is the sort of thing where it might even be worth throwing away all of your future roadmap plans and just getting a revision of the current chips out there ASAP, whatever that may do to the rest of your roadmap.

Sounds like it could be great for processor manufacturers. In the age where CPUs don't get faster, there's finally a reason for customers to buy new CPUs again!

Not really, once a program is compiled with -retpoline, new hardware won't bring back reliable branch prediction.

I'd hope maybe, just maybe, this would be enough to put a focus on compilers producing code that ends up using processor-optimized paths chosen at runtime, to avoid "overheads ranging from 10% to 50%".

Though, in this case, that would essentially mean making the entire executable region writable for some window of time, which is clearly too dangerous, so I guess the 0.1% speedups from compiling undefined behavior in new and interesting ways, will continue taking priority.

I mean, it's a compiler flag right, obviously whoever's going to run a program on an unaffected platform will take the effort to recompile everything with the flag removed.

Just the same way every serious application currently provides different executables for running on systems where SSE2, SSE4.1, or AVX2 is present.

Horizontal scaling though. If every individual processor is slower, more are needed.

Not quite - lots of "serious" applications these days are written to target JIT compilers, which would be capable of switching retpoline on and off depending on need.

Funnily enough, I ended up not including a PS starting with "A sufficiently smart JIT, however..." ;)

I'd rather have linkers go down a similar road that the Linux kernel went on a over a decade ago: provide binary patches in a table (essentially alternative machine code) and have the linker patch the correct alternative depending on the CPU and it's bugs. The Linux kernel already contains an "alternatives" segment which is exactly this kind of list of patches. It would be trivial to add such a table to ELF and PE formats and have the runtime linker process that while it's plowing through the code anyway.

Something like this exists with function multi-versioning: https://lwn.net/Articles/691932/

For example, glibc chooses optimised machine code for memcpy depending on the CPU it runs on.

New CPUs could just convert the retpoline back to the original jump in microcode, and enable the now timing-attack safe branch predictor.

But even then a performance hit remains due to the increased code size of the instruction sequence.

There's a lot of space left in code already to insert trampolines later. And in the end of the day most memory is data, not code.

And eventually, this code will get replaced anyway (just like today there are often multiple code paths in binaries, and a lot of code is compiled for host anyway).

In any case, the performance impact of a couple extra bytes per indirect call is small compared to disabling branch target prediction.

Makes me wonder if that's Intel's PR plan (hence their spin on the story). Assuage the general consumer and mainstream media, point the finger at the OS vendors if necessary, and fix the bug in their next-gen chips.

When this all shakes out, the general story is going to be "upgrade sooner, current-gen Intel chips are x% faster", where x is going to be a larger number than it was a week ago.

It's more-or-less the Apple battery story all over again. Current devices are going to be slower, newer ones are going to be faster. Even if you know the why and the how, you're still in the same place as everybody else (at best, you could upgrade just the chip if your MB is new enough, but you're still buying Intel). Unless there's some clear way of imposing the external cost of this bug on Intel, it's a win-win for them.

Here's another potential PR spin: the major cloud vendors got out ahead of this so that they could point the finger at Intel, making customers not ask "Why did you sell me this fundamentally insecure server for so many years?"

Because this isn't really that new nor is it really a bug. Meltdown could be a bug because the asynchronous access of memory in other protection rings is unsafe, but the rest of it is just a normal side-channel attack, an abuse of otherwise-innocent data.

How much moral culpability should rest on the proprietors of software virtualization technologies that don't really safely encapsulate anything due to hardware incongruencies with the modern "sandboxed computing" model?

Surely there have been engineers over the years who've questioned the propriety of misleading users into seeing VMs as fully-encapsulated systems when the hardware just fundamentally doesn't support that type of native encapsulation. Such persons have probably spent the last several years being shunned for being old fogies overly attached to their rust buckets. It'd be interesting to hear some of their stories.

> Even if you know the why and the how, you're still in the same place as everybody else

If you know then you at least have the option to turn off PTI and compile your performance-critical binaries without retpoline.

And maybe disable microcode updates if intel's updates eat another few percent.

Multicore performance is getting faster at a very high pace.

Ryzen 1700X has 3.3 times the multicore performance of my i5 3450 from 2012 for almost the same price. With 7nm we will see at least another 50% increase in multicore performance on top of that.

Will linux distributions automatically use this compilation option (or its analog in GCC) for packages from now until forever, even if a faster mitigation is added to CPUs?

I use a binary distro and definitely don't want to be running massively slowed-down software mitigations on a corrected CPU.

Although actually, we already are - binary distros already don't take into account per-microarchitecture scheduling, nor any ISA extensions above a common baseline (e.g. just SSE2, no autovectorising to AVX2 etc).

This might provide enough impetus to restructure how binary distros work and get the whole distro compiled with some newer CPU flags (march={first corrected architecture}?) but in the short term i assume every package will take the hit.

Great time to learn about source-based distros!

Not to worry, it’s “just” 5–10% for “well tuned servers using all of [performance-saving] techniques”.

The sentence that follows the line you quoted is

> However, real-world workloads exhibit substantially lower performance impact.

I feel like you could have mentioned this.

I thought it would be Moore's law that forces people to care about their codes' performance. I was wrong, but am nevertheless happy about the recent developments. Programming will become an art once again :)

Agreed. dlopen() should wipe branch prediction caches by default, we need to add additional flags that turn this off.

It's ok. The 9th generation of Intel will be 50% faster and the most secure CPU ever made! /s

This is brutal for all interpreted/JITed languages and all statically compiled languages with dynamic dispatch. I can hardly imagine worse news for performance oriented engineers. And what's worse is that dynamic libraries will probably need to be rebuilt with these mitigations in mind, so nearly everyone will pay the cost even if they don't need it.

I feel bad for all of the engineers currently working on performance sensitive applications in these languages. There's a whole lot of Java, .NET, and JavaScript that's about to get slower[1]. Enterprise-y, abstract class heavy (i.e.: vtable using) C++ will get slower. Rust trait objects get slower. Haskell type classes that don't optimize out get slower.

What a mess.

[1] These mitigations will need to be implemented for interpreters, and JITs will want to switch to emitting "retpoline" code for dynamic dispatch. There's no world in which I don't expect the JVM, V8, and others to switch to these by default soon.

This mitigates spectre variant #2, branch target injection. We also have a mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider the example code from https://googleprojectzero.blogspot.com/2018/01/reading-privi...:

    unsigned long untrusted_offset_from_caller = ...;
    if (untrusted_offset_from_caller < arr1->length) {
     unsigned char value = arr1->data[untrusted_offset_from_caller];
If instead we did:

    unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];
Would this produce a data dependency that prevents speculative execution from reading an out-of-bounds memory address? (Ignore for the moment that a sufficiently smart compiler might "optimize" out the modulo here.)

A new thing that's going to become a standard part of systems engineering: deciding whether any given system needs to run with or without these kinds of protections. Do you want the speed of speculative execution or do you want Meltdown/Spectre protection? In some cases lack of protection is fine. But figuring out the answer for any given system is often going to take expert-level security knowledge. Security is all about multiple layers of protection, and even a non-public facing machine might benefit from these layers depending on the context.

Spectre relies on tricking the CPU into branch predicting its way into accessing protected memory, no? Is it not possible that we can keep most of the performance benefits of speculative execution by somehow having a built in "Hey, never ever speculate that I'll want to access this region of memory" sort of thing?

I read an ars technica article that this would be a possible solution but isn’t right now because the hardware to check access rights isn’t fast enough yet

Uh, isn't this what AMD does?

CPUs should have a single instruction that wipes branch prediction caches. I would have it off by default, and add to the C/C++ spec this as a standard library macro or pragma. Easy peasy.

You only need to wipe between syscalls that have side effects. Number crunching AVX heavy subroutines should never have to deal with safety once entered.

This is what KPTI does, wipe caches, and if you did this often in user code, performance degradation would be all over the place. Also, heavy AVX routines that use encryption keys... would be great to attack.

More likely, this is a shift back to in-order processors, if the solutions aren't workable. If you're in an embedded scenario, sure you can make more trade-offs and have more control, but it's not going to look great when it happens to get hacked.

It has an interesting performance impact on calls to dynamic libraries. One alternative approach would be to avoid the indirect calls through not using '-fPIC --shared' when building shared libraries but '-mcmodel=large --shared'. This causes the relocations to happen at the direct calls and not through a GOT.

The obvious drawback that it effectively disables sharing code in memory, it would still allow sharing code on disk though. So it would be a middle ground between the current state in dynamic and static linking.


This patch apparently implements this mitigation: https://support.google.com/faqs/answer/7625886

And once one knows the technical background, one is better positioned to consider the response of Linus Torvalds to the idea that the entire Linux kernel be recompiled for all x86 CPUs with a compiler that implements this.

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

This would be more interesting if the attack that the compiler mitigations was designed for wasn't cross-vendor, cross-architecture.

0 usages of word "fuck"

1 usage of word "shit"

Not bad for Linus.

This is a really good writeup, thanks. I'm curious -- how often are google support faq articles deeply technical like this?

I, for one, am eternally grateful for the incredibly bright people who take the time to patch this sort of stuff.

And the people who invented computers, programming languages, the internet, and all the learning resources, that allow me to get a paycheck writing extremely high level application code that feels like a coloring book in comparison. Truly the shoulders of giants.

Also to Google for finding and documenting it so well. Google security team really should be given an award.

I have a hunch that the era of side-channel attacks is only now dawning, and that we should expect many more painful exploits and cumbersome mitigations in the coming years.

What do people more knowledgeable in the field think about this?

What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your CPU, whether it is another user's program, a non-root executable, or a JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our assumptions of what we can and can't do securely.

Define "trusted"? Who do you trust to do your verification, and how much does it cost?

Well, critical applications, like flight systems, run on a different ecosystem and are verified. (And it costs a lot.)

But my usecase might be a physical computer that isn't networked which does data science with some programs and prints out results.

These patches are focused at Amazon and cloud providers that are in the business of running separate individual's applications on the same machine.

In the consumer world, the slope would be browser scripts and user applications that aren't running as super. But even then, do you download and run software that you expect might steal information or damage your computer?

These are fundamental security questions. Creating rings and sandboxes are what create the assumptions of privacy and security.

Oops I basically wrote an identical reply before reading yours.

I'm not more knowledgeable than you, but I think I agree.

side channels have always been some of the most insidious exploits. Many are basically un-solvable (timing attacks are always going to leak some information, and compression is basically completely at odds with secure information storage), many more are easily enough overlooked that it would be easy to maliciously include them without raising any eyebrows, and the "fixes" for them almost always murder performance.

I think the only real fully-encompassing solution to this is a redesign in how we use computers. Either a massive step backwards on performance and turning off most automatic "optimizations" until they can be proven through a much more rigorous process (both in compilers, and in hardware), or a significant change in how computers are architected adding more hardware level isolation for processes and systems running on the machine (just daydreaming now, but something like a cluster of isolated micro-CPUs that run one application only).

How about not running untrusted code? That seems to be much easier to do and won't kill performance. Kill js on the web. Run only apps signed by Microsoft. Develop ML based malware fingerprinting that can recognise timing attack patterns. Throwing OO execution away shouldn't be an option in the long term.

It's not just "untrusted code" any more.

The "true secure" way is not running any untrusted code, not connecting to any untrusted networks, and not accepting or storing any untrusted data.

At that point you can't run a computer. It's just not an answer to say "don't let bad things in" because bad things are always going to get in.

And with side channel exploits getting more and more common, and with them being worse and worse, running any code on your machine is basically giving that code any and all of your data on the machine...

Telling users (even highly technical users) to "never ever run any untrusted code ever, and if you mess up once and run untrusted code you have completely ruined the trust of the whole machine and need to start over from scratch while assuming all of your data has been compromised" is not only infeasible, it's impossible. If this is the case, we have lost the "security" game.

It's easy to say "throwing oo execution away" is an overreaction, but if it's necessary to allow multiple programs to run on one machine without them all having what amounts to full access to one another's information, then it might be necessary. Already we know that we can't use compression with encryption, we can't use any kind of "exit early" with most kinds of encryption. It might just be that OO execution is fundamentally opposed to secure computing. At it's core, it's letting the processor do different things depending on what it can see is coming, it's almost a definition of an oracle! That's always going to be a very dangerous game to play.

It could even just be that CPUs need a "secure computing" mode, or maybe even a secure co-processor that disables all of these optimizations. But at the very least, I think changes are going to be necessary, and a 15% perf reduction might be the least of our worries.

Eliminating out of order execution completely would result in much more than a 15% performance reduction.

That's what I was trying to say a bit more tongue in cheek...

"Kill JS on the web" is easy and only mildly inconvenient using NoScript, but that doesn't mean most laypeople are going to do it.

Chrome and FF need "execute JS" to be an explicit per-site permission, similar to the permissions model of native smartphone apps.

Google will never do this because they're an ad company and care more about targeting ads than protecting Chrome users.

Surprisingly, this is how chromium(chrome?) works. You can set it to block JS by default, then you get a <> symbol in the address bar. If you want to enable JS on a domain, right click it and enable. Now all the JS on that domain loads normally. It isn't as fine grained as NoScript, and there's not an easy way to temporarily allow. However, it's probably more normie friendly than NoScript that way.

> need "execute JS" to be an explicit per-site permission

I would personally like that, but it's too optimistic to think that the average user would gain much security from it. Most of them will discover that things work fine when they click OK/Accept, and sometimes break when they don't, and so they become conditioned to just click OK/Accept all the time to avoid potential hassle. So in the end it just reduces productivity without increasing security much.

There is no chance JavaScript/WebAssembly will get disabled in browsers. However, it is possible things like Zero-rating, Certification Authorities, Google AMP and Facebook Instant Articles could eventually help to lockdown the web to only trusted domains.

However this wouldn't completely solve the issue (see Google Play Store).

> I think the only real fully-encompassing solution to this is a redesign in how we use computers.

You can do a lot by separating these speedup mechanisms across security boundaries. The biggest factor that makes this hard to mitigate is in-process security boundaries. Total isolation between processes is neither necessary nor sufficient.

"What do people more knowledgeable in the field think about this?"


(from 2007)

RISC-V impact? With all the reports of these attacks, I have not seen mention of risc-v. Since they are in the process of finalizing a lot of specs including memory model and privileged instructions, I wonder if there will be last minute changes to mitigate these vulnerabilities.

The problem (in my understanding) is not with the specification of the x86 ISA, but with the implementation of the speculative execution micro-architecture and probably the memory sub-system as well. That is why Intel is so badly affected by the problem, but not AMD, despite them both implementing the same instruction set.

RISCV has already had to fix its memory consistency model, so it is not without problems. But it that is a spec bug, not an implementation bug. Whether there is an out of order, speculative execution RISCV core in the wild which suffers from this is as far as I know very unlikely. If there is, no doubt it's designers have had a busy time lately.

My understanding is that meltdown is due to the implementation, but spectre occurs due to issues with the specification.

At the risk of being a HN self-parody, I’ve also been wondering what this means for the Mill...


The only speculation done on the mill currently is on whether it will ever be released, so I think they'll be safe.

Don't worry, I was wondering this myself. There doesn't seem to be anything official or even any discussion on the forums yet.

The details that this attack depends on are outside the architecture of the system, in the microarchitecture. A cpu of almost any architecture can be vulnerable or not depending on how it was implemented, thus Ryzen is immune to the worst variant while both Intel and the fastest Arm cpus are vulnerable.

I'd presume that the slowest RISC-V designs are immune due to not speculating enough, while any high-performance implementation is vulnerable.

As of right now every single CPU that does speculative execution. (I.e. runs both sides of a branch then throws away the one that didn't end up being valid.)

RISC-V is an ISA, so it depends on the implementation.

From one of the papers:

>> While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruc- tion set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

I imagine BOOM and BOOM v2 may be vulnerable as they support OoO execution.

I remember doing tricks like this in 6502 assembly and in other early processors. Amazing that to stop these attacks you have to come up with clever tricks again. Back in the 80's I would have never imagined this type of attack being something to worry about.

>early processors

Early processors had speculative execution? I thought this had been added to Intel/AMD/ARM about 20 years ago?

I guess he means the retpoline. On the 6502 there is no indirect jump instruction, so you need such tricks just to achieve an indirect jump at all.

There's an indirect jump instruction. It's not very good though, and has a notorious bug with addresses ending in 0xFF.

Gah, you're right. Guess my memory is fading. There is indirect JMP, but no indirect JSR or indirect branches. And the indirect JMP as you say is not very useful.

I think it means they're tricks for better performance when you _don't_ have speculative execution.

Speculative execution is as old as branch prediction, which is very, very old.

This brings to mind Ken Thompson's "Reflections on Trusting Trust"[1] -- after all, all I have to do to write code with the exploit is be able to remove the patch and rebuild the compiler and build some executables.

Trusting in a compiler you hope was used to build all the executables on your system isn't trustworthy enough to be the final solution.

[1] https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

Every modern compiler usually has extensions that allow for bits of assembly to be inserted alongside the usual C or C++ code.

Unless the compiler is also patched to either disallow inserted assembly, or to modify the inserted assembly (this being both hard and dangerous), someone who wants to exploit the bug will just add their own inserted assembly code that exploits the bug, and a patched compiler won't help one bit in that case.

Just as a FYI, according to:

* https://lkml.org/lkml/2018/1/4/432 * http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...

It appears that Skylake and later can actually predict retpolines? Some hardware features called IBRS, IBPB, STIBP (not a lot of details on this are out there) are supposedly coming in a microcode update.

The problem I see with this concept is ROP mitigations like Intel’s control flow enforcement don’t seem compatible with intentionally using tweaked addresses with ret. The address they inject won’t match the shadow stack and the program will be terminated.

This is true, and so far, nobody has a better idea. (IE i would expect that unless someone comes up with one, that hardware CFE in its current form dies and won't happen for Intel until the processors are changed in a way that mitigation is not needed)

Isn't it the case that the Itanium architecture would not be vulnerable to Spectre because it moves the onus of branch prediction from the CPU to the compiler?

Assuming the compiler knows what it's doing :)

That was always the problem with the Itanium compilers. They were crap because they couldn't benefit from the years of tuning traditional architectures enjoyed.

Also the compiler had to be absolutely brilliant to rewrite the serial branching code most programmers wrote to work with the EPIC model. They had some good results optimized math-heavy code but the general purpose code ended up with too many nops waiting on results.

I can't help thinking of how the early-ITS approach to security (not only was there none, but looking at other users' work was a deliberate feature) was embraced by its users. I'm way too young to remember, but it rings a bell somewhere down my heart.

There's a lot of prominence being given to all kinds of damage malicious users might inflict, and ways to prevent or mitigate, but little to the malice itself. Whence does it arise? What emotions drive those users? What unmet needs?

Meanwhile, when these slowing-down patches for Sceptre and Meltdown arrive, I intend to not run them, to the possible extent. I intend to keep aside a VM with patches for critical stuff, like banking or others' data entrusted to me. But I don't want my machine to be slowed down just because someone, sometime, might invest effort in targeting these attacks at it. Given how transparent I want to be with my life, that's a risk I'm willing to take.

Most attacks aren't targeted at specific people. Hackers don't want to read your emails, they want your credit-card information, digital account passwords, or to compromise your computer to use in their botnet.

Sure, you might not have anything you want to hide in your life, but the drive-by javascript doesn't care about your secrets - it'll hack you anyway. Best-case scenario, you lose access to a bunch of accounts you used to use and need to create new identities from scratch. Worst-case, they clean you out financially, steal your identity, etc.

retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

An indirect jump is when your program asks the CPU to transfer control to a location that your code itself computes: "jmp %register". Compare to a direct jump, where the destination of the jump is hardcoded into the jump instruction itself: "jmp $0x100".

Most programs have indirect jumps somewhere. Higher-level languages with virtual function calls have lots of indirect jumps, because they parameterize functions: to get the "length" of the variable "foo", the function "bar" has to call one of 30 different functions, depending on the type of "foo"; the function to call is read out of a table at some offset from the base address of "foo". Or, another example is switch statements, which can compile down to jump tables.

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

"Call" and "ret" are how CPUs support function calls. When you "call" a function, the CPU pushes the return address --- the next instruction address after the "call" --- to the stack. When you return from a function, you pop the return address and jump to it. There's a sort of "jmp %register" hidden in "ret".

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

The obvious next question to ask here is, "why don't CPUs predict and speculatively execute rets?" And, they do. So the retpoline mitigates this: instead of just "call/pop/jump", it does "call/...pause/jmp.../mov/jmp", where the middle sequence of instructions set off in "..." is jumped over and not executed, but captures the speculative execution that the CPU does --- the CPU expects the "ret" to return to the original "call", and does not know how to predict around the fact that we did the switcheroo on the return address.

How'd I do?

> Or, another example is switch statements, which can compile down to jump tables.

Is the overhead of the retpoline such that it's no longer a benefit to compile switches to jump tables?

Pretty well, thanks. What I'm wondering is: The attack is using the data fetched into the cache from a speculative indirect jump to do a timing attack and discover what's in the former, correct? Why can't the CPU mark the cache area it fetched in the speculative jump as "stale" and discard it? Why wouldn't that fix the problem?

I don't know any way to leave enough breadcrumbs to do that in four clock cycles, do you?

Intel is going to release a microcode update for BTB control apparently.

retpoline is just a convoluted way of doing an indirect jump/call designed to make branch prediction entirely useless. It's a novel concept because doing this is completely opposite to making a program run faster.

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:


Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

> Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

> (Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

Wouldn't the performance impact be dramatic ? In this[1] example there's a 6 times slowdown between situation with and without correct branch prediction.

[1]: https://stackoverflow.com/questions/11227809/why-is-it-faste...

I don’t think this is about all branch prediction—just about branch jump prediction. Like, “jump to %rax, but don’t try to guess what %rax is before you’re 100% certain.” Not the same as “jump to a known location if you think this here register is true/false”. As far as I can piece together, the exploit relies on making the branch predictor think the branch target will be somewhere you stored malicious code , which will then be executed by another process, e.g. a kernel. If it does harm before the branch predictor catches that it was wrong, you’re home free.

I’m not sure, but that’s what it looks like, so far.

This doesn't affect branches, only indirect jumps (and calls). The performance impact will still be considerable. It will make PGO more crucial (or smarter JITting, for VM languages) since the penalty can be avoided by prefixing an indirect call with a direct call to the most likely target - this is a well-known technique, useful on machines with weaker jump predictors than branch predictors.

Quite possibly, the worst affected code will be OO code that is dynamically (open) polymorphic.

FWIW, all branches will need to be followed by a memory load fence which seems to trip up speculative execution on Intel CPUs.

There is still branch prediction for normal calls or jumps, which are the majority and should be in performance conscious code.

It's just that some language features such as virtual functions in C++ often require indirect invocation when the compiler can't devirtualize a call, and there is lots of it in the kernel in performance-critical paths (think interrupts, syscalls).

There are two paragraphs dedicated to performance impact in the linked PR.

By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.

Considering that every single function call into a dynamically loaded library will be affected, that negligible "otherwise" won't be so negligible in the real world.

Note for a true fix to the BTB poisoning attack you would additionally have to disable SMT/HT.

See here: https://news.ycombinator.com/item?id=16070304

Maybe some future architecture will allow software to tell CPU which regions it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while preventing information leak exploits.

The CPU hardware already has that feature. It is the VM paging system and the permissions assigned thereto.

The bug here is that the CPU is not aborting the speculation when fetches occur to addresses marked as "access denied". Instead the fetch happens and a line of normally inaccessible memory is put into cache by code that should not be able to get it read into the cache normally.

One hardware fix would be to plug that hole. Speculative reads get blocked when they encounter permission denied errors from the paging system and do not change the cache state. That blocks the Meltdown attack, but not the Spectre attack.

I thought about that too... AFAIK currently paging system is not generally accessible to userland programs like browsers. They would need some way to setup different contexts for untrusted javascript code and the internal services that the javascript can call.

Also maybe the context switching would need to be made faster, because you would need to do that whenever eg javascript calls browser interfaces.

Something like the portal calls and "turfs" described in there could help.

This is horrible, really really horrible. And I'm not talking about the bug itself, but the mitigation --- which is basically "stop using indirect jump and call instructions and recompile all your software". The latter is beyond unrealistic.

It also sets a very bad precedent: I understand people want to mitigate/fix as much as possible, but this is basically giving an implicit message to the hardware designers: "it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software."

> it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software.

What are any other options? It's hardware, that cannot be patched. Of course they will change chip designs going forward, but what else do you suggest folks do with the billions of chips that exhibit this problem?

Go ahead, smash your computer, wait a few months, and buy a new one.

It's noted in the patch that one would have to recompile linked libraries, which seems impractical, unless a distro decides to build everything with this flag.

And since this patch is opt in it isn't enough to secure cloud providers.

Not just linked binaries, also the whole underlying OS, and, critically, the compiler itself. Otherwise you could replace the 'proofed' construct with one that is not proofed against the bug.

Why would you need to recompile the compiler? Both variants only provide read access.

Ah right, of course. Sorry, in the midst of doing a pile of stuff I should not be commenting on this without studying it further, I figured that the first level read access would allow you to dig up the secrets required to give you write access which would then allow you the free run of the whole system, but if you are still on the other side of a virtual machine then that won't do any good unless that virtual machine can be escaped as well.

As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side channel caches found in Spectre. There really needs to be two kinds of function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas that over-ride the flag.

The site is down for me. HN hug of death?

Yes. Give it about 5 minutes. It will load without images.

It was a bit slow but eventually loaded for me.

Yeah, same thing happened to me... slow as hell but I guess it's common due the severity of the issue. All people wants to see this at the same time.

what about performance impact after new CPU architecture arrives? how is that going to work?

Mill can't come soon enough.

What makes you think the Mill would be immune to these issues?

Mill has no speculative execution.

Uh, yes it does. It has no /out of order/ execution but it certainly has speculative execution. They mention quite often in talks how they predict EBB exits and not each branch. They even go so far as to follow that EBB exit chain to speculatively load code from DRAM several calls ahead, which is much more speculation than current CPUs are capable of.

You basically can't make a deeply-pipelined processor fast without speculative execution.

A simple model of access permissions that fit before L1 cache and can return a fault before loading anything.

If this were 15 years ago, I'd say the site was SlashDotted.

In other news, Intel has found that by not using a computer at all, though performance overheads increase 100%, this counter-measure does secure any previously available attack vectors.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact