Hacker News new | past | comments | ask | show | jobs | submit login
Intel Analysis of Speculative Execution Side Channels [pdf] (intel.com)
380 points by bcantrill on Jan 5, 2018 | hide | past | web | favorite | 34 comments

I was surprised that this wasn't already submitted -- and then surprised again that it's not being upvoted. To give some color: this doc contains the first public disclosure of some new MSRs being added to allow system software to help mitigate Spectre. In particular, these are the controls to limit speculation including Indirect Branch Restricted Speculation (IBRS) to restrict speculation of indirect branches; Single Thread Indirect Branch Predictors (STIBP) to prevent indirect branch predictions from being controlled by the sibling hyperthread (!!) and Indirect Branch Predictor Barrier (IBPB) to limit influence on later indirect branch predictions. For those who need to implement system software support for the new microcode, this is very important information!

Actually, thanks to the messy uncoördinated premature disclosure, the MSRs became publicly known yesterday through code comments, commit messages, Google Docs, and linux-kernel mailing list discussions.

* https://news.ycombinator.com/item?id=16072009

* https://lkml.org/lkml/2018/1/4/432

* https://news.ycombinator.com/item?id=16072775

* https://lkml.org/lkml/2018/1/4/615

* https://news.ycombinator.com/item?id=16072806

* https://news.ycombinator.com/item?id=16075082

A more pertinent observation is that this is the sort of technical paper that Intel should have published before the press releases that yielded such a backlash.

* https://news.ycombinator.com/item?id=16076601

* https://medium.com/@frankycaron/this-week-in-words-the-langu... (https://news.ycombinator.com/item?id=16075588)

It's not due to the premature disclosure.

As far as Linux is concerned, the disclosure of Spectre and the hardware mitigations was uncoordinated by design. During the embargo period, distros were kept siloed and each more or less left on its own, assuming they were part of the early disclosure at all. This sucks, but until the embargo was lifted we were pretty much forced to play along. We did not even know who knew what, so we couldn't do anything about it.

As a result, distros all differ in the amount of tunables that they provide, in the exact behavior, and in the performance hit that you can expect from fixing CVE-2017-5715. Assuming it's fixed at all (it's not in either Debian or Fedora, for example).

The silver lining is that all discussions on the design choices for the fixes are going to be public. Anyway, based on some tweets from Alex Ionescu, it seems that these MSRs are what Windows uses (and RHEL as well, which is what I worked on).

You are talking about the phase where stuff was not publicly known. I talked about the point of it becoming publicly known.

The way that it actually happened, rather than being the way that Intel expected it to happen with this paper all nicely ready on the first day (like Google's happened to be ahead of time), was pretty clearly unfortunately very much down to the premature disclosure. Witness what Paul Turner said (hyperlinked from one of the aforegiven discussions) about Google's and Intel's original goals for the week, for example.

One can speculate about an alternative universe where Linus Torvalds had this to read before the press releases. There's a whole chapter that addresses the points that he raised.

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

It is interesting that you bring up M. Ionescu. I'll see M. Cantrill's surprise at no-one commenting on Intel's paper, albeit that this is becoming less and less true by the minute by dint of posts like yours and mine, and I'll raise him no-one on Hacker News commenting at all on Microsoft; even though it was a more prominent topic of discussion in my workplace today (when the office systems administrator found out that updates were not happening) than any Linux machine or BSD. (-:

* https://news.ycombinator.com/item?id=16076660

I am sure Linus knew about this before, just like I was disclosed most of the things in the paper (in .pptx format).

Based on how the early disclosure was handled, I would have expected the same mess even without the emergency premature lifting of the embargo. Maybe a little less scrambling, but the same confusion, plenty of discussions on 0-day and no fixes for noncommercial distros. Everything else is wishful thinking.

I don't know what you mean by "first public disclosure" but these mitigations were already disclosed yesterday when Linux patch with their support was submitted: https://lkml.org/lkml/2018/1/4/615

People had been waiting on more details of the Intel microcode update, for example from kernel mailing list yesterday mid-day:


Are programs running on SmartOS zones vulnerable to any of the latest discoveries?


Any comments on niagra t1 & t2 related to these methods?

Sparc didn't get out of order until T4 unless I'm mistaken. (Assuming speculative execution was added about the same time.)

ldxa asi non-faulting load ultasparc-i circa '96

Intel seems committed to write everywhere that their current processors work as intended and according to their specification. I'm not mad at them for Spectre, but Meltdown is ridiculous. If this is the quality of specification they want to reach, I'll use processors from another company that seems to have saner and less buggy specifications.

> These methods rely on common properties of both high-performance microprocessors modern operating systems and susceptibility is not limited to Intel processors, nor does it imply the processor is working outside its intended functional specification.

Given a class-action was filed after the Project Zero reveal [1], I suspect Intel's lawyers are doing everything they can to avoid any indication there is a defect in their chips.

[1]: https://www.courthousenews.com/wp-content/uploads/2018/01/In...

One somewhat self-serving justification Intel might make for this claim is that the functional specification does not say anything about the specific behavior that is being used by the exploits.

In addition, functional specifications are sometimes modified to conform to as-built behavior. When used responsibly, this is a reasonable response, and in practice unavoidable for something as complex as a modern processor.

On the other hand, Intel's statements are silent about what implications cannot be found in the effort they and others are putting into mitigating this feature.

Tangentially, the Titanic sailed with the number of lifeboats it was designed to carry.

As far as I can tell, one bit of news to me at least in this Intel whitepaper from today is that the microcode update to mitigate “variant #2” would be needed for Broadwell+, rather than the Skylake+ that had been stated yesterday on LKML. From Intel's PDF today:

"For Intel® Core™ processors of the Broadwell generation and later, this retpoline mitigation strategy also requires a microcode update to be applied for the mitigation to be fully effective."

vs. at least what I had seen on LKML list yesterday seemed to indicate Skylake+.

Sample snippet from LKML[1]:

"The x86 IBRS feature requires corresponding microcode support. It mitigates the variant 2 vulnerability..."

and related sample snippet from LKML[2]:

"On Skylake the target for a 'ret' instruction may also come from the BTB. So if you ever let the RSB (which remembers where the 'call's came from get empty, you end up vulnerable.

Other than the obvious call stack of more than 16 calls in depth, there's also a big list of other things which can empty the RSB, including an SMI.

Which basically makes retpoline on Skylake+ very hard to use reliably. The plan is to use IBRS there and not retpoline."

I'll confess I'm not 100% following all the ins and outs of this, but can anyone comment on any additional details regarding the Skylake+ vs. Broadwell+, and/or confirm if there was seemingly a change?

[1] https://lkml.org/lkml/2018/1/4/615

[2] https://lkml.org/lkml/2018/1/4/708

Presumably they've found a way to make retpoline work on Broadwell using a microcode update, which is probably better than the alternative of adding a very expensive kludged way of clearing the indirect branch cache in a microcode update.

Control Flow Enforcement (ENDBRANCH requirement at branch targets) looks like a nice feature, looking forward to it.

Agreed, which is why I’m worried about retpoline as it’s a rather hacky (but obviously pragmatic) solution that isn’t compatible with shadow stacks that mitigation strategies won’t be compatible with.

It almost seems like Intel is saying that this is the new normal. That future Intel CPUs will have the same weaknesses, and it's up to software (compilers, OS authors) to deal with it. Or am I speculating incorrectly?

I would guess they are fairly focused on it at this point.

They are releasing microcode update mitigations for the CPUs of today, and at least state they will be improving things in the CPUs of the future, which is more-or-less what one might guess they would do with billions of dollars at stake.

That's not to say that they are going to magically get rid of all speculative execution, and I wouldn't try defending their PR approach, but one would guess they would at a bare minimum whittle away at the cost of mitigations.

Some related snippets about at least declared future intent. This obviously isn't a comprehensive list, but I think it suggests they realize the current state of affairs is not good for them:

From LKML[1] related to approach taken with the new microcode update for variant #2 being better/less costly in future CPUs:

Later CPUs are intended to have an 'IBRS all the time' feature which is set-and-forget, and will perform much better, I believe. If we find we're running on a CPU with that, we'll turn off the retpoline..."

And from today's Intel PDF regarding variant #2:

There are three new capabilities that will now be supported for this mitigation strategy. These capabilities will be available on modern existing products if the appropriate microcode update is applied, as well as on future products, where the performance cost of these mitigations will be improved.

And from today's Intel PDF regarding variant #3:

Future Intel processors will also have hardware support for mitigating Rogue Data Cache Load.

And a related comment from the always reputable source of "some security guy on the internet"[2]:

Whatever mitigations CPU vendors come up with will be in concert with software changes. "Page table isolation" is an overnight redesign of all operating systems. It's here to stay. The next step is for Intel CPUs to fix its performance cost

[1] https://lkml.org/lkml/2018/1/4/432

[2] https://twitter.com/ErrataRob/status/949194584399237120

It's almost like Intel is taking the position that side-channels are nearly impossible to prevent if you're running adversarial code on the same hardware, which makes sense to me; there are certainly those who don't need to resist such attacks, but need the performance benefits of speculative execution.

I'm sure you could do speculative execution without causing side effects that software can observe. Whether it can be done cost-effectively, I can't say.

Section 2.2.1 states:

>"An attacker discovers or causes the creation of ‘confused deputy’ code which allows the attacker to cause speculative operations to reveal information not normally accessible to the attacker."

Can someone say what ‘confused deputy’ code means here? This is not a term I have ever come across before.

The authors then go on to state:

>"If the attacker can identify an appropriate ‘confused deputy’ in a more privileged level, the attackermay be able to exploit that deputy in order to deduce the contents of memory accessible to that deputy but not to the attacker."

Here the reference is to an "appropriate 'confused deputy.' Are these just weasel words? Can someone shed some light on "confused deputies" and what makes one "appropriate"?


That said, I don't think it's good label for what's going on.

It's sort of victim blaming. The code is wrong for having instructions corresponding to a[*b] anywhere in it, rather than our processor is wrong by speculatively executing them with visible effects.

Those visible effects (caches being primed by speculative execution) are desirable in many cases though, so it's misleading to say the processor is wrong.

Thanks, yes agreed, this sounds like a willfully misplaced label.

This seems complicated, but my impression is it would mean:

a) a super-smart compiler that can anticipate lots of flaws and fence them or turn on restrictive modes (maybe V8 is this smart?)

b) you have to turn off more speculation than you wanted to, and it hits performance

SMAP looks cool, seems like same origin policy for pages, but this is more Meltdown than Spectre right?

Problem is, that as far as I understand, compiler "fixes" could fix attacked application. So if JavaScript code is exploiting Spectre to read passwords from your keepassx fixes need to be applied to keepassx and not to V8 engine.

(Also probably you can patch V8 interpreter to mitigate this issue, but this is a different story)

No, it's V8 that has to make sure the privilige check variable is in L1d cache when the if happens.

KeePass uses HTTP, and by the time it sees the request, it cannot do much if it's valid.

You would not send requests to keepass in order to leak passwords. Instead, you would try to setup the branch prediction cache in way such that during ordinary execution, keepass will causes cache accesses dependent on secret data b.c. of the code that is speculatively executed due to indirect branch prediction (you setup the branch prediction cache in such a way that it executes your "gadget" to leak things). So yes, assuming you manage to get enough control over the addresses of things in memory via javascript (may be hard, but there are known ways to defeat ASLR via javascript as well), I think you should be able to attack keepass even if V8 fixed it.

issues were reported to intel on 2017-06-01 In November, CEO Brian Krzanich sold roughly $11 million of stock in company, keeping just the bare minimum. https://gizmodo.com/intel-says-ceo-dumping-tons-of-stock-las...

Part of me is outraged that Intel thinks it gets to have an opinion on the issue at all, after the absolutely chicken shit press release they issued. https://newsroom.intel.com/news/intel-responds-to-security-r...

the really tragic tale is that more businesses arent running AMD. Dan Luu sounded the alarm well before Meltdown, but people buy brands, not technology.


I'd wager you're getting downvoted because this stock discussion has been beaten to death AND is completely irrelevant to the technical details relevant to this topic. Just in case you were curious.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact