
The new microcode from Intel and AMD adds three new features - DoreenMichele
https://lkml.org/lkml/2018/1/22/598
======
ploxiln
That link doesn't seem to show the continuation of the thread, but it
continues to be rather interesting:

[https://lkml.org/lkml/2018/1/23/25](https://lkml.org/lkml/2018/1/23/25)

It looks like Ingo Molnar, Andi Kleen, and others are coming up with a pure-
software alternative to IBRS, which could be much faster (rather similar in
spirit to the retpoline trick)

~~~
newman314
TL,DR;

Leverage CONFIG_DYNAMIC_FTRACE. This is an OS only fix resulting in
effectively zero overhead for chips other than Skylake.

[https://lkml.org/lkml/2018/1/23/25](https://lkml.org/lkml/2018/1/23/25)

    
    
      There's another possible method to avoid deep stacks on Skylake, without 
      compiler support:
    
      - Use the existing mcount based function tracing live patching machinery
        (CONFIG_FUNCTION_TRACER=y) to install a _very_ fast and simple stack depth 
        tracking tracer which would issue a retpoline when stack depth crosses 
        boundaries of ~16 entries.
    
      The overhead of that would _still_ very likely be much cheaper than a hundreds 
      (thousands) of cycle expensive MSR write at every kernel entry 
      (syscall entry, IRQ entry, etc.).
    

Pretty damn cool if it works.

~~~
gtirloni
I couldn't find much about this 16 entries stack depth issue. Does anyone have
more information?

~~~
bonzini
It's explained in the thread. Basically, the RET instruction has its own tiny
branch predictor, the return address stack (also known as RSB) which is only
16 or 32 entries deep. If your call stack is deeper than that, sooner or later
the stack becomes empty.

In this case, old processors simply does not predict the jump and stalls. But
Skylake instead uses the generic indirect branch predictor, since after all
RET _is_ an indirect branch! Except the idea behind retpolines was to avoid
the generic branch predictor, and now every single RET can use it! So, if you
don't want to use IBRS you need to find a different way to protect the kernel
from someone forcing undesired speculative execution after the RET.

~~~
gtirloni
RSB! Thank you, now I know what to focus on.

------
shakna
> But wait, why did I say "mostly"? Well, not everyone has a retpoline
> compiler yet... but OK, screw them; they need to update.

That got me interested, who has reptoline support, and from when?

GCC added support in 7.2 [0], which means a lot of people don't have access
yet.

LLVM added it in D41723 [1], but I don't think that has landed in a release
yet.

I can't find any talk around TCC, but that isn't exactly surprising.

Haven't heard anything about MSVC, and can't find anything, but that's not
surprising as I don't think Windows is using reptoline in their mitigations,
but I really have no idea.

[0] [https://www.phoronix.com/scan.php?page=news_item&px=Clear-
Li...](https://www.phoronix.com/scan.php?page=news_item&px=Clear-Linux-
Retpoline-KPTI)

[1] [https://reviews.llvm.org/D41723](https://reviews.llvm.org/D41723)

~~~
taspeotis
I'm not sure how close this is to the retpoline mitigation but ...
[https://blogs.msdn.microsoft.com/vcblog/2018/01/15/spectre-m...](https://blogs.msdn.microsoft.com/vcblog/2018/01/15/spectre-
mitigations-in-msvc/)

 _Our tests show the performance impact of /Qspectre to be negligible. We have
built all of Windows with /Qspectre enabled and did not notice any performance
regressions of concern. Performance gains from speculative execution are lost
where the mitigation is applied but the mitigation was needed in a relatively
small number of instances across the large codebases that we recompiled._

~~~
shakna
Thanks! I think that and the inner-linked post tell us what we need to know.

From Terry's post:

> For context, on newer CPUs such as on Skylake and beyond, Intel has refined
> the instructions used to disable branch speculation to be more specific to
> indirect branches, reducing the overall performance penalty of the Spectre
> mitigation.

And your link:

> /d2guardspecload, that is currently equivalent to /Qspectre.

It seems equivalent to reptoline, at least from a high level perspective.

So MSVC is probably protected, but only optionally. They're leaving it up to
developers to use the /Qspectre flag, it isn't something automatically enabled
when you use an /O(x) optimise flag.

Edit: I should add that /Qspectre is currently being added to a lot of
Windows, but they're assessing where before doing so, because not everywhere
should need it.

~~~
cesarb
From what I understood from that link, /Qspectre is for variant 1, while
retpoline is for variant 2, so it's not equivalent.

------
lovelearning
Can somebody here explain why Intel and kernel devs are still debating about
solutions in late January? I haven't followed this stuff closely, but I was
under the impression that Intel and others had been informed about these
vulnerabilities 6 months ago and everybody had fixes ready by early January.

~~~
gtirloni
Botched disclosure process.

More details at the end here:
[http://www.daemonology.net/blog/2018-01-17-some-thoughts-
on-...](http://www.daemonology.net/blog/2018-01-17-some-thoughts-on-spectre-
and-meltdown.html)

~~~
kbenson
Whether it was botched or not (from the position of this article) is based on
whether you think it was important that FreeBSD have been notified ahead of
time. Obviously the FreeBSD devs think they should have been notified. Since
this basically affects _everyone_ that runs x86 code, and not _everyone_ could
be notified ahead of time and still keep any form of secrecy, some groups will
need to be notified and others not.

I think a case can be made that as operating systems go, FreeBSD is at a much
lower threat level than Linux, MacOS/iOS, Windows (i.e. desktops that run
unvetted code) and the VPS platforms that run host operating systems that need
to worry about guests breaking isolation (Google, Amazon, others).

Given all that, while it wasn't done perfectly (it was leaked a few days early
and some cloud providers had little warning, especially Joyent which also runs
a different OS), I'm not sure I would call it botched entirely since it is
also likely the largest exploit in history. Others disagree with some or all
of this.

~~~
qubex
I read somewhere (didn't verify, wasn't _obviously_ substantiated) that the
OpenBSD folks were beside themselves with anger, and that in turn others shot
back that they had demonstrated a very poor track record with respecting
embargoes in the past and had let slip “seven of the last five” major
vulnerabilities they had been pre-warned of.

EDIT: For reference, consider some of the comments in this thread:
[https://news.ycombinator.com/item?id=16110750&ref=hvper.com](https://news.ycombinator.com/item?id=16110750&ref=hvper.com)

~~~
jasonkostempski
Seven of the last five? Sounds like they have some out-of-order execution
issues of their own.

~~~
qubex
It’s a common British idiom, to opine against eternal doomsayers. :)

~~~
knight17
First time seeing this usage, can you tell me what is this called exactly, in
grammatical terms? Tried to Google and can't seem to reach anywhere.

------
DSingularity
4,000 cycles for a speculative execution barrier is just horrible. Since
everyone needs a patched kernel, might as well take the time to recompile the
kernel with retpoline compiler and remove all offending instructions in the
first place.

~~~
kuschku
Except, thanks to modern speculative execution, skylake and later also
speculates through retpoline.

So you still need some solution there.

~~~
trishume
I'm pretty sure it's not that it speculates through retpoline, IIRC retpoline
is coded with trickery to avoid that, but it only replaces indirect branches.
It's that it speculates through normal `ret` instructions in functions, in a
way that can be exploited.

~~~
BeeOnRope
It's that prior to Skylake, ret instruction prediction only ever used the
"return stack buffer" (RSB), but since Skylake the indirect predictor is used
as a backup. The RSB is a prediction mechanism specifically for call and ret
pairs: each ret is predicted to return the instruction following the
corresponding call.

This works great for the common pattern of matched pairs, but the stack used
to track outstanding calls has a limited size (32 in Skylake), let's call it
N. If you have N + M calls followed by N + M rets (or any other pattern where
the call chain gets that deep), you will predict correctly the first N rets,
but then the stack is exhausted and the last M rets won't be predicted by the
RSB.

Prior to Skylake, those last M rets just wouldn't get predicted at all
(probably instruction fetch would just fall through to the instructions
following the ret), but in Skylake the indirect branch-predictor, which is
usually used to predict jmp or call instructions to variable locations, is
used as a fallback instead.

So the concern is that people could train the indirect predictor prior to a
kernel call, and if the 32-deep RSB was ever exhausted, the indirect predictor
could kick in causing a Spectre-like vulnerability.

------
davidlt
On ARM (A64, A32, T32) we get CSDB barrier, but it's a hint instruction
instead of going via MSR registers.

From whitepaper CSDB is 1101_0101_0000_0011_0010_0010_100_11111

Here some snippets from ARM manuals:

HINT instruction:

1101 0101 0000 0011 0010 0010 100 11111 CRm = 0010 op2 = 100

Some encodings described here are not allocated in this revision of the
architecture, and behave as NOPs. (This is important)

Hints 18 to 23 variant Applies when CRm == 0010 && op2 != 00x. HINT #<imm>

Hint is encoded in CRm:op2 pair, existing similar:

0010:000 ESB // Error Synchronization Barrier 0010:001 PSB CSYNC // Profiling
Synchronization Barrier

Thus in assembler this is written:

hint #0x14

Which is a NOP if SOC does not understand this hint. It's being used here:
[http://lkml.iu.edu/hypermail/linux/kernel/1801.0/04191.html](http://lkml.iu.edu/hypermail/linux/kernel/1801.0/04191.html)

and also here: [https://github.com/ARM-software/speculation-
barrier/blob/mas...](https://github.com/ARM-software/speculation-
barrier/blob/master/speculation_barrier.h) (which is being upstreamed to
compilers in cross-platform generic form IIRC)

ARM whitepaper states that conditional selection/conditional move is enough on
most ARM implementations. If it's not the case then the new CSDB solves the
problem. On older CPUs it's still a NOP.

X-Gene disables branch prediction:
[http://lkml.iu.edu/hypermail/linux/kernel/1801.2/06482.html](http://lkml.iu.edu/hypermail/linux/kernel/1801.2/06482.html)

ThunderX2 branch prediction hardening:
[https://patchwork.kernel.org/patch/10151975/](https://patchwork.kernel.org/patch/10151975/)

~~~
bonzini
CSDB is for Spectre variant 1. The post is about variant 2.

------
6f666579
This might be fundamentally irrelevant to this thread, but it got me wondering
and I would like some assistance/guidance:

Whenever I read the mailing list, or threads related to it, I don't understand
99% of the stuff. Do I need to know C _very well_? Do I need to be familiar
with kernel? If so, how do I go about doing that? At the moment I'm reading
"Code: The Hidden Language of Computer Hardware and Software".

~~~
isido
Generally, you need to know the operating systems and hardware work together.
And then depending on the details, you might need to know C _and_ assembly. In
this case, you need to have good understanding on how modern processors do
performance optimizations.

There probably isn't any good single resource for learning all this, but
perhaps getting some good textbook on computer hardware architecture might be
good next step after reading the "Code". Unfortunately I don't have any
immediate suggestions on the current crop - I remember learning from William
Stallings' books some 15-20 years back - not sure if they are the best choice
nowadays.

~~~
ctw
The course "Computer Systems Architecture" at Queen's University uses
"Computer Organization and Design: The Hardware/Software Interface" by
Patterson and Hennessy. It's a 400-level course in the computer and electrical
engineering department. I can recommend it. Most chapters in it have a "Real
Stuff" section where they look at a real world CPU and compare it with the
contents of the book to see how the theory actually ends up in practice.

~~~
SAI_Peregrinus
I'd also recommend the text Structured Computer Organization by Tannenbaum[1].

[1] [https://www.pearson.com/us/higher-
education/program/Tanenbau...](https://www.pearson.com/us/higher-
education/program/Tanenbaum-Structured-Computer-Organization-6th-
Edition/PGM200985.html)

------
thomastjeffery
Since all these CPUs are turning out to be so inherently insecure, maybe we
should work harder to run only trusted/verified code.

~~~
MontagFTB
There’s a limit at which improving the car compensates for the degradation of
the road. At some point, repaving is the only correct fix.

~~~
Beltiras
Is retpoline the equivalent of picking another road in this analogy?

~~~
SolarNet
More like picking special tires that can run on the shitty road. The roads are
the hardware itself and their vendors (I would assume).

~~~
Beltiras
I can see that making sense. Still see the retpoline reducing the forks in the
road, forcing the "proper route".

~~~
SolarNet
No you still misunderstand. The road in the analogy is the hardware itself.
And the car is the OS. The OP's point was that we have to replace the
hardware, because at some point you are hacking up your car to absurd degree
just to be able to drive on the road of the hardware. It has nothing to do
about forks in the road and the analogy to branching.

He could be arguing for anywhere from, we need to replace all our hardware
now, to we need to start over with how we design chips.

------
RobLach
_The new microcode from Intel and AMD adds three new features._

 _The second (STIBP) protects a hyperthread sibling from following branch
predictions which were learned on another sibling._

Is this about threads in the context of SMT?

I guess what I'm asking is "hyperthread" a thing outside of Intel marketing
speak that warrants mentioning AMD?

~~~
theevilsharpie
Hyperthreading is Intel's implementation of SMT. However, since Intel was the
only one doing SMT on x86 until very recently, hyperthreading has become the
de facto generic name for it.

~~~
RobLach
Ah that makes sense. My last exposure to this space was IBM Power stuff and we
were calling them sibling threads or side threads.

------
tinus_hn
> You _might_ want this when running unrelated processes in userspace, for
> example.

When are you not? Why have security and process isolation if all processes are
considered related?

~~~
tomalpha
You might perhaps have a setup where trusted users are running trusted code,
can guarantee* a lack of malicious intent, and primarily want the user-and-
process-segregation system to prevent a single failure of a single process
from affecting anyone else.

You might conceivably have this in a corporate environment (I do).

For the vast majority of systems that _do_ run untrusted code, including web
browsers, this mitigation is probably appropriate.

Perhaps the word “might” here could be upgraded to “probably”?

*for some arbitrary value of “guarantee”

~~~
tinus_hn
So you run everything as root, or your setup would work just as well if you
did?

~~~
sambe
How is that comparable? Running everything as root is a) asking for accidents;
b) a different class of problem.

------
cthalupa
So, the interesting thing here, to me, is this bit:

>So now we _mostly_ don't need IBRS. We build with retpoline, use IBPB on
context switches/vmexit (which is in the first part of this patch series
before IBRS is added), and we're safe. We even refactored the patch series to
put retpoline first.

Unless I'm reading this wrong, it looks like retpoline is not enough, and you
still need the microcode updates from Intel or AMD that offer IBPB.

If this is the case, why has Google been saying that retpoline is all we need?

~~~
gavindean90
From what I can see, the retpoline works on pre Skylake processors. I am
guessing that Google might not be using Skylake+ processors and so their
comments are not wrong for their equipment.

~~~
cthalupa
No, it's pretty specifically not about the Skylake issue - The retpoline not
working on Skylake is fixed by IBRS, and is due to ret calls falling back to
indirect prediction in deep stacks.

vmexits and context switches are a very different thing than the deep stack
ret calls.

------
summiwap
I have always felt it, but presumed it was "cross talk" in the optical nerves
being interpreted as "movement" in the outer ear.

~~~
Operyl
I think you might’ve commented on the wrong post somehow there.

~~~
el_benhameen
What's very odd is that it's a copy of the first sentence of this comment:
[https://news.ycombinator.com/item?id=16218608](https://news.ycombinator.com/item?id=16218608)

~~~
nitrogen
Accidental copy/paste through an interface for the visually impaired?

 _> about: Visually Handicapped but can handle computer online work_

~~~
el_benhameen
Huh, that's a good point. Apologies if my comment came off as accusatory.

~~~
nitrogen
I think your comment was fine. I would have had the same kind of thought --
possible bot trying to age an account with content lifted from other legit
posts.

------
compuguy
I swore that Intel had recalled the microcode updates because of issues (I've
not had any of note)?

[https://www.computerworld.com/article/3250297/microsoft-
wind...](https://www.computerworld.com/article/3250297/microsoft-windows/let-
the-biosuefi-firmware-recall-begin.html)

------
axaxs
My 2 unasked for pennies - the kernel is already too bloated with support.
These are hacks and should not be mainlined. Let them fix their hardware and
give arduous instructions to patch...they deserve it. Don't further pollute
the kernel. Imagine if a tiny hardware vendor tried the same...ubiquity is not
an excuse.

~~~
Someone
_”Let them fix their hardware”_

That could (and, I guess, would) mean a much, much, larger performance hit
than a combined microcode/kernel fix would give. I doubt people would be happy
about that.

Given that we don’t want to give up all speculated instruction execution, it
will be difficult to find the spot where we get safety at minimal performance
impact.

~~~
gaius
Right, but why should OS devs and vendors scramble to save Intel’s share
price? The right outcome here is for the devs to _do nothing_ and the world
switch to another CPU vendor on the next hardware refresh cycle.

~~~
mhandley
Which CPU vendor would you switch to? Although Meltdown is (mostly) an Intel
bug, all the main CPU vendors, including AMD and ARM, are affected by Spectre,
which is what this particular discussion is about.

~~~
gaius
Agreed, but it almost doesn’t matter: what matters is sending the signal that
shenanigans will be punished by the market.

If devs do anything it should be learning to write software that doesn’t
depend on speculative execution for performance

~~~
coldtea
> _Agreed, but it almost doesn’t matter: what matters is sending the signal
> that shenanigans will be punished by the market._

No, what it matters is understanding the issue. And the issue has no
"shenanigans". All vendors suffer from Spectre (and some non-Intel models from
Meltdown too), but more importantly, none of these was done "on purpose" which
is the definition of a shenanigan.

> _If devs do anything it should be learning to write software that doesn’t
> depend on speculative execution for performance_

Whatever.

~~~
gaius
_none of these was done "on purpose" which is the definition of a shenanigan._

The “shenanigans” are that it has become acceptable practice to cut corners on
security to improve benchmarks.

Maybe speculative execution is just a thing that shouldn’t be done? People
computed perfectly well before it. Itanium isn’t impacted by Meltdown so Intel
is capable of it.

~~~
coldtea
> _The “shenanigans” are that it has become acceptable practice to cut corners
> on security to improve benchmarks._

Nobody "cut corners". Those were unexpected bugs that took more than a decade
for someone to even discover, not some deliberate decision to sacrifice
security for performance.

> _Maybe speculative execution is just a thing that shouldn’t be done?_

No, it should 100% be done.

> _People computed perfectly well before it._

Perfectly slowly too.

> _Itanium isn’t impacted by Meltdown so Intel is capable of it._

Itanium tanked.

~~~
gaius
_Nobody "cut corners"._

What do you call accessing memory before checking permissions then? AMD do the
check!

 _Perfectly slowly too._

A simple predictor that assumes backwards branches will be taken (loops) and
forward branches (most likely exceptions) will not is really all you need, if
you insist.

~~~
lorenzhs
You appear to be confusing Meltdown and Spectre. This is about Spectre Variant
2. You're describing Meltdown. Also, the fault is _queued_ and not discarded.
If the branch ends up being taken, you'll get a fault for accessing a
forbidden part of memory. If not, the CPU "undoes" the speculatively executed
part as if it never happened. The Meltdown issue is that it doesn't undo the
cache effects, which nobody thought to be a big deal, and which are very hard
to undo.

Also, you severely underestimate the speedups gained by a better branch
predictor. There's a nice writeup at [https://danluu.com/branch-
prediction/](https://danluu.com/branch-prediction/) if you're interested.

------
paulie_a
That is one of the most mobile unfriendly sites I have ever seen

------
nimbius
"since the peanut gallery is paying lots of attention it's probably worth
explaining it a little more for their benefit."

what a ridiculous sentiment to defend a patch from Intel thats effectively
disabled by default for one of the most egregious bugs in microcode since
F00F.

"The new microcode from Intel and AMD adds three new features." No ones
talking about AMD, David, not Linus and certainly not hackernews or other
outlets. Spectre patching will happen regardless of this fiasco but This NOT
SPECTRE. this is a taylor made shit sandwich thats been warned about by
researchers and lurking on Intels plate for more than a decade. intel chose
speed over security, and intel lost. Now they're trying to find a way --any
way-- to avoid having to eat an 8 generation recall or worse.

its been said on HN before, but this is now more of an 'if not when' of intels
loss of dominance in cloud. customers no longer care about the fastest chip in
the west if its going to get neutered to half speed due to wreckless design
that allows for a complete security disaster. Theyll buy twice as many AMD, at
half the price of intel, and make up the difference in sheer volume.

~~~
phonon
David Woodhouse works for AWS, not Intel. So maybe take it down a notch?

~~~
paulie_a
While the op may have mistaken the author, the point is valid on numerous
levels, including attempting to slander AMD because Intel has bad designs

~~~
phonon
Umm, no. This is about Spectre variant 2, which AMD is 100% exposed to.

"GPZ Variant 2 (Branch Target Injection or Spectre) is applicable to AMD
processors."

[https://www.amd.com/en/corporate/speculative-
execution](https://www.amd.com/en/corporate/speculative-execution)

