
LLVM patch to fix half of Spectre attack - Kristine1975
https://reviews.llvm.org/D41723
======
tptacek
Page was down when I tried to read it, but it's archived here:
[http://archive.is/s831k](http://archive.is/s831k).

Its hard to get your head around how big a deal this is. This vulnerability is
so bad they killed x86 indirect jump instructions. It's so bad compilers ---
all of them --- have to know about this bug, and use an incantation that hacks
ret like an exploit developer would. It's so bad that to restore the original
performance of a predictable indirect jump you might have to change the way
you write high-level language code.

It's glorious.

~~~
rayiner
Is glorious the right word for it? We’re going back to the stone ages where
processors couldn’t predict the targets of indirect jumps. More generally,
this seems to me like an attempt to patch out of what is really a class of
attacks leveraging fundamental assumptions about high-performance CPU design.
Before, OOO just had to preserve correctness and (some of) the order of
exceptions and memory operations. Now, it has to preserve (some of) the timing
of in-order execution too? Where does this path end?

~~~
simias
Legitimate question: on any non-shared non-virtualized system is there any
reason to enable these workarounds _besides_ running sandboxed applications
such as javascript in a web browser (or flash/java applets/Active X, but those
are not really super popular nowadays)?

For any other non-sanboxed application you pretty much have to trust the code
anyway. Privilege escalation is always a bad thing of course, but for single
user desktop machines getting user shell access as an attacker means that you
can do pretty much anything you want.

As far as I can see the only surface of attack for my current machine would be
a website running untrusted JS. For all other applications running on my
machine if one of them is actually hostile them I'm already screwed.

Frankly I'm more annoyed at the ridiculous over-engineering of the Web than at
CPU vendors. Because in 2017 you need to enable a turing complete language
interpreter in your browser in order to display text and pictures on many
(most?) websites.

Gopher should've won.

~~~
panarky
> on any non-shared non-virtualized system is there any reason to enable these
> workarounds

Does the non-shared non-virtualized system have any encryption keys in memory
that you want to protect?

Do you use full-disk encryption or ssh to other machines or use a
cryptocurrency wallet?

~~~
gmueckl
These questions are only relevant if you're not controlling and trusting all
the code you're running that system. For a consumer system this is true if
(and basically only if) you're running a web browser on that system.

If you're confident in the software you're running on a non-shared hardware,
both Meltdown and Spectre are non-issues requiring no mitigation. This is a
narrow class of systems, but it exists.

~~~
sigstoat
> If you're confident in the software you're running on a non-shared
> hardware...

and that you won't be hit by a remote execution vulnerability.

~~~
gmueckl
That's what being confident in the software means.

------
dzdt
_When using these patches on statically linked applications, especially C++
applications, you should expect to see a much more dramatic performance hit.
For microbenchmarks that are switch, indirect-, or virtual-call heavy we have
seen overheads ranging from 10% to 50%._

Ouch! This is independent of other performance hurts, like from the kernel
syscall overhead that was the hot topic yesterday. This is pretty crazy.

~~~
jerf
That's bad. A single 5% hit might not be the end of the world, but 5% here and
10% there and another 5% over there in the common case adds up badly enough.
Doubly-pathological cases (indirect calling-heavy code calling lots of
syscalls)... a 50% slowdown and a 30% slowdown combines to a 60% total
slowdown. Yeowch.

Will be intrigued to see how processor manufacturers respond to this. If they
were even slightly relaxed about it prior to disclosure I expect there's going
to be some very hurried attempts to engineer some solutions pronto. This is
the sort of thing where it might even be worth throwing away all of your
future roadmap plans and just getting a revision of the current chips out
there ASAP, whatever that may do to the rest of your roadmap.

~~~
ant6n
Sounds like it could be great for processor manufacturers. In the age where
CPUs don't get faster, there's finally a reason for customers to buy new CPUs
again!

~~~
CmdDot
Not really, once a program is compiled with -retpoline, new hardware won't
bring back reliable branch prediction.

I'd hope maybe, just maybe, this would be enough to put a focus on compilers
producing code that ends up using processor-optimized paths chosen at runtime,
to avoid "overheads ranging from 10% to 50%".

Though, in this case, that would essentially mean making the entire executable
region writable for some window of time, which is clearly too dangerous, so I
guess the 0.1% speedups from compiling undefined behavior in new and
interesting ways, will continue taking priority.

I mean, it's a compiler flag right, obviously whoever's going to run a program
on an unaffected platform will take the effort to recompile everything with
the flag removed.

Just the same way every serious application currently provides different
executables for running on systems where SSE2, SSE4.1, or AVX2 is present.

~~~
mike_hearn
Not quite - lots of "serious" applications these days are written to target
JIT compilers, which would be capable of switching retpoline on and off
depending on need.

~~~
CmdDot
Funnily enough, I ended up not including a PS starting with "A sufficiently
smart JIT, however..." ;)

------
AaronFriel
This is brutal for all interpreted/JITed languages and all statically compiled
languages with dynamic dispatch. I can hardly imagine worse news for
performance oriented engineers. And what's worse is that dynamic libraries
will probably need to be rebuilt with these mitigations in mind, so nearly
everyone will pay the cost even if they don't need it.

I feel bad for all of the engineers currently working on performance sensitive
applications in these languages. There's a whole lot of Java, .NET, and
JavaScript that's about to get slower[1]. Enterprise-y, abstract class heavy
(i.e.: vtable using) C++ will get slower. Rust trait objects get slower.
Haskell type classes that don't optimize out get slower.

What a mess.

[1] These mitigations will need to be implemented for interpreters, and JITs
will want to switch to emitting "retpoline" code for dynamic dispatch. There's
no world in which I don't expect the JVM, V8, and others to switch to these by
default soon.

------
rntz
This mitigates spectre variant #2, branch target injection. We also have a
mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre
variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider
the example code from [https://googleprojectzero.blogspot.com/2018/01/reading-
privi...](https://googleprojectzero.blogspot.com/2018/01/reading-privileged-
memory-with-side.html):

    
    
        unsigned long untrusted_offset_from_caller = ...;
        if (untrusted_offset_from_caller < arr1->length) {
         unsigned char value = arr1->data[untrusted_offset_from_caller];
         ...
        }
    

If instead we did:

    
    
        unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];
    

Would this produce a data dependency that prevents speculative execution from
reading an out-of-bounds memory address? (Ignore for the moment that a
sufficiently smart compiler might "optimize" out the modulo here.)

------
jzl
A new thing that's going to become a standard part of systems engineering:
deciding whether any given system needs to run with or without these kinds of
protections. Do you want the speed of speculative execution or do you want
Meltdown/Spectre protection? In some cases lack of protection is fine. But
figuring out the answer for any given system is often going to take expert-
level security knowledge. Security is all about multiple layers of protection,
and even a non-public facing machine might benefit from these layers depending
on the context.

~~~
s4vi0r
Spectre relies on tricking the CPU into branch predicting its way into
accessing protected memory, no? Is it not possible that we can keep most of
the performance benefits of speculative execution by somehow having a built in
"Hey, never _ever_ speculate that I'll want to access this region of memory"
sort of thing?

~~~
lorenzq
I read an ars technica article that this would be a possible solution but
isn’t right now because the hardware to check access rights isn’t fast enough
yet

------
leni536
It has an interesting performance impact on calls to dynamic libraries. One
alternative approach would be to avoid the indirect calls through not using
'-fPIC --shared' when building shared libraries but '-mcmodel=large --shared'.
This causes the relocations to happen at the direct calls and not through a
GOT.

The obvious drawback that it effectively disables sharing code in memory, it
would still allow sharing code on disk though. So it would be a middle ground
between the current state in dynamic and static linking.

[https://www.technovelty.org/c/position-independent-code-
and-...](https://www.technovelty.org/c/position-independent-code-
and-x86-64-libraries.html)

------
ealexhudson
This patch apparently implements this mitigation:
[https://support.google.com/faqs/answer/7625886](https://support.google.com/faqs/answer/7625886)

~~~
JdeBP
And once one knows the technical background, one is better positioned to
consider the response of Linus Torvalds to the idea that the entire Linux
kernel be recompiled for all x86 CPUs with a compiler that implements this.

* [https://lkml.org/lkml/2018/1/3/797](https://lkml.org/lkml/2018/1/3/797) ([https://news.ycombinator.com/item?id=16066968](https://news.ycombinator.com/item?id=16066968))

~~~
tptacek
This would be more interesting if the attack that the compiler mitigations was
designed for wasn't cross-vendor, cross-architecture.

------
badrequest
I, for one, am eternally grateful for the incredibly bright people who take
the time to patch this sort of stuff.

~~~
ben_jones
And the people who invented computers, programming languages, the internet,
and all the learning resources, that allow me to get a paycheck writing
extremely high level application code that feels like a coloring book in
comparison. Truly the shoulders of giants.

------
vfaronov
I have a hunch that the era of side-channel attacks is only now dawning, and
that we should expect many more painful exploits and cumbersome mitigations in
the coming years.

What do people more knowledgeable in the field think about this?

~~~
xigency
What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your
CPU, whether it is another user's program, a non-root executable, or a
JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our
assumptions of what we can and can't do securely.

~~~
pjc50
Define "trusted"? Who do you trust to do your verification, and how much does
it cost?

~~~
xigency
Well, critical applications, like flight systems, run on a different ecosystem
and are verified. (And it costs a lot.)

But my usecase might be a physical computer that isn't networked which does
data science with some programs and prints out results.

These patches are focused at Amazon and cloud providers that are in the
business of running separate individual's applications on the same machine.

In the consumer world, the slope would be browser scripts and user
applications that aren't running as super. But even then, do you download and
run software that you expect might steal information or damage your computer?

These are fundamental security questions. Creating rings and sandboxes are
what create the assumptions of privacy and security.

------
phkahler
RISC-V impact? With all the reports of these attacks, I have not seen mention
of risc-v. Since they are in the process of finalizing a lot of specs
including memory model and privileged instructions, I wonder if there will be
last minute changes to mitigate these vulnerabilities.

~~~
leoc
At the risk of being a HN self-parody, I’ve also been wondering what this
means for the Mill...

[https://millcomputing.com/docs/prediction/](https://millcomputing.com/docs/prediction/)

~~~
gpderetta
The only speculation done on the mill currently is on whether it will ever be
released, so I think they'll be safe.

------
coldcode
I remember doing tricks like this in 6502 assembly and in other early
processors. Amazing that to stop these attacks you have to come up with clever
tricks again. Back in the 80's I would have never imagined this type of attack
being something to worry about.

~~~
FLUX-YOU
>early processors

Early processors had speculative execution? I thought this had been added to
Intel/AMD/ARM about 20 years ago?

~~~
dzdt
I guess he means the retpoline. On the 6502 there is no indirect jump
instruction, so you need such tricks just to achieve an indirect jump at all.

~~~
pubby
There's an indirect jump instruction. It's not very good though, and has a
notorious bug with addresses ending in 0xFF.

~~~
dzdt
Gah, you're right. Guess my memory is fading. There is indirect JMP, but no
indirect JSR or indirect branches. And the indirect JMP as you say is not very
useful.

------
peapicker
This brings to mind Ken Thompson's "Reflections on Trusting Trust"[1] -- after
all, all I have to do to write code with the exploit is be able to remove the
patch and rebuild the compiler and build some executables.

Trusting in a compiler you hope was used to build all the executables on your
system isn't trustworthy enough to be the final solution.

[1]
[https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html](https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html)

~~~
pwg
Every modern compiler usually has extensions that allow for bits of assembly
to be inserted alongside the usual C or C++ code.

Unless the compiler is also patched to either disallow inserted assembly, or
to modify the inserted assembly (this being both hard and dangerous), someone
who wants to exploit the bug will just add their own inserted assembly code
that exploits the bug, and a patched compiler won't help one bit in that case.

------
cws125
Just as a FYI, according to:

* [https://lkml.org/lkml/2018/1/4/432](https://lkml.org/lkml/2018/1/4/432) * [http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...](http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blob;f=xen/arch/x86/spec_ctrl.c;h=79aedf774a390293dfd564ce978500085344e305;hb=refs/heads/sp2-mitigations-v6.5#l168)

It appears that Skylake and later can actually predict retpolines? Some
hardware features called IBRS, IBPB, STIBP (not a lot of details on this are
out there) are supposedly coming in a microcode update.

------
jgowdy
The problem I see with this concept is ROP mitigations like Intel’s control
flow enforcement don’t seem compatible with intentionally using tweaked
addresses with ret. The address they inject won’t match the shadow stack and
the program will be terminated.

~~~
DannyBee
This is true, and so far, nobody has a better idea. (IE i would expect that
unless someone comes up with one, that hardware CFE in its current form dies
and won't happen for Intel until the processors are changed in a way that
mitigation is not needed)

------
teilo
Isn't it the case that the Itanium architecture would not be vulnerable to
Spectre because it moves the onus of branch prediction from the CPU to the
compiler?

~~~
als0
Assuming the compiler knows what it's doing :)

~~~
teilo
That was always the problem with the Itanium compilers. They were crap because
they couldn't benefit from the years of tuning traditional architectures
enjoyed.

~~~
acdha
Also the compiler had to be absolutely brilliant to rewrite the serial
branching code most programmers wrote to work with the EPIC model. They had
some good results optimized math-heavy code but the general purpose code ended
up with too many nops waiting on results.

------
nathell
I can't help thinking of how the early-ITS approach to security (not only was
there none, but looking at other users' work was a deliberate feature) was
embraced by its users. I'm way too young to remember, but it rings a bell
somewhere down my heart.

There's a lot of prominence being given to all kinds of damage malicious users
might inflict, and ways to prevent or mitigate, but little to the _malice_
itself. Whence does it arise? What emotions drive those users? What unmet
needs?

Meanwhile, when these slowing-down patches for Sceptre and Meltdown arrive, I
intend to _not_ run them, to the possible extent. I intend to keep aside a VM
with patches for critical stuff, like banking or others' data entrusted to me.
But I don't want my machine to be slowed down just because someone, sometime,
might invest effort in targeting these attacks at it. Given how transparent I
want to be with my life, that's a risk I'm willing to take.

~~~
fwip
Most attacks aren't targeted at specific people. Hackers don't want to read
your emails, they want your credit-card information, digital account
passwords, or to compromise your computer to use in their botnet.

Sure, you might not have anything you want to hide in your life, but the
drive-by javascript doesn't care about your secrets - it'll hack you anyway.
Best-case scenario, you lose access to a bunch of accounts you used to use and
need to create new identities from scratch. Worst-case, they clean you out
financially, steal your identity, etc.

------
fooker
retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

~~~
tptacek
An indirect jump is when your program asks the CPU to transfer control to a
location that your code itself computes: "jmp %register". Compare to a direct
jump, where the destination of the jump is hardcoded into the jump instruction
itself: "jmp $0x100".

Most programs have indirect jumps somewhere. Higher-level languages with
virtual function calls have lots of indirect jumps, because they parameterize
functions: to get the "length" of the variable "foo", the function "bar" has
to call one of 30 different functions, depending on the type of "foo"; the
function to call is read out of a table at some offset from the base address
of "foo". Or, another example is switch statements, which can compile down to
jump tables.

What we want, to mitigate Spectre, is to be able to disable speculative
execution for indirect jumps. The CPU doesn't provide a clean way to do that
directly.

So we just stop using the indirect jump instructions. Instead, we abuse the
fact that "ret" is an indirect jump.

"Call" and "ret" are how CPUs support function calls. When you "call" a
function, the CPU pushes the return address --- the next instruction address
after the "call" \--- to the stack. When you return from a function, you pop
the return address and jump to it. There's a sort of "jmp %register" hidden in
"ret".

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump,
where the mov does a switcheroo on the saved return address.

The obvious next question to ask here is, "why don't CPUs predict and
speculatively execute rets?" And, they do. So the retpoline mitigates this:
instead of just "call/pop/jump", it does "call/...pause/jmp.../mov/jmp", where
the middle sequence of instructions set off in "..." is _jumped over_ and not
executed, but captures the speculative execution that the CPU does --- the CPU
expects the "ret" to return to the original "call", and does not know how to
predict around the fact that we did the switcheroo on the return address.

How'd I do?

~~~
StavrosK
Pretty well, thanks. What I'm wondering is: The attack is using the data
fetched into the cache from a speculative indirect jump to do a timing attack
and discover what's in the former, correct? Why can't the CPU mark the cache
area it fetched in the speculative jump as "stale" and discard it? Why
wouldn't that fix the problem?

~~~
ahh
I don't know any way to leave enough breadcrumbs to do that in four clock
cycles, do you?

------
contrarian_
Note for a true fix to the BTB poisoning attack you would additionally have to
disable SMT/HT.

See here:
[https://news.ycombinator.com/item?id=16070304](https://news.ycombinator.com/item?id=16070304)

------
Pelam
Maybe some future architecture will allow software to tell CPU which regions
it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while
preventing information leak exploits.

~~~
pwg
The CPU hardware already has that feature. It is the VM paging system and the
permissions assigned thereto.

The bug here is that the CPU is not aborting the speculation when fetches
occur to addresses marked as "access denied". Instead the fetch happens and a
line of normally inaccessible memory is put into cache by code that should not
be able to get it read into the cache normally.

One hardware fix would be to plug that hole. Speculative reads get blocked
when they encounter permission denied errors from the paging system and do not
change the cache state. That blocks the Meltdown attack, but not the Spectre
attack.

~~~
Pelam
I thought about that too... AFAIK currently paging system is not generally
accessible to userland programs like browsers. They would need some way to
setup different contexts for untrusted javascript code and the internal
services that the javascript can call.

Also maybe the context switching would need to be made faster, because you
would need to do that whenever eg javascript calls browser interfaces.

------
userbinator
This is horrible, really _really_ horrible. And I'm not talking about the bug
itself, but the mitigation --- which is basically "stop using indirect jump
and call instructions and recompile all your software". The latter is beyond
unrealistic.

It also sets a very bad precedent: I understand people want to mitigate/fix as
much as possible, but this is basically giving an implicit message to the
hardware designers: "it doesn't matter if our instructions are broken,
regardless of how widespread in use they already are --- they'll just fix it
in the software."

~~~
hn_throwaway_99
> it doesn't matter if our instructions are broken, regardless of how
> widespread in use they already are --- they'll just fix it in the software.

What are any other options? It's _hardware_ , that cannot be patched. Of
course they will change chip designs going forward, but what else do you
suggest folks do with the billions of chips that exhibit this problem?

------
sempron64
It's noted in the patch that one would have to recompile linked libraries,
which seems impractical, unless a distro decides to build everything with this
flag.

~~~
jacquesm
Not just linked binaries, also the whole underlying OS, and, critically, the
compiler itself. Otherwise you could replace the 'proofed' construct with one
that is not proofed against the bug.

~~~
JDevlieghere
Why would you need to recompile the compiler? Both variants only provide read
access.

~~~
jacquesm
Ah right, of course. Sorry, in the midst of doing a pile of stuff I should not
be commenting on this without studying it further, I figured that the first
level read access would allow you to dig up the secrets required to give you
write access which would then allow you the free run of the whole system, but
if you are still on the other side of a virtual machine then that won't do any
good unless that virtual machine can be escaped as well.

------
strongholdmedia
As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that
> were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would
> all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

------
crb002
This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side
channel caches found in Spectre. There really needs to be two kinds of
function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas
that over-ride the flag.

------
okneil
The site is down for me. HN hug of death?

~~~
hultner
It was a bit slow but eventually loaded for me.

~~~
mayoralito
Yeah, same thing happened to me... slow as hell but I guess it's common due
the severity of the issue. All people wants to see this at the same time.

------
lousken
what about performance impact after new CPU architecture arrives? how is that
going to work?

------
eptcyka
Mill can't come soon enough.

~~~
mike_hearn
What makes you think the Mill would be immune to these issues?

~~~
eptcyka
Mill has no speculative execution.

~~~
phs2501
Uh, yes it does. It has no /out of order/ execution but it certainly has
speculative execution. They mention quite often in talks how they predict EBB
exits and not each branch. They even go so far as to follow that EBB exit
chain to speculatively load code from DRAM several calls ahead, which is much
more speculation than current CPUs are capable of.

You basically can't make a deeply-pipelined processor fast without speculative
execution.

------
silimike
If this were 15 years ago, I'd say the site was SlashDotted.

------
andrewmcwatters
In other news, Intel has found that by not using a computer at all, though
performance overheads increase 100%, this counter-measure does secure any
previously available attack vectors.

