>> These attacks leak information through micro-architectural side-channels which we show are not mere bugs, but in fact lie at the foundation of optimization.
So we'll need to have non-speculative execution for cloud CPUs and stronger efforts to keep untrusted code off our high performance CPUs. This may even lead to chips with performance cores and trusted cores.
No. The paper notes that Spectre can, and will in the future be able to defeat all programming language level techniques of isolation. With properly designed OoO, Spectre cannot defeat process isolation. The fundamental lesson that everyone must take to heart is that in the future, any code running on a system, including things like very high-level languages running on interpreters, always have full read access to the address space of the process they run in. Process isolation is the most granular security boundary that actually works.
Or in other words, running javascript interpreters in the same address space as where you manage crypto is not something that can be done. Running code from two different privilege levels in the same VM is not something that can be done. Whenever you need to run untrusted code, you need to spin up a new OS-managed process for it.
So this is something that I've never gotten a full answer to: what is the difference between a "thread" and a "process" in this model?
This isn't a facetious question. A thread is just, at its core, a process that shares memory with another process. (In fact, this is how threads are implemented on Linux.) But all, or virtually all, processes also share memory with other processes. Text pages of DLLs are shared between processes. Browser processes have shared memory buffers, needed for graphics among other things.
What separates processes that share memory from threads that share memory regarding Spectre? Is it the TLB flush when switching between processes that doesn't occur between threads? Or something else?
For meltdown (spectre v3, iirc) It's not so much sharing memory as sharing address space. Processes have different page tables. Threads within a process share page tables.
For spectre v1 and v2, right now (on existing hardware) mostly nothing separates threads from processes. In the future, process isolation is a good candidate for designing hardware + system software such that different processes are isolated (via partitioning the caches, etc).
You probably still want threads within a process to share cache hits.
So, if that's true, why is Chrome considered to have solved Spectre? Browser content processes from different domains share some memory. Moreover, if process boundaries don't have any effect on the branch predictor on current hardware, then why is process separation relevant at all? Doesn't all this mean Spectre is still an issue?
I guess I jumped the gun a bit in my comment above.
In terms of the possibility of exploit, as I understand there isn't at this point any isolation between processes.
In terms of the ease of exploit, being able to run untrusted code in the same process as the victim helps quite a bit. Otherwise, you have to find a gadget (i.e. qualifying bounds check for v1, indirect branch for v2) in the victim process that you can exploit from the attacker process. Possible, but quite a bit harder than making your own gadget.
This all ignores the forward looking reasons process isolation is a good idea. I can't keep track of the latest mitigations in Linux, but they pretty much all will only help between processes by flushing various hardware data structures. And hopefully someday we will have hardware actually designed to restore the guarantees of isolation between processes.
I'm pretty sure this is accurate, but I'm just a random guy on the internet so don't trust my word for it too much.
It's not really about process isolation then, but the amount of control untrusted code can have over a process. Which means if everything that code can do is masked to some part of the process, it should be able to achieve the same isolation between such subprocesses but within the OS process boundaries. Although the paper claims this is too hard.
Chromium does not fully solved Spectre. It is still too expensive to run one process per domain, so many unrelated pages run in the same process. But Chromium contains a few mitigations that makes exploiting Spectre from JS much harder.
The threat model is: A code triggering spectre v1 gets read access to the entire address space currently mapped in. ^*
Since process boundaries are enforced by not mapping any ram not usable by the process, this means they don't get violated by spectre v1. If you have two threads which only share part of their address space, the unshared part is protected. Any executable or library mapped into multiple processes is readable from any of them.
^*: With modern cpus, multiple processes can be mapped in simultaneously using ASIDs, however this doesn't matter because they work as they should and properly isolate the processes. You can just assume the model "only one process is mapped at a time".
Your description implies the existence of another mitigation. Namely: When you enter untrusted code you mprotect() all sensitive areas and remove PROT_READ. When exiting the untrusted code you add the permissions back.
Are you sure that works? As I understand it, the issue with Spectre is the branch predictor, not the memory mappings. The reason why process isolation works is that branch prediction gets reset on context switch (or that this will happen on newer generations of hardware in the future).
Mprotect should in fact work, but it is likely more expensive than actual process separation. Resurrecting segments or using virtualization hardware in userspace (see libdune) might be workable solutions.
The issue is that speculation allows bypassing software enforced bound checking, but, discounting meltdown, the hope is that hardware can still enforced them.
mprotect does not issue a memory barrier (mfence), so whilst theoretically protected it is practically delayed and can still be read from the cache via sidechannels.
Same issue with the unsafe bzero call. A compiler barrier is not safe enough to delete secrets.
Mprotect should work because even under speculation the CPU shouldn't allow a read to an invalid address to be executed. Meltdown shows that some CPUs speculate even this sort of checks, but it seems that it is not inherently required for an high performance implementation.
Most modern architectures make extensive use of PC-relstive instructions for branches and load/store. That means when rebasing a binary you just need to modify the pointers in the data segment (things like GOT entries, etc) and can leave the text untouched.
> With properly designed OoO, Spectre cannot defeat process isolation
It's worth noting that no existing or announced common hardware is "properly designed" according to this condition. Even the "fixed" Intel hardware that's been announced is still vulnerable to spectre v1 across process boundaries.
> It's worth noting that no existing or announced common hardware is "properly designed" according to this condition. Even the "fixed" Intel hardware that's been announced is still vulnerable to spectre v1 across process boundaries.
AMD Zen is.
Spectre v1 (bounds check bypass) only works inside processes. All it allows you to do is to read any memory location currently mapped into your address space, and so it gives anything that can execute code complete read access to the address space of the process it's running in. On Intel CPUs, this also allows reading the kernel address space, unless kpti is used. Eventually, the ability to read kernel memory will be removed, and so kpti becomes unneccessary.
On all AMD post-BD cpus, spectre v1 cannot be used to read kernel address space.
All the rest of spectre (and meltdown) can eventually be fixed, but it is effectively impossible to make a cpu that is both fast and doesn't exhibit spectre v1.
Regardless of the accuracy of your claims regarding spectre v1, I'd like to see a source saying that spectre cannot defeat process isolation on AMD Zen. I've found a lot of sources that don't support that, and none that do. The closest thing I've read is a statement that Zen 2 will have some mitigations for spectre.
So, I went and read the whole lkml threads you linked and if I understood correctly, regarding spectre v1, the kernel is only expected vulnerable to bpf based attacks or similar. As far as I understand, the speculation barriers are used to protect arrays directly accessible by bpf programs.
There is a mention of out of process attacks to other userspace programs, but no details.
By carefully crafting inputs, I'm ready to admit that it might be theoretical ly possible to attack some exploitable branches, but the big deal with spectre is the high bandwidth that can be attained by directly running code in process.
Do you have any pointer to any description of an even remotely practical out of process spectre v1 attack that doesn't involve executing code in process? Repurposing an interface that is not meant to be used to run code (I e. Build your own VM) is fair game.
> If you read the papers you need a very specific construct in order to not
only cause a speculative load of an address you choose but also to then
manage to cause a second operation that in some way reveals bits of data
or allows you to ask questions.
> BPF allows you to construct those sequences relatively easily and it's
the one case where a user space application can fairly easily place code
it wants to execute in the kernel. Without BPF you have to find the right
construct in the kernel, prime all the right predictions and measure the
result without getting killed off. There are places you can do that but
they are not so easy and we don't (at this point) think there are that
many.
> The same situation occurs in user space with interpreters and JITs,hence
the paper talking about javascript. Any JIT with the ability to do timing
is particularly vulnerable to versions of this specific attack because
the attacker gets to create the code pattern rather than have to find it.
---
> big deal with spectre is the high bandwidth that can be attained by directly running code in process
That depends on your perspective. If you are an OS developer who strives to guarantee process isolation, than it is a pretty big deal that spectre v1 allows you to read memory from the kernel or from other processes, even if it might be tricky to do so. If you write a JS JIT, then yeah you are probably most concerned about the single-process case.
> remotely practical
IMO, most spectre attacks are not remotely practical. No, I don't have a pointer. The only actual demonstrations of spectre I've seen is the one included with the original paper (single process).
I'm not aware of a spectre v1 attack for the altair 8800, or indeed any of the CPU's of that era, Z80, 6502, 8086/8....
But then, things moved on, standard ways to add more cache with multiple cores was lapped up and later on we found a design flaw that echoed back for a decade or more upon all these multi-core CPU's.
Though in fairness and to put some context upon all this, CPU design is more complex than writing TAX laws. Yet we have exploits for TAX laws appearing and used all the time by large corporations. Whilst the comparison is not ideal and some would say, unfair. It does highlight that nothing is perfect and what we may class as perfect today (or darn close), could and may very well be classed as swiss cheese in the future. It gets down to how far away that future is.
After all, we still have encryption utilised that we have (on paper) shown to be flawed to future quantum CPU's!
But in a World that was aware of Y2k decades before the event, the penchant of business to drive everything to the last minute for profit will always be a factor in advancements. After all, if CPU cores had isolated caches instead of sharing, then that would mitigate so many issues, yet it would cost more to make and most consumers would not appreciate the extra cost for what to them is little value in return above and beyond the cheaper solution. That's business for you and CPU's are made by them for profit.
>> The paper notes that Spectre can, and will in the future be able to defeat all programming language level techniques of isolation.
That's why I said we need trusted cores - i.e. ones that don't implement speculative execution or share cache with other cores. Untrusted code needs to be run in physical isolation, not just virtual isolation.
But the real solution to all of this is not to run untrusted code at all. This raises the question of how we come to trust the code we run. The simplest and most obvious thing we need to do is disable javascript. I mean how can you possibly trust code that came in a 3rd party payload used for advertising? How can you trust anything from Facebook? Or any of them? The answer is that you can't and in may cases should not.
What about languages that prevent the program from reading the current time or determining whether some local computation finished before or after some external event? (I'm thinking of something like Haskell code that's executing outside of the IO monad.)
Perhaps we just need to have a more restricted idea about what untrusted code is allowed to do.
In concept this could work. In practice, you bassicly need to design the language (or at least compiler and runtime) with this in mind from the beggining, otherwise you might forget about some backdoor.
Eg, if you do Haskell and just verify the function you are running is not in the IO monad, you might miss some usage of UnsafePerformIO. Even if you check their code, if you let them specify dependencies they might manages to sneak a buggy use of UnsafePerformIO into a library they submitted to hackage.
Plus, your restriction is essentially: no clock, no contact with the outside world, no threading, and a carefully considered interface to the host program to prevent time leaks.
By "something like Haskell code executing outside the IO monad", I meant something like Haskell with the obvious backdoors turned off. (Ghc has a "safe haskell" option to disallow things like unsafePerformIO.)
Disallowing direct access to the outside world is a big restriction, but it may be that a lot of the things you'd want to do inside a sandboxed application that aren't safe could be delegated to trusted code through an appropriate interface.
Threading isn't necessarily a problem; the Haskell Par monad for instance should be fine as there is no program-visible way to know which of two sub-tasks executed in parallel finished first.
I looked that up in my copy of "Parallel and Concurrent Programming in Haskell" by Simon Marlow, and indeed you're right -- the runPar function shouldn't let you return an IVar, but it does. This is planned to be "fixed in a future release", but the current online documents say about the same thing.
Presumably, this could be fixed easily by using the phantom type trick (same as ST) but it would make the type signatures ugly and possibly break existing code. (Maybe there's a more modern alternative to phantom types?) So, yeah, you might not want to use the Par monad as it's currently implemented in ghc as your secure parallel sandbox.
The online docs suggest using lvish if you want a safer Par monad interface, which I'm not familiar with (though the lvish docs say that it's not referentially transparent if you cheat and use Eq or Ord instances that lie).
The general idea seems sound, though -- it should be possible to have parallelism in a sandbox environment without allowing the sandboxed program to conditionally execute code based on which of several threads finished some task first.
I point isn't that it is impossible in theory. It is just that, in practice, we do not have a good track record of locking down pre existing languages. As much as Haskell is one of the more idealogically pure languages, its ecosystem is still written under the general assumption that its programmers are not malicious geniuses.
I think the problem is that runPar is currently allowed to return an IVar to the code that called it, which could then pass that IVar into a different runPar invocation. That's not something the runtime system expects, or the type system should allow.
Putting aside the fact that I don't think you're actually disagreeing with the parent, I don't think it's really accurate to say that all programming level techniques of isolation are defeatable. The paper enumerates a specific set of vulnerable language features (e.g. "Indexed data structures with dynamic bounds checks"). The paper doesn't say much about languages lacking these features. Also, I believe retpoline is a (nearly?) perfect software fix for Intel's microarchitecture.
>> Processor affinity would become a security-influenced decision with a CPU like that.
Yes it would.
>> That would allow for a whole new class of bugs.
I think it's necessary, but not easy.
Actually what I think is necessary is for people to stop running code from random places - or even common places. Google could work without running stuff on my machine.
I hope they still offer CPUs optimised for the non Cloud scenario, where all the programs running are trusted and so these sorts of attacks (which requires local access) are not applicable.
Outside of some very niche scenarios, does this usecase even exist? Certainly nothing running a javascript enabled browser, electron app, or in general any VM of any sort qualifies.
All the VMs run on my employer's servers are running code we trust. None of them run arbitrary code from some third party, because we're not a cloud provider, nor are they used to browse the web. I don't want to slow them down to mitigate vulnerabilities that just aren't a serious risk or even applicable.
Sure. I’m comfortable describing HPC and on-premises fully trusted computing (another response) as “very niche scenarios” though (compared to the much much much larger markets of large cloud farms on the server side, and consumer devices), to the point where I have to wonder whether or not it’s worth it for CPU vendors to cater specific SKUs to them without the silicon mitigations.
How about CPUs without speculative execution and simultaneous multithreading (SMT / Hyper-Threading, which has similar issues)? We would, of course, need other optimizations to claw back the performance loss--an engineering problem I feel we can solve.
I've wondered if the solution is more, simpler cores. We concentrate on smaller, faster cores, and the programming to utilize them better. Perhaps advances in memory architectures as well. Hardware isn't my specialty, so I'm just brainstorming here.
Perhaps this is where ARM and even RISC-V based systems can step in.
But I'm a software guy, so what do I know? I just know I'd feel more comfortable with systems based on simpler CPUs that just cannot be exploited by the recent side-channel attacks discovered, rather than trying playing whack-a-mole with patches, along with trying to reason when it might be safe to use CPUs with these optimizations.
A few things: ARM and RISC-V definitely have specEx baked in (though you can not include SpecEx module on RISC-V). There are interesting alternatives to SpecEx. DSPs use delay slots, and I've seen delay slots used quite well in a GP-CPU. Getting high instruction saturation on a CPU with delay slots is a "hard compiler problem", but I have a few things to say about that:
Despite jokes about "better compilers", compilers are getting better (e.g. polyhedral optimization). One way to think of what OOOex/SpecEx is that it's figuratively the CPU JITting your code on the fly. The most popular programming language JITs aggresively anyways so one wonders if there isn't some reduplication going on.
Furthermore, the most popular programming language isn't entirely the most raw-power performant, and it's pretty clear that in our current ecosystem just pushing operations through the FPU (which is what x86 optimizes for) isn't necessarily the most important thing in the world; uptime, reliability, fault-tolerance, safe paralellization, distribution, and power conservation might be more important moving forward.
HM, oops, apparently RISC-V has OOOEx, not SpecEx.
I understand this is nitpicking, but it's not accurate to say "RISC-V has speculative execution" or "ARM has OoO execution" and that they therefore suffer from spectre and friends.
RISC-V/ARM are specifications of instruction sets, for which there exists an enormous domain of possible implementations. Spectre/Meltdown are not inherent features of Instruction set architectures. They are emergent properties of certain implementations of those instruction set architectures.
For example, the BOOM implementation of RISC-V does out of order execution. The Rocket chip implementation does not. Both implement the RISC-V architecture.
I'm not replying to you specifically. But I see this sort of thing on HN all the time and I feel like it's an important distinction to make.
Thank you, I should have been more careful. Spectre and meltdown are in fact specific interactions that happen because OOO and specex are hard and it's easy to mess up given the high level of statefulness and complexity in contemporary chip designs (in this case - memory caching). But ooo and specex make chip architectures difficult to reason about and I'm sure more errors will emerge.
> Despite jokes about "better compilers", compilers are getting better
The compiler has to make static decisions. The hardware knows what is actually happening. There is an inherent information asymmetry at work that a "sufficiently smart" compiler seems unlikely to overcome.
My intuition says software can't beat the speed of a superscalar OOO CPU anymore than a GP CPU can beat a roughly equivalent DSP for algorithms suitable to run on the DSP, but I have no proof for that.
I'll also note that we've been promised "smarter compilers" for decades. Intel has tried that route several times. No one has ever made it work.
> My intuition says software can't beat the speed of a superscalar OOO CPU anymore
How good is good enough? I mean we have distributed tensor flow which is basically on the fly compilation that can reorganize your computational graph around nodes with gpus separated by network latency, or Julia where you can drop in a GPUarray as a datatype and move computation to the GPU without changing your code.
If we go to something a bit more baroque, java is within 1.5 of c/c++ these days
Could you hand roll a better solution? Probably. Would it be worth it? Doubtful.
I think it's definitely worth exploring this angle because modern JIT compilers have become very advanced, and there's still a lot of juice left to squeeze there. Look at some of the things Graal is doing and it looks a lot like what OOO speculation is doing - it'll recompile branches on the fly based on profiling information and things like that.
It is a natural alternative. The "simpler but more cores" project works on paper (i.e. potential instruction throughput). In reality it falls apart for variety of reasons. The most fundamental is because of
the difficulty of exploiting thread-level parallelism. The complex Out-of-Order cores do a really good job of improving throughput by finding independent instructions to execute in parallel. The path-to-parallelism is much easier at the granularity of instructions than at the granularity of cores. Parallel programming is hard. Amadahls law cannot be avoided except through ----- speculation, so we are back to complexity again.
> In reality it falls apart for variety of reasons.
On the contrary, the "many wimpy cores" approach has been very successful on GPUs (and other vector processors such as TPUs, etc.) What it hasn't been successful at is running existing software unmodified.
The best solution (the one we use today, in fact) seems to be a hybrid approach. We have a few powerful cores to run sequential code and many weaker units (often vector units; e.g. GPUs and SIMD) to run parallel code. Not all algorithms can be parallelized, so we'll likely always need a few fast sequential cores. But a lot of code can.
Amdahl's law for sure. We all know most software sucks, so in this vision (simpler cores), that reduces us to taking the performance hit and optimizing elsewhere. For example, older ARM-based systems do not use speculative execution, and can handle encryption and video transcoding though co-processors.
Side-channel attacks are some of the most terrifying to me, because fundamentally your code works fine, it's just some detail in the timing that gives away information. At least with buffer overflows there are automated ways to try and find them, like fuzz testing. It's not perfect, but at least it's something to try and validate you are doing the right thing. Is there some sort of fuzz testing that could potentially find side-channel timing attacks? Maybe some sort of statistical analysis on the timing results? I found this https://arxiv.org/pdf/1811.07005.pdf but it seems to be fairly recent, and I don't know how mature this area is.
One of the interesting aspects of quantum computing is that it upgrades these issues into actual bugs. Bit flips are errors in the Z basis, and phase flips (information leaks) are errors in the X basis. [1][2]
Trying to build large scale quantum computers could lead to styles of optimization that provably don't have side channel issues.
Has there been an in-the-wild exploit found for many of these micro-arch side channel attacks?
I still have reservations that Specter can be actually exploited, and certainly not by JavaScript running in a browser it seems (even without the timing fixes -- just too much noise in the system to get a real world exploit). All the proof of concepts I saw effectively needed a running start (and a much lower bar to clear). I'm still not sure if we are just making too much out of many of these attacks.
Also, many of these exploits require running native code to really even be possible, and a JIT offers a large degress of protection. Most of the machines I use, if you are already able to run native code, the battle has already been lost.
I think the security community can sometimes have Chicken Little syndrome. I most definitely wouldn't want performance enhancing techniques that might have side-channel vulnerabilities to not be produced or used because they might be exploitable in a use case (eg, web serving and rendering) that doesn't fit most others (eg, internal servers).
> I still have reservations that Specter can be actually exploited
It's not easy, and many systems are more easily broken into by other means (e.g., unpatched known-vulnerable programs). But people are already demonstrating exploits; it's not theoretical.
> and certainly not by JavaScript running in a browser it seems (even without the timing fixes -- just too much noise in the system to get a real world exploit).
Your faith is unwarranted. The paper itself evaluated JavaScript as implemented by v8, which is used by Chrome: "as part of our defensive work, we implemented a number of the described mitigations in the V8 JavaScript virtual machine and evaluated their performance penalties. As we’ve noted, none of these mitigations provide comprehensive protection against Spectre, and so the mitigation space is a frustrating performance / protection trade-off."
Also, modern statistics is pretty amazing at removing noise. Don't make claims about "too much noise" until you've tried them.
> Most of the machines I use, if you are already able to run native code, the battle has already been lost.
Perhaps, but many others are in a different circumstance.
We've moved to a world where people run on clouds (shared machines with your enemy) and automatically run JavaScript provided by malicious parties. This isn't hopeless. You can run on clouds but run on temporarily-isolated CPUs so there's no leak. A lot of applications can be easily written to work without JavaScript (though many can't, and many developers have no idea that this might be desirable). For many, providing confidentiality when the CPUs always leak is a real problem.
> I think the security community can sometimes have Chicken Little syndrome.
That's certainly true. These are not easy to exploit today, especially given some mitigations. But think of these as warning shots - these are problems for which there is NO practical full mitigation today. It would be unwise to ignore the problem. Attackers are likely to gradually get better at exploiting this over time, and if that happens, the only solution available is to NOT share CPUs running programs (disable JavaScript and use more-expensive isolated-CPU clouds) OR replace most CPUs with CPUs that do not yet exist.
> I most definitely wouldn't want performance enhancing techniques that might have side-channel vulnerabilities to not be produced or used because they might be exploitable in a use case (eg, web serving and rendering) that doesn't fit most others (eg, internal servers).
Understandable, and that may be just fine for your situation. However, most systems try to use least privilege to reduce damage (e.g., if you break into one component, you don't get everything). This attack makes it easier to breach such isolation - so even in that case, you might have reason to be concerned.
The sky is not falling, at least not today. But there are reasons to be concerned.
As part of our offensive work, we developed proofs of concept in C++, JavaScript, and WebAssembly for all the reported vulnerabilities. We were able to leak over 1KB/s from variant 1 gadgets in C++ using rdtsc with 99.99% accuracy and over 10B/s from JavaScript using a low resolution timer.
What are you basing this unearned confidence on? We've seen this same pattern repeat thousands of times:
1. This might be a vulnerability in theory but there's no POC
2. Well it's just a POC, no one has actually weaponized this.
3. Sure someone weaponized this but it is very difficult to pull off.
4. Oh the script-kiddie toolkits can build exploits for this automatically. Who could have seen it coming?!?
(Step #2 is probably where the various governmental spying agencies start deploying the exploit in targeted ways)
Perhaps this time things are different but it seems unlikely.
Reading 10B/sec from JS, which is under ideal conditions, does not sound like a fast path to exploitation. I agree it is a concern, but I think it is overblown, similar to pcwalton below.
There has not been an instance of a real world exploit being run (e.g, you don't know what memory address you are attacking and the system is running under a real world load), and I saw a release by a con sec company (can't remember which one) that said they haven't seen any indication anybody was trying this in the wild either.
There is so much that needs to line up to get Specter to work in Javascript that I'm at the point that I don't think it is possible.
Everybody keep saying that the POC shows it is possible and that a real world exploit is easy after that. But Specter has been around for a year now, and nobody has yet to even get close to make a working exploit that didn't require some large leaps of faith.
And every time you bring this up, you get voted into the floor in HN. I get it. Specter is very cool, almost magical, but coolness doesn't make it any more likely to work.
I agree that Spectre is somewhat overblown. It's something that we should be working on mitigating, to be sure, but it is very difficult to exploit. Memory safety issues, in practice, are significantly easier to find, so that's what attackers go for. We're not in a "drop everything, all hands on deck" situation.
So we'll need to have non-speculative execution for cloud CPUs and stronger efforts to keep untrusted code off our high performance CPUs. This may even lead to chips with performance cores and trusted cores.