What about devices with older processors? I'm still running a Sandy Bridge rig and it works fine, except for the side channel vulnerablities. It's probably not going to be patched. I also have a cheaper computer with a Skylake processor, which is newer yet still vulnerable!
It's only a matter of time until something really nasty comes along, making all these PCs dangerous to use. What then? Lawsuits?
My questions are only partially rhetorical.
The important thing to realize is that speculation and caching and such were invented for performance reasons, and without them, modern computers would be 10x-100x slower. There's a fundamental tradeoff where the CPU could wait for all TLB/permissions checks (increased load latency!), deterministically return data with the same latency for all loads (no caching!), never execute past a branch (no branch prediction!), etc., but it historically has done all these things because the realistic possibility of side-channel attacks never occurred to most microarchitects. Everyone considered designs correct because the architectural result obeyed the restrictions (the final architectural state contained no trace of the bad speculation). Spectre/Meltdown, which leak the speculative information via the cache side-channel, completely blindsided the architecture community; it wasn't just one incompetent company.
The safest bet now for the best security is probably to stick to in-order CPUs (e.g., older ARM SoCs) -- then there's still a side-channel via cache interference, but this is less bad than all the intra-core side channels.
These vulnerabilites and Meltdown allow untrusted code to speculatively access data that it shouldn't be allowed to access at all, and use that speculative access to leak data itself. Unlike Spectre this can be (and to some extent has to be) fixed at the hardware level, because the hardware itself is failing to protect sensitive data. This class of vulnerability seems to have been mostly Intel-exclusive so far (with the main exception being one unreleased ARM chip that was vulnerable to Meltdown). There's nothing inherent about modern high-performance CPUs that requires them to be designed this way.
Edit: This slipped my mind, but Foreshadow / Level 1 Terminal Fault was yet another similar Intel-only processor vulnerability that allowed speculative access to data the current process should not be able to access. It's definitely a pattern.
Assuming by "designed this way" you mean: to speculatively execute past security checks, I'd disagree.
I'd say the relevant performance measure for CPUs (as opposed to other kinds of processor) is the speed at which they can execute serial operations. As electronic performance improvements offer increasingly marginal gains, we need to resort to improved parallelism. When operations are needfully serial due to dependency, as are security checks, the only way to accelerate that beyond the limits of the electronics is to make assumptions (speculations).
It's not inherently wrong to do this, but requires speculations never have effects outside of their assumptive execution context.
Also, no need for older SoCs. High end ARM chips have both in-order and out-of-order cores but the cheaper ones have in-order ones only. A Snapdragon 450 is pretty modern and doesn't speculate deeply enough to be vulnerable to Spectre.
In the x86 space, Meltdown absolutely was down to one company apparently deciding to over-optimize for performance.
I can't find it now, but I remember reading a thread from (I think) the OpenBSD devs about how the Intel MMU documentation described fairly sane behaviours and how far the reality deviated from the documentation.
Serious weasel wording, outside of x86 space every other high end architecture also had Meltdown issues, ARM, and IBM's POWER and mainframe designs.
ARM is actually a good target with a number of their newest designs using out-of-order speculative execution with Spectre vulnerabilities and their owning mobile space outside of notebooks, one of the newest even being vulnerable to Meltdown, but the significant headline worthy instances tend to come in much more locked down devices. Speed is also an issue, everything else being equal, the faster the chip, the faster data exfiltration proofs of concept will work.
that said, some newer Intel LLCs are non-inclusive, and amd changed its cache relations as well in ryzen
Whether you run security checks in sequence or in parallel with a data access is pretty fundamental to the design of a core. Doing the later is has performance advantages but it's really hard to verify that won't be any architectural leaks as a result, let alone these micro-architectural leaks that nobody was thinking about.
I wondered aloud if it wouldn’t be better for use to embrace NUMA, make the bigger caches directly addressable as working memory instead of using them as cache.
Per Wikpedia, "The Pentium has two datapaths (pipelines) that allow it to complete two instructions per clock cycle in many cases. The main pipe (U) can handle any instruction, while the other (V) can handle the most common simple instructions." (https://en.wikipedia.org/wiki/P5_(microarchitecture) )
HT uses two instruction decoders to keep more busy a set of execution engines, suppose you had the above set of 2 execution datapaths (that may be improved on these Atoms), and couldn't do two operations at a time from one instruction stream, but the other could use the unused datapath.
out-of-order execution != speculative execution.
It is possible to have OoO without speculative execution. On the other hand they do tend to come as a pair since they both utilise multiple execution units, for instance the Intel Pentium in 1993 was the first x86 to have OoO or branch prediction (486 and those before were scalar CPUs).
That's technically correct (the best type of correct!), but without speculation, the CPU can't see beyond a branch, so the window in which the machine can reorder instructions is very small. (For x86 code, the usual rule of thumb is that branches occur once per 5 instructions.) So in practical terms, an OoO design (in the restricted-dataflow paradigm, i.e., all modern general-purpose OoO cores) always at least does branch prediction, to keep the machine adequately full of ops.
An interesting question, though, given that we're going to do speculation, is what types of speculation the core can do. The practical minimum is just branch prediction, but modern cores do all sorts of other speculation in memory disambiguation, instruction scheduling, and the like, some of which has enabled real side-channel attacks (e.g., Meltdown).
Also, FWIW, Pentium was 2-way superscalar, but not OoO in the fully general dataflow-driven sense; the Pentium Pro (aka P6), in 1995, was the first OoO. (Superscalar can technically be seen as OoO, I suppose, in that the machine checks dependencies and can execute independent instructions simultaneously, but it's not usually considered as such, as the "reordering" window is just 2 instructions.)
OTOH speculation without OoO Is not only possible but in fact very common. For example the majority of non ancient in-order CPUs.
In fact the original pentium, contrary to your statement, was an in-order design (the pentium pro was the first Intel OoO design). Also I believe that 486 already had branch prediction. Pentium claim to fame was being superscalar (i.e it could execute, in some cases two instructions per cycle), a better FPU and better pipelined ALU (I think it had a fully pipelined multiply for example).
To the extent I've looked at it, without reading original documents, the original OoO design that current systems are based on, the IBM System 360/Models 91 and 95's floating point unit using Tomasulo's algorithm https://en.wikipedia.org/wiki/Tomasulo_algorithm didn't extend to speculative execution.
No doubt because gates were dear, implemented with discrete transistors, and that processor was a vanity project of Tom Watson Jr's. And memory was slow, core except for NASA's two 95's with 2 MiB of thin film memory at the bottom, and cache was still a couple of years out, introduced with the 360/Model 85. OoO becomes compelling when you have a fast and unpredictable memory hierarchy, as we started to have in a big way in the 1990s when we returned to this technique.
I can't find any evidence to support that 
I suppose it's technically possible to have branch prediction on a scalar processor, but I imagine it would not be hugely beneficial.
Edit: pentium did have a significantly more sophisticated predictor of course, although not without flaws.
The problem is that single-thread performance is really important for a lot of workloads, because (i) parallelization is hard, (ii) even for parallelized workloads, serial bottlenecks (critical sections, etc.) still exist, and (iii) latency is often important too (one web request on one core of a server, or compiling one straggler extra-large file in a parallel build, for example).
So it would be safest to execute them on separate CPUs not sharing a common cache, e.g. pinning them to different CPU sockets on a multi-socket machine, or to different physical machines altogether.
This may be still faster than running on old ARMs.
I wonder if dedicated cloud boxes, where all VMs you spin on a particular physical machine belong only to your account, will become available from major cloud providers any time soon. In such a setup, you don't need all the ZombieLoad / Meltdown / Spectre mitigations if you trust (have written) all the code you're running, so you can run faster.
I should add that AWS has "dedicated instances" which are exclusive to one customer. They are more expensive than standard instances but they are used by e.g. the better financial services companies for handling customer data.
Datasets, yes, but what about instructions from shared libraries?
I'll make a wild guess this being seriously beneficial would be limited to uncommon server configurations....
- Same OS, server: you likely control all the processes in it anyway, no untrusted code.
- Different OSes in different VMs, server: they likely run different versions of Linux kernel, libc, and shared libraries anyway. They don't share the page caches for code they load from disk.
- Browser with multiple tabs, consumer device: likely they might share common JS library code on the browser cache level. The untrusted code must not be native anyway, and won't load native shared libraries from disk.
- Running untrusted native code on a consumer device: well, all bets are off if you run it under your own account; loading another copy of msvcrt.dll code is inconsequential compared to the threat of it being a trojan or a virus. If you fire up a VM, see above.
In any case, containers share OS kernel, OS page cache, etc. This can be beneficial even for a shared hosting as a way to offer a wide range of preinstalled software as ro-mounted into the container's file tree. Likely code pages of software started this way would also be shared.
The "elephant in the room" with all these attacks starting from Spectre/Meltdown is that an attacker has to run code on your machine to be able to exploit them at all.
To the average user, the biggest risk of all these side-channels is JS running in the browser, and that is quite effectively prevented by careful whitelisting.
As you can probably tell, I'm really not all that concerned about these sidechannels on my own machines, because I already don't download and run random executables (the consequences of doing that are already worse than this sidechannel would allow...), nor let every site I visit run JS (not even HN, in case you're wondering --- it doesn't need JS to be usable.)
In a dream world. It's become obvious that our general-purpose systems are far too complex to be proven secure, and trying to run untrusted code on such a foundation is going to turn up vulnerability after vulnerability after vulnerability. It is madness.
There is absolutely nothing to be done on our level about this.
I'm fairly convinced this is systemic issue that can only be solved by redesigning
almost entirely modern cpus and computers architecture.
I can draw a parallel to approximately all Intel cpus which are know to have a dedicated "mini cpu" called "Minix" which is an absolute "black box" and have been found to be vulnerable for to a wide variety of attack for nearly decades...
Not only we need to redesign computers and cpu architecture but we desperately need to make that entire process and knowledge open source , available to all and more transparent.
Today this entire knowledge is the hand of few gigantic corps whom are keeping it to ensure their monopolistic position.
Here's hoping OpenRISC takes off!
What exactly is "our level"?
Or, if that is too complex, there could just be a bunch of breadboarded ICs capable of networking. There actually are real-world examples of such machines exposed via telnet, e.g. http://www.homebrewcpu.com
And all the performance beasts, while surely indispensable, could just enjoy their air-gapped solitude. Are there any massively parallel supercomputing tasks whose results couldn't be summarized and reduced to mere text, which is not too hard to move over the airgap manually?
Video decoding? Games? You know, the major use cases for custom highly patented hardware design?
Hitting diminishing returns on caches with an ever increasing gate budget is I believe prompted this second generation starting in the 1990s which adds speculative execution, which the algorithm with all those extra hidden registers really invites you to do.
Probably it's smart to not see a computer not like a walled garden but more like a sieve.
None of that helps with your public cloud workloads.
But as an Intel consumer, I am not happy. My understanding is that more stuff can be fixed in microcode, but I suppose a bug could show up which was not practically fixable. If that happened, I would certainly sue or join a class-action lawsuit. Probably the class-action route, because even if I didn't get any thing, I would be just mad enough at Intel to want them to suffer.
Of course, we do have consumer protection agencies; it is possible that they would step if Intel had sold what would effectively be a defective product.
It doesn't have anything do to with how large Intel is. They have clearly made a more aggressive hardware design which has more corner cases to break. The designs are broken and microcode can patch some variants of these side channels but the overhead is becoming a problem.
In this case it's not certain if microcode can address the problem but if it can't, disabling of SMT (hyperthreading) can be a significant cost for some workloads (well above 10% for things that haven't been specifically tuned to avoid cache misses, which is most software in my experience).
They are BY FAR the largest producer of CPUs for laptops, desktops, and servers. Note that on each of these platforms, arbitrary code execution is an issue.
Now for phones? Less so. Aggressively locked down software can help.
So, as a researcher, who are you going to research? AMD, who has negligible market share? Apple, who completely locks down their platform? Qualcomm? Well, that's an option, but Intel still makes them look small. Vulnerabilities in Intel CPUs affect the most people and the most money... You're naturally going to put more research into Intel.
It absolutely does have something to do with how large Intel is.
Like others pointed out, the portability of an attack is usually tried shortly after a successful attack is found. In this case, the attacks have not been found to work elsewhere yet. I won’t count it out that it’s more effort but we’re looking at a timeline of research that spans a year after first reports were made to Intel, which is plenty of time to consider other chips. AMD’s specter problems are very real but much narrower while an entirely separate architecture like ARM shared a lot more of the attack surface with Intel, including Apple chips which you cite as locked down.
Your logic makes sense but the actual historical log of exploits I’ve seen does not seem to line up to explain the result. It only guarantees that researchers will try things against Intel chips first, but nothing about the exclusion of other chips.