Intel was also planning to wait for at least another 6 months before bringing this to light if it wasn't for the researchers threatening to release the details in May.
Source in the dutch interview: https://www.nrc.nl/nieuws/2019/05/14/hackers-mikken-op-het-i...
In this case the practice of responsible disclosure has been turned on its head. There should no longer be any responsible disclosure with Intel as long as they do not commit to changing their behavior.
The way Intel has been handling these security issues, I am going to avoid buying Intel whenever possible moving forward, regardless of if they have slight performance or power gains over competitors. The way to speak negatively toward corporate governance in this case is to vote with my wallet.
Wtf does that mean exactly? Do the patches and microcode work or do they not? I expect the truth to come out as OSS maintainers come out of embargo and others analyze the patches. But it sure looks like VM's on your favorite cloud provider will still be vulnerable in some ways because they're not turning off HT.
Wired has many details of your Dutch link in English.
Intel pressuring vendors to not recommend disabling hyper threading? Apple has added the option to MacOs, so presumably the mitigations are not completely effective:
Of course, until the legally agreed date when they can dump shares so there’s no obvious proof that it’s insider trading. Isn’t that what (then) Intel CEO Brian Krzanich did after Meltdown/Spectre?
What about devices with older processors? I'm still running a Sandy Bridge rig and it works fine, except for the side channel vulnerablities. It's probably not going to be patched. I also have a cheaper computer with a Skylake processor, which is newer yet still vulnerable!
It's only a matter of time until something really nasty comes along, making all these PCs dangerous to use. What then? Lawsuits?
My questions are only partially rhetorical.
The important thing to realize is that speculation and caching and such were invented for performance reasons, and without them, modern computers would be 10x-100x slower. There's a fundamental tradeoff where the CPU could wait for all TLB/permissions checks (increased load latency!), deterministically return data with the same latency for all loads (no caching!), never execute past a branch (no branch prediction!), etc., but it historically has done all these things because the realistic possibility of side-channel attacks never occurred to most microarchitects. Everyone considered designs correct because the architectural result obeyed the restrictions (the final architectural state contained no trace of the bad speculation). Spectre/Meltdown, which leak the speculative information via the cache side-channel, completely blindsided the architecture community; it wasn't just one incompetent company.
The safest bet now for the best security is probably to stick to in-order CPUs (e.g., older ARM SoCs) -- then there's still a side-channel via cache interference, but this is less bad than all the intra-core side channels.
These vulnerabilites and Meltdown allow untrusted code to speculatively access data that it shouldn't be allowed to access at all, and use that speculative access to leak data itself. Unlike Spectre this can be (and to some extent has to be) fixed at the hardware level, because the hardware itself is failing to protect sensitive data. This class of vulnerability seems to have been mostly Intel-exclusive so far (with the main exception being one unreleased ARM chip that was vulnerable to Meltdown). There's nothing inherent about modern high-performance CPUs that requires them to be designed this way.
Edit: This slipped my mind, but Foreshadow / Level 1 Terminal Fault was yet another similar Intel-only processor vulnerability that allowed speculative access to data the current process should not be able to access. It's definitely a pattern.
Assuming by "designed this way" you mean: to speculatively execute past security checks, I'd disagree.
I'd say the relevant performance measure for CPUs (as opposed to other kinds of processor) is the speed at which they can execute serial operations. As electronic performance improvements offer increasingly marginal gains, we need to resort to improved parallelism. When operations are needfully serial due to dependency, as are security checks, the only way to accelerate that beyond the limits of the electronics is to make assumptions (speculations).
It's not inherently wrong to do this, but requires speculations never have effects outside of their assumptive execution context.
Also, no need for older SoCs. High end ARM chips have both in-order and out-of-order cores but the cheaper ones have in-order ones only. A Snapdragon 450 is pretty modern and doesn't speculate deeply enough to be vulnerable to Spectre.
In the x86 space, Meltdown absolutely was down to one company apparently deciding to over-optimize for performance.
I can't find it now, but I remember reading a thread from (I think) the OpenBSD devs about how the Intel MMU documentation described fairly sane behaviours and how far the reality deviated from the documentation.
Serious weasel wording, outside of x86 space every other high end architecture also had Meltdown issues, ARM, and IBM's POWER and mainframe designs.
ARM is actually a good target with a number of their newest designs using out-of-order speculative execution with Spectre vulnerabilities and their owning mobile space outside of notebooks, one of the newest even being vulnerable to Meltdown, but the significant headline worthy instances tend to come in much more locked down devices. Speed is also an issue, everything else being equal, the faster the chip, the faster data exfiltration proofs of concept will work.
that said, some newer Intel LLCs are non-inclusive, and amd changed its cache relations as well in ryzen
Whether you run security checks in sequence or in parallel with a data access is pretty fundamental to the design of a core. Doing the later is has performance advantages but it's really hard to verify that won't be any architectural leaks as a result, let alone these micro-architectural leaks that nobody was thinking about.
I wondered aloud if it wouldn’t be better for use to embrace NUMA, make the bigger caches directly addressable as working memory instead of using them as cache.
Per Wikpedia, "The Pentium has two datapaths (pipelines) that allow it to complete two instructions per clock cycle in many cases. The main pipe (U) can handle any instruction, while the other (V) can handle the most common simple instructions." (https://en.wikipedia.org/wiki/P5_(microarchitecture) )
HT uses two instruction decoders to keep more busy a set of execution engines, suppose you had the above set of 2 execution datapaths (that may be improved on these Atoms), and couldn't do two operations at a time from one instruction stream, but the other could use the unused datapath.
out-of-order execution != speculative execution.
It is possible to have OoO without speculative execution. On the other hand they do tend to come as a pair since they both utilise multiple execution units, for instance the Intel Pentium in 1993 was the first x86 to have OoO or branch prediction (486 and those before were scalar CPUs).
That's technically correct (the best type of correct!), but without speculation, the CPU can't see beyond a branch, so the window in which the machine can reorder instructions is very small. (For x86 code, the usual rule of thumb is that branches occur once per 5 instructions.) So in practical terms, an OoO design (in the restricted-dataflow paradigm, i.e., all modern general-purpose OoO cores) always at least does branch prediction, to keep the machine adequately full of ops.
An interesting question, though, given that we're going to do speculation, is what types of speculation the core can do. The practical minimum is just branch prediction, but modern cores do all sorts of other speculation in memory disambiguation, instruction scheduling, and the like, some of which has enabled real side-channel attacks (e.g., Meltdown).
Also, FWIW, Pentium was 2-way superscalar, but not OoO in the fully general dataflow-driven sense; the Pentium Pro (aka P6), in 1995, was the first OoO. (Superscalar can technically be seen as OoO, I suppose, in that the machine checks dependencies and can execute independent instructions simultaneously, but it's not usually considered as such, as the "reordering" window is just 2 instructions.)
OTOH speculation without OoO Is not only possible but in fact very common. For example the majority of non ancient in-order CPUs.
In fact the original pentium, contrary to your statement, was an in-order design (the pentium pro was the first Intel OoO design). Also I believe that 486 already had branch prediction. Pentium claim to fame was being superscalar (i.e it could execute, in some cases two instructions per cycle), a better FPU and better pipelined ALU (I think it had a fully pipelined multiply for example).
To the extent I've looked at it, without reading original documents, the original OoO design that current systems are based on, the IBM System 360/Models 91 and 95's floating point unit using Tomasulo's algorithm https://en.wikipedia.org/wiki/Tomasulo_algorithm didn't extend to speculative execution.
No doubt because gates were dear, implemented with discrete transistors, and that processor was a vanity project of Tom Watson Jr's. And memory was slow, core except for NASA's two 95's with 2 MiB of thin film memory at the bottom, and cache was still a couple of years out, introduced with the 360/Model 85. OoO becomes compelling when you have a fast and unpredictable memory hierarchy, as we started to have in a big way in the 1990s when we returned to this technique.
I can't find any evidence to support that 
I suppose it's technically possible to have branch prediction on a scalar processor, but I imagine it would not be hugely beneficial.
Edit: pentium did have a significantly more sophisticated predictor of course, although not without flaws.
The problem is that single-thread performance is really important for a lot of workloads, because (i) parallelization is hard, (ii) even for parallelized workloads, serial bottlenecks (critical sections, etc.) still exist, and (iii) latency is often important too (one web request on one core of a server, or compiling one straggler extra-large file in a parallel build, for example).
So it would be safest to execute them on separate CPUs not sharing a common cache, e.g. pinning them to different CPU sockets on a multi-socket machine, or to different physical machines altogether.
This may be still faster than running on old ARMs.
I wonder if dedicated cloud boxes, where all VMs you spin on a particular physical machine belong only to your account, will become available from major cloud providers any time soon. In such a setup, you don't need all the ZombieLoad / Meltdown / Spectre mitigations if you trust (have written) all the code you're running, so you can run faster.
I should add that AWS has "dedicated instances" which are exclusive to one customer. They are more expensive than standard instances but they are used by e.g. the better financial services companies for handling customer data.
Datasets, yes, but what about instructions from shared libraries?
I'll make a wild guess this being seriously beneficial would be limited to uncommon server configurations....
- Same OS, server: you likely control all the processes in it anyway, no untrusted code.
- Different OSes in different VMs, server: they likely run different versions of Linux kernel, libc, and shared libraries anyway. They don't share the page caches for code they load from disk.
- Browser with multiple tabs, consumer device: likely they might share common JS library code on the browser cache level. The untrusted code must not be native anyway, and won't load native shared libraries from disk.
- Running untrusted native code on a consumer device: well, all bets are off if you run it under your own account; loading another copy of msvcrt.dll code is inconsequential compared to the threat of it being a trojan or a virus. If you fire up a VM, see above.
In any case, containers share OS kernel, OS page cache, etc. This can be beneficial even for a shared hosting as a way to offer a wide range of preinstalled software as ro-mounted into the container's file tree. Likely code pages of software started this way would also be shared.
The "elephant in the room" with all these attacks starting from Spectre/Meltdown is that an attacker has to run code on your machine to be able to exploit them at all.
To the average user, the biggest risk of all these side-channels is JS running in the browser, and that is quite effectively prevented by careful whitelisting.
As you can probably tell, I'm really not all that concerned about these sidechannels on my own machines, because I already don't download and run random executables (the consequences of doing that are already worse than this sidechannel would allow...), nor let every site I visit run JS (not even HN, in case you're wondering --- it doesn't need JS to be usable.)
In a dream world. It's become obvious that our general-purpose systems are far too complex to be proven secure, and trying to run untrusted code on such a foundation is going to turn up vulnerability after vulnerability after vulnerability. It is madness.
There is absolutely nothing to be done on our level about this.
I'm fairly convinced this is systemic issue that can only be solved by redesigning
almost entirely modern cpus and computers architecture.
I can draw a parallel to approximately all Intel cpus which are know to have a dedicated "mini cpu" called "Minix" which is an absolute "black box" and have been found to be vulnerable for to a wide variety of attack for nearly decades...
Not only we need to redesign computers and cpu architecture but we desperately need to make that entire process and knowledge open source , available to all and more transparent.
Today this entire knowledge is the hand of few gigantic corps whom are keeping it to ensure their monopolistic position.
Here's hoping OpenRISC takes off!
What exactly is "our level"?
Or, if that is too complex, there could just be a bunch of breadboarded ICs capable of networking. There actually are real-world examples of such machines exposed via telnet, e.g. http://www.homebrewcpu.com
And all the performance beasts, while surely indispensable, could just enjoy their air-gapped solitude. Are there any massively parallel supercomputing tasks whose results couldn't be summarized and reduced to mere text, which is not too hard to move over the airgap manually?
Video decoding? Games? You know, the major use cases for custom highly patented hardware design?
Hitting diminishing returns on caches with an ever increasing gate budget is I believe prompted this second generation starting in the 1990s which adds speculative execution, which the algorithm with all those extra hidden registers really invites you to do.
Probably it's smart to not see a computer not like a walled garden but more like a sieve.
None of that helps with your public cloud workloads.
But as an Intel consumer, I am not happy. My understanding is that more stuff can be fixed in microcode, but I suppose a bug could show up which was not practically fixable. If that happened, I would certainly sue or join a class-action lawsuit. Probably the class-action route, because even if I didn't get any thing, I would be just mad enough at Intel to want them to suffer.
Of course, we do have consumer protection agencies; it is possible that they would step if Intel had sold what would effectively be a defective product.
It doesn't have anything do to with how large Intel is. They have clearly made a more aggressive hardware design which has more corner cases to break. The designs are broken and microcode can patch some variants of these side channels but the overhead is becoming a problem.
In this case it's not certain if microcode can address the problem but if it can't, disabling of SMT (hyperthreading) can be a significant cost for some workloads (well above 10% for things that haven't been specifically tuned to avoid cache misses, which is most software in my experience).
They are BY FAR the largest producer of CPUs for laptops, desktops, and servers. Note that on each of these platforms, arbitrary code execution is an issue.
Now for phones? Less so. Aggressively locked down software can help.
So, as a researcher, who are you going to research? AMD, who has negligible market share? Apple, who completely locks down their platform? Qualcomm? Well, that's an option, but Intel still makes them look small. Vulnerabilities in Intel CPUs affect the most people and the most money... You're naturally going to put more research into Intel.
It absolutely does have something to do with how large Intel is.
Like others pointed out, the portability of an attack is usually tried shortly after a successful attack is found. In this case, the attacks have not been found to work elsewhere yet. I won’t count it out that it’s more effort but we’re looking at a timeline of research that spans a year after first reports were made to Intel, which is plenty of time to consider other chips. AMD’s specter problems are very real but much narrower while an entirely separate architecture like ARM shared a lot more of the attack surface with Intel, including Apple chips which you cite as locked down.
Your logic makes sense but the actual historical log of exploits I’ve seen does not seem to line up to explain the result. It only guarantees that researchers will try things against Intel chips first, but nothing about the exclusion of other chips.
* Core and Xeon CPUs affected, others apparently not.
* HT on or off, any kind of virtualization, and even SGX are penetrable.
* Not OS-specific, apparently.
* Sample code provided.
Essentially: Intel released a microcode update which makes the `verw` instruction now magically flush MDS-affected buffers. On vulerable CPUs, this instruction now needs to be run on kernel exit; the microcode update won’t do it automatically on `sysexit`, unfortunately.
I do agree that this won't end soon though. It appears to me that many of the methods CPU's use for better performance are fundamentally flawed in their security, and it's not like we can expect the millions of affected machines to be upgraded to mitigate this.
I don't think you're giving enough credit. The actual microarchitecture isn't documented much in those manuals, so looking hard at those wouldn't help without making a series of assumptions of how it all works. The authors of recent exploits have been diligently reverse engineering and making sensible guesses.
Your argument sounds like an argument against responsible disclosure totally.
Don't forget that Hope was also released when the box was opened.
(FYI, I've been a security researcher for 15+ years and work as the head of hacker education for HackerOne; I am very, very pro disclosure. :) )
My biggest worry is that all currently known classical "secure" data sets, including encrypted but recorded internet communication, will become an open book a few decades from now. What insights will the powers that be choose draw from it then, and how will that impact our future society? Food for thought.
You can have unintentional exploits/vulnerabilities in free/open source software or hardware too.
All “secrets” are eventually revealed, security is about managing the risks and timing associated with this revelations
"Obscurity" general refers to situations where "everything" is confidential. And when everything is confidential priority one, nothing is, since people can't work like that.
Cryptography attempts to sequester the confidential data into a small number of bytes that can be protected, leaving larger body of data (say, the algorithm) non-confidential.
Security _ONLY_ through obscurity is not security. Obscurity is a perfectly valid layer to add to a system to help improve your overall security.
If you choose not to disable HT, you stay vulnerable even with updated microcode, right?
In any case, Apple's stats are much more gruesome...
How will your "userland core" switch to other userland programs safely? A pointer-dereference can be a MMap'd file, so its actually I/O. This will cause the userland program to enter kernel-mode to interact with the hardware (yes, on code as simple as blah = (this->next)... the -> is a pointer dereference potentially in a mmap'd space to a file).
So right there, you need to switch to kernel mode to complete the file-read (across pointer-dereference). So what, you have a semaphore and linearize it to the kernel?
So now you only have one-core doing all system level functions? That's grossly inefficient. Etc. etc. I don't think your design could work.
User PU would stall on the "outermost" return and wait for another dispatch by kernel PU; it would also stall during context switches.
In-demand paging is another situation where a simple pointer-dereference can suddenly turn into a filesystem (and therefore: kernel-level / hardware-level) call.
I don't know enough about the IaaS market to know what the relative revenues of low-end compute vs. medium-to-high-end compute are for your average vendor, though. Is most of the profit in the low end?
I'm also curious on what the impact on margins would be if IaaS vendors decided to switch away from serving the low-end compute demand with "a few expensive high-power Intel cores per board, each multitasking many vCPUs", to serving the demand with "tons of cheap low-power ARM cores per board (per die?) with each core bound to one vCPU."
Sounds a lot like IMB’s Cell architecture.
Just shows the hoops and tricks needed to keep making, on paper, faster processors year on year but without node shrinks to give headroom.
14nm++++ is played out.
To avoid reintroducing these spectre like bugs, you'd have to conservatively design the per-thread execution to avoid those covert channels. Not only synchronously enforcing all logical ISA guarantees for paging and other exception states, but also using more heavy-handed tagging methods to partition TLB, cache, etc. for separate protection domains.
On the other hand GPUs are pretty much what are you describing: excellent for some specific workloads, terrible for others.
Isn't something like that done for GPUs? They have the advantage of having a massive number of threads to execute. For CPUs, the number of runnable threads tends to be lower.
So getting rid of speculation entirely, and stalling on every branch, would waste time equivalent to dozens of instructions. On typical code that has a branch every few instructions, this could slow down execution by several times.
What architecture would do that, and how?
add %rax, %rbx
add %rcx, %rdx
Have any of these bugs been completely based on speculation, or is it always speculating across privilege boundaries? (Although I feel like even the former isn't same, e.g., if you're in some form of VM attempting to maintain privilege separations.)
Intel does more speculation, but you won't find anything beyond the tiniest embedded CPUs which don't do any.
E.g. this new one can only be reproduced on Intel, not on AMD and ARM.
If you want to ban speculative execution for everything, you need to make the case that it's a fundamental issue and not an implementation specific issue.
Right now, that's not the case for many of these vulnerabilities.
For example, the first version of Foreshadow went after the SGX enclave. Given how widespread Meltdown and Spectre bugs are, there's absolutely no reason to believe that the other vendors don't have similar unique problems.
AMD don't seem to have commented on ZombieLoad yet, presumably because it's much newer and they didn't have pre-announcement info about it, but they've commented on the other two vulnerabilities announced today and explained that the reason they're not vulnerable is because the corresponding units in their CPUs don't allow speculative data access unless the access checks pass and their whitepaper seems to suggest the same is true of ZombieLoad: https://www.amd.com/system/files/documents/security-whitepap...
SGX does make for an easier and flashier demo for Foreshadow, though, so it makes sense that the researchers went after that target. They managed to recover the top-level SGX keys that all SGX security and encryption on the system relies on, something that I don't think anyone had ever managed before.
Also, as I've said elsewhere, Intel seems to speculatively leak data that shouldn't be accessible pretty much everywhere in their designs where memory is accessed.
Sure there is. Just like the first round last year, intel totally through AMD under the bus to save face and stock price. That is the reason to mention AMD literally, to keep their stock price from crashing.
And it will be tough as no compiler supports it, moreover C/C++ are architected from the beginnign to not bother with runtime information about object types/sizes.
I've noticed some Thinkpads with AMD CPUs but I feel like I'm on virgin ground when it comes to AMD and their integrated GPU offerings.
I believe others OEMs are developing similar offerings as well but I can't find any quick links for newer SKUs like the Ryzen 7 3700U which offers the improved Zen+ revisions which will specifically improve battery life and heat issues.
There is a much larger market in 2019 for AMD laptops, so you should be able to find something to suite your needs.
Next year with the 7nm mobile chips will probably be much better.
The new 3700U model will probably be available on Aliexpress next month or so. I would consider it except Linux support is unknown and only 8GB of RAM.
Realistically the AMD laptops lack of color accuracy compared to shit like the trutone Intel macs, and lack of display connectivity especially compared to a thunderbolt outfitted x1 extreme that can connect to four 4k displays is a bigger problem. Especially since external displays actually need the resolution.
I looked at reviews of several laptops with a 1080 vs a 1440 vs a 4k option and 1440 > 4k was the biggest drop in battery life.
Well... I went from a 1080p Thinkpad x260 to a 1440 Thinkpad Yoga X1 a year ago and honestly couldn't go back now.
It's mostly plugged into an external monitor and delivers 1440 resolution to it through DP, but even on a train I can work or 4-5 hours in 1440 resolution with no issue.
And it runs Fedora.
Spectre, on the other hand, is harder both to fix in hardware and to attack because the victim context is itself tricked into speculatively executing code using attacker-supplied data that leaks information - it uses inherent properties of speculative execution rather than any kind of hardware bug, but it's only exploitable if there's some victim code that does exactly the right kind of processing on attacker-supplied data.
No, AMD has been largely immune to bugs involving speculating past a page fault. Both Meltdown and L1TF involved speculating past page faults, and the ZombieLoad paper also mentions exploiting bad behavior during page faults (but, disclaimer, I haven't read in enough detail yet).
AMD was not immune to, for example, spectre variant 2, which very much did allow reading from other address spaces (even other VMs): https://www.amd.com/en/corporate/security-updates
In general it doesn't make sense to expect that any brand of processors might be vulnerable or invulnerable to all illegal memory access. There are many different components involved in handling memory access and many different ways they could go wrong.
All three of these latest vulnerabilities are like Meltdown and L1TF in that they just allow speculative execution to completely ignore hardware-level access protections and read data from a completely different process, and there's nothing that can be done about it at the software level. (Originally, the comment you're replying to was posted on a discussion about all three, if I remember rightly.) All of these affect Intel but not AMD. It's not like modern AMD chips are exactly poor performers either.
Not really. These attacks are completely different from Meltdown and L1TF in that they don't involve reading from memory at all. They involve hidden processor state that contains values that were recently used in another context. The attacker never explicitly specifies an address they are interested in -- they just get whatever's floating around.
A comparable (but much more obvious) bug would be if the OS failed to clear registers when switching contexts. Although the values in those registers may have at one point been read from memory, the attacker recovering those values isn't directly accessing the victim's memory.
Your comments seem to be arguing that AMD isn't affected by these bugs for the same reason they weren't affected by Meltdown. But these bugs operate in a totally different way and exploit totally different components. There doesn't appear to be any reason to believe they are related.
I think the reason AMD isn't affected likely has to do with the fact that these attacks are targeting specific implementation details of Intel processors, which AMD processors probably just happen to implement differently. (Indeed, Fallout appears to be attacking outright unintentional behavior -- it would be surprising if multiple CPUs had the same bug.) It seems likely to me that AMD has different bugs which haven't been found yet, perhaps mostly because researchers haven't focused on them.
(Disclosure: I own some AMD stock. I don't own Intel stock. I have no other affiliation with either company.)
If that's the case, my CPUs are likely pinned to my VM. I could still have evil userland apps spying on my own VM, but I would not expect this to allow other VMs to spy on mine.
Not to these vulnerabilities. These are attacking memory that is "in flight" within a processor.
This is the terminology that Intel itself uses in its documentation to describe its products, though. To be fair, they say "physical processor" and "logical processor", not "core".
most of this seems to be behaving as intended, they just didn't foresee the side channels this opens up
What are they saying here?
> ...said it works “just like” in PCs
The number of mistakes in the Techcrunch article is atrocious.
Keep in mind, this article was posted at 3am pacific, 6am eastern. Assuming the author is in North America, he was probably under a deadline and rather tired.
I have found similar typos from prominent writers. Sometimes they email me back, which I appreciate.
I found one in an article by Cory Doctorow on boingboing. I checked on builtwith.com, and they use WordPress/Jetpack. Jetpack has a feature that will warn you if you try and publish something with spelling mistakes, it is just not enabled by default.
I let Mr Doctorow know all this, in a very polite manner, and he responded with "many thanks". I'm not big on celebrity worship, but it still made my day.
“Service guarantees citizenship, want to know more?”
We merged that HN thread (https://news.ycombinator.com/item?id=19911465) into this one, via this other one (https://news.ycombinator.com/item?id=19911715).
I also note that the provided OSes are being updated with mitigations as well, so for complete mitigation of the issue you'll probably need to update your OS.
> Chip designers are under so much pressure to deliver ever-faster CPUs that they’ll risk changing the meaning of your program, and possibly break it, in order to make it run faster.
> applications will increasingly need to be concurrent if they want to fully exploit CPU throughput gains that have now started becoming available and will continue to materialize over the next several years. For example, Intel is talking about someday producing 100-core chips; a single-threaded application can exploit at most 1/100 of such a chip’s potential throughput.
It seems the trend in programming languages is towards better concurrency support. But why don't we yet see 100-core chips? If chip makers had to forego all speculative execution and similar tricks, would that push us toward the many-core future?
Hi. Am trying to understand what you meant here. That both "verw" or "l1d_flush followed by an lfence" are faster than "mfence" which you implemented in safelibc?
If so, why didn't you use these faster options yourself? My understanding was that these faster options needed to be handled at the hypervisor/kernel level, rather than in libc. If so, how is the attitude of glibc maintainers relevant?
safe libs need to do the right thing, not the fast thing. esp. crypto.
The attitude of libc and crypto maintainers is that you cannot trust them with security. all the memzero's are insecure. besides being overly complicated and slow.
Linux is a bit better, but there are still estimated 20.000 security relevant bugs.
Is that a reflection of engineering differences or a statistical byproduct of the market share of Intel CPUs?
I run AMD not because of the security implications but because I feel every dollar that goes to Intel competition will push Intel and thus the entire industry forward.
So the cloud vendors are 97% minimum Intel, they're exquisitely vulnerable both technically and reputationally to these bugs, the stakes are existential for them and they have a lot of money they can throw at the problem, whereas the users of notebooks and desktops are a much more diffuse interest.
As I've mentioned many times in these discussions today, everyone had Spectre issues, and everyone but AMD has Meltdown ones. The more recent vulnerabilities are Intel only because they're using what was learned from those first two to attack Intel specific features like the SGX enclave.
> The safest workaround to prevent this extremely powerful attack is running trusted and untrusted applications on different physical machines.
> If this is not feasible in given contexts, disabling Hyperthreading completely represents the safest mitigation.
That sounds like a contradiction --- if you can already execute code, I'd say you're quite privileged. It's unfortunate that their demo doesn't itself run in the browser using JS (I don't know if it's possible), because that's closer to what people might think of as "unprivileged".
The attacker has no control over the address from which data is leaked, therefore it is necessary to know when the victim application handles the interesting data.
This is a very important point that all the Spectre/Meltdown-originated side-channels have in common, so I think it deserves more attention: there's a huge difference between being able to read some random data (theoretically, a leak) and it being actionable (practically, to exploit it); of course as mentioned in the article there are certain data which has patterns, but things like encryption keys tend to be pretty much random --- and then there's the question of what exactly that key is protecting. Let's say you did manage to correctly read a whole TLS session key --- what are you going to do with it? How are you going to get access to the network traffic it's protecting? You have just as much chance that this same exploit will leak the bytes of that before it's encrypted, so the ability to do something "attackful" is still rather limited.
Even the data which has patterns, like the mentioned credit card numbers, still needs some other associated data (cardholder name, PIN, etc.) in order to actually be usable.
The unpredictability of what you get, and the speed at which you can read (the demo shows 31 seconds to read 12 bytes), IMHO leads to a situation where getting all the pieces to line up just right for one specific victim is a huge effort, and because it's timing-based, any small change in the environment could easily "shift the sand" and result in reading something entirely different from what you had planned with all the careful setup you did.
Using ZombieLoad as a covert channel, two VMs could communicate with each other even in scenarios where they are configured in a way that forbids direct interaction between them.
IMHO that example is stretching things a bit, because it's already possible to "signal" between VMs by using indicators as crude as CPU or disk usage --- all one VM has to do to "write" is "pulse" the CPU or disk usage in whatever pattern it wants, modulating it with the data it wants to send, and the other one can "read" just by timing how long operations take. Anyone who has ever experienced things like "this machine is more responsive now, I guess the build I was doing in the background is finished" has seen this simple side-channel in action.
I always interpreted "privileged" to mean "superuser". I.e. unrestricted. Or possibly the case of one user and another user. Having a program that can determine the URL you are visiting in the browser from memory when running as the same user is a different class than something that can do the same when run as any non-root user on the system. There's a reason it's common to "drop privileges" in a daemon after any initial setup that requires those privileges (such as binding to a low port).
If you're in a VM, you have no privileges over the host CPU, you can't switch to another VM or to the host itself. That's what's meant by unprivileged here.
"""> Virtualization seems to have a lot of security benefits.
You've been smoking something really mind altering, and I think you
should share it.
x86 virtualization is about basically placing another nearly full
kernel, full of new bugs, on top of a nasty x86 architecture which
barely has correct page protection. Then running your operating
system on the other side of this brand new pile of shit.
You are absolutely deluded, if not stupid, if you think that a
worldwide collection of software engineers who can't write operating
systems or applications without security holes, can then turn around
and suddenly write virtualization layers without security holes.
You've seen something on the shelf, and it has all sorts of pretty
colours, and you've bought it.
That's all x86 virtualization is.
The only hardline view on security you'll encounter in the wild is "security is practical in our computational environments". Only half-joking here.
My reading of Theo's quote is merely "the combination of x86/IA32/AMD64 and virtualization gives little to no factual security benefits, and plenty of pitfals".
I don't see Theo as being a hardliner about security, just meticulous about good engineering practices - as per OpenBSD's usual standards - and facing the problems & risks as they are.
 examples: "Rust/Java gives you security", "shortlisting the only allowed actions by end-user application gives you security", "hardcore firewalls give you security", "virtualization gives you security", "advanced architectures like Burroughs' give you security".
The new class of attacks we now see target any type of shared code execution environment. OpenBSD is as vulnerable to this as anything else.
 probably isn't the best source out there, I was in a bit of a rush to find it but that is indeed the quote! Gotta either love or hate Theo I guess!
1. There's no way in hell that a bunch of VMs running on one physical server is more secure than a bunch of different physical servers each running an OS. If there were architectural hooks for those VMs to provide additional security beyond what the host OS provides, then an OS like OpenBSD would already be making use of it.
2. Running a bunch of VMs on a single physical machine is certainly cheaper.
3. People who are in favor of the cost-cutting are claiming that there's a security benefit to sell more stuff.
Am I right?
If so, how does that stance jibe with the research that Qubes is based on?
I think security guarranties are better if you follow practices of a little selfcentered project such as OpenBSD (run only trusted code) than if you follow practices of QubesOS (running whatever untrusted code you desire in Xen domains and relying on VM separation).
We will most likely see a continued divergence between "consumer silicon" which is designed for speed in a single tenant environment on your local desktop or laptop, and "cloud silicon" which is optimized to protect virtualization, be power efficient, etc. I'd predict that this will actually lead to increased efficiency and lower prices of cloud resources rather than the "death by a 1000 cuts" that you are proposing.
Even if there's no opportunity to switch away, eventually you bleed your customers dry and put them out of business. You will typically always aim to price your offering at the equilibrium point where loss of custom increases faster than the increase in profit, and vice versa.
One situation in which an increase in your costs can be good, is if the same increase applies more to your competition. But, in this case multi-tenant cloud is hit harder than the competing alternative of private infrastructure.
Am I totally misunderstanding this? Someone please correct me if I'm wrong.
I don't think you don't need to go this far. You can probably get away with circuit switching small blocks of hardware, and fully resetting them between handovers. Although you'd have to ensure sufficient randomisation / granularity to destroy side channels in the switching logic.