> 8/14/18 – Updated: As in the case with Meltdown, we believe our processors are not susceptible to these new speculative execution attack variants: L1 Terminal Fault – SGX (also known as Foreshadow) CVE 2018-3615, L1 Terminal Fault – OS/SMM (also known as Foreshadow-NG) CVE 2018-3620, and L1 Terminal Fault – VMM (also known as Foreshadow-NG) CVE 2018-3646, due to our hardware paging architecture protections. We are advising customers running AMD EPYC™ processors in their data centers, including in virtualized environments, to not implement Foreshadow-related software mitigations for their AMD platforms.
For those on AMD platforms, how do you disable software mitigations for Foreshadow? Is this automatically done by browsers, operating systems and hypervisors?
I haven't really ready anything on this most recent set of vulnerabilities and AMD.
Yes spectre type issues effect many ranges of processor (including AMD), however everything shown to date indicates that its very difficult to exploit.
There are a large number of Intel only issues like meltdown and the L1 cache attack that are much more severe, and much easier to exploit.
With 4 execution units, 4 processes run at a time, and they all get swapped out every scheduling tick (losing all their cached lines). OTOH each process gets a full measure of cache to use during its slice.
With 8 execution units, 8 processes "run" at a time, they interleave based on stalls and CPU resources, and the OS doesn't need to reschedule anything every tick (so they hopefully keep their cache lines hot). But each process gets a half measure of cache to use.
In reality, code tuned to use a full measure of cache will be better off matching the number of processes to the number of execution units available, so you'd run half the number of processes with HT disabled. And cache-tuned code tends to fall off a performance cliff when it exceeds cache available, so it may easily run more than twice as fast, depending on the work.
The win from HT depends on most code not being tuned to full measures of cache, and having a lot of memory stalls or other heterogeneous work that other work can fit into. And most code is like that. Cache tends to have a declining marginal return - you have to add exponentially more cache to avoid cache misses - https://en.wikipedia.org/wiki/Power_law_of_cache_misses .
¹ For example in SPECvirt_sc2013 the patch reduces performance by 31% (https://www.servethehome.com/wp-content/uploads/2018/08/Inte...) 31% is unheard of for a security patch! It's a difficult security-speed tradeoff that businesses must carefully consider.
Give intel some time to cope with this newest set of vulns, see if they can find a way to re-enable hyper threading safely, and if they can't and are still advertising hyper threading, then start going after them.
It does not. The fundamental difference between AMD and Intel CPUs in all these faults is that AMD does all permissions checks eagerly before returning results from memory, while when speculating, Intel defers them until speculation is resolved.
This does not mean that AMD escaped all of it, because some of the attacks (eg, spectre variant 1) do not cross a protection boundary that permissions checks would catch.
While no AMD issues are known at present. OpenBSD will have it disabled by default. How about other operating systems, I wonder?
The major win (whether AMD intended it or not) is that AMD cpus don't speculate loads until page permissions have been verified. Intel fires off the speculation immediately. That is one of the primary side channels underlying many of the attacks.
Intel has been actively muddying the waters with FUD to get people to think AMD is just as vulnerable. Please don’t buy into that. Intel is far more exposed. The subsequent slowdowns are more severe.
Microarchitectural differences. They are, after all the same architecture, i.e. x86-64.
One interesting thing is that to mitigate L1TF hyperthreads only need to be disabled if you are running VMs, the userspace mitigations are effective regardless of HT status. However, there's a catch, you can leave hyperthreading enabled if you disable the Extended/Nested Page Table virtualization feature. However it is noted that this will result in a significant performance impact.
However this does not mean that HT with VMs is totally secure, as there may be more vulnerabilities relating to HT yet to be disclosed/released as alluded to by Theo. (For context, see the previous discussion  around the Lazy FPU switching vulnerability where Theo made the decision to enable mitigations in OpenBSD prior to the public disclosure of the bug based (Theo/OpenBSD was _not_ party to the embargo))
Think and plan before you blanket disable HT on all servers running intel CPUs...
If an attacker is able to run any code on these private servers, I have bigger problems to deal with than HT as attack vector..
Cloud Providers are gonna have a bad time if this is true.
Scheduling different VMs to run on the same hyperthreaded core at once seems like it can't be good for either VM's performance, even if there were no security concerns. Hyperthreading is much more useful for running multiple threads of the same app, accessing similar instruction caches etc.
(There's also a question of safety within the VM, but a huge number of cloud users are running effectively one user within their VM.)
If the API is "SQLish queries", I have a hard time believing you are going to be able to trigger these kind of attacks. You need a tight loop of carefully constructed code to flip them, no?
If the only real solution is to turn off HT/SMT that, seen positively, should net us a lot faster VMs then...
you also doubled the cost of each VM (in terms of cpu), but you didn't double the performance of each VM, so it's a net negative.
Vmware have disclaimers in the mitigation options that preclude turning off HT, meaning, use at your own risk. 
I am still waiting on a comment from Linode 
Openstack have some knobs you can adjust, but it really depends on your workloads and what risk you are willing to accept. 
AWS have their own custom hypervisor and are said to have worked around the issue.  Amazon had info on this before others. It appears they have a special relationship with Intel?
I have not found any hardware or OS vendors that are willing to say that you can leave HT enabled. It is a very heated topic because folks will have to increase their VM infrastructure anywhere from 5% to 50% depending on their workload profiles. For public clouds, you can't predict workload profiles.
Edit: Oops I left out the main site for L1TF 
 - https://kb.vmware.com/s/article/55806
 - https://blog.linode.com/2018/08/16/intels-l1tf-cpu-vulnerabi...
 - https://access.redhat.com/articles/3569281
 - https://aws.amazon.com/security/security-bulletins/AWS-2018-...
 - https://foreshadowattack.eu/
(I want to...)
how does that work on unix systems when processes are all forked from 1 process? even if you get past that issue, how do you prevent less privileged processes that use other security mechanisms (cgroups, pledge, selinux, croot, sandboxing)?
It's another situation like what happened with WEP WiFi encryption ten years ago.
Imagine buying a car that says, "Save $5000 for a less safe version without airbags." Yes, I know Airbags are DOT requirement, just trying to make a point.
Edit: I think people are missing my point. I am not saying they don't sell cheaper models that are less safe. My point is that they don't ACTIVELY market them as such. Point me to an advertisement that says "Save $5000 for a less safe car!". This is in the spirit of what the GP was talking about whether cloud providers can market as "Less secure but cheaper HT option".
Suppose you have $5k, you need a car in order to feed your family, and that only the following two options are available: You can buy the safe car for $10k or a less safe car for $5k.
In that situation, less safety can be a reasonable choice.
Indeed, there was a long period of time in which Volvos were demonstrably more safe than other lower-cost vehicles, yet people bought the lower-cost vehicles.
In the cloud-offering world, instead of marketing servers as "less-secure", they can simply offer "more-secure" options that run on non-HT hardware. HIPAA-compliant cloud-buyers will have to upgrade, and then the cloud vendors can slowly lower the prices on both, making the less-secure option lower cost than the present day.
Is there a hyperthreading joke here somewhere?
∞ Known to cause cancer in the state of California
• This statement not evaluated by the FDA
º Might spontaneously catch fire and explode in minor accidents
On second and third tier cloud providers, the vCPUs tend to be dynamically scheduled so that they may share cores with other VMs.
Say you have 100 adders, the processor tries to schedule as many instructions as it can on those 100 adders, but eventually it will run into data dependencies. The left over units can go to a hyperthread.
My understanding (let me know if I'm wrong) is that Hyperthread aware OSes (which is like what, everything since WinXP/Linux kernel 2.4?) will schedule lower priority tasks to the logical cores and higher priority tasks to the real cores.
So when it comes to a hosted provider (a.k.a cloud provider, a.k.a somebody else's computer), what you get pretty much depends on the virtualisation layer they use: Vmware, KVM, Xen, Hyper-V, etc.
Do hypervisors typically peg VMs to a real physical core? I was always under the impression they over-provision on most hosts, so you're getting part of a core and the vCPUs the list in the product documents just indicates your priority and how may vCPUs appear to your particular VM.
You understand it wrong, even though you’re somewhat correct as to what scheduler actually does. In a single physical core, both logical processors are equivalent, and neither one has higher internal priority over the other. The hyper threading aware scheduler will take extra care in scheduling in this scenario, but not in a sense you describe — if you have 2 physical cores, and thus 4 logical processors, and 2 CPU intensive tasks, the scheduler might attempt to schedule them on different physical cores, instead of stuffing them on the two logical processors of a single physical core. It’s not because one logical core is better than the other, but rather it’s because the two tasks would simply compete with each other in a way they wouldn’t if they were on physically separate cores.
That is not how I understand it. The OS sees two identical logical cores per physical core and the CPU manages which is which internally. Also it's not really high and low priority - it's two queues multiplexing between the available execution units. If one queue is using the FPU then the other is free to execute integer instructions, but a thousand cycles later they might have switched places. Or if one queue stalls while fetching from main memory, the other gets exclusive use of the execution units until it gets unstuck.
In my floating-point heavy tests on i7 however, there is still a small advantage in leaving HT on, the common wisdom is if you are doing FP, HT is pointless and may actually harm performance, but that doesn't match my observations if your working set doesn't fit into L2 cache. YMMV.
A semi-modern OS will try to keep a process on the same physical core if it can, so it may be flipflopping between two logicals, but should still see the same cache. Disabling HT means the OS still sees logical cores, but half as many of them, with a 1:1 correspondence between logicals and physicals.
In this case, reconstruction 2 MP images on a quadcore E3 Skylake, the performance without HT was better, and even better after replacing some of the pathological uses with B-tree and similar structures under iirc MIT/BSD using the same interface (it was just a typedef away). Also they used size_t for thenumber of an image in your dataset, yet their software is far from scaling that far without a major performance fix due to the cost//benefit of optimization leaning towards a good couple sessions with a profiler, before spending the money on the compute (unless the deadline precludes it).
The dataset still doesn't fit into L3, and even then there are ways to block the image similar to matrix multiplication.
perf stat -dd
works wonders. The ubuntu package is perf-tools-unstable, iirc, and setting lbr for the callgraoh of perf top if you run on Haswell or newer gives you stack traces for code compiled with -fomit-frame-pointer.
I benchmarked this myself using POV-Ray (which is extremely heavy on floating point) when I first got my i7-3770k (4 cores, 8 threads).
Using two rendering threads was double the speed of one, four was double the speed of two, but eight was only about 15% faster than four.
I don't think I've ever actually seen an example of real-world tasks that get slowed down by HT. Every example I've seen was contrived, built specifically to be slower with HT.
Turn it off or sell your Intel chips.
I thought the root of one of the Foreshadow problems was that caches are shared across cores, and therefore even with hyperthreading disabled, you still gain information about a process on another core. Am I misinterpreting it?
It does seem like the paranoid thing to do is that each socket gets to be used by only a single user. (I half-jokingly suggested at work that we replace our internal cloud with a Beowulf cluster of Raspberry Pis...)
It also seems like you could design OSes in a way which is more robust to this, e.g., certain cores are only for the kernel and processes running as root, and system calls are inter-processor interrupts, so privileged kernel (or userspace root) data doesn't go into untrusted caches at all.
Cache timimg attacks are the old hat in the Timing side channel business, the newer attacks are cooler because the memory maps are not checked and you can determine the caching status of memory not mapped into your processes address space. (AFAIK)
I'm pretty close to not caring anymore. I hope somebody figures out how to at least fix the security news infrastructure, if fixing security is still a ways off.
>>> We are having to do research by reading other operating systems.
Si Intel cooperates with business partners like apple/windows and not with open source. Does it mean that Apple and Windows can claim to be more secure because they have access to the information needed to fix Intel's issues ?
They can claim it, but I would trust a Linux or FreeBSD box over MacOS or Windows anytime even if they get some security info before the open source operating systems.
Note that I wrote and included a suggested diff for OpenBSD already, and that
at the time the tentative disclosure deadline was around the end of August. As
a compromise, I allowed them to silently patch the vulnerability.
Regardless of whether it is, you should expect the result of that to be that nobody trusts them with embargoes.
Which is in fact, what has happened.
Google broke an embargo early. https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...
EFail embargo was broken. See https://twitter.com/seecurity/status/995964977461776385 and http://flaked.sockpuppet.org/2018/05/16/a-unified-timeline.h...
I don't think picking on OpenBSD is the right thing here.
Linus has "badmouthed" Intel in much harsher and more explicit terms. Linux is just too big for them to get away with trying to smear and slander so they ignore him and move on.
In the end, the conclusion is:
"Long story short, Hyper Threading is still very much relevant in 2018 with current-generation Intel CPUs. In the threaded workloads that could scale past a few threads, HT/SMT on this Core i7 8700K processor yielded about a 30% performance improvement in many of these real-world test cases."
(For those who aren’t familiar with Apple devices, Apple don’t expose settings like this to a user, which are usually available in the BIOS on a PC)
Here it is: ]
Title: Disable SMT/Hyperthreading in all Intel BIOSes
Posted by Theo de Raadt-2 on Aug 23, 2018; 11:35am
Two recently disclosed hardware bugs affected Intel cpus:
- T1TF (the name "Foreshadow" refers to 1 of 3 aspects of this bug, more aspects are surely on the way)
Solving these bugs requires new cpu microcode, a coding workaround, AND the disabling of SMT / Hyperthreading.
There will be more hardware bugs and artifacts disclosed. Due to the way SMT interacts with speculative execution on Intel cpus, I expect SMT to exacerbate most of the future problems.
A few months back, I urged people to disable hyperthreading on all Intel cpus. I need to repeat that:
DISABLE HYPERTHREADING ON ALL YOUR INTEL MACHINES IN THE BIOS.
Also, update your BIOS firmware, if you can.
OpenBSD -current (and therefore 6.4) will not use hyperthreading if it is enabled, and will update the cpu microcode if possible.
But what about 6.2 and 6.3?
The situation is very complex, continually evolving, and is taking too much manpower away from other tasks. Furthermore, Intel isn't telling us what is coming next, and are doing a terrible job by not publically
documenting what operating systems must do to resolve the problems. We are having to do research by reading other operating systems. There is no time left to backport the changes -- we will not be issuing a complete set of errata and syspatches against 6.2 and 6.3 because it is turning into a distraction.
Rather than working on every required patch for 6.2/6.3, we will re-focus manpower and make sure 6.4 contains the best solutions possible.
So please try take responsibility for your own machines: Disable SMT in the BIOS menu, and upgrade your BIOS if you can.
I'm going to spend my money at a more trustworthy vendor in the future.
active: Tells whether SMT is active (enabled and siblings online)
control: Read/write interface to control SMT. Possible
"on" SMT is enabled
"off" SMT is disabled
"forceoff" SMT is force disabled. Cannot be changed.
"notsupported" SMT is not supported by the CPU
If control status is "forceoff" or "notsupported" writes are rejected.
Right now I'm looking at what making a UEFI application to disable HT before boot might involve... not sure if that's too late in the boot process or not.
I'm not sure that's true. For example, on a i7-4770 I get:
$ cat /sys/devices/system/cpu/cpu[0-3]/topology/thread_siblings_list
If you read https://www.openbsd.org/goals.html, the word "performance" does not appear.
I think, though, that if you'd be particularly willing to knowingly allow these kinds of vulnerabilities in exchange for some performance, OpenBSD probably isn't a good fit for you in the first place.
I disagree. You may have consciously picked OpenBSD because you believe that security is critical for your business. But if you're renting a server (shared or otherwise) to handle your website and paid for X number of cores, RAM, etc., you establish a baseline for what kind of performance you get out of that setup. If that performance suddenly nosedives 20% overnight because the new mitigation patches turned off hyperthreading, the rig you paid for may have gone from sustainably handling your workload to buckling, causing service degradation, outages, etc. I imagine it could be a real problem. It's not so much "Oh, we can't handle that performance hit, we'll run without it" so much as wanting to know the extent of the damage before they take the plunge.
How about an architecture more like Erlang's, where you have independent processes with their own CPU core, where each has their own memory, but where you have much faster communications supported at lower hardware levels? Why not have a multi-processor architecture designed for direct support of Hoare CSP-inspired languages?
Hypercube topology: http://web.eecs.umich.edu/~qstout/pap/IEEEM86.pdf
Parallelism is inherent in most problems but due to current programming models and architectures which have evolved from a sequential paradigm, the parallelism exploited is restricted. We believe that the most efficient parallel execution is achieved when applications are represented as graphs of operations and data, which can then be mapped for execution on a modular and scalable processing-in-memory architecture. In this paper, we present PHOENIX, a general-purpose architecture composed of many Processing Elements (PEs) with memory storage and efficient computational logic units interconnected with a mesh network-on-chip. A preliminary design of PHOENIX shows it is possible to include 10,000 PEs with a storage capacity of 0.6GByte on a 1.5cm2 chip using 14nm technology. PHOENIX may achieve 6TFLOPS with a power consumption of up to 42W, which results in a peak energy efficiency of at least 143GFLOPS/W. A simple estimate shows that for a 4K FFT, PHOENIX achieves 117GFLOPS/W which is more than double of what is achieved by state-of-the-art systems.
(PDF: https://science.raphael.poss.name/pub/poss.13.micpro.pdf )
"The Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. Its implementation in hardware provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional “accelerator” approach, Microgrids are components in distributed systems on chip that consider both clusters of small cores and optional, larger sequential cores as system services shared between applications.
(PDF: https://science.raphael.poss.name/pub/poss.15.tpds.pdf )
"This article advocates the use of new architectural features commonly found in many-cores to replace the machine model underlying Unix-like operating systems. "
So how about silicon-level interconnect that looks like networking? As it is now, it seems almost designed to elicit badly non-optimal code.
multi-socket/distributed systems are not perfomant for latency-critical user applications...fabrication costs and energy efficiency would be significantly worsened.
I think there would be tremendous benefits if we started designing multi-socket/distributed system that could perform in those situations. For one thing, Intel has currently painted itself into a corner with regards to large wafer yields, and AMD is kicking their butts by combining smaller dies.