Hacker News new | comments | ask | show | jobs | submit login
Disable SMT/Hyperthreading in all Intel BIOSes (marc.info)
384 points by carlesfe 5 months ago | hide | past | web | favorite | 153 comments

Does this mean AMD hyperthreading has a performance + security advantage over currently shipping Intel processors?

Edit: https://www.amd.com/en/corporate/security-updates

> 8/14/18 – Updated: As in the case with Meltdown, we believe our processors are not susceptible to these new speculative execution attack variants: L1 Terminal Fault – SGX (also known as Foreshadow) CVE 2018-3615, L1 Terminal Fault – OS/SMM (also known as Foreshadow-NG) CVE 2018-3620, and L1 Terminal Fault – VMM (also known as Foreshadow-NG) CVE 2018-3646, due to our hardware paging architecture protections. We are advising customers running AMD EPYC™ processors in their data centers, including in virtualized environments, to not implement Foreshadow-related software mitigations for their AMD platforms.

For those on AMD platforms, how do you disable software mitigations for Foreshadow? Is this automatically done by browsers, operating systems and hypervisors?

They've been unsusceptible to some of the more recent side-channel attacks, but others have affected everything from Intel to AMD to ARM.

I haven't really ready anything on this most recent set of vulnerabilities and AMD.

There is a lot of false equivalence going on about the vulnerability of Intel vs everyone else.

Yes spectre type issues effect many ranges of processor (including AMD), however everything shown to date indicates that its very difficult to exploit.

There are a large number of Intel only issues like meltdown and the L1 cache attack that are much more severe, and much easier to exploit.

With the newer AMD processors having as many real cores as they do, does the cost-benefit analysis of HT/SMT change? I read in a comment here a few weeks ago that turning it off on the newer AMD CPUs can yield better performance because of improved cache-coherency on some workloads (My memory of what I read might be totally wrong).

Yes, but it's often a case where deploying finer-grained explicit parallelism works in favor, due to the much longer dependency chains that can be hid by the second thread. There are architectures not impacted by this issue, mostly ones with explicit dependency tagging long enough to handle a dTLB fault, but you need closer to double the registers and interleaving by the compiler to get abotu the same performance as with SMT-2 (aka, hyperthreading).

I'll defer to whatever the benchmarks say of course, but I don't see why HT would affect cache coherency for normal workloads. If you disable HT you'd still have the same number of threads/processes running on the system, so you still have to schedule the same amount of work and do the same number of context switches.

Suppose you have 8 runnable processes and a 4 core / 8 thread system.

With 4 execution units, 4 processes run at a time, and they all get swapped out every scheduling tick (losing all their cached lines). OTOH each process gets a full measure of cache to use during its slice.

With 8 execution units, 8 processes "run" at a time, they interleave based on stalls and CPU resources, and the OS doesn't need to reschedule anything every tick (so they hopefully keep their cache lines hot). But each process gets a half measure of cache to use.

In reality, code tuned to use a full measure of cache will be better off matching the number of processes to the number of execution units available, so you'd run half the number of processes with HT disabled. And cache-tuned code tends to fall off a performance cliff when it exceeds cache available, so it may easily run more than twice as fast, depending on the work.

The win from HT depends on most code not being tuned to full measures of cache, and having a lot of memory stalls or other heterogeneous work that other work can fit into. And most code is like that. Cache tends to have a declining marginal return - you have to add exponentially more cache to avoid cache misses - https://en.wikipedia.org/wiki/Power_law_of_cache_misses .

The working set at any point in time might be larger with HT and and less likely to fit in cache.

some of the heavily multithreaded applications I use at work see up to a ~27% loss in performance by disabling SMT, others don't see much of a loss at all (on AMD EPYC)

maybe. no guarantee AMD won't have the same issue.

For hypervisors it's automatic. For operating systems, there's no performance penalty. Browsers need not do anything.

It's still the hypervisor (Hyper-V in that case) that has to do the work to mitigate the vulnerability, not the browser. The browser just uses whatever APIs the hypervisor exposes to create a virtual machine.


Where did you get that impression?

AMD claims their processors aren't affected at all by any of the 3 variants of Foreshadow (https://www.amd.com/en/corporate/security-updates) therefore SMT is safe to leave enabled. On the other hand, on Intel the only fully comprehensive workaround is to completely disable SMT, so given that disabling SMT almost halves the performance on some workloads,¹ AMD is bound to have a huge performance advantage over Intel, on these particular workloads.

¹ For example in SPECvirt_sc2013 the patch reduces performance by 31% (https://www.servethehome.com/wp-content/uploads/2018/08/Inte...) 31% is unheard of for a security patch! It's a difficult security-speed tradeoff that businesses must carefully consider.

Should truth-in-advertising laws require OEMs to stop advertising hyper-threading, e.g. 4C/8T on new hardware, if the advertised feature is not fit for purpose?

My personal opinion is "not yet".

Give intel some time to cope with this newest set of vulns, see if they can find a way to re-enable hyper threading safely, and if they can't and are still advertising hyper threading, then start going after them.

I believe it may be because the email only mentioned Intel explicitly and had a "I won't be buying Intel in future" (paraphrased) comment. However, I personally wouldn't assume that AMDs HT implementation doesn't have similar issues.

> However, I personally wouldn't assume that AMDs HT implementation doesn't have similar issues.

It does not. The fundamental difference between AMD and Intel CPUs in all these faults is that AMD does all permissions checks eagerly before returning results from memory, while when speculating, Intel defers them until speculation is resolved.

This does not mean that AMD escaped all of it, because some of the attacks (eg, spectre variant 1) do not cross a protection boundary that permissions checks would catch.

HT is also disabled on AMD, for both CMT (FPU-shared) & SMT implementations.


While no AMD issues are known at present. OpenBSD will have it disabled by default. How about other operating systems, I wonder?

Possibly, but it's not uncommon to refer to x86 and x64 machines as intel architecture machines or as intel machines.

Due to architectural differences, AMD CPUs are immune or nearly immune to some of the speculative execution attacks (though not all of them). For example, AMDs branch predictor uses the full address and so is not vulnerable to branch predictor poisoning in the same way as Intel.

The major win (whether AMD intended it or not) is that AMD cpus don't speculate loads until page permissions have been verified. Intel fires off the speculation immediately. That is one of the primary side channels underlying many of the attacks.

Intel has been actively muddying the waters with FUD to get people to think AMD is just as vulnerable. Please don’t buy into that. Intel is far more exposed. The subsequent slowdowns are more severe.

>Due to architectural differences

Microarchitectural differences. They are, after all the same architecture, i.e. x86-64.


Thanks for the clarification!

For anyone (understandably) confused about the attacks and mitigations related to L1TF, I've found the linux kernel documentation on the mitigations[0] to be a great resource.

One interesting thing is that to mitigate L1TF hyperthreads only need to be disabled if you are running VMs, the userspace mitigations are effective regardless of HT status. However, there's a catch, you can leave hyperthreading enabled if you disable the Extended/Nested Page Table virtualization feature. However it is noted that this will result in a significant performance impact.

[0] https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html

However this does not mean that HT with VMs is totally secure, as there may be more vulnerabilities relating to HT yet to be disclosed/released as alluded to by Theo. (For context, see the previous discussion [1] around the Lazy FPU switching vulnerability where Theo made the decision to enable mitigations in OpenBSD[2] prior to the public disclosure of the bug based (Theo/OpenBSD was _not_ party to the embargo))

[1] https://news.ycombinator.com/item?id=17304233

[2] https://marc.info/?l=openbsd-cvs&m=152818076013158&w=2

We have disabled Hyper Threading(HT) on all public facing servers(running OpenBSD). However, our compute nodes running Linux kernel are benefiting about 80 to near 100% boost for specific scientific workloads. So, we run our INTERNAL NETWORK ONLY compute nodes with HT on. In places where security is not primary concern, why not make use of HT for extra efficiency?

Think and plan before you blanket disable HT on all servers running intel CPUs...

If an attacker is able to run any code on these private servers, I have bigger problems to deal with than HT as attack vector..

If you fully trust the software you're running, I see no reason to disable HT. At this point, I don't think I'd have it running on anything publicly facing, though. That said, I still have it enabled on my work PC & home PC.

Do you fully trust all the JavaScript you run?

Nope, which is why I generally have JS disabled.

Yes I think it is a more significant problem for Multitenant cloud providers..

Agree. A personal computer could probably even risk it as long as they don't run untrusted javascript (which they shouldn't do anyways, or only under sandboxed/careful conditions).

Very few people don't run untrusted Javascript these days.

Of course. Very few people maintain servers but the parent's point applies to those who do.

Does that mean hyperthreading is effectively unpatchably insecure?

Cloud Providers are gonna have a bad time if this is true.

You can give both hyperthreads in a physical core to the same tenant, no?

Scheduling different VMs to run on the same hyperthreaded core at once seems like it can't be good for either VM's performance, even if there were no security concerns. Hyperthreading is much more useful for running multiple threads of the same app, accessing similar instruction caches etc.

(There's also a question of safety within the VM, but a huge number of cloud users are running effectively one user within their VM.)

Yes, you can isolate hyperthread siblings to the same VM but you also need to ensure no host code (userspace or kernel) runs on that core, or the untrusted guest may be able to read values stored in L1 by that code. This is harder to do and likely would result in large performance drops for some workloads (because you are essentially disabling the advantage of locality for data that needs to be accessed from both guest and host environment).

Other things require sandboxed multitenancy than just full-on VMs. Database queries against a "scale-free" database like BigQuery/Dynamo, for example, where two queries from different tenants might actually be touching the same data, with much the same operations, and therefore you'd (naively) want to schedule them onto the same CPU for cache-locality benefits.

Okay, so many tennants are on the same BigQuery/Dynamo machine sharing cores.

If the API is "SQLish queries", I have a hard time believing you are going to be able to trigger these kind of attacks. You need a tight loop of carefully constructed code to flip them, no?

The latter question is very important indeed. If you for instance render websites in your vm they, if i understand correctly, can potentially read secrets from other processes, like db credentials and other stuff...

If the only real solution is to turn off HT/SMT that, seen positively, should net us a lot faster VMs then...

>If the only real solution is to turn off HT/SMT that, seen positively, should net us a lot faster VMs then...

you also doubled the cost of each VM (in terms of cpu), but you didn't double the performance of each VM, so it's a net negative.

It might be Intel in the end having to pay that cost...

If you render websites that run code in your VM (e.g., you're running a traditional shared hosting infrastructure where mutually-untrusted users can upload PHP scripts, or you're doing something serverless / FaaS / Cloudflare Workers / etc. where mutually-untrusted users can upload functions), then yes. If you're rendering websites in the sense of hosting WordPress for lots of people but not permitting custom plugins, then no.

I thought more about Rendering and executing their js for screenshotting purposes for example.

Probably in most cases.

Vmware have disclaimers in the mitigation options that preclude turning off HT, meaning, use at your own risk. [1]

I am still waiting on a comment from Linode [2]

Openstack have some knobs you can adjust, but it really depends on your workloads and what risk you are willing to accept. [3]

AWS have their own custom hypervisor and are said to have worked around the issue. [4] Amazon had info on this before others. It appears they have a special relationship with Intel?

I have not found any hardware or OS vendors that are willing to say that you can leave HT enabled. It is a very heated topic because folks will have to increase their VM infrastructure anywhere from 5% to 50% depending on their workload profiles. For public clouds, you can't predict workload profiles.

Edit: Oops I left out the main site for L1TF [5]

[1] - https://kb.vmware.com/s/article/55806

[2] - https://blog.linode.com/2018/08/16/intels-l1tf-cpu-vulnerabi...

[3] - https://access.redhat.com/articles/3569281

[4] - https://aws.amazon.com/security/security-bulletins/AWS-2018-...

[5] - https://foreshadowattack.eu/

AWS is able to get custom Intel Processors due to their size (c5 instances have a custom Intel processor).

Makes sense. I sure would like some custom processors. :-)

Well, fabbing using reasonably respectable processes only runs circa $2000/mm2 or so, and using crazy old process nodes like CMOS et al gets you down to $300/mm2, so you could very well make something.


(I want to...)

Microsoft has stated that you can leave HT enabled when using Hyper-V on Windows 2016. The same mitigations have allowed them to keep HT enabled in Azure.


Is it a feasible solution to enable hyperthreading only for threads or forks of the same process? Then they can use this ability, but other processes cannot do timing attacks on this process in this core... I think

>Is it a feasible solution to enable hyperthreading only for threads or forks of the same process?

how does that work on unix systems when processes are all forked from 1 process? even if you get past that issue, how do you prevent less privileged processes that use other security mechanisms (cgroups, pledge, selinux, croot, sandboxing)?

I'm guessing someone at Amazon is looking at this right now.

I think EC2 has isolated cores (except t1/t2/t3) all along.

Note that the recently announced T3 instances all have an even number of vCPUs; I wonder if it's related to this issue.

You could allow processes that have ptrace rights on each other to run simultaneously which would cover most issues, but you’d still run into trouble with JavaScript engines running untrusted code.

Thinking about this, they're probably gonna introduce "insecure but cheap" instances for customers that don't mind the chance of data leaks and takeovers...

Which is going to be everyone except customers who already have issues with cloud and need special instances because of regulations. Then we'll wee the occasional "30.000 credit cards stolen" hack every three years because of this issue and that'll be it.

It's another situation like what happened with WEP WiFi encryption ten years ago.

That would be hard to market.

You could run the entire free tier there and disable it for paying customers - I'm sure there's a significant fraction of Amazon and Google's clouds, at least, that are on the free tier, and saving money on those would help everyone (they'd let people who aren't yet at a significant enough scale to care about security play around with things for free, and they'd let the cloud providers pack them very tightly).

Indeed, the idea of "Security" is not negotiable when marketing.

Imagine buying a car that says, "Save $5000 for a less safe version without airbags." Yes, I know Airbags are DOT requirement, just trying to make a point.

Edit: I think people are missing my point. I am not saying they don't sell cheaper models that are less safe. My point is that they don't ACTIVELY market them as such. Point me to an advertisement that says "Save $5000 for a less safe car!". This is in the spirit of what the GP was talking about whether cloud providers can market as "Less secure but cheaper HT option".

There are absolutely situations in which a substantially cheaper but less-secure/safe solution to a problem can make economic sense.

Suppose you have $5k, you need a car in order to feed your family, and that only the following two options are available: You can buy the safe car for $10k or a less safe car for $5k.

In that situation, less safety can be a reasonable choice.

Indeed, there was a long period of time in which Volvos were demonstrably more safe than other lower-cost vehicles, yet people bought the lower-cost vehicles.

In the cloud-offering world, instead of marketing servers as "less-secure", they can simply offer "more-secure" options that run on non-HT hardware. HIPAA-compliant cloud-buyers will have to upgrade, and then the cloud vendors can slowly lower the prices on both, making the less-secure option lower cost than the present day.

Consumers make less safe but cheaper decisions all the time. My point wasn't about the choice. It was about the seller trying to market it as such.

With two $5k cars, you can guarantee safety by having a leader car clear the road while you follow at very low speed.

Is there a hyperthreading joke here somewhere?

The Ford Pinto Deluxe, a beautiful car for only $10k!∞•º

∞ Known to cause cancer in the state of California

• This statement not evaluated by the FDA

º Might spontaneously catch fire and explode in minor accidents

Car manufactures do exactly that all the time though. Things like auto emergency braking and side airbags are still options that you can pay extra for. Airbags, anti lock brakes, backup cameras, etc. were all available as optional upgrades for decades before they got mandated.

Yeah but they just wouldn't market it that way. It's easy enough to spin. You have the regular version, then you have the "enhanced security" version.

Good point. I can see how that would totally work. I don't know the target consumer savvyness for security but most people would just glean over and buy the cheaper option.

...and people still buy motorcycles (which are significantly cheaper), proving that "security" or "safety" is not an absolute, nor a must-have.

bold statement regarding the price of motorcycles..

For private databases, sure, but I don't need that security if I am running an isolated server that only hosts public data.

That's always been a fundamental part of the proposition of multi-tenant VM hosting, though.

Question: If I rent 4 core AWS instance, does it mean 4 physical cores or 4 hyper threaded cores? Is there a standard to this definition of “cores” across GCP, DO, Linode, etc. I don’t have the experience or knowledge about cloud computing but just have a DO instance running a web server. I’m curious.

A cloud "vCPU" is a hyperthread and in good providers (EC2/GCE) they are properly pinned to the hardware such that, for example, a 4-vCPU VM would be placed on two dedicated physical cores. This was probably done for performance originally but now it also has security benefits. You can get hints of this by running lstopo on VMs and similar bare metal servers.

On second and third tier cloud providers, the vCPUs tend to be dynamically scheduled so that they may share cores with other VMs.

Here's lstopo output on a 4-core AWS instance to illustrate your point: https://instaguide.io/info.html?type=c5.xlarge#tab=lstopo

I asked this a while ago: a cloud core is always just a hyperthread.

You mean it's random then, right? I mean let's talk about what a hyperthread really is: It's the left over functional units (or execution units).

Say you have 100 adders, the processor tries to schedule as many instructions as it can on those 100 adders, but eventually it will run into data dependencies. The left over units can go to a hyperthread.

My understanding (let me know if I'm wrong) is that Hyperthread aware OSes (which is like what, everything since WinXP/Linux kernel 2.4?) will schedule lower priority tasks to the logical cores and higher priority tasks to the real cores.

So when it comes to a hosted provider (a.k.a cloud provider, a.k.a somebody else's computer), what you get pretty much depends on the virtualisation layer they use: Vmware, KVM, Xen, Hyper-V, etc.

Do hypervisors typically peg VMs to a real physical core? I was always under the impression they over-provision on most hosts, so you're getting part of a core and the vCPUs the list in the product documents just indicates your priority and how may vCPUs appear to your particular VM.

My understanding (let me know if I'm wrong) is that Hyperthread aware OSes (which is like what, everything since WinXP/Linux kernel 2.4?) will schedule lower priority tasks to the logical cores and higher priority tasks to the real cores.

You understand it wrong, even though you’re somewhat correct as to what scheduler actually does. In a single physical core, both logical processors are equivalent, and neither one has higher internal priority over the other. The hyper threading aware scheduler will take extra care in scheduling in this scenario, but not in a sense you describe — if you have 2 physical cores, and thus 4 logical processors, and 2 CPU intensive tasks, the scheduler might attempt to schedule them on different physical cores, instead of stuffing them on the two logical processors of a single physical core. It’s not because one logical core is better than the other, but rather it’s because the two tasks would simply compete with each other in a way they wouldn’t if they were on physically separate cores.

My understanding (let me know if I'm wrong) is that Hyperthread aware OSes (which is like what, everything since WinXP/Linux kernel 2.4?) will schedule lower priority tasks to the logical cores and higher priority tasks to the real cores

That is not how I understand it. The OS sees two identical logical cores per physical core and the CPU manages which is which internally. Also it's not really high and low priority - it's two queues multiplexing between the available execution units. If one queue is using the FPU then the other is free to execute integer instructions, but a thousand cycles later they might have switched places. Or if one queue stalls while fetching from main memory, the other gets exclusive use of the execution units until it gets unstuck.

In my floating-point heavy tests on i7 however, there is still a small advantage in leaving HT on, the common wisdom is if you are doing FP, HT is pointless and may actually harm performance, but that doesn't match my observations if your working set doesn't fit into L2 cache. YMMV.

A semi-modern OS will try to keep a process on the same physical core if it can, so it may be flipflopping between two logicals, but should still see the same cache. Disabling HT means the OS still sees logical cores, but half as many of them, with a 1:1 correspondence between logicals and physicals.

I have a handful of examples, https://github.com/simonfuhrmann/mve/tree/master/libs/dmreco... is one, which are coded without too much respect towards using cache-efficient data structures, in fact it's actually hrader in C++ to not totally ignore the cache handling data as whole cachelines. Note that in any cases the compiler could use more respectful datastructures with at least very similar performance even if they don't spill out of cache.

In this case, reconstruction 2 MP images on a quadcore E3 Skylake, the performance without HT was better, and even better after replacing some of the pathological uses with B-tree and similar structures under iirc MIT/BSD using the same interface (it was just a typedef away). Also they used size_t for thenumber of an image in your dataset, yet their software is far from scaling that far without a major performance fix due to the cost//benefit of optimization leaning towards a good couple sessions with a profiler, before spending the money on the compute (unless the deadline precludes it).

The dataset still doesn't fit into L3, and even then there are ways to block the image similar to matrix multiplication.

perf stat -dd works wonders. The ubuntu package is perf-tools-unstable, iirc, and setting lbr for the callgraoh of perf top if you run on Haswell or newer gives you stack traces for code compiled with -fomit-frame-pointer.

> In my floating-point heavy tests on i7 however, there is still a small advantage in leaving HT on, the common wisdom is if you are doing FP, HT is pointless and may actually harm performance, but that doesn't match my observations if your working set doesn't fit into L2 cache. YMMV.

I benchmarked this myself using POV-Ray (which is extremely heavy on floating point) when I first got my i7-3770k (4 cores, 8 threads).

Using two rendering threads was double the speed of one, four was double the speed of two, but eight was only about 15% faster than four.

I don't think I've ever actually seen an example of real-world tasks that get slowed down by HT. Every example I've seen was contrived, built specifically to be slower with HT.

From my understanding, you can't (necessarily) even rely on your guest's CPUs mapping to the host's actual CPUs, which makes spending time twiddling NUMA actively useless. Assuming that's actually the case, I very much doubt the guest's scheduler has the ability to schedule tasks between logical and physical cores, based on priority.

No need to down vote him people. He is polite and on topic. Not completely correct, but none of us is all the time.

Is there any risk this also impacts browsers executing JS?

Yes. you are executing JavaScript after all.

Hyperthreading is fine on its own, but yes in combination with other CPU features it is effectively impossible to secure.

Turn it off or sell your Intel chips.

> SMT is fundamentally broken because it shares resources between the two cpu instances and those shared resources lack security differentiators.

I thought the root of one of the Foreshadow problems was that caches are shared across cores, and therefore even with hyperthreading disabled, you still gain information about a process on another core. Am I misinterpreting it?

It does seem like the paranoid thing to do is that each socket gets to be used by only a single user. (I half-jokingly suggested at work that we replace our internal cloud with a Beowulf cluster of Raspberry Pis...)

It also seems like you could design OSes in a way which is more robust to this, e.g., certain cores are only for the kernel and processes running as root, and system calls are inter-processor interrupts, so privileged kernel (or userspace root) data doesn't go into untrusted caches at all.

Foreshadow is caused by the L1 cache which is not shared across cores. It may be only a matter of time before L3 attacks are discovered but I don't know of any today.

Oh - I forgot the L1 cache isn't shared across cores. That makes sense, thanks.

There are cache partitioning implementations to isolate cores from each other, but mainly to prevent noisy neighbors from bumping you out of the higher level caches.


Cache timimg attacks are the old hat in the Timing side channel business, the newer attacks are cooler because the memory maps are not checked and you can determine the caching status of memory not mapped into your processes address space. (AFAIK)

It looks like CAT only does allocation of the last level cache (ie, L3). Despite the literature claiming this could prevent timing attacks, but I don't see how it could. Isn't there enough difference in speed between L3 and L1 that one should be able to extract timing information?

Scary looking headline on a discussion forum with instructions to perform a task that the average user would not really understand, with no explanation of attack vector or even consequences for any user who doesn't want to take the time (and energy, frankly at this point) to follow security news.

I'm pretty close to not caring anymore. I hope somebody figures out how to at least fix the security news infrastructure, if fixing security is still a ways off.

EDIT: Scratch that, I assume attack vector is a browser since they mentioned JavaScript.

This is not security news at all. This is Theo de Raadt's personal E-mail sent to the OpenBSD development mailing list for system developers. It is never intended for the consumption by the general public.

Hah, okay. I'm not familiar with OpenBSD so I didn't know who Theo was. Well, that would be good to know, let's say, on the HN headline.


Does anyone know the best way to disable hyperthreading in OSX? The only thing I found was this: https://www.whatroute.net/cpusetter.html

I'm not sure if you are running OSX on a multi user server, but is this something we need to worry about on our laptop? It's not clear to me what needs to be done in a single user situation.

The OpenBSD post references possible JavaScript (browser) attacks.

Maybe this is what finally gets me to upgrade from my ~2012 i7-3770. Not because of performance improvements, but to avoid performance degradation from all these security patches...

I'm not so sure I'd bother, more of these attacks keep coming out, odds are you'll just buy a CPU that'll be vulnerable to the next one. Maybe AMD would save you from some.

I'm in this exact scenario. I am thinking I might just go with AMD this time around, even if it is mostly an illusory short-term strength over Intel. In the long run I will undoubtedly have to refresh my hardware as new exploits come out, but at least I can take solace in the fact that I'm only worried about a single machine and not a datacenter.

I feel the same, maybe I will switch to AMD next time I upgrade. On the other end, I am happy with my budget dedicated server: its Intel Atom N2800 is so "rustic" that it did not get affected by any of Intel vulns (yet).


>>> We are having to do research by reading other operating systems.

Si Intel cooperates with business partners like apple/windows and not with open source. Does it mean that Apple and Windows can claim to be more secure because they have access to the information needed to fix Intel's issues ?

A lot of these bugs seems to be found by the Google teams and since they are heavy Linux users, I'm sure in many cases they have a solution for Linux before Apple or Microsoft does.

> Does it mean that Apple and Windows can claim to be more secure because they have access to the information needed to fix Intel's issues ?

They can claim it, but I would trust a Linux or FreeBSD box over MacOS or Windows anytime even if they get some security info before the open source operating systems.

Intel cooperates with organizations that obey embargoes and don't badmouth their partners in public, like Red Hat, Canonical, and probably the Linux Foundation. Intel does not cooperate with OpenBSD.

OpenBSD has never disobeyed an embargo. Argued for them to be reduced, criticized them, but not disobeyed them.

i'm pretty sure this is not true: The most recent example i remember is: https://lobste.rs/s/dwzplh/krack_attacks_breaking_wpa2#c_pbh...

This is common misinformation. Even in this case, OpenBSD did not break the embargo. After protesting, they received the permission of the researcher to publish:

  Note that I wrote and included a suggested diff for OpenBSD already, and that
  at the time the tentative disclosure deadline was around the end of August. As
  a compromise, I allowed them to silently patch the vulnerability.

I think what you're insinuating is a side effect of the OpenBSD group being open, honest, and congruent to their ethics.

What he's insinuating is that they agreed to embargos and then repeatedly broke them, claiming it was "better for users".

Regardless of whether it is, you should expect the result of that to be that nobody trusts them with embargoes.

Which is in fact, what has happened.

The KRACK embargo expired as per agreement but I'll partially concede after reading about this OpenSSL accident: https://www.tedunangst.com/flak/post/regarding-embargoes

Google broke an embargo early. https://security.googleblog.com/2018/01/todays-cpu-vulnerabi...

EFail embargo was broken. See https://twitter.com/seecurity/status/995964977461776385 and http://flaked.sockpuppet.org/2018/05/16/a-unified-timeline.h...

I don't think picking on OpenBSD is the right thing here.

> badmouth

a.k.a. truth

Linus has "badmouthed" Intel in much harsher and more explicit terms. Linux is just too big for them to get away with trying to smear and slander so they ignore him and move on.

I think they mean Linux, not Apple/Windows.

There are some comments here talking about the possibility of getting better performance without HT. Here's an article from a test (on Intel only) of that theory: https://www.phoronix.com/scan.php?page=article&item=intel-ht...

In the end, the conclusion is: "Long story short, Hyper Threading is still very much relevant in 2018 with current-generation Intel CPUs. In the threaded workloads that could scale past a few threads, HT/SMT on this Core i7 8700K processor yielded about a 30% performance improvement in many of these real-world test cases."

Will be interesting to see what Apple does about this in their next software update. I can’t imagine many people will be happy if the next software update forcibly disables hyper threading.

(For those who aren’t familiar with Apple devices, Apple don’t expose settings like this to a user, which are usually available in the BIOS on a PC)

Will the loss of HT have apparent consequences on most Intel-based Apple hardware? Very few of them are servers under constant multithreaded load, throughput-oriented. I suppose almost all Macbooks and most Mac Pros will not visibly slow down.

I would expect disabling HT on dual-core systems to have a noticeable performance impact on the desktop.

It's not clear that he's using "SMT" to refer to AMD specifically as he goes on to talk about "Intel CPUs" and disabling it in "Intel BIOS". Does the Zen architecture have the same issue?

I don't think AMD is impacted by Foreshadow

SMT is the generic name? Hyper-Threading is the name of Intel's 2 way SMT implementation. I don't think they call the Xeon Phi 4-way SMT Hyper-Threading, but I could be wrong there.

It does not.

[ marc.info not responding for me, found post linked elsewhere. Edited due to markdown.

URL: http://openbsd-archive.7691.n7.nabble.com/Disable-SMT-Hypert...

Here it is: ]


Title: Disable SMT/Hyperthreading in all Intel BIOSes

Posted by Theo de Raadt-2 on Aug 23, 2018; 11:35am

Two recently disclosed hardware bugs affected Intel cpus:

- TLBleed

- T1TF (the name "Foreshadow" refers to 1 of 3 aspects of this bug, more aspects are surely on the way)

Solving these bugs requires new cpu microcode, a coding workaround, AND the disabling of SMT / Hyperthreading.

SMT is fundamentally broken because it shares resources between the two cpu instances and those shared resources lack security differentiators. Some of these side channel attacks aren't trivial, but we can expect most of them to eventually work and leak kernel or cross-VM memory in common usage circumstances, even such as javascript directly in a browser.

There will be more hardware bugs and artifacts disclosed. Due to the way SMT interacts with speculative execution on Intel cpus, I expect SMT to exacerbate most of the future problems.

A few months back, I urged people to disable hyperthreading on all Intel cpus. I need to repeat that:


Also, update your BIOS firmware, if you can.

OpenBSD -current (and therefore 6.4) will not use hyperthreading if it is enabled, and will update the cpu microcode if possible.

But what about 6.2 and 6.3?

The situation is very complex, continually evolving, and is taking too much manpower away from other tasks. Furthermore, Intel isn't telling us what is coming next, and are doing a terrible job by not publically documenting what operating systems must do to resolve the problems. We are having to do research by reading other operating systems. There is no time left to backport the changes -- we will not be issuing a complete set of errata and syspatches against 6.2 and 6.3 because it is turning into a distraction.

Rather than working on every required patch for 6.2/6.3, we will re-focus manpower and make sure 6.4 contains the best solutions possible.

So please try take responsibility for your own machines: Disable SMT in the BIOS menu, and upgrade your BIOS if you can.

I'm going to spend my money at a more trustworthy vendor in the future.

Makes sense from the adage that more complexity means more exploits.

Does anyone know the best way to disable hyperthreading on Linux?

Okay, I found it. A SMT knob was added alongside in the L1TF fixes.




    active:  Tells whether SMT is active (enabled and siblings online)
    control: Read/write interface to control SMT. Possible


    "on"		SMT is enabled
    "off"		SMT is disabled
    "forceoff"   	SMT is force disabled. Cannot be changed.
    "notsupported"	SMT is not supported by the CPU

    If control status is "forceoff" or "notsupported" writes are rejected.

In the bios?

I would love to but how in the world do I do this when I'm using Windows and Lenovo doesn't give me an option in the BIOS?

Find a way to embarrass Lenovo into updating the BIOS to give you that option.

One way to achieve something similar would be via a software tool which would set the process affinity to only run on real cores.

Or you could only run Chrome (untrusted JavaScript) on core 2 and 3, and run the app that has your secrets on core 0 and 1. (It is my understanding that 2k cores are real, and 2k+1 is their matching, "virtual" core) This way you get both hyperthreading and security. I'm not a security expert though.


I'm not sure it would be that easy since I believe e.g. I/O can go through the System process (or other processes even), which has full affinity. We'd likely have to set thread affinities for all processes/threads. But then it would clash with manually-set affinities, and I'm also not sure if it would have worse performance than actually disabling hyper-threading or not.

Right now I'm looking at what making a UEFI application to disable HT before boot might involve... not sure if that's too late in the boot process or not.

> It is my understanding that 2k cores are real, and 2k+1 is their matching, "virtual" core)

I'm not sure that's true. For example, on a i7-4770 I get:

  $ cat /sys/devices/system/cpu/cpu[0-3]/topology/thread_siblings_list
(Of course, that might just be Linux renumbering them)

So, realistically, how much performance did your average OpenBSD server just lose from following this mitigation?

Performance is not a top priority for OpenBSD.

If you read https://www.openbsd.org/goals.html, the word "performance" does not appear.

I think this may be missing the point of the grandparent comment; rather than interpreting it as an accusation of OpenBSD sabotaging its users' performance, I think we're all just curious at the relative importance of hyperthreading for real-world workloads, on any OS, in grim anticipation of the potential worst-case scenario where hyperthreading's security woes continue to worsen and worsen.

I don't think you can answer that with a single question. It depends on how CPU-intensive the work that server was handling was.

I think, though, that if you'd be particularly willing to knowingly allow these kinds of vulnerabilities in exchange for some performance, OpenBSD probably isn't a good fit for you in the first place.

> I think, though, that if you'd be particularly willing to knowingly allow these kinds of vulnerabilities in exchange for some performance, OpenBSD probably isn't a good fit for you in the first place.

I disagree. You may have consciously picked OpenBSD because you believe that security is critical for your business. But if you're renting a server (shared or otherwise) to handle your website and paid for X number of cores, RAM, etc., you establish a baseline for what kind of performance you get out of that setup. If that performance suddenly nosedives 20% overnight because the new mitigation patches turned off hyperthreading, the rig you paid for may have gone from sustainably handling your workload to buckling, causing service degradation, outages, etc. I imagine it could be a real problem. It's not so much "Oh, we can't handle that performance hit, we'll run without it" so much as wanting to know the extent of the damage before they take the plunge.

Does anyone know if disabling SMT has had an effect on vmd(8) performance in -current?

Is not it better just to disable Javascript instead of cutting your CPU performance?

Some of us like using websites.

Many actually improve when you disble JS

But a non-zero amount will completely break or not show anything at all (even though the actual content to be displayed is just static text and doesn't need any JS)

May as well disable all machine code too, since that is also susceptible.

This affects more than just JS.

Can I ask, Is Intel going to sue us if compare benchmarks with SMT to no SMT? Like they mentioned about the microcode benchmarks. Intel is in trouble.

Eh hyperthreading seems to be questionable anyway. On all the application benchmarks I've run except for a few, it slows things down.

Care todo share ir provide more specific info?

I've long felt that there's something less than half-baked about the multi CPU architecture we're currently using. The hacky contortions HFT coders have come up with to avoid things like False Sharing strike me as a big red flag.


How about an architecture more like Erlang's, where you have independent processes with their own CPU core, where each has their own memory, but where you have much faster communications supported at lower hardware levels? Why not have a multi-processor architecture designed for direct support of Hoare CSP-inspired languages?

Hypercube topology: http://web.eecs.umich.edu/~qstout/pap/IEEEM86.pdf

Something like this?

Parallelism is inherent in most problems but due to current programming models and architectures which have evolved from a sequential paradigm, the parallelism exploited is restricted. We believe that the most efficient parallel execution is achieved when applications are represented as graphs of operations and data, which can then be mapped for execution on a modular and scalable processing-in-memory architecture. In this paper, we present PHOENIX, a general-purpose architecture composed of many Processing Elements (PEs) with memory storage and efficient computational logic units interconnected with a mesh network-on-chip. A preliminary design of PHOENIX shows it is possible to include 10,000 PEs with a storage capacity of 0.6GByte on a 1.5cm2 chip using 14nm technology. PHOENIX may achieve 6TFLOPS with a power consumption of up to 42W, which results in a peak energy efficiency of at least 143GFLOPS/W. A simple estimate shows that for a 4K FFT, PHOENIX achieves 117GFLOPS/W which is more than double of what is achieved by state-of-the-art systems.


Something like this:

1) https://www.sciencedirect.com/science/article/pii/S014193311...

(PDF: https://science.raphael.poss.name/pub/poss.13.micpro.pdf )

"The Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. Its implementation in hardware provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional “accelerator” approach, Microgrids are components in distributed systems on chip that consider both clusters of small cores and optional, larger sequential cores as system services shared between applications.

2) https://ieeexplore.ieee.org/document/7300441/ (PDF: https://science.raphael.poss.name/pub/poss.15.tpds.pdf )

"This article advocates the use of new architectural features commonly found in many-cores to replace the machine model underlying Unix-like operating systems. "

That's like the Cell processor. For every year that you program for Cell you need at least two years of therapy.

If you're going to change the substrate or paradigm, then you need to do a dynamite job of supporting your users. Sony did not do that.

No networking can touch silicon-level interconnect between cores or within cores on a single chip, at least for latency. Erlang's model of computation doesn't have much to say about physical implementation, and multi-socket/distributed systems are not perfomant for latency-critical user applications. For servers and high performance computing sure, I guess in theory we could use tons of simple single-core chips, but fabrication costs and energy efficiency would be significantly worsened.

No networking can touch silicon-level interconnect between cores or within cores on a single chip

So how about silicon-level interconnect that looks like networking? As it is now, it seems almost designed to elicit badly non-optimal code.

multi-socket/distributed systems are not perfomant for latency-critical user applications...fabrication costs and energy efficiency would be significantly worsened.

I think there would be tremendous benefits if we started designing multi-socket/distributed system that could perform in those situations. For one thing, Intel has currently painted itself into a corner with regards to large wafer yields, and AMD is kicking their butts by combining smaller dies.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact