
Does simultaneous multithreading still make sense? - wheresvic3
https://www.codeblueprint.co.uk/2019/11/05/does-smt-make-sense.html
======
xscott
I worked on a project where the (large) customer had some legacy requirements
about percentage of "CPU" our application was allowed to use. The requirement
was written back in the days when a single computer really only had one core,
and once things like that are written it's hard to get them unwritten.

For our application (heavily numeric, very well behaved cache access), turning
on hyperthreading only increased real performance by about 10% (measured as
work completed per unit of time). However, we settled on a metric where we
defined CPU use to be load average divided by number of cores. Doubling the
number of cores the system showed in top allowed us to meet the required
margin.

So from a bureaucratic point of view, hyperthreading was a 100% improvement.

------
xoa
Flagging this as it's an absurdly shallow article apparently combining about
10 minutes of "research" after hearing something on twitter with conflating
typical end user use-cases and an entire technology. The "tuning" and "oh noes
my VMs this is surely a new problem nobody doing virtualization has ever
thought of" section is too absurd to even bother with. But for the security
aspect it's worth pointing out that in many, if not _most_ [1], true
performance critical environments all code being run is trusted. The system or
cluster is dedicated to being given one specific job after another to crunch
on, exclusively by authorized users in authenticated ways and outputting
exclusively to a controlled channel going off-system. Even if it ever should
have a problem, it would merely result in possibly some corruption of data in
flight and some downtime as the whole thing was re-imaged, but nothing that
would be remotely worse a 15-50% drop in performance (!). For roots sake.

\----

1: where "most" means "in the raw amount of hardware $$$ spent".

~~~
zamadatix
There aren't really many "performance critical" multithreaded environments in
the world. For the most part you either have something that doesn't scale and
needs a really fast thread or you have a cost equation about how many servers
you need to buy/maintain. The main exception that comes to mind are extremely
large databases that heavily resist horizontal scaling due to poor design (of
either software or database).

I'd argue most large compute by total $ is actually shared at the host level
i.e. public/private cloud or user devices. Basically the only things that
aren't are dedicated clusters for specific applications and a few hundred
supercomputers while AWS alone has over 100x as many cores as the largest
supercomputer.

Also I don't think there is a high horse to be on about an article not
targeting the audience of the largest exacompute scale clusters, not
everyone/everything at HN need be at the forefront of the field to avoid being
flagged.

~~~
chaps
Er, there aren't many performance critical multithreaded environments? Latency
sensitive systems disagree, and those are all over the place.

~~~
zamadatix
Can you give examples of something that scales via threading but requires in a
single thread the function takes 500 microseconds instead of 600 microseconds
to compute that actually contradicts that "most" systems aren't this way?

~~~
jcelerier
Most professionnal audio software is like that - you can have e.g. a thread
per track to make it simple but you also have to ensure that each execution
cycle does not take more than 1 millisecond else you get audio glitches. And
there is no limit to how hard you have to improve - this is a central factor
for people buying your software (see dawbench) and artists really really don't
like limits - they will always try to add more effects, etc etc on each track.

~~~
zamadatix
Agreed but I'd hardly call pure software professional real time audio setups a
disruptor of the vast majority of systems. Put all of these niche compute
heavy multithreaded real time use cases and you have <1% the CPU market.

I.e. my claim was "There aren't really many 'performance critical'
multithreaded environments in the world." not that there aren't any.

~~~
jcelerier
> Agreed but I'd hardly call pure software professional real time audio setups
> a disruptor of the vast majority of systems.

I mean, there's still a few hundred thousand people registered on DAW-related
forums so certainly a fair bit more are using those. That is more than a dozen
european countries. Sure, it's not angry birds but I do not think that it is
relevant to cater to the lowest common denominator of software.

------
altmind
Of course SMT makes sense. Why would it not be? Article says that thats
because ppl only count the threads in their "cpuinfo" output and get the wrong
impression? Intel vulnerabilities are not SMT vulnerabilities per-se, they are
side channel attacks on a specific SMT implementation.

~~~
ljhsiung
Also want to add that to say "don't use SMT because it's insecure" is the same
as saying "don't use a cache because it's insecure" or "don't use speculative
execution". As a short-term fix, I would 100% agree that disabling SMT e.g.
OpenBSD's approach is awesome and shows their security-consciousness. But to
preach "disable SMT because it's too challenging" feels very lazy.

Additionally, as you've said, it's _still_ uArch dependent. For example-- the
Fallout vulnerability (one of the MDS attacks) only worked on Intel machines,
but not AMD+ARM, most likely due to the differences in how the two designs
handled store-to-load forwarding on the store queues/buffers.

The author seems to also value security over performance. I do as well. But
the balance between performance and security is a fickle one, and I feel that
"SMT is nonsensical" is a bit too much

~~~
phkahler
>> Additionally, as you've said, it's still uArch dependent

Intel would love for everyone to disable SMT regardless of vendor. That would
help them with relative performance.

------
h2odragon
Given the mismatch between memory latency and how fast a cpu can actually run
when it does have data, SMT still does make sense, sometimes, for some kinds
of system. bigger better caches make it less useful and security... well.
"Ownership" of ones computational environment is a metaphysical debate now,
this is just one more bullet point on the list.

------
clarry
In the linked article about ghk's talk, you find this tidbit: "If you're not
using a supported distro, or a stable long-term kernel, you have an insecure
system. It's that simple. All those embedded devices out there, that are not
updated, totally easy to break."

Is he still talking about SMT, or just poor security of Linux in general?

I'm wondering about this since "all those embedded devices out there" that I
can think of are not running CPUs with SMT.

~~~
esotericn
Tons of stuff like NUC's used as digital displays, kiosks, etc, everywhere.
I'd be surprised if even half of that was on a proper update path.

Embedded isn't just like, microcontrollers. Think about all the times you've
seen a BSOD on a billboard.

~~~
jacquesm
If it doesn't have a jtag connector it isn't embedded.

~~~
zamadatix
This comment is extremely ironic considering it was an Intel processor that
spurred the widespread use of JTAG and all the way up to Skylake Intel
products had traditional JTAG connectors. These days they do JTAG over a
physical USB port but I'm not sure how the shape of the port is supposed to
matter.

JTAG on the NUCs actually led to a CVE as well IIRC.

~~~
jacquesm
JTAG has nothing to do with Intel per-se but everything with BGAs which made
it super hard to get to certain signals.

~~~
zamadatix
It was (relatively) uncommon until Intel released the 80486, then it became
very popular and was found on basically every chip. Not that there weren't
devices before and after that used JTAG but none nearly as influential in it's
growth.

------
toast0
In my mind, SMT made more sense when core counts were low. These days, desktop
use cases can more often run out of threads to run than places to run them.
Server use cases can often run more threads, but it might not be useful to run
32 cpu threads if your NICs can only properly run 16 queues.

~~~
Fronzie
For computational tasks, I've seen SMT give a roughly 50% performance increase
compared to not using SMT on the same machine.

Much of that depends on how 'regular' the executions are. A highly optimized
FFT or BLAS routine will benefit less than a sparse matrix computation, where
part of the time is spent in indexing, rather than floating point operations.

~~~
tyingq
Some SMT on/off benchmark comparisons on a Ryzen 3900x. Confirms your
"sometimes 50+% / sometimes nothing" experience.

[https://www.techpowerup.com/review/amd-ryzen-9-3900x-smt-
off...](https://www.techpowerup.com/review/amd-ryzen-9-3900x-smt-off-vs-
intel-9900k/3.html)

------
jmakov
Some benchmarks for orientation: [https://www.anandtech.com/show/11544/intel-
skylake-ep-vs-amd...](https://www.anandtech.com/show/11544/intel-skylake-ep-
vs-amd-epyc-7000-cpu-battle-of-the-decade/15)

~~~
jmakov
Also for different workloads:
[https://www.phoronix.com/scan.php?page=news_item&px=AMD-
Ryze...](https://www.phoronix.com/scan.php?page=news_item&px=AMD-
Ryzen-9-3900X-SMT-Perf)

------
hrgiger
This makes me wonder how SMT handled in linux kernel especially on cpu-idle
and scheduling then I have found below articles, sharing for those who is also
interested in:

1- Rock and a hard place: How hard it is to be a CPU idle-time governor
[https://lwn.net/Articles/793372/](https://lwn.net/Articles/793372/)

2- Many uses for Core scheduling
[https://lwn.net/Articles/799454/](https://lwn.net/Articles/799454/)

------
CodeArtisan
I would say that one of the major performance boosts of Zen over Bulldozer is
the introduction of real SMT due to the expiration of the patents. Bulldozer
had CMT which is not the same technique.

CMT vs SMT (very simplified view):
[https://i.imgur.com/AcZnipK.png](https://i.imgur.com/AcZnipK.png)

As you can see, with CMT, you have the same amount of ALUs than with SMT but a
single thread can only use its dedicated ALU leaving the other one useless
meanwhile SMT allows a single thread to use all ALUs.

~~~
Narishma
> due to the expiration of the patents

How do you know that's the reason?

------
tyingq
It's certainly good for Amazon, where they pawn off a thread as a "vCPU".

If SMT dies off, it would be a pretty big margin hit for them.

------
ridiculous_fish
How will SMT evolve with the frequency down-clocking required by AVX-512?
Might a thread be penalized because it happens to be executed concurrently
with a AVX-512 thread on the same score?

~~~
touisteur
I thought down-clocking was on the first generation of low-end almost-not-
Xeons with AVX-512? Will a 2018/19 Xeon Gold or Platinum really down-clock?

------
metaphor
FYSA, SMT in this context is _simultaneous multithreading_ a.k.a.
_hyperthreading_ , not _surface mount technology_.

Hardware folks can safely move on.

~~~
saagarjha
And not “Satisfiability modulo theories” either, it seems. I would never
recommend people to “move on” from an interesting article, though.

~~~
metaphor
Agreed.

At first glance, I genuinely thought this was going to be a pitch for yet
another fragile additive manufacturing toy with narrow usecase, or a new
process that enables IPC-7092 designs on the cheap.

------
maweki
I wish that acronyms would be written out if they have multiple meanings in
the computer context. My first thought was "how can satisfiability modulo
theory ever not make sense?"

~~~
callmeal
I thought it was Surface Mount Technology and was wondering what kind of
replacement was being proposed.

------
rektide
Cores sharing some caches make sense but no maybe smt does not make sense.

~~~
philjohn
Or does SMT make sense because looking at instructions coming in and branch
predicting to execute some speculatively can only go so far, and sometimes
hints from the application that "hey, this can be run independently of that"
helps with overall throughput?

------
baybal2
Yes, it does. Instruction level vulnerabilities arise from execution of
insecure code.

If you have to do so, your security is already compromised. Shared hosting,
virtualisation, and etc are all insecure by definition.

------
rythie
Intel i5 desktop chips don’t have hyper-threading (SMT) and haven’t for the 10
years they’ve been available. Typically the i7 variant of the same CPU has
been about £100 more (roughly 50%). The point about only 5% extra die space
makes no difference to the consumer, as there is/was quite a high cost premium
on desktops for that feature. Now Intel has removed hyper-threading from most
of it’s i7 desktop chips, and you get 2 extra cores over the i5 version
instead.

~~~
tyingq
_" Intel i5 desktop chips don’t have hyper-threading (SMT) and haven’t for the
10 years they’ve been available."_

That's mostly true, though there have been a few desktop i5 processors with
hyperthreads.

Like:
[https://ark.intel.com/content/www/us/en/ark/products/43546/i...](https://ark.intel.com/content/www/us/en/ark/products/43546/intel-
core-i5-650-processor-4m-cache-3-20-ghz.html)

~~~
rythie
I didn’t spot that one, though it was almost 10 years ago and I don’t see more
recent examples. My point was that a large number of users don’t actually have
hyper-threading on the desktop.

~~~
Narishma
What about i3 processors. Or laptop processors, AFAIK those all support HT.

