
SMT-COP: Defeating Side-Channel Attacks on Execution Units in SMT Processors - matt_d
http://www.cs.binghamton.edu/~dima/pact19.pdf
======
justincormack
The original idea for SMT was that a single application would use both threads
to do part of the same computation, using resources better. But we just used
it as "more CPUs", which has a different security profile.

~~~
ryacko
It would be interesting if logic units were uncoupled from cores, then
multiple cores could share multiple logic units, reducing contention pressure.
As it is, all designs treat CPUs as a series of chiplets.

~~~
zamadatix
The AMD bulldozer was a "Clustered MultiThreading" chip somewhat as you
describe in that certain units were shared between what would normally have
been a core. It was a bit of a disaster though as it was more "cut down and
share what's left" than "keep and share".

I have a feeling it'd be hard to find something that is easier to route and
schedule over multiple cores than it is to just add that extra unit to a
single core. AMDs latest CPUs are a good example of this, the L3 cache isn't
even contiguous across all cores in the same core complex anymore (same access
pattern as if you went to a completely different chiplet).

~~~
ryacko
According to Intel’s manual (although I don’t think it still applies):

“Under heavy load, with multiple cores executing RDRAND in parallel, it is
possible, though unlikely, for the demand of random numbers by software
processes/threads to exceed the rate at which the random number generator
hardware can supply them.”

I don’t think it would be that hard to find something to route over multiple
cores. If certain operations were unrolled and buffered, it would be overall
more efficient.

~~~
zamadatix
Depends on the elements you're talking about. RDRAND takes no input other than
"I need a number" and produces output by feeding the constant output of a
hardware RNG to the AES components of a particular core.

Pretty much every other piece of the CPU needs to consume inputs and produce
outputs without race conditions or cache coherency. This is where it becomes
difficult to connect and schedule and is the reason AMD ends up with 16*16 MB
of L3 cache instead of 1x256MB, trying to pump coherency instead of just
output over an interconnect comes with an ENORMOUS penalty.

~~~
ryacko
I wasn’t aware that RDRAND was implemented in microcode on the same core, I
thought it was implemented as a coprocessor with some microcode checks for
quality.

Avoiding speculation on the output of high latency instructions seems prudent.

------
SketchySeaBeast
This is largely a cloud/server issue, is it not? Any desktop machine that's
running malicious code is already compromised, isn't it? Or is this a concern
with drive by JavaScript? I never know if these issues are likely to impact
the home user or not.

~~~
dijit
These issues can be triggered via javascript, yes.

[https://github.com/terjanq/meltdown-spectre-
javascript](https://github.com/terjanq/meltdown-spectre-javascript)

------
verdverm
Awesome, love to see great work from my alma mater!

