Hacker News new | past | comments | ask | show | jobs | submit login
SMT-COP: Defeating Side-Channel Attacks on Execution Units in SMT Processors (binghamton.edu)
41 points by matt_d 7 months ago | hide | past | web | favorite | 13 comments

The original idea for SMT was that a single application would use both threads to do part of the same computation, using resources better. But we just used it as "more CPUs", which has a different security profile.

SMT would be amazing if there was a way to share data across “threads” without going through memory. GPUs have this and it works wonders for performance.

Caches effectively do this some data. Certainly for read only date.

I think the missing piece is keeping the two threads tightly synchronized. You can ensure the data is passed exactly when the adjacent thread will need it.

It would be interesting if logic units were uncoupled from cores, then multiple cores could share multiple logic units, reducing contention pressure. As it is, all designs treat CPUs as a series of chiplets.

The AMD bulldozer was a "Clustered MultiThreading" chip somewhat as you describe in that certain units were shared between what would normally have been a core. It was a bit of a disaster though as it was more "cut down and share what's left" than "keep and share".

I have a feeling it'd be hard to find something that is easier to route and schedule over multiple cores than it is to just add that extra unit to a single core. AMDs latest CPUs are a good example of this, the L3 cache isn't even contiguous across all cores in the same core complex anymore (same access pattern as if you went to a completely different chiplet).

According to Intel’s manual (although I don’t think it still applies):

“Under heavy load, with multiple cores executing RDRAND in parallel, it is possible, though unlikely, for the demand of random numbers by software processes/threads to exceed the rate at which the random number generator hardware can supply them.”

I don’t think it would be that hard to find something to route over multiple cores. If certain operations were unrolled and buffered, it would be overall more efficient.

Depends on the elements you're talking about. RDRAND takes no input other than "I need a number" and produces output by feeding the constant output of a hardware RNG to the AES components of a particular core.

Pretty much every other piece of the CPU needs to consume inputs and produce outputs without race conditions or cache coherency. This is where it becomes difficult to connect and schedule and is the reason AMD ends up with 16*16 MB of L3 cache instead of 1x256MB, trying to pump coherency instead of just output over an interconnect comes with an ENORMOUS penalty.

I wasn’t aware that RDRAND was implemented in microcode on the same core, I thought it was implemented as a coprocessor with some microcode checks for quality.

Avoiding speculation on the output of high latency instructions seems prudent.

I think there have been some designs like this but maybe only on mobile cores.

This is largely a cloud/server issue, is it not? Any desktop machine that's running malicious code is already compromised, isn't it? Or is this a concern with drive by JavaScript? I never know if these issues are likely to impact the home user or not.

These issues can be triggered via javascript, yes.


Awesome, love to see great work from my alma mater!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact