Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Inlined micro instructions, at that, with a special flag to prevent the CPU skipping over zero runs in multipliers.


I can't wait for some uarch to fuck everyone over and decide that after tracking zeroing idioms in the the renamer the next escalation is to eliminate data dependencies if the renamer detects it's not outcome determinative e.g. AND r1, r2, r3 with the renamer having inferred that one of the inputs is zero so the other isn't needed and the result must be zero. Or tracking identity elements for operations (0 for addition/substraction, 1 for multiplication/division) to bypass the computation.


SIMD instructions are a good way to avoid this, even if you only use one column of the operands the fact that it's SIMD means operations have to happen in lockstep across the execution units.


Do they? An arch with bultin lane predication (like AVX512) could easily implement wide SIMD on top of narrower ALU and then skip the masked out lanes. Actual runtime would depend on the number of non masked lanes.

I'm not up to date on GPU architectures, bit I wouldn't be surprised of they do this sort of stuff.


You [as a designer] could probably add latency synthetically and still benefit from avoiding a physical register allocation (although I guess, that's only a workaround for leaking in the time domain).

edit: Anyway, if your threat model includes "attacker can discern differences in power at uop-granularity and make meaningful correlations", you are probably doomed at the outset and you should not have used an out-of-order machine in the first place.


Well, multiplication used to take variable time not long ago.

Anyway, IMO, CPU designers seem more aware of security implications than compiler developers. I expect more attention to those things in the future, not less.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: