This is an area where hardware, run-time library, compiler, and cryptography developers have been pulling in multiple directions for a while, and it's nice that the hardware at least is explicitly providing options.
One thing I've always wanted but never seen is a way to be explicit to the compiler what time complexity I require for a specific piece of code. At the moment, we have in some cases library guarantees about how slow an algorithm is allowed to be (in C++ STL, I don't think I've seen guarantees anywhere else?) and compilers will try to manipulate code to make it faster if they can prove the result is the same. But that leaves cryptography developers trying to ensure that the optimiser doesn't do as good a job as it is trying to do.
What we've learned is that telling the compiler what you're trying to achieve is much more straightforward than only giving the compiler part of the information you have then relying on it happening to do the right thing. Being able to set a constant time annotation on source code and have it propagate all the way to the hardware would be really neat.
But what I think that also means is Intel shipped Ice Lake some time ago and this is now a “Surprise!” moment. Behavior changes like this deserve much more fanfare and, where necessary, development activity to take place in the kernel and related layers, ideally before commercial availability and relatively widespread adoption occurs.
Now that the thread is already a couple days old, has anyone managed to quantify the performance impact of setting the flag?
All the proposals for more fine-grained ways to set the flag for code that needs it (opposed to having it always enabled) come with some overhead, and it's hard to weigh that overhead against an unknown upside of leaving the flag off whenever possible.
Unless I am misunderstanding something, this flag only seems to affect about 10 instructions. Most of the ones you would expect to be involved with cryptography already have data-independent timing regardless of the value of the MSR.
The true/false on that page indicates whether the instruction has “MXCSR Configuration Dependent Timing.” This is separate from their sensitivity to the data-operand independent timing MSR bit.
My interpretation is that all of the instructions in that list require DOIT set for guarantees of data independent timing
> a new model specific register (MSR) control enables data operand independent timing for the listed data operand independent timing instructions.
More specifically, there are a few things I take away from this list:
* Vector integer multiply instructions may be data-dependent in some circumstances.
* Intel appears to be willing to commit to making most integer instructions constant-time.
* Some of the vector integer multiply instructions appear to be largely reusing the floating-point FMA units (VPMADD52* is a pretty strong clue there--52 bits covers the size of the mantissa in double-precision), and those units appear to have some non-constant-time behavior.
Division, remainder, and memory access are data-dependent, someone who didn't know better could easily put those in a cryptographic algorithm (or worse, a compiler could transform a program into using these).
The intro matter is a bit hard to parse, but my understanding is that everything on that page is supposed to be constant-time, but the things that have "True" in the final column are the ones that aren't constant-time unless you twiddle on MSR on some systems.
Most of the integer instructions you'd expect to be constant-time are in fact constant-time; it's a slice of vector integer multiply (and integer multiply-adds) that are inadvertently not constant-time.
No, that's the wrong way to parse it, as there are explicitly _two_ bits on the MSR (DOITM and MCDT), and this thread is about DOITM.
All the instructions in the list are no longer (guaranteed to be?) constant-time unless the DOITM bit is set.
> Specifically, when data operand independent timing mode [DOITM] is enabled, instructions within the data operand independent timing subset execute with timing (in terms of processor cycles) that is independent of the data values in the sources of the instruction
Because if the attacker can observe how long the software takes, that can open you up to timing side channel attacks.
The most obvious example is if you compare a user-supplied secret with a stored secret, the naive `for i in 0..min(len(a), len(b)) { if a[i] != b[i] return false } return true` will take longer the more of the first letters you guessed correctly. By averaging lots of tries, even tiny differences can be measured even over the internet.
Real crypto code takes some pains to ensure that you can't observe secrets from timing information, but it can only ensure that on the level of assembly instructions. If the CPU messes with timing on the level beyond that, that could open up vulnerabilities.
The critical component is that the time is "data dependent". Cryptographic keys (and derived materials) are data, so a data dependent execution time could leak information about the keys.
As a simple example, consider a naive password check:
You might think that this function only tells you if the guess was correct or not. But it actually can tell you which character is the first incorrect one. With enough tries and some statistical analysis, this sort of attack had been demonstrated even over the internet.
One of the most devastating attacks against cryptography ("crypto") are timing side channel attacks. If the attacker can learn how much time a given crypto routine takes for differing inputs in a system being attacked, they can often pry to locked crypto box wide open and access the protected data inside.
I'm curious as to what the *BSD's are planning on this, too, and if this affects or might make vulnerable to timing attack things like portable OpenSSH, constant-time string comparisons, etc. Especially if the latter, this would have the potential to turn into one of the worst CPU vulnerabilities in history.
One thing I've always wanted but never seen is a way to be explicit to the compiler what time complexity I require for a specific piece of code. At the moment, we have in some cases library guarantees about how slow an algorithm is allowed to be (in C++ STL, I don't think I've seen guarantees anywhere else?) and compilers will try to manipulate code to make it faster if they can prove the result is the same. But that leaves cryptography developers trying to ensure that the optimiser doesn't do as good a job as it is trying to do.
What we've learned is that telling the compiler what you're trying to achieve is much more straightforward than only giving the compiler part of the information you have then relying on it happening to do the right thing. Being able to set a constant time annotation on source code and have it propagate all the way to the hardware would be really neat.