
Constant-Time Mul - lelf
https://www.bearssl.org/ctmul.html
======
JoshTriplett
In an ideal world, we should teach _compilers_ how to do constant-time
operations and/or intrinsics, and then use those compilers to write
cryptographic code. The compiler could then use primitives that work in
constant time on the target platform (implementing them without hardware
support if necessary for a given target) and produce a compile-time error if
it can't produce constant-time code for a given target or for a given piece of
code.

~~~
zokier
Something like FaCT?
[https://github.com/PLSysSec/FaCT](https://github.com/PLSysSec/FaCT)

------
unwind
I'm not an expert at all, but from what I've read about the C standard and how
compilers interpret it, it seems dangerous to implement things like this
(constant time integer multiplication) as C macros.

It should probably be done in (inline) assembly, to avoid the compiler
optimizing away the extra "pointless" bits for instance. Just because a
favorite compiler doesn't do it today, doesn't mean it's safe.

I'm sure the actual authors of code like this have analyzed the resulting code
and know more than I do about the safety of the operations involved, but I
didn't see that covered in the article.

~~~
tprynn
The reason not to implement this in inline assembly is that you would have to
implement unique assembly for each architecture supported, which would mean
probably dozens of different implementations, especially considering that even
on a given architecture there are notable processor-family differences in
whether the same multiply instruction is constant time or not.

In any case, I think it's safe to say that 'pornin knows more about writing
safe C code (and dealing with strange compiler behavior) than almost anyone.

~~~
andrewla
But that's the core of the problem -- the code has to be architecture specific
because the constant nature of integer multiplication depends on the processor
architecture. Once you're writing a multiple for an XX-generation Intel
processor, you might as well write it as assembly, rather than have a MUL_XX
macro that will probably do the right thing.

------
_Nat_
Constant-time operations feel like such a hack.

I mean, sure, if we could ensure that all operations were always constant-
time, that'd be a rather simple, stress-free way to get time-invariance. But
the low-level implementation assumptions feel flimsy.

Instead, if we want time-invariance, seems like we should just have
dispatchers handle it -- this is, if we want the system to hide timing
information, we should explicitly specify this behavior rather than trying to
get it as a side-effect.

\---

Note: We don't actually need strict time-invariance so much as for any time-
variance to not leak information about secrets. For example, if a dispatcher
takes an hour to resume a process, then sure an attacker might very easily
pick up on that time-variance introduced by the dispatcher, but so long as it
doesn't reflect on any secret information, who cares?

~~~
mikeash
One problem I see is in determining the amount of time before resuming. You
have to be sure that all possible inputs will take less time than what you
request, but you don’t want to wait forever. You also have to be robust
against disturbances due to things like system load. If you specify a maximum
time based on the no-load scenario, then an attacker discovers that they can
bog down your system by flooding it with packets while it does the
computation, you’re back to leaking info.

~~~
_Nat_
Yeah, a dispatcher would have to be designed to guard its secrets. For
example, it'd have to ensure that it can do tasks in whatever window it needs
to, have a logic for simulating CPU-load if an attacker might be able to watch
the CPU's process load, potentially force the CPU to burn more power if we're
trying to protect against power-analysis, etc..

Logically, it'd seem like a more complex task since we'd have to keep track of
what secrets we're guarding and planning an execution strategy to meet those
criteria. But, it feels cleaner to acknowledge these criteria and ensure that
they're being addressed.

------
snops
It would be interesting to know what the ARM SecurCore SC300 processor[1]
does, if that is constant time multiply or if they just expect you will use
the hardware crypto accelerators that are commonly integrated with it, e.g. in
the ST33 [2].

[1] [https://developer.arm.com/ip-
products/processors/securcore/s...](https://developer.arm.com/ip-
products/processors/securcore/sc300-processor) [2]
[https://www.st.com/en/secure-mcus/st33-arm-
sc300.html](https://www.st.com/en/secure-mcus/st33-arm-sc300.html)

~~~
dmitrygr
I believe SC300 code is just a C-M3, so timings would be the same as C-M3
(variable based on input). ARM even says the programer's model and all other
m3 docs apply on the SC300 site

~~~
snops
Ah thanks, that explains why I couldn't find the programmers manual for it
directly

------
kazinator
Isn't there some way to just pad up the variable time that some sensitive
operation takes, by spinning on a fine-grained clock register until it reaches
a predetermined value? This could be calibrated by timing a known worst case.

~~~
Someone
_”spinning on a fine-grained clock register until it reaches a predetermined
value”_ would require a second thread to do that counting, and isn’t
guaranteed to be constant time.

Also, even if it were, one probably could detect that the CPU was just
spinning from a side channel, for example by tracking CPU sleep state, cache
line pressure or contention for integer units in the CPU.

~~~
kazinator
Can't constant-time multiplication techniques somehow fall victim to that
level of visbility?

(If we can track details like contention for integer units in the CPU, why
don't we just peek at what bits are churning through the registers?)

~~~
Majromax
No; the core goal of side-channel-free algorithms (of which constant time
algorithms are a subset) is that the instructions executed by the CPU should
not depend on secret data.

Spinning until a fixed time is reached could conceivably leak information if
another thread can time math-unit bandwidth; it would be able to distinguish
the spinwait from bona-fide mathematical computation.

However, no (known) method accessible to code running on the machine can
distinguish between an adder unit summing zeroes versus key data.

~~~
kazinator
Can't we throw a few math instructions into the spin loop?

~~~
naniwaduni
-> Can we throw _exactly the same number_ of math (micro)operations into the spin loop?

-> Can we just make sure the exact same operation operations occur?

~~~
kazinator
The first idea seems way easier to achieve and maintain. It can be implemented
at higher level around a complex, lumped operation, like calculating a
signature, without having to go into all of that operation's subroutines to
change everything to constant time. We could conceivably drop it into any
language, wrapping it around any crypto, while treating that as a black-box.

~~~
kazinator
Could also be faster. That is to say, variable-time calculations likely have a
worst case time that is better than for the equivalent fixed-time
calculations, since we just write the obvious code for the CPU using its
normal instructions and optimize as usual. If we are able to box the padded
time fairly tightly to the worst case, we are better off performance-wise than
with the fixed-time implementations.

------
bitslip
Unrelated - I'm browsing this page right now over a proxy that's part of a
project I'm working on in Haskell, using BearSSL for HTTPS. I can't of course
speak to the correctness of the crypto primitives, but the project is small,
self-contained, well written and easy to navigate such that writing the
Haskell wrapper for it was relatively straight-forward. Great project.

~~~
loeg
Thomas Pornin (BearSSL) is incredibly talented.

------
mzs
such a great post full of gems like this:

"… Tanja Lange pointed me to this study by Wouter de Groot, who performed
actual measures on some Cortex-M3 devices, from which it appeared that a short
cycle count could occur not only in the documented case (operands fit in 16
bits) but also when one or both of the operands is zero or a power of 2. In
the case of MUL31(), this means that the underlying umull will complete in 5
cycles, except if one or both of the two operands are zero, in which case the
umull will complete in 4 cycles. …"

------
bronzeage
why can't we solve the constant-time issue by just deciding before the action
how long it should take in the worst case, and then just wait for that time
before continuing?

~~~
guitarbill
edit: see other discussion below,
[https://news.ycombinator.com/item?id=20395393](https://news.ycombinator.com/item?id=20395393)

presumably, that would make it very slow, since for every multiplication you'd
have to check a timer. you could maybe do it at a higher level, but then how
would you know how long a combination of operations would take on that
processor? but for a web API, adding random jitter seems like it'd force
attackers to use a lot more trials to gain any timing insight.

