
Fast constant-time GCD algorithm and modular inversion - MrXOR
https://gcd.cr.yp.to/papers.html
======
waterhouse
It seems "constant-time" isn't used to mean "the time taken is O(1) regardless
of the size of the input n", but rather, "for a given input size, the
algorithm is carefully written to do the same amount of work no matter what
the specific bits of the input are, to defeat timing attacks". This took me a
bit of time to figure out.

To illustrate the kind of thing it's talking about, consider the naive
algorithm for computing a^n via exponentiation by squaring:

    
    
      total = 1
      while n > 0:
        if n is odd:
          total = total*a
          n = n-1
        a = a*a
        n = n/2
      return total
    

If n has k bits, and j of them are 1s, then there will be k-1 squarings of a,
and j multiplications of total by a. An attacker who can measure the total
time may be able to at least figure out the number of 1 bits in n. If they can
get fine-grained observations of power draw or something, then they might even
be able to tell which bits are 1.

Consider this alternative:

    
    
      total = 1
      while n > 0:
        maybe_total = total * a
        if n is odd:
          total = maybe_total
        a = a * a
        n = n >> 1
      return total
    

This will do the same number of multiplications, if you can convince the
compiler to not do any optimizations. Note that it still has a branch, though,
which might conceivably be detectable. To plug that hole, something like this
might work:

    
    
      # if n is odd:
      #   total = maybe_total
      # becomes this:
        low_bit = n & 1 # i.e. 0 or 1 if n is odd or even
        mask = low_bit - 1 # i.e. "all 1s" or 0 respectively
        total = (total & mask) | (maybe_total & ~mask)

~~~
arghwhat
This is standard terminology within programming and cryptography. This is the
only practical definition to use, as cryptography commonly deals with
arbitrarily sized inputs, which can of course not all run in O(1).

A simple example of constant-time within this definition, as well as why
constant-time does not mean O(1) in this context, is that of a comparator:

The most normal, and fastest, approach is to go through data in the largest
chunks the CPU can handle, returning 1 immediately a mismatch is found, and 0
at the end under the assumption that no termination = no mismatch. However,
this reveals information about the values compared, as the time it takes to
compare reveals where the mismatch is located.

For constant time, your goal is to ensure that all data is processed equally
regardless of outcome. You do this by effectively making the comparison a-b=c,
or a^b=c, where a zero-result means equality. In reality, this would be
implemented by XOR'ing the largest chunk the CPU can handle together, and then
OR'ing the result with a global value starting at 0, overwriting it. This
continues without termination to the very end, where you then simply return
the running OR'd counter, which is either 0 for equality, or an arbitrary
value for the opposite.

Both of course leak the length of the values compared, but the lengths in
these cases are commonly known by the adversary, making its protection less
relevant.

~~~
caf
_This is the only practical definition to use, as cryptography commonly deals
with arbitrarily sized inputs, which can of course not all run in O(1)._

It doesn't necessarily follow that any algorithm on a arbitrarily-sized input
can't run in O(1) - for a trivial example there is an algorithm for _"
determine if the input number is odd"_ which is O(1).

~~~
arghwhat
> for a trivial example there is an algorithm for "determine if the input
> number is odd" which is O(1).

I would argue that this particular example cannot be considered a case of
arbitrarily-sized input: There is only a single meaningful bit of information,
which is the single bit accessed.

However, if you must, we can expand to clarify that the input is meant to
include arbitrary _meaningful_ bits of information that require processing and
cannot simply be ignored.

~~~
throwawaymath
That's not how this works. What you're doing here is defining away the scope
of the problem to make O(1) complexity analysis redundant. A bound of O(1)
conventionally means that we can exploit some structural component of the
input and the given problem to find a solution independent of the input size.
If you redefine input size to the more narrow sense of inputs which don't have
some sort of structural feature like that, then yes of course nothing is
constant time. But then you're just shifting the difficulty of the problem
around without much of a gain.

~~~
arghwhat
I see how you may think that, but I would argue that I am not redefining
anything, as things turn nonsensical without this assumption.

It is not an input if it does not affect your output (I am sure a better
formal definition exists—I usually hear "meaningful", hence my use of that).
Without this restriction, any input could be arbitrarily padded, inflating
input sizes and making an algorithms complexity appear lower.

I find the even/odd example to be an exceptionally clear case of an algorithm
that takes a fixed 1-bit input: Its implementation on a little-endian machine
does not know the bounds of its input, only the location of the least
significant byte (due to byte-addressing).

Without a defined bound, such an implementation must be assumed to either take
a single byte as input (here a byte read is an implementation detail, despite
the algorithm only needing one bit), or the full system memory from the
location specified as input regardless of contents (due to having used the
word "arbitrary", the input is allowed to not fit in registers).

However, I admit that I am now entering deeper theoretical waters with regards
to definition details that I am normally comfortable with. I do however not
find any other definition to line up with practice.

~~~
throwawaymath
_> I see how you may think that, but I would argue that I am not redefining
anything, as things turn nonsensical without this assumption._

The point I'm making is that this statement reduces to the claim that defining
an algorithm's worst case time complexity as O(1) or constant time is
nonsensical.

Do you disagree that adding two numbers is a constant time operation?

~~~
arghwhat
> Do you disagree that adding two numbers is a constant time operation?

It is constant time by crypto programming definition in both theory and
practice, and O(1) only in theory (bigints are a pain).

I did not try to claim that O(1) is nonsensical. Rather, that certain O(1)
variable-input algorithms are in fact simply fixed input algorithms due to not
considering its input.

I also find these "non-sensical" O(1) algorithms to be outliers.

------
throwawaymath
It's pretty frustrating to see the discussion on this submission dominated by
people litigating the "constant time" terminology. The authors, Bernstein and
Yang, are using constant time in the conventional, complexity theoretic sense
of the word. Here is a quote from Section 2, "Organization of this paper":

 _> We start with the polynomial case. Section 3 defines division steps.
Section 5, relying on theorems in Section 4, states our main algorithm to
compute c coefficients of the nth iterate of divstep. This takes n(c + n)
simple operations. We also explain how “jumps” reduce the cost for large n to
(c + n)(log cn)^2+o(1) operations. All of these algorithms take constant time,
i.e., time independent of the input coefficients for any particular (n, c)._

In particular note that last sentence. The asymptotic runtime of the presented
algorithm does not depend on the inputs, _n_ and _c_. This algorithm analysis
is confirmed throughout the remainder of the paper, which walks through each
stage of the algorithm. Now let's look at a few canonical definitions of
"constant time", i.e. O(1).

From Skiena, we have:

 _Constant functions - f(n) = 1 - Such functions might measure the cost of
adding two numbers, printing out the "Star Spangled Banner", or the growth
realized by functions such as f(n) = min(n, 100). In the big picture, there is
no dependence on the parameter n._

Likewise from Sedgewick & Wayne:

 _Constant. A program whose running time 's order of growth is constant
executes a fixed number of operations to finish its job; consequently its
running time does not depend on N. Most Java operations take constant time._

I'll update if I find a choice example from Knuth in TAOCP, but I think this
suffices. The discussion about whether or not the cryptographic use of the
term satisfies the complexity theoretic sense of the term is a red herring;
it's a distinction without a difference. Algorithm analysis focuses on
asymptotic behavior, which is definitionally given by tail behavior, or rate
of growth of a function. Among other things, this paper is _not_ about an
implementation methodology that ensures the GCD algorithm will take exactly
the same amount of time regardless of the input.

______________________

1\. _The Algorithm Design Manual_ , 2nd Edition, § 2.3.1 Dominance Relations,
Page 39

2\. _Algorithms_ , 4th Edition, § 1.4 Analysis of Algorithms, Page 187

~~~
ziedaniel1
I think you misread slightly.

 _> All of these algorithms take constant time, i.e., time independent of the
input coefficients for any particular (n, c)._

This means that once you have chosen a particular _n_ and _c_ , the time no
longer varies. However, if _n_ and _c_ vary, the running time is definitely
allowed to vary also (as the formulas _n(c + n)_ and _(c + n)(log cn)^2+o(1)_
clearly do).

~~~
throwawaymath
No, I didn't misread. You and I are in (apparently violent) agreement.
Constant time does not mean that running time cannot vary, in either
complexity theory or cryptography. There are misconceptions on both sides
here, with regard to what the terminology means in both complexity theory and
cryptography.

For precision, I'll start with a good definition[1] for what "constant time"
means in cryptography:

 _> Constant-time implementations are pieces of code that do not leak secret
information through timing analysis. This is one of the two main ways to
defeat timing attacks: since such attacks exploit differences in execution
time that depend on secret elements, make it so that execution time does not
depend on secret elements. Or, more precisely, that variations in execution
time are not correlated with secret elements: execution time may still vary,
but not in a way that can be traced back to any kind of value that you wish to
keep secret, in particular (but not only) cryptographic keys._

Secure, constant time cryptographic algorithms need not have unvarying
execution time. Now going back to complexity theory, it is also an
extraordinarily common misconception that "constant time" means "the algorithm
has the same execution time regardless of the size of the input." This is not
the case. Big O notation doesn't even care about what always happens, it cares
about what happens in the worst case. When we use "constant time" in the O(1)
sense of the word, we are not precluding the possibility of an algorithm
having variable execution time. Again, for precision, we are simply saying
that the execution time (number of operations, etc) has an asymptotic upper
bound which is independent of the input. The execution time may vary with the
input, and generally speaking it will.

_________________________

1\. Thomas Pornin, _Why constant time crypto?_
[https://bearssl.org/constanttime.html](https://bearssl.org/constanttime.html)

~~~
infinity0
> Constant time does not mean that running time cannot vary, in either
> complexity theory or cryptography.

Again, overloading of the term "constant time" causes pointless
misunderstanding and arguments. Your statement here is wrong.

In complexity theory the term "constant time" does indeed mean the running
time is bounded even with unbounded input (e.g. goes to infinity), although it
could vary within this bound.

In cryptography the term "constant time" is sometimes used to mean a different
concept, that the operation actually takes constant non-varying time, so that
an attacker can't exploit this as a side channel to figure out the input
values.

The paper seems to be using the latter meaning.

~~~
throwawaymath
_> In cryptography the term "constant time" is sometimes used to mean a
different concept, that the operation actually takes constant non-varying
time, so that an attacker can't exploit this as a side channel to figure out
the input values._

Note that I cited Thomas Pornin for my definition of constant time
cryptography, who is a cryptographer in theory and implementation. It is
emphatically _not_ necessary for software to run with unvarying execution time
in order for it to be "constant time" according to the cryptographic sense of
the term. This will be a poor hill for you to die on, but I invite you to
provide literature supporting _your_ alternative definition.

------
esjeon
If I understood correctly, this paper is _NOT_ about a new blazing fast
security-breaking CS-history-changing algorithm. This paper suggests an
algorithm that takes the same amount of time to compute GCD(6, 9) and
GCD(123456789, 987654321), to prevent leaking hints on its inputs through
side-channels. That is, this thing is basically less efficient, but still runs
the same number of instructions no matter the input.

(EDIT: ... as long as inputs have the same bit-length. Any 32-bit inputs will
be handled faster than 1024-bit inputs, but any 1024-bit inputs will consume
the same amount of time no matter their actual values. That is, 0x0001 and
0x000000001 are handled differently by the algorithm.)

The paper do mention this:

> However, in cryptography, these algorithms are dangerous. The central
> problem is that these algorithms have conditional branches that depend on
> the inputs. Often these inputs are secret, and the branches leak information
> to the attacker through cache timing, branch timing, etc.

So, yeah, this is security-centered cryptography paper. The term "constant-
time" is used in a different context here.

~~~
throwawaymath
The term "constant-time" is used in the complexity theoretic sense. Can you
explain to me, concretely, how what you've said here

 _> as long as inputs have the same bit-length. Any 32-bit inputs will be
handled faster than 1024-bit inputs, but any 1024-bit inputs will consume the
same amount of time no matter their actual values. That is, 0x0001 and
0x000000001 are handled differently by the algorithm_

indicates the algorithm is constant-time in one sense but not the other?

~~~
waterhouse
The problem is: "Constant—with respect to what?" "Remains constant under what
conditions?" The function f(x,y) = x^2 is constant under varying y, and is not
constant under varying x. The adjective "constant", by itself, is
incomplete—unless it's truly a mathematical constant, like 2 or e, that
depends on nothing else—which is what leads people to the "O(1)"
interpretation of the phrase "constant time".

So if it's not used to mean "constant (no matter what you vary)", then it
means "constant (if you vary certain parameters and I'm not specifying which
ones)". When you use a phrase with something left out and implied, then the
audience has to fill it in somehow. If the audience shares your background,
perhaps has been reading similar papers recently in which "constant with
respect to xyz" had the xyz spelled out explicitly, this may go well; if not,
it may not. In this case, people's interpretations of "the xyz we're varying"
appear to range over "the entire space of integer-tuple inputs", "the size of
the integers", "the bits of the integers after the leading 1", "the parts of
the inputs that are considered 'secret'", and more.

So, if you say something with an implicit part left unspecified, and people
fill it in with something different than what you intended... the first time
this happens, I might consider it an unfortunate accident. If it happens
repeatedly, it may be worth being more explicit or choosing another term.
(Suggested terms: "secret-hiding", "secret-blind". "[something]-oblivious"
might be another good word-formation—precedent exists in "cache-oblivious"
algorithms.)

This is not the worst terminological mess we have in CS[1].

[1] My (least) favorite example is the term "dynamic programming", whose name
appears to have been chosen because it sounded good and was vague enough to
cover what the author wanted: _" Thus, I thought dynamic programming was a
good name. It was something not even a Congressman could object to. So I used
it as an umbrella for my activities."_
[https://en.wikipedia.org/wiki/Dynamic_programming#History](https://en.wikipedia.org/wiki/Dynamic_programming#History)

------
b-3-n
I was a bit disappointed to see that the "constant time" was a click bait.
Should be "fixed time" \- or similar - instead.

~~~
minitech
It’s not click bait. It’s standard terminology.

~~~
OskarS
To clarify: it's not "constant time" in the sense of having O(1) time
complexity with regards to the size of the inputs, which is what most people
mean by "constant time" (which is obviously not possible in this case: there's
never going to be a GCD algorithm that can work as fast on 100-bit integers as
on 1,000,000,000-bit integers).

It's "constant time" in the cryptographic sense, that the time to run it can't
be used as a side-channel to figure out what the inputs are. A great result to
be sure, but the terminology is undoubtedly confusing.

~~~
throwawaymath
_> it's not "constant time" in the sense of having O(1) time complexity with
regards to the size of the inputs_

Yes it is. The presented algorithm is constant time in the exponent, i.e. 2 +
O(1), where this exponent is not impacted by the size of the inputs _n_ and
_c_. Much like any other complexity analysis, an algorithm is O(1) as long as
O(1) is asymptotically the "largest part" of the running time. As the size of
n increases, the exponent 2+O(1) increasingly dominates execution time.

~~~
gjm11
"Constant time in the exponent" is nonsense, I'm afraid.

A bounded exponent of _n_ would be the same thing as "polynomial time", but
the thing that's bounded (and indeed arbitrarily close to 2 for large _n_ ) is
the exponent of log _n_.

The running time of the algorithm presented in this paper is _n_ (log _n_
)^(2+ _o_ (1)). This ...

... is not constant; it increases with _n_ , a bit faster than linearly.

... has _o_ (1), not _O_ (1), in the exponent; the two mean different things.
_O_ (1) means "bounded", o(1) means "tends to zero". The claim isn't that the
running time is <= _n_ times polynomial(log _n_ ) but that it's <= _n_ times
"at most approximately a quadratic polynomial in log _n_ ".

... doesn't in fact depend mostly on that exponent; the most important factor
is the _n_ , not the (log _n_ )^(2+o(1)). If that 2 were a 100, the _n_ factor
would still (asymptotically) matter more.

For instance, suppose _n_ =2^100 and our logs are to base 2. Then the running
time of this algorithm is approximately some constant times 2^100 (that's the
_n_ factor) times 100^2 (that's the factor with log _n_ in it). 2^100 is much,
much, much bigger than 100^2.

------
ComputerGuru
I presume you came across this researching the zero-day DoS in Windows 10 (and
others?) caused by an infinite loop in Microsoft's modular inversion code?
Thanks for sharing!

~~~
AnaniasAnanas
OP probably saw DJBs tweet
[https://twitter.com/hashbreaker/status/1139008213570007040](https://twitter.com/hashbreaker/status/1139008213570007040)

------
londons_explore
Isn't constant time GCD a problem for factoring big primes?

~~~
nullc
I have an O(N) algorithm for factoring any prime: Read each digit of the prime
from the input tape and write it to the output tape. :P

(Perhaps you meant factoring large semi-primes? :) )

~~~
hoseja
I have an O(1) then!

~~~
ColinWright
Given that you have to read and then write each digit of the input I find it
hard to believe that you have an O(1) algorithm - can you tell us what it is?

~~~
Yajirobe
Take a photo of the input on the tape and print the photo.

~~~
mratsim
Depending of the size of the prime you might need to take multiple photos.

