
Montgomery Multiplication (2012) [pdf] - Cieplak
http://www.hackersdelight.org/MontgomeryMultiplication.pdf
======
crispweed
> The computation [...] is multiplying two 64-bit unsigned integers, giving a
> 128-bit product. Some machines have an instruction for that.

And compilers often let you call these instructions fairly directly, with
compiler intrinsics.

With Visual Studio on Windows x64, for example, you can implement the
mulul64() function with _umul128: [https://docs.microsoft.com/en-
us/cpp/intrinsics/umul128](https://docs.microsoft.com/en-
us/cpp/intrinsics/umul128) (and expect quite a good speedup)

