
Calculating integer factorials in constant time, taking advantage of overflow - DmitryNovikov
https://blogs.msdn.microsoft.com/oldnewthing/20151214-00/?p=92621
======
tantalor
If you enforce n<=12 (to prevent undefined behavior) in the original "naive"
algorithm, then it too is constant time, i.e., O(12).

~~~
Dylan16807
_Constant_ time goes beyond O(1), it means that the lower bound is equal to
the upper bound. 10! can't be faster than 12!

~~~
tantalor
No, I don't think so. O(1) is "constant" time. See
[https://en.wikipedia.org/wiki/Time_complexity](https://en.wikipedia.org/wiki/Time_complexity)

~~~
Dylan16807
O(1) is a constant time _bound_. But when people talk about constant time they
often mean literally constant, such as to avoid timing attacks.

~~~
dragonwriter
Its true that constant time is sometimes used in that way, though in
discussions of computation complexity, "constant time" usually means O(1).

There narrower use is more like Θ(1) with the upper and lower constants
identical.

------
arbre
You can also compute the factorials at compile time with templates:

template<int64_t n> int64_t fact() { return n * fact<n-1>(); } template<>
int64_t fact<0>() { return 1; }

The book "Modern C++ design" from Alexandrescu extensively covers that kind of
stuff.

There might even be a way to use a loop and constexprs.

~~~
wyager
Won't most modern compiled languages statically evaluate constant numerical
expressions?

------
hellofunk
I found this rather strange. Use of the word "calculate" is a bit of a stretch
with dealing with a lookup table to get results. Worth reading, however.

~~~
mzl
The point is that the lookup table contains results for _all_ inputs to the
function that have a defined result (that is, functions that do not trigger
overflow and thus undefined behaviour). In that sense, it has the same
semantics as the original function although most people would not suspect
that.

------
dvh
... or get real:
[https://en.wikipedia.org/wiki/Stirling%27s_approximation](https://en.wikipedia.org/wiki/Stirling%27s_approximation)

~~~
uxcn
I was expecting to see a variant of this.

------
kleiba
Interestingly, I was just doing some calculations over the weekend involving
large factorials. Naturally, I used a "big integer" library to overcome the
limitations of the hardware architecture. That anyone outside a class-room
would be willing to restrict themselves to 32 or 64 bit for the largest
integers thus seems contrived: of course, dealing with factorials will quickly
get you past your system's word size, but that's what arbitrary-length integer
libs are for.

~~~
a1k0n
Depends what you need them for... In statistics, factorials appear in
normalizing constants for various distributions (e.g., Poisson) and often
you're working with floating point log-probabilities, in which case you can
use the log-gamma function (built into most math libraries) to replace the log
factorial. And that's a 64-bit float, no bignums needed.

~~~
jeffwass
But in what situation would you actually need an arbitrary precision library
for general statistics? Ie, in what situations would any reasonable
approximations not suffice for reasonable real-world precision?

Eg, Stirlings approximation for ln(n!) should serve fairly well for moderate
and above values of n. And knowing your statistical distribution at large n
you can likely find a decent distribution from a discrete case requiring n!
into a continuous one, eg Gaussian.

~~~
a1k0n
You wouldn't, for statistics. That's my point.

Only in number theory problems do you actually need bignums for factorials,
and those are usually toy problems or modulo some big prime.

I don't know of any actual applications where you need to compute the bignum
factorial. I'm curious what the GP needed it for.

~~~
kleiba
I needed them for a toy problem as well. Tried to experimentally verify the
equality of two formulas involving large factorials. It was a counting
problem.

------
WhitneyLand
I wonder if old school developers are more likely to think of these types of
solutions.

When I wrote games for older hardware every frickin' thing possible used look
up tables to maximize speed.

Coincidentally I received this type of problem later when interviewing at MS,
but it wasn't Raymond Chen's group.

~~~
noobermin
What do you mean? They asked you to compute something but expected you to use
a lookup table instead?

~~~
WhitneyLand
Yes, it was the only way to speed up an algorithm's implementation and they
wanted to see if you could realize that.

------
scott_s
I've used a similar technique for base-2 logarithms, when I knew the context
in which it was used would only require handling powers-of-2 up to 16384:
[https://github.com/scotts/streamflow/blob/master/streamflow....](https://github.com/scotts/streamflow/blob/master/streamflow.c#L269)

~~~
drfuchs
Dude, your case statement will compile into a bunch of test and branch
instructions, implementing a binary search if you're lucky, and a linear
search if you're not. Better to use __builtin_clz() or similar, so your
function will just take a few cycles with no (expensive) branching. Extra
bonus: with care, it can work for all inputs, giving a truncated or rounded
result.

~~~
uxcn
On x86, it could also compile down to just a _jmp_. A _bsr_ would still be
faster, but it would probably be worth checking that ___builtin_clz_ actually
compiles down to _bsr_.

For portability reasons, it might be better just using bit tricks and letting
the compiler try to optimize it though. Clang and GCC do pretty well here.

------
ape4
I like the idea of hybrid. Since the value of a factorial depends on the
factorial of the previous 2 numbers. Cache fact(n) and fact(n-1) for a
selection of n's. eg n = 10, 100, 1000, 10000, ...

------
tromp
Meanwhile, here's an artistic rendering of the factorial function:

    
    
        ┬────────────────
        ┼─────────────┬──
        │ ──┬──────── ┼ ┬
        │ ┬─┼─┬────── │ │
        │ │ │ ┼─┬─┬── │ │
        │ │ │ ┼─┼─┼─┬ │ │
        │ │ │ └─┤ ├─┘ │ │
        │ │ │   ├─┘   │ │
        │ │ ├───┘     │ │
        │ ├─┘         │ │
        └─┤           │ │
          └───────────┤ │
                      └─┘
    

(a diagram of factorial on Church numerals)

------
NelsonMinar
There's a similar algorithm for O(1) primality checking. It does require
access to Google though.

~~~
mikeash
This also works well for reversing many cryptographic hashes.

------
banku_brougham
I like this. It clearly shows that for all the computing power available, some
algebra or calculus can be better for discovering the general form of a
solution.

------
yangshun
Clickbait much.

