x*y = (x)(y)
// fragment into binomials
= (x/m*m + x%m)(y/m*m + y%m) // uses integer division
// distribute binomials
= (x/m*m)(y/m*m) + (x/m*m)(y%m) + (x%m)(y/m*m) + (x%m)(y%m)
// extract modulus from each term
= [(x/m)(y/m)]m^2 + [(x/m)(y%m) + (x%m)(y/m)]m + [(x%m)(y%m)]
// rename terms
= [A]m^2 + [B+C]m + [D]
// where [A] is what nayuki calls (a),
// [B+C] is what nayuki calls (a-b-c), and
// [D] is what nayuki calls (c).
1) taking the area of two binomials;
2) shrinking Quadrant A by a factor of the modulus's perfect-square (e.g. 3^2);
3) shrinking Quadrant B and Quadrant C by a factor of the modulus per se (e.g. 3); and
4) leaving Quadrant D the same.
[B B B D D] scaled by modulus [B D D]
[A A A C C] ----------------> [A C C]
[A A A C C] where m = 3
[A A A C C]
I wouldn't call that "lost", myself. Good job. :-)
There is also Strassen's algorithm for matrix multiplication. Something to keep in mind if you ever need to multiply really gigantic matrices.
Why are we able to get away with only three recursive multiplications instead of four? The observation here is that in the product (Em + F)(Gm + H) = EGm^2 + (EH + FG)m + FH, after the first and last coefficients EG and FH have already been calculated, the middle coefficient (EH + FG) can be calculated using only the one new multiplication in (E + F)(G + H) - EG - FH, rather than the naive two in its definition.
Thus, our clever binomial trick is the observation that multiplying two polynomials of degree 1 yields a polynomial of degree 2, so we need to pick 3 points for our evaluation; if we pick the points to be 0, 1, and "infinity" (in the sense of a value such that P("infinity") = the leading coefficient of P, just as P(0) equals the degree zero coefficient of P), then we get our Karatsuba trick: we break (Ex + F) down into F, E + F, and E, and similarly for (Gx + H), then multiply the corresponding values together (our three multiplications), and from those results reconstitute the polynomial product we are interested in. (And we can apply this recursively to break down the multiplication of polynomials of greater degree than 1)
An even more clever approach notes that we can evaluate our polynomials at various roots of unity all at once very efficiently with the Fast Fourier Transform, which is what leads to the most efficient multiplication algorithms known.
In other words, always choosing the asymptotically fastest (and often more complex - with larger constants) algorithms is a bad decision if most of your inputs never get large enough; simpler algorithms are good, even if they're not the theoretically fastest, because they also tend to be easier to optimise.
However, the recursive algorithm being faster than slow-DP until n=5 is a little odd... some other effect could be responsible for that.
> If we wanted to compute a single term in the sequence (e.g. F(n)), there are a couple of algorithms to do so, but some algorithms are much faster than others.
If we wanted only a single term in the sequence, we would just use the closed-form equation, which takes O(1) bigint arithmetic operations,
F(n) = (phi^n - (-phi)^-n) / sqrt(5)
(Where phi is the golden ratio.)
No, the well-known Binet formula involving phi is unsuitable because it uses real numbers. See these detailed comments about handling quadratic surds properly, which is effectively the same as the fast doubling algorithm:
- Chinjut: https://news.ycombinator.com/item?id=9316574
- xyzzyz: https://news.ycombinator.com/item?id=9316144
Regarding the computation of F(0) through F(n):
No, this is not true either. The matrix method and fast doubling only compute O(log(n)) terms of the sequence, not the complete sequence from 0 to n.
Doing it with floats/doubles is a little more reasonable for an O(1) approach, but even that approach breaks down for large enough n.
It's a fun recreational mathematics thing, but unfortunately I've never come across a problem where I need to solve Fibonacci insanely quickly. However I still keep my hopes up!
>>> fib = lambda n: 5**-.5*((.5+.5*5**.5)**n - (.5-.5**.5)**n)
While the explicit formula for the n-th Fibbonacci number is indeed F_n = 1/sqrt5 * [ ((1+sqrt5)/2)^n - ((1-sqrt5)/2)^n ]
or F_n = (1.6180^n + 0.6180^n)/2.2361
you can notice that the -((1-sqrt5)/2)^n = 0.6180^n term tends to 0, and is in fact smaller than 0.5 even for n=2, which gives us an even simpler formula for F_n = ceil(1.6180^n/2.2361).
This would also take care of the limited precission of the float point numbers.
n = [1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17]
Although, wouldn't variable overflow be the case for any algorithm, since already the 100th or so Fibonacci number takes more space than 2^64
And if you were to use the full float precision of sqr5 (and not just 4 digits like I did), you would run into 64bit float overflow way before that algorithm stops being accurate
If you were dealing with fixed-length integers, just use a lookup table.
Also, you're wrong regarding floating-point. On my machine, fib(101) is off from your approximation by ~2 million (!). Well before hitting the limits of a 64-bit float. It starts erroring at n=71, and only gets worse from there. Worse, the relative delta grows as n increases.
Thanks for actually testing it
[Note that, by representing a + bφ in terms of its individual a and b components, we can just directly extract the components when we need them, instead of having to extract the coefficient of φ in x as (x - conj(x))/(φ - conj(φ)) [where conj(a + bφ) = a + b(1 - φ); note that this conjugation operator distributes across multiplication], which is what Binet's formula does to extract the coefficient of φ in φ^n]
Edit: implemented binary squaring:
I mean, the nth Fibonacci number requires O(n) bits to store.
To see why, note that each step of the matrix powering or fast doubling algorithm 1) costs a constant C number of multiplications, and 2) roughly doubles the number of bits.
F(n) has b = O(n) bits, so the cost is essentially C M(b) + C M(b/2) + C M(b/4) + ... <= 2 C M(b) = O(M(n)), using the natural assumption that M(2x) >= 2M(x).
The author is saying that the naive recursive algorithm is incredibly slow. That would be an implementation like this (in Python):
if n < 2:
return fib(n-1) + fib(n-2)
As a side note, the Java and C# implementations of the "fast doubling" algorithm actually used a fixed number of loop iterations instead of the mathematical recursive formula.
To use Ableson's precise language:
Every iterative procedure can be expressed as
a recursive *definition.*
The fast doubling algorithm has a recursive definition and produces a logarithmic procedure.
The classic implementation based on the mathematical definition:
if n == 0: return 0
elif n == 1: return 1
else: return F(n-1)+F(n-2)
[node 15]: https://mitpress.mit.edu/sicp/full-text/sicp/book/node15.htm...
But why only take n up to around 4 million? Not too long ago, I was able to compute the billionth Fibonacci number on a pretty ordinary machine, using a Haskell implementation of the fast doubling method. (The number took 20 minutes to print out in my console window.)
By the way, I did a Python implementation, too. And I learned that GHC large-integer multiplication is a lot faster than CPython large-integer multiplication. I wonder why. (No, it has nothing to do with dynamic typing; almost all of the time is going to be spent executing internal integer-multiplication routines written in C.)
I'm still waiting for a candidate to break out Binet's formula.