

Fast Fibonacci - brudgers
http://www.nayuki.io/page/fast-fibonacci-algorithms

======
Double_Cast
The Fast Fibonacci article links to an article on an algorithm called
"karatsuba-multiplication". Oi - I just lost two hours of my life to reverse-
engineering it. (But it was worth!). In case anyone else was wondering, here's
what I got out of it.

    
    
      x*y =  (x)(y) 
          // fragment into binomials
          =  (x/m*m + x%m)(y/m*m + y%m)     // uses integer division
          // distribute binomials
          =  (x/m*m)(y/m*m)  + (x/m*m)(y%m) + (x%m)(y/m*m) + (x%m)(y%m)
          // extract modulus from each term
          =  [(x/m)(y/m)]m^2 + [(x/m)(y%m)  + (x%m)(y/m)]m + [(x%m)(y%m)]
          // rename terms
          =  [A]m^2 + [B+C]m + [D]
          // where [A]   is what nayuki calls (a),
          //       [B+C] is what nayuki calls (a-b-c), and
          //       [D]   is what nayuki calls (c).
      Q.E.D.
    

Geometrically, what we're doing is:

1) taking the area of two binomials;

2) shrinking Quadrant A by a factor of the modulus's perfect-square (e.g.
3^2);

3) shrinking Quadrant B and Quadrant C by a factor of the modulus per se (e.g.
3); and

4) leaving Quadrant D the same.

    
    
      [B B B D D]  scaled by modulus  [B D D]
      [A A A C C]  ---------------->  [A C C]
      [A A A C C]  where m = 3
      [A A A C C]

~~~
Chinjut
You've left out the key observation that makes Karatsuba multiplication faster
than naive multiplication: after having computed [A] and [D], we can compute
the combined [B + C] with one multiplication rather than two. That is, to
multiply two binomials of a given size, we only need three multiplications of
binomials of half that size, rather than the naive four. This is what improves
the runtime of Karatsuba multiplication to Ө(n^(log_2(3))) from the ordinary
Ө(n^2).

Why are we able to get away with only three recursive multiplications instead
of four? The observation here is that in the product (Em + F)(Gm + H) = EGm^2
+ (EH + FG)m + FH, after the first and last coefficients EG and FH have
already been calculated, the middle coefficient (EH + FG) can be calculated
using only the one new multiplication in (E + F)(G + H) - EG - FH, rather than
the naive two in its definition.

~~~
Chinjut
We can also view Karatsuba multiplication as a special case of a general class
of polynomial multiplication algorithms (after all, numbers written in a
particular base are just polynomials evaluated at that base). Naively,
multiplying two polynomials of degree n takes (n + 1)^2 multiplications of
coefficients, but a more clever approach would evaluate the two polynomials at
a number of points, then multiply those values together, then fit a polynomial
to the results. This involves only (2n + 1) multiplications (the number of
values needed to determine the resulting polynomial).

Thus, our clever binomial trick is the observation that multiplying two
polynomials of degree 1 yields a polynomial of degree 2, so we need to pick 3
points for our evaluation; if we pick the points to be 0, 1, and "infinity"
(in the sense of a value such that P("infinity") = the leading coefficient of
P, just as P(0) equals the degree zero coefficient of P), then we get our
Karatsuba trick: we break (Ex + F) down into F, E + F, and E, and similarly
for (Gx + H), then multiply the corresponding values together (our three
multiplications), and from those results reconstitute the polynomial product
we are interested in. (And we can apply this recursively to break down the
multiplication of polynomials of greater degree than 1)

An even more clever approach notes that we can evaluate our polynomials at
various roots of unity all at once very efficiently with the Fast Fourier
Transform, which is what leads to the most efficient multiplication algorithms
known.

------
userbinator
Interesting to see that "slow dynamic programming" which is the classic
exchange-and-add algorithm is the fastest up to around n=126, at which point
fast doubling with naive multiplication takes the lead, and Karatsuba
multiplication needs n over 5000 before it becomes faster than the other
methods. But, assuming that you can much more easily get an order of magnitude
improvement out of the "simple and slow" method than the more complex ones,
then the break-even point moves up to n=1260.

In other words, always choosing the asymptotically fastest (and often more
complex - with larger constants) algorithms is a bad decision if most of your
inputs never get large enough; simpler algorithms are good, even if they're
not the theoretically fastest, because they also tend to be easier to
optimise.

However, the recursive algorithm being faster than slow-DP until n=5 is a
little odd... some other effect could be responsible for that.

~~~
brudgers
Optimizing on Big-O is always about working with interestingly sized data. If
we're going to be doing lots of calculations and only talking about a limited
range of possible values, memoization is asymptotically O(1) time.

------
_delirium
A nitpick:

> If we wanted to compute a single term in the sequence (e.g. F(n)), there are
> a couple of algorithms to do so, but some algorithms are much faster than
> others.

If we wanted only a _single_ term in the sequence, we would just use the
closed-form equation, which takes O(1) bigint arithmetic operations,

    
    
        F(n) = (phi^n - (-phi)^-n) / sqrt(5)
        (Where phi is the golden ratio.)
    

But if what's needed is the entire sequence F(0) up through a chosen F(n),
then the algorithms in this article come in handy.

~~~
nayuki
Regarding the computation of a single term:

No, the well-known Binet formula involving phi is unsuitable because it uses
real numbers. See these detailed comments about handling quadratic surds
properly, which is effectively the same as the fast doubling algorithm:

\- Chinjut:
[https://news.ycombinator.com/item?id=9316574](https://news.ycombinator.com/item?id=9316574)

\- xyzzyz:
[https://news.ycombinator.com/item?id=9316144](https://news.ycombinator.com/item?id=9316144)

Regarding the computation of F(0) through F(n):

No, this is not true either. The matrix method and fast doubling only compute
O(log(n)) terms of the sequence, not the complete sequence from 0 to n.

~~~
_delirium
Ah, right, I am wrong on both counts. Thanks for the pointers!

------
cabinpark
Good stuff! I love Fibonacci algorithms and if you dig deeper, there are even
faster algorithms that exploit even more identities like the fast doubling and
look-up tables.

It's a fun recreational mathematics thing, but unfortunately I've never come
across a problem where I need to solve Fibonacci insanely quickly. However I
still keep my hopes up!

~~~
xtagon
Just a guess, but it might be useful for fractal rendering. That's the only
real application I can think of.

~~~
eru
The techniques developed are useful in other areas. Eg Double_Cast talks about
fast matrix multiplication above---certainly a useful technique elsewhere.

------
arh68
Great writeup! I only wish the author had benchmarked the approximate
algorithm with phi and floating points too, because it's often 'good enough'

    
    
        >>> fib = lambda n: 5**-.5*((.5+.5*5**.5)**n - (.5-.5**.5)**n)
        >>> fib(12)
        144.00138887270816
        >>> fib(250)
        7.896325826131797e+51
    

EDIT: Laughing, don't know why I called it factorial at first. Whoops

~~~
osoba
There is, actually, an even simpler formula.

While the explicit formula for the n-th Fibbonacci number is indeed F_n =
1/sqrt5 * [ ((1+sqrt5)/2)^n - ((1-sqrt5)/2)^n ] or F_n = (1.6180^n +
0.6180^n)/2.2361 you can notice that the -((1-sqrt5)/2)^n = 0.6180^n term
tends to 0, and is in fact smaller than 0.5 even for n=2, which gives us an
even simpler formula for F_n = ceil(1.6180^n/2.2361).

This would also take care of the limited precission of the float point
numbers.

~~~
pavpanchekha
What do you mean, take care of the limited precision? The formula you gave is
only correct for

    
    
        n = [1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17]
    

The reason the explicit floating point formula doesn't get used is because you
need as many bits of floating point as you expect to get bits of answer. Which
means that you get the wrong answer pretty quickly. If the goal was to
calculate an approximation to the fibonacci numbers, sure, but then you don't
need the (ceil) call.

~~~
osoba
Oh yeah you're right

Although, wouldn't variable overflow be the case for any algorithm, since
already the 100th or so Fibonacci number takes more space than 2^64

And if you were to use the full float precision of sqr5 (and not just 4 digits
like I did), you would run into 64bit float overflow way before that algorithm
stops being accurate

~~~
TheLoneWolfling
Everything else is talking about bigints.

If you were dealing with fixed-length integers, just use a lookup table.

Also, you're wrong regarding floating-point. On my machine, fib(101) is off
from your approximation by ~2 million (!). Well before hitting the limits of a
64-bit float. It starts erroring at n=71, and only gets worse from there.
Worse, the relative delta grows as n increases.

~~~
osoba
Hmm ok

Thanks for actually testing it

~~~
TheLoneWolfling
Python is wonderful for that sort of thing.

------
TheLoneWolfling
It would be interesting to see what the asymtotic complexity would be in terms
of number of (fixed-size) operations as opposed to number of bigint operations
required.

I mean, the nth Fibonacci number requires O(n) bits to store.

~~~
fdej
It's O(M(n)) where M(n) is the complexity of the algorithm used for
multiplying n-bit integers.

To see why, note that each step of the matrix powering or fast doubling
algorithm 1) costs a constant C number of multiplications, and 2) roughly
doubles the number of bits.

F(n) has b = O(n) bits, so the cost is essentially C M(b) + C M(b/2) + C
M(b/4) + ... <= 2 C M(b) = O(M(n)), using the natural assumption that M(2x) >=
2M(x).

~~~
TheLoneWolfling
So then: is there any better solution? Or is that the optimal?

~~~
fdej
It's almost certainly optimal, but proving that is likely an open problem.

------
spoonman1
Maybe someone can help me understand a bit better. The author talks about how
slow a recursive algorithm is, but the "fast doubling" implementation itself
uses recursion. Would not the same slower effect be felt by this
implementation due to its use of recursion?

~~~
brudgers
To use the language of Ableson and Sussman in _Structure and Interpretation of
Computer Programs_ [node 15]:

The fast doubling algorithm has a recursive _definition_ and produces a
logarithmic _procedure._

The classic implementation based on the mathematical definition:

    
    
      def F(n):
        if n == 0: return 0
        elif n == 1: return 1
        else: return F(n-1)+F(n-2)
    

Has a recursive _definition_ and produces a recursive _procedure_ [even worse
the recursive _procedure_ runs in exponential time]. It is also possible to
create a recursive _definition_ of a _procedure_ that runs in linear time.

[node 15]: [https://mitpress.mit.edu/sicp/full-
text/sicp/book/node15.htm...](https://mitpress.mit.edu/sicp/full-
text/sicp/book/node15.html)

~~~
bhrgunatha
In fact SICP covers the fast doubling fibonacci algorithm directly in exercise
1.19[1] (although in a slightly obfuscated form)

[1] [http://mitpress.mit.edu/sicp/full-text/book/book-
Z-H-11.html...](http://mitpress.mit.edu/sicp/full-text/book/book-
Z-H-11.html#%_thm_1.19)

------
ggchappell
Fun stuff.

But why only take n up to around 4 million? Not too long ago, I was able to
compute the billionth Fibonacci number on a pretty ordinary machine, using a
Haskell implementation of the fast doubling method. (The number took 20
minutes to print out in my console window.)

By the way, I did a Python implementation, too. And I learned that GHC large-
integer multiplication is a lot faster than CPython large-integer
multiplication. I wonder why. (No, it has nothing to do with dynamic typing;
almost all of the time is going to be spent executing internal integer-
multiplication routines written in C.)

------
balls187
One of my warm up questions is to ask to code Fib. It's easy enough to explain
and leads to interesting questions about performance.

I'm still waiting for a candidate to break out Binet's formula.

~~~
brudgers
Interesting. Still runs in O(log n). Link:
[http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.ht...](http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.html)

------
ksjjsk
you know its actually an O(logn) operation with fast exponentiation through
squaring.....

[http://en.wikipedia.org/wiki/Fibonacci_number#Relation_to_th...](http://en.wikipedia.org/wiki/Fibonacci_number#Relation_to_the_golden_ratio)

------
thomasfromcdnjs
bm

