Hacker News new | past | comments | ask | show | jobs | submit login
Fast Fibonacci (nayuki.io)
69 points by brudgers on Apr 3, 2015 | hide | past | web | favorite | 49 comments

The Fast Fibonacci article links to an article on an algorithm called "karatsuba-multiplication". Oi - I just lost two hours of my life to reverse-engineering it. (But it was worth!). In case anyone else was wondering, here's what I got out of it.

  x*y =  (x)(y) 
      // fragment into binomials
      =  (x/m*m + x%m)(y/m*m + y%m)     // uses integer division
      // distribute binomials
      =  (x/m*m)(y/m*m)  + (x/m*m)(y%m) + (x%m)(y/m*m) + (x%m)(y%m)
      // extract modulus from each term
      =  [(x/m)(y/m)]m^2 + [(x/m)(y%m)  + (x%m)(y/m)]m + [(x%m)(y%m)]
      // rename terms
      =  [A]m^2 + [B+C]m + [D]
      // where [A]   is what nayuki calls (a),
      //       [B+C] is what nayuki calls (a-b-c), and
      //       [D]   is what nayuki calls (c).
Geometrically, what we're doing is:

1) taking the area of two binomials;

2) shrinking Quadrant A by a factor of the modulus's perfect-square (e.g. 3^2);

3) shrinking Quadrant B and Quadrant C by a factor of the modulus per se (e.g. 3); and

4) leaving Quadrant D the same.

  [B B B D D]  scaled by modulus  [B D D]
  [A A A C C]  ---------------->  [A C C]
  [A A A C C]  where m = 3
  [A A A C C]

> I just lost two hours of my life to reverse-engineering it.

I wouldn't call that "lost", myself. Good job. :-)

There is also Strassen's algorithm for matrix multiplication. Something to keep in mind if you ever need to multiply really gigantic matrices.


You've left out the key observation that makes Karatsuba multiplication faster than naive multiplication: after having computed [A] and [D], we can compute the combined [B + C] with one multiplication rather than two. That is, to multiply two binomials of a given size, we only need three multiplications of binomials of half that size, rather than the naive four. This is what improves the runtime of Karatsuba multiplication to Ө(n^(log_2(3))) from the ordinary Ө(n^2).

Why are we able to get away with only three recursive multiplications instead of four? The observation here is that in the product (Em + F)(Gm + H) = EGm^2 + (EH + FG)m + FH, after the first and last coefficients EG and FH have already been calculated, the middle coefficient (EH + FG) can be calculated using only the one new multiplication in (E + F)(G + H) - EG - FH, rather than the naive two in its definition.

We can also view Karatsuba multiplication as a special case of a general class of polynomial multiplication algorithms (after all, numbers written in a particular base are just polynomials evaluated at that base). Naively, multiplying two polynomials of degree n takes (n + 1)^2 multiplications of coefficients, but a more clever approach would evaluate the two polynomials at a number of points, then multiply those values together, then fit a polynomial to the results. This involves only (2n + 1) multiplications (the number of values needed to determine the resulting polynomial).

Thus, our clever binomial trick is the observation that multiplying two polynomials of degree 1 yields a polynomial of degree 2, so we need to pick 3 points for our evaluation; if we pick the points to be 0, 1, and "infinity" (in the sense of a value such that P("infinity") = the leading coefficient of P, just as P(0) equals the degree zero coefficient of P), then we get our Karatsuba trick: we break (Ex + F) down into F, E + F, and E, and similarly for (Gx + H), then multiply the corresponding values together (our three multiplications), and from those results reconstitute the polynomial product we are interested in. (And we can apply this recursively to break down the multiplication of polynomials of greater degree than 1)

An even more clever approach notes that we can evaluate our polynomials at various roots of unity all at once very efficiently with the Fast Fourier Transform, which is what leads to the most efficient multiplication algorithms known.

Interesting to see that "slow dynamic programming" which is the classic exchange-and-add algorithm is the fastest up to around n=126, at which point fast doubling with naive multiplication takes the lead, and Karatsuba multiplication needs n over 5000 before it becomes faster than the other methods. But, assuming that you can much more easily get an order of magnitude improvement out of the "simple and slow" method than the more complex ones, then the break-even point moves up to n=1260.

In other words, always choosing the asymptotically fastest (and often more complex - with larger constants) algorithms is a bad decision if most of your inputs never get large enough; simpler algorithms are good, even if they're not the theoretically fastest, because they also tend to be easier to optimise.

However, the recursive algorithm being faster than slow-DP until n=5 is a little odd... some other effect could be responsible for that.

Optimizing on Big-O is always about working with interestingly sized data. If we're going to be doing lots of calculations and only talking about a limited range of possible values, memoization is asymptotically O(1) time.

A nitpick:

> If we wanted to compute a single term in the sequence (e.g. F(n)), there are a couple of algorithms to do so, but some algorithms are much faster than others.

If we wanted only a single term in the sequence, we would just use the closed-form equation, which takes O(1) bigint arithmetic operations,

    F(n) = (phi^n - (-phi)^-n) / sqrt(5)
    (Where phi is the golden ratio.)
But if what's needed is the entire sequence F(0) up through a chosen F(n), then the algorithms in this article come in handy.

Regarding the computation of a single term:

No, the well-known Binet formula involving phi is unsuitable because it uses real numbers. See these detailed comments about handling quadratic surds properly, which is effectively the same as the fast doubling algorithm:

- Chinjut: https://news.ycombinator.com/item?id=9316574

- xyzzyz: https://news.ycombinator.com/item?id=9316144

Regarding the computation of F(0) through F(n):

No, this is not true either. The matrix method and fast doubling only compute O(log(n)) terms of the sequence, not the complete sequence from 0 to n.

Ah, right, I am wrong on both counts. Thanks for the pointers!

As andrewla notes, big int operations aren't O(1) in an apples-to-apples comparison. In fact, if you work out all of the math for a Z(5) implementation, this approach is essentially the same as the fast matrix exponentiation approach from the article. As I recall, they differ only by an application of some Fibbonaci identity (because of these identities, there are many equivalent recurrence relations).

To pick a little further at this nit; bigint is no good, right? The only way to compute phi^n with bigints is to do an O(log n) operations in the algebraic integers Z(5) (similar to exponentiating complex integers).

Doing it with floats/doubles is a little more reasonable for an O(1) approach, but even that approach breaks down for large enough n.

That quickly fails due to the nature of floating point arithmetic.

Good stuff! I love Fibonacci algorithms and if you dig deeper, there are even faster algorithms that exploit even more identities like the fast doubling and look-up tables.

It's a fun recreational mathematics thing, but unfortunately I've never come across a problem where I need to solve Fibonacci insanely quickly. However I still keep my hopes up!

Fibonacci numbers are so fun that Fibonacci Quarterly has been around for 52 years.


Just a guess, but it might be useful for fractal rendering. That's the only real application I can think of.

The techniques developed are useful in other areas. Eg Double_Cast talks about fast matrix multiplication above---certainly a useful technique elsewhere.

Great writeup! I only wish the author had benchmarked the approximate algorithm with phi and floating points too, because it's often 'good enough'

    >>> fib = lambda n: 5**-.5*((.5+.5*5**.5)**n - (.5-.5**.5)**n)
    >>> fib(12)
    >>> fib(250)
EDIT: Laughing, don't know why I called it factorial at first. Whoops

There is, actually, an even simpler formula.

While the explicit formula for the n-th Fibbonacci number is indeed F_n = 1/sqrt5 * [ ((1+sqrt5)/2)^n - ((1-sqrt5)/2)^n ] or F_n = (1.6180^n + 0.6180^n)/2.2361 you can notice that the -((1-sqrt5)/2)^n = 0.6180^n term tends to 0, and is in fact smaller than 0.5 even for n=2, which gives us an even simpler formula for F_n = ceil(1.6180^n/2.2361).

This would also take care of the limited precission of the float point numbers.

What do you mean, take care of the limited precision? The formula you gave is only correct for

    n = [1, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17]
The reason the explicit floating point formula doesn't get used is because you need as many bits of floating point as you expect to get bits of answer. Which means that you get the wrong answer pretty quickly. If the goal was to calculate an approximation to the fibonacci numbers, sure, but then you don't need the (ceil) call.

Oh yeah you're right

Although, wouldn't variable overflow be the case for any algorithm, since already the 100th or so Fibonacci number takes more space than 2^64

And if you were to use the full float precision of sqr5 (and not just 4 digits like I did), you would run into 64bit float overflow way before that algorithm stops being accurate

Everything else is talking about bigints.

If you were dealing with fixed-length integers, just use a lookup table.

Also, you're wrong regarding floating-point. On my machine, fib(101) is off from your approximation by ~2 million (!). Well before hitting the limits of a 64-bit float. It starts erroring at n=71, and only gets worse from there. Worse, the relative delta grows as n increases.

Hmm ok

Thanks for actually testing it

Python is wonderful for that sort of thing.

I go into this a liitle bit at http://lee-phillips.org/lispmath/

That only works for the first 20 or so Fibonacci numbers due to limited floating point precision, so then you could also use a 20 entry lookup table.

Agreed, a benchmark would be cool. From what I've read before it ends up taking longer than the integer methods anyway because you still need to keep log n bits of precision in the floating point values around to get the correct answer. Perhaps if you had a use case for approximate values of very large fibonacci numbers it would be useful? I'd love to hear that use case.

It would be incomparable because the 4 algorithms described on the page produce exact integer results, whereas a floating-point algorithm will not have enough precision to represent larger numbers.

The Binet formula operates on noninteger numbers, but it does not need to be implemented using floating points. One can operate in a field Q(sqrt(5)), and represent elements using two rational numbers, obtaining exact results. See example implementation: https://github.com/xyzzyz/FibBinet

The "fast doubling" approach IS the natural calculation in Q(sqrt(5)). That is, let φ^2 = φ + 1 (so φ = (1 + sqrt(5))/2). Then (a + bφ)^2 = (a^2 + b^2) + (2ab + b^2)φ. Furthermore, (a + bφ) * φ = b + (a + b)φ. Finally, we have that φ^n = F(n - 1) + F(n)φ. Thus, we can extract the n-th Fibonacci number from the n-th power of φ, which we can compute in the form a + bφ by "repeated squaring" using the above rules, which is precisely what the "repeated doubling" approach of the linked-to page is effectively doing.

[Note that, by representing a + bφ in terms of its individual a and b components, we can just directly extract the components when we need them, instead of having to extract the coefficient of φ in x as (x - conj(x))/(φ - conj(φ)) [where conj(a + bφ) = a + b(1 - φ); note that this conjugation operator distributes across multiplication], which is what Binet's formula does to extract the coefficient of φ in φ^n]

and the same thing in Python:


Edit: implemented binary squaring:


factorial(n) is an odd choice of name for the nth Fibonacci number. You should at the very least provide a doc string! :)

Thank you. The 'code' is bad enough, would never pass review ;)

It would be interesting to see what the asymtotic complexity would be in terms of number of (fixed-size) operations as opposed to number of bigint operations required.

I mean, the nth Fibonacci number requires O(n) bits to store.

It's O(M(n)) where M(n) is the complexity of the algorithm used for multiplying n-bit integers.

To see why, note that each step of the matrix powering or fast doubling algorithm 1) costs a constant C number of multiplications, and 2) roughly doubles the number of bits.

F(n) has b = O(n) bits, so the cost is essentially C M(b) + C M(b/2) + C M(b/4) + ... <= 2 C M(b) = O(M(n)), using the natural assumption that M(2x) >= 2M(x).

So then: is there any better solution? Or is that the optimal?

It's almost certainly optimal, but proving that is likely an open problem.

Maybe someone can help me understand a bit better. The author talks about how slow a recursive algorithm is, but the "fast doubling" implementation itself uses recursion. Would not the same slower effect be felt by this implementation due to its use of recursion?

The author is not saying "every algorithm which uses recursion is slow".

The author is saying that the naive recursive algorithm is incredibly slow. That would be an implementation like this (in Python):

    def fib(n):
        if n < 2:
            return n
            return fib(n-1) + fib(n-2)
And that implementation is slow. It's exponentially slow.

Agreed, this response is exactly correct. I didn't realize that my wording of "slow recursion" could be misread as "every recursion is slow", which is not true - especially because every iterative procedure can be expressed as a recursive procedure with the same time complexity.

As a side note, the Java and C# implementations of the "fast doubling" algorithm actually used a fixed number of loop iterations instead of the mathematical recursive formula.

every iterative procedure can be expressed as a recursive procedure.

To use Ableson's precise language:

   Every iterative procedure can be expressed as
   a recursive *definition.*

SICP is written to carefully avoid overloading terms: "Function" is reserved for mathematical functions and "closure" is reserved for algebraic closure.

Very Sussman-y. His distate for ambiguity is very coherent.

To use the language of Ableson and Sussman in Structure and Interpretation of Computer Programs [node 15]:

The fast doubling algorithm has a recursive definition and produces a logarithmic procedure.

The classic implementation based on the mathematical definition:

  def F(n):
    if n == 0: return 0
    elif n == 1: return 1
    else: return F(n-1)+F(n-2)
Has a recursive definition and produces a recursive procedure [even worse the recursive procedure runs in exponential time]. It is also possible to create a recursive definition of a procedure that runs in linear time.

[node 15]: https://mitpress.mit.edu/sicp/full-text/sicp/book/node15.htm...

In fact SICP covers the fast doubling fibonacci algorithm directly in exercise 1.19[1] (although in a slightly obfuscated form)

[1] http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-11.html...

Much appreciated, thank you.

Fun stuff.

But why only take n up to around 4 million? Not too long ago, I was able to compute the billionth Fibonacci number on a pretty ordinary machine, using a Haskell implementation of the fast doubling method. (The number took 20 minutes to print out in my console window.)

By the way, I did a Python implementation, too. And I learned that GHC large-integer multiplication is a lot faster than CPython large-integer multiplication. I wonder why. (No, it has nothing to do with dynamic typing; almost all of the time is going to be spent executing internal integer-multiplication routines written in C.)

One of my warm up questions is to ask to code Fib. It's easy enough to explain and leads to interesting questions about performance.

I'm still waiting for a candidate to break out Binet's formula.

Interesting. Still runs in O(log n). Link: http://mathworld.wolfram.com/BinetsFibonacciNumberFormula.ht...

you know its actually an O(logn) operation with fast exponentiation through squaring.....



Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact