Hacker News new | past | comments | ask | show | jobs | submit login
A linear algebra trick for computing Fibonacci numbers fast (codeconfessions.substack.com)
158 points by todsacerdoti 3 months ago | hide | past | favorite | 73 comments

I think the proof for the closed-form version is accessible to people with a background in linear algebra. The matrix is diagonalizable (not all matrixes are diagonalizable, but this one is):

  M = P Δ P^-1
Here, P is some invertible matrix and Δ is a diagonal matrix. The powers of M reveal how useful this is:

  M^N = (P Δ P^-1) * … * (P Δ P^-1)
If you adjust the parentheses, you’ll see N-1 terms of (P^-1 P), which can be removed, giving:

  M^N = P Δ^N P^-1
The powers of a diagonal matrix is done by taking powers of the entries on the diagonal.

You see the φ values (1±√5)/2 in the matrix Δ.

Diagonalization of a 2x2 matrix is simple to do on paper. The diagonal of Δ contains the eigenvalues of M, which can be found using the quadratic formula. Proving that this is a correct way to diagonalize any diagonalizable matrix is more of a chore, but for this specific 2x2 matrix, you can just show that that you’ve found the values for P and Δ.

This is very elegant mathematically, but I would not use the closed-form solution if I wanted the exact answer, because you’d need a lot of precision, and that’s inconvenient.

(Author of the article here) Thanks for writing this up, it's much much more intuitive than what I've read everywhere else. I primarily looked up Knuth (TAOCP Vol 1) for this part, and he showed a longish proof using generating functions which looked too long to show in the article, and I would not have done a better job than him.

The thirty three miniatures book showed a more abstract proof, which would have been accessible to people deeply familiar with the concept of basis vectors.

My own personal experience is that, in linear algebra, there are a lot of different ways to prove the same thing. When you search for a proof of something, you may find a proof that is overly basic and verbose, or a proof that is overly advanced and terse.

Finding an explanation of a linear algebra concept that strikes the right balance can be time-consuming or difficult. This is why you see, for example, a hojillion different articles explaining what quaternions are. Everybody is most comfortable at a different place on the complexity spectrum—from “a vector is (x,y,z), here are rules for how that works” to “a vector space is an abelian group V, field F, and a map in Hom(F,End(V))”.

I agree. In fact finding an explanation of pretty much any significant mathematical concept that strikes that balance (across the broad range of individual talents and backgrounds) is time-consuming or difficult.

Which is why a significant chunk of mathematics research effort and funding should go to making such explanations less time-consuming and difficult. IMHO. It may be as simple as a vetted compilation of explanations for a single idea so the individual can find the ones that work for them, or formal development of new explanations. What’s more significant: a new incremental result in some obscure corner of math, or 100M more people understanding what an eigenvalue actually means? Or a new explanation that makes a concept more relevant in an entire application area?

That's quite true. And so many books to teach it.

I am a software engineer and don't have a major in math beyond studying discrete math in college. So I really have to work hard to understand these proofs. And I certainly value proofs explained in simple terms. :)

The generating function proof is also really beautiful! I think I maybe like it even more than the linear algebra proof.

The generating function approach is the standard way of solving recurrences. In that sense, it isn't beautiful, just routine. But yes the first time I'd seen it I felt my body shudder at that sorcery.

If anyone wants to read this in book form, Linear Algebra Done Right includes it as an exercise at the end of a very short and readable chapter 5.C on eigenspaces and diagonal matricies.

The treatment in thirty-three miniatures describes the steps you take but doesn't mention what you are actually doing (finding eigenvalues) or leave you with any intuition for why this was a natural thing to have tried

Yes, that book (thirty three miniatures) has great content, but a hard read. Basically someone like me needs to go back, read other sources, and spend time on paper to get it.

Yes! This is close the explanation I first encountered in Hubbard & Hubbard. One of the nicest math textbooks and going from zero up to differential forms.


what if you use the closed-form answer, but without any floating-point or other approximations

i mean it all comes out to an integer in the end, all the radicals cancel

i see tromp already suggested a form of this

> what if you use the closed-form answer, but without any floating-point or other approximations

You can use a field extension here, ℚ(√5), to represent your numbers. The numbers would be represented as pairs, (a,b), which represents the number a+b√5. You can see for yourself that this set is closed under addition and multiplication.

But when you try to implement code that computes exponents with this representation, you’re probably going to fall back to the old “repeated squaring” trick. But that’s how you calculate exponents with your matrix representation anyway—and you can note that the matrix powers (for this matrix) always take the form

  | a b |
  | 0 a |
So you can represent these matrixes as (a,b) pairs anyway. We are right back where we started—our state is represented as a pair of numbers, and we use repeated squaring to calculate an exponent.

Sometimes these different ideas for representing the problem give us new insight or new algorithms, and sometimes it turns out to be a new perspective on the same piece of code.

But that seems to be computationally same amount of work as the matrix form, so we get similar performance?

Yes, I believe so, up to some multiplicative factor.

You can carry out the exponentiations in the field Q[sqrt(5)], which is two-dimentional over Q. The interesting thing here is that diagonalization is trading one 2d vector space for another -- one with a very well-behaved multiplication operation (it's distributive, commutative, associative, and invertible).

There's no need to do this to quickly compute fibonacci numbers, but it's pretty neat.

I did this years ago to demonstrate to my students that the "exact solution" can still be written in code. There are implementations in Ruby and Python, with some benchmarking code: https://github.com/jfarmer/fib-bench/

Code winds up looking like:

  def fib_phi(n)
    ((PhiRational(0,1)**n - PhiRational(1,-1)**n)/PhiRational(-1, 2)).a.to_i
The exponentiation operation uses basic exponentiation by squaring for performance: https://en.wikipedia.org/wiki/Exponentiation_by_squaring

(Technically implemented arithmetic over Q(ϕ) not Q(√5) because it simplifies the code a bit.)

Nice! Thank you for sharing.

hmm, maybe? i haven't tried it so i can't be sure, but it's plausible

The comparison with the closed form is weird to me: since it uses sqrt(5), I suppose the author intended to use floating point numbers to compute it. But when doing so:

- You get a constant time algorithm, since the power function is constant time on most mathematical librarie. Without entering into details on how the pow function is actually computed, on real number you can compute it with: pow(x,y) = exp(y * log(x)). It is not visible on the performance graph probably because the matrix-based algo and the closed formula are dwarfed by the linear time algo.

- You gen an incorrect result very quickly: not even considering the fact that sqrt(5) can not be represented exactly by a floating point number, the result will be out of the range of floating point numbers with n proportional to the number of bits your floating point number holds, so probably between n=50 and n=100.

Author here:

At the time of writing, I was not aware that the matrix form and the closed form are same, because you can derive the golden ratio based formula from the matrix formula. Full disclaimer: I am not a math major, and I can make such mistakes :)

Although, I am curious to learn about exponentiation of real numbers. While writing this article, I was investigating how different libraries/languages implement exponentiation. While for matrix exponentiation, numpy used the algorithm I showed in the article. I also looked at the pow() function in libm of FreeBSD and it used the form you provided. I've been trying to look at an explanation behind this. Please share a pointer if you have any references.

The main issue with floating point number is that it doesn't act like a ring (https://en.wikipedia.org/wiki/Ring_(mathematics)) because of rounding errors. For example the equality x*(y*z) = (x*y)*z doesn't hold because each multiplication introduce small rounding errors.

And this kind of equations are required for the exponentiation-by-squaring method (consider for example how x**5 is computed: x**5 = (((x*x)*x)*x)*x) using a naive exponentiation approach, but also x**5 = x*(x*x)*(x*x) = x*(x*x)**2 using the exponentiation-by-squaring method).

Integer operations and matrix operations are both rings, so the exponentiation-by-squaring approach works. For floating points, it doesn't give an exact result, but could still be used in practice to get an approximate result.

The pow function doesn't use this approach however, because both arguments of the pow function are floating points numbers, and can thus compute things like pow(5, 0.5) = sqrt(5), or even 534.2**12.7. The exponentiation-by-squaring approach works only when the exponent is an integer, so we need other solutions.

How exactly the pow function (or other math function like exp, log, cos, ...) are computed is a vast research subject in itself, but if you are interested, the Handbook of Floating-Point Arithmetic (https://perso.ens-lyon.fr/jean-michel.muller/Handbook.html) is both a good introduction to the domain (and the problems associated with the finite precision of floating point number), and provides a lot of techniques to work around or minimize the rounding errors that happens when manipulating floating point numbers.

Thank you so much, both for providing more background on exponentiation, and for the reference. :-)

You can also use exact arithmetic. This is not as bad as it seems, all values you'll encounter are of the form a + sqrt(5) b, with a and b rational (and not even all rationals, but I can't be bothered).

You can even make this somewhat more concrete by using the matrix representation* of a + sqrt(5) b:

    [[a, 5b],
     [b,  a]]
Of course this reduces the whole problem to a matrix exponential again, so it's somewhat pointless.

*: The matrix representation is what you get if you view each value a + sqrt(5)b as a vector, and notice that multiplication by a constant is a linear map. The matrices are these linear maps, and it can be shown that matrix addition and multiplication preserve the ring structure. You can forget about the vectors at this point.

That's true. Integers of this form a+b*sqrt(5) are called quadratic integers (https://en.wikipedia.org/wiki/Quadratic_integer), and define a ring for which you can define addition and multiplication rules:

(a+b*sqrt(5)) + (c+d*sqrt(5)) = (a+c) + (b+d)*sqrt(5)

(a+b*sqrt(5)) * (c+d*sqrt(5)) = (ac+5bd) + (ad+bc)*sqrt(5)

Note that seeing the operations this way reduce a bit the number of operations compared to the matrix representation you provided: a multiplication using the formula above require 5 integer multiplications instead of 8 with the matrix [[a, 5b], [b, a]].

But this is only a constant factor improvement: the algorithm complexity is still log(n).

And the same result can be achieved by using the special structure of the matrix: for your representation, the matrix are all toeplitz matrices, and multiplying those kind of matrices require less operations than full generic matrix multiplication. For the representation in the original article, If I remember correctly, the matrices are triangular, and again you can multiply them more efficiently using this specificity.

For the initiated, this is SICP[1] Exercise 1.19. That exercise walks you through a proof without explicitly using linear algebra. I remember having a blast solving this back in the day.

[1] https://web.mit.edu/6.001/6.037/sicp.pdf => page 61

I just read the exercise. That's very clever.

Makes me want to sit and go through the book.

I skimmed your substack, and it seems to me that you would like SICP. Give it a try.

Thank you. I will do it.

Interesting, TIL.

Thanks for sharing. :)

The only logarithmic algorithm here is fib_closed_form because floating-point multiplication is a fixed-cost operation (and yields a progressively imprecise result, as other commenters have pointed out).

The other approaches use arbitrary precision integers, as you've noted, so additions and multiplications are not actually fixed-cost, but scale with the number of digits you're operating on (addition scales linearly; multiplication in Python uses Karatsuba for sufficiently large N, so it scales like n^1.58).

In fact, you can't get a sublinear result with a lossless approach, since the number of digits in Fib(n) scales linearly with n, and so you need O(n) operations to just write out the digits.

This is pretty straightforward to explain with the closed-form solution: Fib(n) is roughly 1.618^n / 2.236; by taking the binary logarithm, you can see this requires about 0.7n bits to represent. (Equivalently, 0.2n decimal digits.)

> Another interesting thing to note here is the performance of fib_fast and fib_closed_form algorithms. Both of these algorithms essentially compute the nth power of an expression and therefore their complexities in terms of n are same. However, the golden ratio based algorithm is computing the nth power of a scalar value which can be done much faster than computing the nth power of a 2X2 matrix, thus the difference in their actual run-time performance.

Computing powers of the golden ratio should not be done as scalars, since as other posters noted that requires a lot of precision and makes for a messy computation. It's more simply done on two dimensional numbers of the form a + sqr(5)*b with a and b integers, analogous to complex numbers. Then the computational effort can be seen to equal that of the matrix powers.

To clarify, you mean treat sqrt(5) as a symbolic constant, and then expand out repeated multiplications of "a + sqrt(5)*b" to arrive at an expression with addition and multplications involving powers of a, b, and sqrt(5)?

This is correct. All fibonacci numbers are integers, thus all instances of sqrt(5) must eventually cancel out.

We essentially implemented this matrix version in Lean/mathlib to both compute the fibonacci number and generate an efficient proof for the calculation.


In practice this isn't very useful (the definition of Nat.fib unfolds quick enough and concrete large fibonacci numbers don't often appear in proofs) but still it shaves a bit of time off the calculation and the proof verification.

The closed form solution as implemented will use floats with some fixed number of bits, right? So it cannot possible compute the numbers precisely except in a finite number of initial cases.

Computing the matrix power by squaring means the sizes of the integers are small until the final step, so that final step dominates the run time.

All of this gets much easier if you notice that the smaller eigenvalue is less than one. So its contribution becomes exponentially smaller at higher F_i. Hence, you can just take the corresponding power of phi, multiply by some constant I forgot, and round to the nearest integer. No need for an exact formula.

I mean if you want to have the value as a float I reckon the closed form will suit you just fine.

You can try to do it with arbitrary precision integers, but obviously the runtime can't be faster than the size of the answer, which technically makes it linear again.

It can be quite fast though. I especially like Julia for this. 1) because you can just tell it to use arbitrary precision ints and 2) because you can write a power function that works for everything, floats, ints, matrices with arbitrary precision integers. Which gives one of my favourite one-liners:

    fib(n) = ((BigInt [1 1; 1 0])^n)[2,1]
Last time I checked a manual implementation of exponentiation through squaring is just as fast as the native one. If I recall correctly the 10^9th fibonacci number is a few hundred megabytes, beyond that points things get a bit tricky.

> You can try to do it with arbitrary precision integers, but obviously the runtime can't be faster than the size of the answer, which technically makes it linear again.

The time will be dominated by the time to multiply integers in the final stage of the repeated squaring, which is superlinear in the number of bits. (1)

(1) for fixed word size. One might instead argue it's a function of the number of words (# of bits/word size) and that the word size must eventually increase as integers get larger, asymptotically.

> If I recall correctly the 10^9th fibonacci number is a few hundred megabytes,

It’s easy to prove that

  fib(n) < 2^n
(If you need a hint: use induction), so it should be less than 10⁹ bits, or 128 megabytes.

And that bound isn’t tight. A tighter one is 2log(phi) bits per step. That’s slightly less than 0.7, so that would make it less than 90 megabytes.

I may have remembered its size as an ASCII string.

I think you're right, and the article says "in some rare cases the method may produce incorrect result due to approximation errors".

If by "rare cases" one means "for all but a small number of cases".

What's "small" in this case? I only have 32GB of RAM, which is only enough for a vanishingly small set of Fibonacci numbers.

79. There are only 79 Fibonacci numbers (starting at 0, counting the 1 twice) that are exactly represented by a double-precision float. That's a lot less than 32GB of RAM.

Yup, I implemented all these, and only naive Python array based approach is the most practical, Python being a GC language.

For all other things that I tried, I either get integer overflows (for linear algebra based implementations) or floating point errors (for the closed form).

It's easier to treat these as linear difference equations with constant coefficients and use the ansatz of λ^{n}. This is morally the same thing but without the need for matrix manipulation.

Knuth, Oren and Patashnik's Concrete Mathematics is full of these and is required reading for discrete math.

I came here to say the same: difference equations, z-transform, recurrence relations, are (or used to be) standard undergrad eng curriculum (typically via some required Signals & Systems course). Maybe not so popular in the CS world though?

Beautiful expo on the subject: https://eee.guc.edu.eg/Courses/Communications/COMM401%20Sign... (Chapter 10, etc.)

Note that “Oren and Patashnik” is one person named Oren Patashnik.

Thanks! The correct authors are Graham, Knuth, and Patashnik.

Reminds me of similar post with some nice visualisations



That's a work of art. Amazing

Well, I am also in the reading group, and I was very happy to see this pop up here.

But, all these method, in practicality, leads to integer overflows.

Works very well for smaller Fibonacci numbers, but not for, say, the 1500th Fibonacci number.

That is very easy and practical to do instead with the naive formula using an array based approach.

    fibs = [0, 1]
    for i in range(2,n):
        fibs.append(fibs[-2] + fibs[-1])
That's the core.

I have been meaning to write it up, but dealing with a bad flu.

The closed form leads to approximation errors.

I tried Julia, Numpy, Numba, Jax, Rust, but integer overflows are there in any parallelized approach.

The naive implementation with LRU caching is the fastest and most reliable so far.

I am surely missing something, you can actually use some things to overcome integer overflows, right?

I hoped to see those in this article, but they were simply not there.

Can anyone else help?

Even UInt128 is not enough for 1500th Fibonacci number. The upside of Rust, is that it doesn't fail silently. The compiler is really helpful.

Jax, Julia, Numpy confidently give wrong answers. Only way to get the feel is to eyeball a bunch of entries and check for negative numbers. Very easy to spot overflows when you are at, say, 1600th fibonacci, but one can feel confident on wrong answers on smaller numbers.

Python has arbitrary precision integers, so you could do it without running into overflows. Although if implementing floating point based technique, you may want to use np.float128 to handle values upto 10^4932.

> you may want to use np.float128 to handle values upto 10^4932.

I'm not sure the maximum value matters, since I think it only actually gives you 33 decimal digits precision and the 1500th number has over 300 digits.

Although it looks like you'd need to be more careful with it too:


> np.float96 and np.float128 are provided for users who want specific padding. In spite of the names, np.float96 and np.float128 provide only as much precision as np.longdouble, that is, 80 bits on most x86 machines and 64 bits in standard Windows builds.

My bad. I should have looked into the docs. Thanks for pointing out!

There are "bignum" implementations for every language. Though I never tested the performance impact on closed-form Fibonacci when a defined double precision >64bit is used.


Your code has an accidentally quadratic runtime (instead of linear). Since the array is appended to, the code regularly increases the memory region and has to move all the previous data over.

You could pre-allocate the memory as n is known ahead. Also you don't need all that memory. You only need to store the last two entries.

Calculation of such large numbers are usually handled by BigInt libraries that are more or less only limited by memory. In Rust that would be the uint crate.

So I implemented this in Python and did not realize that it could so easily handle very large outputs! Very cool Python. I did have to add the lines listed below to be able to print really big results. It was when setting the input to 1000000 that I could really notice the difference between the elementary algorithm and the matrix mulitplication based version. I thought I would try numPy too but yes, it can only go up to around 92 as an input due to the 64 bit integer limit. I also tried a version in CuPy just for fun. It could use up to 64 bits as well (floats). Interesting that regular Python takes a win here.

import sys


Please do look at the results of taking that matrix to different powers: https://www.wolframalpha.com/input?i=%5B%5B1%2C+1%5D%2C+%5B1...

What is this matrix doing? When you left multiply it, you're summing the elements of the left column and putting them in the top left (next fib number). Then you're moving the current number to bottom left (and top right). Incidentally, you move the previous to bottom right.

That's interesting!! It's very simple

Yes, indeed. If you see the thirty three miniatures book, it gives the matrix exponentiation formula for computing Fibonacci numbers, without any explanation behind it. To write the article, I tried to figure out ways it can be explained. And this pattern you suggested shows up nicely even when you express successive Fibonacci numbers in terms of the base cases F0 and F1. Which ultimately lead me to realize, the coefficients are the elements of the matrix.

I really like this analysis, which also discusses the “fast doubling” method: https://extratricky.com/blog/fibonacci-complexity

it points out that the bit complexity of the traditional algorithm is actually quadratic. and apparently, restricted to fixed width integers, there is a constant time method.

There's less to the "constant time method" thing than meets the eye.

If you work modulo M for some fixed M, then the Fibonacci sequence is periodic with period at most M^2 (because if two consecutive numbers are known, so is the whole rest of the sequence), which is a constant. Therefore, for any particular M, you can just write down the at-most-M^2 repeating sequence and what the actual period is (call that p), and then for any n the algorithm goes: reduce n mod p, and then look up the result in your big lookup table.

(Actually, this isn't constant-time, because if n is large then reducing it mod p isn't constant-time. For that matter, no algorithm can possibly be constant-time, because unless you pick a modulus for which the period is a power of 2 you always need to look at all the bits of n.)

Very cool. Thanks for sharing.

Nit: if number n is given in binary representation in the input, then an O(n)-algorithm runs in exponential time as n is an exponential function of log(n). Hence, log(n)^O(1)-time algorithms for the Fibonacci number are reasonable to exist, unless the decision version of the Fibonacci number is NP-hard.

This is also a nice way to show that the growth of fib numbers is exponentially bounded.

Any solution to a linear recurrence relation is exponentially bounded by the root(s) of the characteristic polynomial.

Yes, that's a good point.

Why is there no mention of using a generating function approach to calculate the analytical function?

Knuth shows that technique in TAOCP, vol-1. It was on the longer side and I didn't want to reproduce it in the interest of space and time. I was more focused on the matrix form.

Why is there no mention of a generating function approach to derive the analytical form?

Hm, no mention of Binet or Bernoulli?

loved this article thanks !

do folks actually read knuth ? i thought his tomes were meant only for bookshelves -lol- -sarcasm-

Thank you (author here).

I don't read TAOCP. I have read first few chapters of volume-1 so I am familiar with the notation and some MIX syntax.

But while writing this article, I just opened up specific topics, such as computing Fibonacci numbers in volume-1, or evaluating powers in volume-2 and I managed to understand it. Sometimes you may find back references, such as when discussing Fibonacci numbers, he references Euclid's algorithm and you may have to go back and check it out, or ignore it (depends on the context). So I believe you don't necessarily need to read from cover to the end; you can browse it based on your interest.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact