Hacker News new | comments | show | ask | jobs | submit login
Don’t invert that matrix (2010) (johndcook.com)
133 points by egjerlow 380 days ago | hide | past | web | 42 comments | favorite

That article is quite right regarding the take-home message to try to avoid inverting a matrix when other, more direct methods for solving the particular problem at hand exist.

On page 2 of http://www.ti3.tuhh.de/paper/rump/Ru08a.pdf there is an innocent-looking matrix for which the numerically computed inverse is off from the true inverse by 47 orders of magnitude.

However, one of the reasons he's given is not correct: Druinsky and Toledo have shown (http://arxiv.org/abs/1201.6035) that -- despite the very widespread belief to the contrary -- solving a linear system be calculating the inverse can be as accurate (though not nearly as efficient) as solving it directly.

They even speculate as to how this myth was able to sustain it for so long. Namely, there is a certain error bound for the solution as calculated by employing the inverse which is quite simple to deduce and which predicts a pretty large error. However, this bound is too conservative. There is a better bound, which ensures that the error is of the same kind as for the direct method. However, this bound is much harder to derive.

I came here to cite the same paper, but your comment about efficiency is only true in a limited context. Applying an explicitly-computed inverse exposes more parallelism than solving with factors. State of the art parallel linear algebra packages like Elemental (http://libelemental.org) use selective inversion of diagonal blocks to significantly improve their strong scalability.

Thank you for sharing that insight! Didn't know that.

I get that inverting a matrix can be computationally intensive, but what's the alternative?

"What if you have to solve Ax = b for a lot of different b‘s? Surely then it’s worthwhile to find A-1. No. The first time you solve Ax = b, you factor A and save that factorization. Then when you solve for the next b, the answer comes much faster."

I don't get it, although I'm not a numerics expert so maybe I'm missing something. He says, "the first time you solve Ax = b, you factor A and save that factorization."

Great. But don't you then use the factorization to determine the inverse of A (LAPACK example: call dgetrf to get the factorization, then call dgetri to compute the inverse using the factorization)? Maybe what he's talking about is going over my head. How else would you solve Ax = b for x, if not by computing the inverse of A (assuming A is a 3x3 or 4x4)?

"How else would you solve Ax = b for x, if not by computing the inverse of A (assuming A is a 3x3 or 4x4)?"

For instance, there a fast iterative solvers. They start with an initial guess for the solution (which can be an arbitrary vector) and successively generate more accurate approximations.

These iterative solvers don't need access to the inverse of A. Furthermore, many of those solvers don't even need access to A: They only need access to a subroutine which, given a vector u, returns Au.

Many matrices arising in applications (for instance one often reduces the problem of solving partial differential equations to the problem of solving a (very huge) linear system of equations) are sparse (have only few non-zero entries). For these it can make sense to write specialized code which specifically calculates Au.

If you want to read more about this, then a good starting point is https://en.wikipedia.org/wiki/Kaczmarz_method.

You calculate the factorization (QR, LU, or similar), but you never calculate the inverse of the factorization:

    A x = b
    A => QR  # QR decomposition
    Q R x = b
    R x = Q' b  # Q is inverted by transposing
    x = backsubstitute(R, Q' b)
Q was the transpose of it's inverse, and so you don't get any rounding error when you transpose it. R didn't need to be inverted because it was in a form where back substitution lets you apply the inverse of R to be without needing the inverse of R.

MATLAB's linsolve uses your strategy! (http://in.mathworks.com/help/matlab/ref/linsolve.html)

Linear models in R are also solved by QR factorization.

For example, you factorize A into LU, so that you have LUx = b.

Now you can let Ux = y, solve the system Ly = b for y, then solve Ux = y for x.

Solving each triangular system is fast: O(n^2).

If you need to do this with a different b, you can do it again, and you already have A factorized, which was the expensive part, i.e. O(n^3).

(Depending on the type of problem, you might want to factorize the original matrix in a different way, e.g. QR or Cholesky or whatever.)

IIRC, the HP-15C used LU decomposition in its matrix math routines.

Sure, assuming that A is full-rank and small, then its perfectly reasonable to invert A with adjoint(A)/discriminant(A). But this is very expensive, and sensitive to roundoff for larger A.

Lets say you are using QR factorization. Then R\b is easy to compute by back-substitution, because R is triangular. Q\b is also easy, because Q is orthogonal, and Q's inverse is its transpose, so you just calculate transpose(Q)*b.

Same thing for LDL^T factorization, LU factorization, etc. Its easy because back-substitution with a triangular matrix is easy. In the case of LDL^T factorization, D is diagonal, and again D\b is easy (its a bunch of rank-1 division operations).

>Great. But don't you then use the factorization to determine the inverse of A (LAPACK example: call dgetrf to get the factorization, then call dgetri to compute the inverse using the factorization)?


>How else would you solve Ax = b for x?

Call dgetrs instead of dgetri.

You use the factorisation directly (not the inverse) to get x from b. This is done by calling dgetrf to get the factorisation, then calling dgetrs to get the solution. You don't need to call dgetri.

I'd do an example, but there are other replies to your comment that go through the mathematics.

In PDEs, we typically use something called a Krylov subspace iterative solver.

These solvers construct a solution by adding up terms of the form c_n * A^n * y and truncating the series when the residuals are small enough. This is especially efficient in PDEs, where problem matrices are sparse so matrix-vector products are efficient (i.e. O(n)), and in practice only a few terms of the series are necessary.

I think that the idea is not that you don't take steps that, taken together, amount to inverting `A`, but rather that you don't just try to invert `A`. Thus, I think that the idea is to do something like "Compute `A = LU`, and then invert `L` and `U`"—which of course can be used to invert `A`, but which isn't exactly the same as "Compute `A^{-1}`" (using Cramer's rule, or some other absurdity).

Incidentally, while solving `Ax = b` in general is definitely equivalent to inverting `A` (simply by taking `b` to be basis vectors, so that the solutions `x` are the columns of the inverse matrix in that basis), it is certainly possible to solve it in particular cases without inverting `A`; for example, the solution to `Ax = b`, where `b` is the first column of `A` in a particular basis, is the first vector `x` of that basis—no inversion required (nor even computation of `b`)!

In LAPACK you would use dgetrs to solve for x directly with the LU factorization.


Numerical linear algebra has a long and vast history. You really do need specialized information that your undergrad linear algebra classes would not have imparted.

For instance, the condition number is important: https://en.wikipedia.org/wiki/Condition_number

You would solve e.g.

min_x* ( ||A x* - b|| (+\lamba regularization(x*) )

Use a Pseudoinverse or a decomposition instead. A great book to read is Matrix Computations by Golub/van Loan.

For those looking for more background on the topic, these are the fundamentals: http://stanford.edu/class/ee364a/lectures/num-lin-alg.pdf

I also strongly recommend the course Convex Optimization. Might help some people avoid neural nets when they don't need 'em ;)

In R there is a little syntactical nudge in this direction in that the matrix inversion function is actually called `solve`. `solve(A, B)` gives you the solution to AX = B, and B's default argument is the identity, so `solve(A)` just gives you the inverse, and any instances of `solve(A) %*% B` in your code should stand out as red flags. As rcxdude mentions this is something that you tend to care more about in statistical applications, and that's what R is for.

Is there more context around what exactly he's getting at? In games we use matrix inverses all the time(and since they're usually a specialized affine transform there's cheap shortcuts to calculate them) for stiff like hit detection or animation.

More talking about applications where the matrices are large, like statistics, machine learning (but I repeat myself), signal processing, etc. In games pretty much all matrices are around 4x4, instead of millions by millions.

There's an neat technique for fitting RBF representations to dense point clouds. Instead of the impractical matrix inversion, the trick is to solve the linear system with a fast multipole method, similar to the Barnes-Hut algorithm for the n-body problem.

Carr, J.C. et al, "Reconstruction and Representation of 3D Objects with Radial Basis Functions", Siggraph 2001


Another relevant difference is that precision is not nearly as critical in games as in most numerical applications. Errors of a few percent will likely go unnoticed. Graphics implementations therefore usually only use single precision.

Ah, that's the context I was missing, thanks!

In games we can usually just transpose the inner 3x3 R+S, invert scale and handle translation similarly to get a semi-quick inverse.

This is the difference between matrices that are SO(3) (which have trivial inverses) and matrices that are just GL(n) (which: don't).

This. Or you know how to calculate the closed-form inverse a priori like with projection matrices (e.g. gluUnProject).

In games you're probably applying many small inverse matrices to many small vectors. In partial differential equations, you solve a single large linear system where the unknown vector represents a field defined at every point on a spatial mesh.

In my research, specifically, that spatial mesh can have as many as 10^7 grid points. For linear systems of this size, any exact solution at all is impractical - we exclusively employ iterative approximations built from terms of the form (A^n) b since matrix-vector products are about the only things we can compute in a reasonable amount of time.

And not only are the matrices small, they're probably known (or assumed) to be well-conditioned.

Inverting even a small hilbert matrix will probably go boom.

Not just well conditioned, but special orthogonal.

In general there are no shortcuts, and when the matrix A is not small (e.g. m x n with m,n << 100), you don't want to invert A, especially if you are only interested in x (from A x = b). Instead, use numerical minimization schemes like conjugated gradients and variants thereof or Quasi-Newton methods (BFGS). Combined with preconditioning as well as regularization or denoising this usually yields good results. Compressive sampling methods (total variation regularization) have received lots of attention in the last ten years.

This comment seems to be a diversion from the question asked. The main point of OP is that solving A x = b is commonly done with factorizations of A, not by explicit computation of inv(A).

Resorting to a numerical minimization (like MINRES) would be unusual unless the dimension is much bigger than hundreds. Certainly it's not relevant in a graphics computation with dimension of 3, 4, or 6. My laptop inverts a 4Kx4K general matrix in 1.5s. It solves a 4K linear system in 0.5s.

It's funny because the whole discussion only exists because there is no division operator in "matrix math". For scalars we can simply write a / b (instead of a^-1 * b) and everyone knows how to calculate it. But for some reason there's no such thing when dealing with matrices.

Well it turns out there is something like division for matrices and it's called factorization (followed by back-substitution) and there are actually many ways to do it, not just one (each having different numerical properties and complexity).

So whenever I see A^-1 * b in an equation I view it as something like b / A and I know there's the choice of "division methods" to use for this.

It's a little more complicated than: """It's funny because the whole discussion only exists because there is no division operator in "matrix math"."""

Matrices in general (insert caveats on size here) form a ring, but not a division ring (https://en.wikipedia.org/wiki/Division_ring). This means that not all (square) matrices will have a multiplicative inverse. This is why we often restrict to things like GL(n), the group of non-singular square matrices of size n.

Furthermore when you say "scalars", they have multiplicative inverses because they're elements of a field (and sometimes the inverse might be a pain like if you're in a number field or something).

This isn't even getting into the Moore-Penrose pseudoinverse (https://en.wikipedia.org/wiki/Moore–Penrose_pseudoinverse) or linear operators or so on, but I'll let someone else shout about algebra.

I think you meant "b / a" in "a / b (instead of a^-1 * b)".

By b / A he is implying a horizontal fraction bar with b above and A below. Of course, since matrix multiplication is not commutative, it’s probably better to write something like /A * b (or Matlab uses the cute notation A \ b) instead of b / A, but the intention is fairly clear from context.

(In general, most of our mathematical notation conventions are designed for the real number system, and start falling over when we have “number-like objects” which behave a bit differently.)

Sparse matrix factorization is a black art and is best done with some existing out-of-the-box solution.

The problem is that they're all these FORTRAN packages that one person wrote, and then everybody, in the name of human progress, reused and reused ever since -- akin to bootstrapping human intelligence.

Nobody ever wants to write the Lanczos bidiagonalization algorithm with partial reorthogonalization from scratch, and when I considered writing my own C++ implementation to bring human progress to the 21st century, my academic advisor told me "just don't."

EDIT: The real reason is that FORTRAN is less apt to copy and more apt to reference than C++.

I am not a mathematician, but simply based on the fact that beautiful/simpler answers to questions seem to get closer to the ultimate truth, the current state of large-dimension matrix inversion is far from the ideal possible solution, because it is ugly as hell.

Arguably LU / LDLT factorizations before solving the direct system are the more elegant / beautiful / simpler answers if you take the algorithm at face value. However, large-dimension matrix inversion has problems outside of just beauty. Part of the problem is that we don't have fast, perfect representations of real or complex numbers. At best we have double precision floating points, which means that when you take the adjoint over the determinant, you're looking at a lot of error.

That's not to say that the implementations of these have to be ugly or that our current BLAS / LAPACK / whatever can't be improved. There's a lot of room for improvement in this space, but the fact that they're 1. fast and 2. well tested and robust means that improving these sorts of interfaces come with baggage. People complain about the wheel being reinvented every time somebody invents a new Javascript framework, but could you imagine if this same problem applied to BLAS and other high performance math libraries? (Arguably, at least then we'd have more choices in math space)

> Part of the problem is that we don't have fast, perfect representations of real or complex numbers.

We'll probably never have perfect representation of real numbers, but Gustafson's unums seem to be the next best thing that can actually make a difference. His examples of how unums can solve the trickiest numerical challenges are impressive.



I avoid floating point wherever possible (especially after reading "What Every Programmer Needs To Know About Floating Point") and just use large integers. Not sure if the particular math in question here can be modified to use large integers...

It's not the current state; it is inherent in the problem, so the situation will never get better.

Applied math is like a writing a program for customers; the result is more important than the beauty of the road towards it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact