What Is a Hessenberg Matrix?

diceduckmonk · 2023-12-06T17:36:35

Not to be confused with the Hessian Matrix.

rocky1138 · 2023-12-06T16:07:41

This is neat, but why would I want this

zamadatix · 2023-12-06T16:35:44

In general if you convert to Hessenberg form a lot of what you might be trying to do with the matrix becomes easier to calculate. The QR algorithm was a common example of this.

If you mean why would I want to put so much into simplifying linear algebra computations then you're missing out on the chance to work with some of the most useful stuff in comp sci (in my opinion)!

kxyvr · 2023-12-06T17:28:48

In a vacuum, you shouldn't care. Broadly, Hessenberg matrices arise in a variety of contexts, so understanding their properties helps these algorithms proceed more effectively.

For example, upper Hessenberg matrices arise during an Arnoldi iteration, which is itself a key building block in Krylov methods, which are highly effective sparse linear system solvers, which are necessary for solving certain kinds of mechanics problems needed for engineering. That is to say, it's important, but somewhat removed from the end application. The reason they arise in this context is due to a clever trick.

Say, for example, we want to solve the linear system Ax=b. Take the right hand side, b, and label it as our first vector, v. Save the norm in this new matrix T, T(1,1) = norm(v), and then normalize v and save it, V(:,1)=v/norm(v). Next, we add a new vector to this batch. Let v = AV(:,2). Orthogonalize v against the previous vectors in V using any variety of algorithms such as Gram-Schmidt or Householder reflectors. Save the orthogonalization coefficients in the second column of T. Hence, T(1,2)=inner(v,V(:,1)). Then, save the norm of this orthogonalized vector, T(2,2) = norm(v), as well as the vector itself V(:,2) = v/norm(v). Here is a code in MATLAB/Octave that does this simply:

  randn('seed',1);
  m = 5;
  A = randn(m);
  b = randn(m,1);
  V = zeros(m,0);
  T = zeros(1,0);
  for i = 1:m
      if i == 1
          v = b;
      else
          v = A*V(:,end);
      end
      for j = 1:i-1
          T(j,i) = V(:,j)'*v;
          v = v - T(j,i)*V(:,j);
      end
      T(i,i) = norm(v);
      V(:,i) = v/T(i,i);
  end
  H = T(:,2:end);

If you cut off the first column of T, we obtain an upper Hessenberg matrix H. This matrix now has the property that AV(:,1:end-1) = VH. This is the Arnoldi iteration.

Why we care in this context goes back to the original linear system that we spoke of, Ax=b. Say we want to find a least squares solution, min norm(Ax-b). If we look for a solution for x within the vector space defined by V, x = Vy, then we can rewrite this system as min norm(AVy-b). If we generate V using the Arnoldi iteration, we have min norm(VHy-b). And, note, most of the time we don't need to find V that spans the whole space. Meaning, m vectors for an m-dimensional space. A smaller number will do, which means that V has a small number of columns. Since the first vector of V is b, we have, min norm(V(Hy-e1)) where e1 is the vector with 1 in the first element and zeros underneath. Since V is orthonormal, it does not affect the norm, so we have min norm(Hy-e1). Since H is upper Hessenberg, we can transform it into an upper triangular matrix using Givens rotations, which are alluded to in the linked article, which gives essentially a QR factorization of H, cheaply. This gives us min norm(QRy - e1). Since Q is orthonomal, it doesn't affect the norm, so we have min norm(Ry - Q'e1). Since R is upper triangular, we can solve a linear system and this is cheap. And, here we have an incomplete description of the algorithm GMRES. We go through this trouble because in many important contexts this is vastly cheaper than computing a factorization of the original matrix like LU or computing Gaussian elimination. This is one of many places where upper Hessenberg matrices are found, hence, the article above.

By the way, if anyone enjoys this kind of trickery, these are the kinds of algorithms studied in more traditional applied mathematics degrees, as opposed to pure math or statistics. It doesn't mean that they're not taught elsewhere, but it's kind of their bread and butter. A better description of the above algorithm for GMRES can be found in Saad's book, "Iterative Methods for Sparse Linear Systems", which is free to download.

diracs_stache · 2023-12-06T19:16:39

Please enjoy this upvote, nice to see the full forest of application from struggling with something like LU decomps by hand