There's also the Moore-Penrose pseudoinverse which also has plenty of applicaitons and works on non-square matrices.
This leads to the question: for a non-symmetric square diagonalizable matrix, what is the difference between the eigenvalues and the singular values? A good heuristic for applications is that the eigenvalues capture the asymptotic behavior of repeating the map, i.e. the behavior of A^k as k goes to infinity, while the singular values capture the transient behavior of applying the map once.
For example, consider the matrix
A good book on these topics is "Spectra and Pseudospectra" by Trefethen and Embree.
However, they're still an extension in the sense that singular values do correspond to (square of the) eigenvalues of the square matrix A A^T and A^T A.
I find it fascinating that A = U∑V^T works on ANY matrix A, with U and V being made of orthogonal vectors (the left and right singular vectors of A)... it's like saying that all matrices are scalings and projections when viewed with appropriate bases for the input and output spaces (U and V).
> we can find the left and right singular values
Interestingly the (nonzero subset) of the left and right singular values happen to be the same... I don't have a useful intuition about how to explain this. Does anyone know why A A^T and A^T A have the same eigenvalues?
It makes sense to talk about the eigenvalues of an operator on any vector space. But in order for singular values to make sense the vector space needs to have an inner product.
So eigenvalues are more general because they make sense even when you haven't chosen an inner product on your vector space.
From an HN perspective this kind of consideration is relevant when working on a data set in which there are some arbitrary units. You don't want your model predicting house prices to work differently depending on whether you choose to measure floor size in m^2 or ft^2.
(1) A rotation in the source space, (2) an axis-aligned non-uniform scaling (if necessary adding/removing dimensions to match the destination space), and (3) another rotation in the destination space.
Rotation is only a meaningful concept when you have a metrical structure (think of the full geometry of Euclid’s Elements including circles, perpendicularity, lengths, angles), so if you don’t have a metrical structure the SVD is likewise not really meaningful.
Often spaces we deal with using linear algebra only have an affine structure (parallelism is well defined, and lengths can be compared when they are along parallel lines), but do not have a metrical structure (so there is no meaningful way to compare lengths pointed in different directions or measure angles). For example: there is no meaningful concept of the angle between the directions of 5 miles/gallon and 10 miles/gallon, and if you arbitrarily defined one, it would change when you switched from gallons to milliliters or from miles to meters.
In practice people still often arbitrarily impose a metrical structure, sometimes based on some heuristic analysis of the data involved, and then use tools like the SVD based on that.
But the definition of A^T or A^* depends on a particular choice of basis, and is not a coordinate-invariant concept. If one had an inner product <,>, the adjoint can be defined in a coordinate-independent way using that inner product: A^* is the operator defined by
<Au,v> = <u,A^v>
for all vectors u and v. (One can check that A^ is uniquely defined.) For a different choice of inner product, one gets a different A^.
Note that the operator A^TA comes up naturally when one considers what the action of A does to the length of vectors:
|u|^2 = <u,u>
|Au|^2 = <Au,Au> = <u,A^Au>
Geometrically, the SVD tells us how a linear transformation dilates or contracts space in different directions, e.g., think about what a linear transformation does to a sphere. But all these concepts -- length of vectors, spheres, ellipsoids (images of spheres under linear transformations) -- depend on a choice of inner product.
The norm is defined in terms of the inner product.
On the other hand, eigenvalues are defined with using only the transform and scalar multiplication, which does not require an inner product.
of which the "Low-rank matrix approximation" is the most important one (it's like looking inside the matrix, seeing its significant components, and zeroing out the remaining ones to save space). See also PCA in statistics.
2. 1976 video about SVD https://www.youtube.com/watch?v=R9UoFyqJca8
that shows a visualization for an algorithm for how to compute it.
3. Good two-part blog post series
It's such an important operation that I'd say understanding it changes how you understand a lot of linear algebra. That Kun post is also good.
You can use it to search for words and find related texts even though those texts do not contain the actual words you searched for. Or you can use it to find similar texts, even though important words may differ.
Not sure how relevant LSI is these days, not my field at all, but mapping words to vector spaces and using SVD like this kinda blew my mind a bit when I stumbled upon it many years ago.
- Principal component analysis
- Fitting a plane to a set of points
- Linear least squares
This had been criticised for decades. Competently and popularly presented criticism goes back to at least as far as 1957 (Artin's Geometric Algebra; see discussion of determinants somewhere near the beginning) but linear algebra is still often presented decoupled from geometry.
I wonder though if there's purely algebraic approach to matrices that explains as much (or more) as geometric one. Maybe approaching algebra of matrices consistently as an example of category algebra could be illuminating.
I don't think it was a pedagogically a problem, except I couldn't bring myself to care about matrices when I was learning them... It was very easy to take my abstract knowledge and apply it, and for me it might have been harder the other way around.
In retrospect a hybrid applied/theoretical topic (like reed Solomon encoding and recovery) might have perked me up. But I might have also been a strange case.
It is pedagogically a problem for people who ask questions like the one in topic. But it can also be a problem when one encounters bilinear form and linear operator in practice but can't distingish between the two. I can't think of a specific example but I was asked once about some problem (in electrical engineering, iirc) where the source of confusion was this; some transformations of a (square) matrix were natural while others were not.
Some people feel strongly about the topic—mostly those with “pure math” inclinations.
As such, we have more rules and theorems around them, making them suitable for practical usage. For example, physics deeply utilizes Hermitian matrices, which are a special subclass of square matrices. Spectral theory is incredibly important in all sorts of applications.
However the person actually asked why it is important for matrices to be square for most of the theorems they are learning. The actual answers to this are interesting, but the presence of this question likely implies that they are being taught abstract theory first prior to building much intuition.
I was taught this way as well in my advanced pure math course. It was all super abstract until I was studying for the final exam and then had this eureka moment where suddenly everything made sense (a matrix is just a numerical way of describing a linear transform! And computing eigenvectors is like factoring!). Sadly, this happened again the next term where we derived SVD - except this time it never made sense to me until a later on course where we needed to use SVD for some application.
As linear transformations of space into itself, a very frequent operation, are described by a square matrix those matrices do show up more frequently. But map X into Y of different dimensions and you get a non-square matrix
Axler, and a few others, say this straight away in the intro. This makes things much simpler to understand instead of developing linear algebra historically from systems of linear equations.
there's a fantastic historiography of linear algebra and why we're stuck in this situation here: https://www.youtube.com/watch?v=C2RO34b_oPM
Calculus is also a bit confusing as for historical reasons we use a lot of notation from Newton-Leibniz informal infinitesimal approach, but we try to teach Bolzano-Weirstrass formal analysis, ending up with a confusing potpurri.
This is true, but it's also something that is arguably impossible on the one hand and "unwise" on the other.
You can easily map a higher-dimensional space into a lower-dimensional space, but you will irretrievably lose a lot of information when you do so.
And in the other direction, you can't actually map a lower-dimensional space into a higher-dimensional space with a matrix. The image of X can never have more dimensions than X does -- the choice to represent it with Y > X dimensions is just that, a representational choice. This idea is only meaningful in terms of the semantics behind the representation.
If I understand "unwise" to refer to a linear map into a lower-dimensional space: of course that's something you'll often want to do! Suppose, for example, that most of the interesting structure of your data is close to lying on a n-dimensional subspace of R^m, with n<m. Constructing a clever linear map from R^m to R^n can be very wise and useful!
> And in the other direction, you can't actually map a lower-dimensional space into a higher-dimensional space with a matrix. The image of X can never have more dimensions than X does -- the choice to represent it with Y > X dimensions is just that, a representational choice.
Any matrix involves a choice (of basis), so that complaint is moot. For sure you can have a linear map whose codomain is higher-dimensional. You are correct that the image can't be higher-dimensional, but that doesn't prevent the existence of such a map.
For example, an x-ray computed tomography (CT) image volume in 3D may be projected into various 2D synthetic planar x-ray projections (digitally reconstructed radiographs).
There are countless situations where projections are very useful.
The third dimension is discretized while the other two are continuous; the reconstruction consists of smoothing out the third dimension.
If you're solving a linear system of equations for example: you need a square matrix. If you don't either your system is underdefined or you have redundancies or contradictions into it.
(And one could say algebra with numbers is a special case of a 1x1 matrix, but anyway...)
So worry about square matrices, about 1xN and Nx1 matrices (which are actually vectores) and the weird shapes are weird shapes
Your explanation is bound to confuse people far more than help, as it seems to mystify something that is very mundane.
Think of it as type casting in programming languages, you only do it when you need, but the real work is done processing elements of the same type.
If that wasn't the case, then non-square matrices would have all the "cool" properties of the square ones.
Yes I might be picking at straws here, and I agree it might confuse people even more, but this weirdness of matrices hasn't escaped me since I've learned about it. And then when you get to vectors it all makes sense. Hum...
Of course it's all they "can do". They're merely a way of writing down linear maps for a given choice of bases.
My point is, the sub-algebra of all square matrices supports more operations than mixing matrices of different sizes.