
Singular Value Decomposition Part 2: Theorem, Proof, Algorithm - alanfranzoni
https://jeremykun.com/2016/05/16/singular-value-decomposition-part-2-theorem-proof-algorithm/
======
rrmm
An excellent algorithm to know backwards and forwards.

It's one of those tools you always want to try first if you can assume a
somewhat linear approximation will work.

------
thearn4
The fact that the SVD algorithm is backwards-stable is still somehow
surprising to me, despite being very familiar with the proof. Very few
algorithms that produce more data than they take as input have this property.

~~~
coherentpony
> Very few algorithms that produce more data than they take as input have this
> property.

Although a useful metric, measuring only the amount of input and output misses
other useful characteristics.

The SVD produces more output than input, but the output is very heavily
constrained.

~~~
amelius
This got me thinking. How many numbers do you need to describe any unitary
matrix of size N?

~~~
mathpepe
In the case of matrices with real coefficients, these matrices constitute the
famous ortogonal group. The ortogonal group is a Lie group of dimension
n*(n-1)/2\. A table of Lie groups with their dimensions is here
[https://en.wikipedia.org/wiki/Table_of_Lie_groups](https://en.wikipedia.org/wiki/Table_of_Lie_groups)

Every Lie group has an associated vector space with the same dimension, so
many question about Lie groups can be addressed by studying the associate
vector space.

To suggest an application of Lie groups and their associated vector spaces, in
machine learning one can reduce the number of parameters when a group act over
a space (rotations and the like), so you obtain new features to train your
machine in a more efficient way.

------
RaptorJ
This might be a far out question, but is there a connection/analogy between
the SVD approximation and a fourier transform+low-pass filter.

~~~
thearn4
It's true that a 2D discrete FFT can also be used to produce an orthogonal
expansion of an input matrix, but a truncated reconstruction using it (like a
band-pass filtering) won't have the rank properties (e.g. optimal low-rank
approximation) of a truncated SVD-based approximation.

They are relatable in many ways if you are interested in data filtering and
analysis (viewing matrices as data structures rather than as operators in a
function space). Generally the SVD is nicer to have, but the FFT is easier to
compute (O(n^3) vs O(n^2 log (n)).

Another way to think about them:

An FFT will describe your data in terms of a fixed orthonormal basis: sine and
cosine functions of varying frequency.

An SVD will describe your data using a basis that is also orthonormal, but is
entirely based on your data.

~~~
rrmm
If you are interested in this stuff, check out Strang's Linear algebra course
lectures (mit opencourseware iirc).

He goes through SVD and FFT in terms of matrices. Really illuminating course.

~~~
thomasahle
This one? [http://ocw.mit.edu/courses/mathematics/18-06-linear-
algebra-...](http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-
spring-2010/video-lectures/) Do you know if there is a written version? Videos
are sometimes a bit harder to follow at your own pace, I find.

~~~
rrmm
Chapters from his textbook are available online.

------
mathheaven
Just a little nickpick, reading the first lines, I would prefer the author to
explain the Pythagorean theorem in R^n like this (is a very simple proof)

v . v = |v|^2, and u and v are orthogonal (u . v = 0) then |u+v|^2 = (u+v) .
(u+v) = u.u + v.v = |u|^2 + |v|^2

A nice example of a projection, if v=(1,1,1,...,1) then the projection of u
over v is mean(u)*v

------
shriphani
What an incredibly well written tutorial! Do you plan to cover some of the
work around thin/skinny SVDs?

------
thomasahle
Is there an analog of SVD where you optimize the L1 / sum of absolute errors,
rather than the squares?

