

A Beginner’s Guide to Eigenvectors, Covariance, PCA and Entropy - vonnik
http://deeplearning4j.org/eigenvector

======
alephone
"square matrices have as many eigenvectors as they have linearly independent
dimensions; i.e. a 2 x 2 matrix would have two eigenvectors, a 3 x 3 matrix
three, and an n x n matrix would have n eigenvectors, each one representing
its line of action in one dimension."

This is not quite right. First of all, if v is an eigenvector of A, then a*v
for any non-zero a is also an eigenvector of A, so it makes no sense to say
"there are n eigenvectors". Perhaps you meant to say n linearly independent
eigenvectors?

But this is also not correct. An n x n matrix does not necessarily have to
have n linearly independent eigenvectors; consider for instance A = [1, 1; 0,
1], which has dim N(A - I) = 1 (hence 1 linearly independent eigenvector).
Basically, not every matrix is diagonalizable. This example also shows that
rank(A) is also not necessarily equal to the number of linearly independent
eigenvectors (is that what you meant by linearly independent dimensions?).

A more correct statement would be to say every symmetric matrix has n linearly
independent eigenvectors (and its eigenvectors can be chosen to form an
orthonormal basis for R^n).

~~~
vonnik
Thanks for making that point. The statement "square matrices have as many
eigenvectors as they have linearly independent dimensions" is meant to inform
the statements that follow.

~~~
jordigh
This statement makes no sense. Independence is not a property of dimensions. A
set of vectors is said to be independent if the only linear combination of
them that is zero is the trivial linear combination. Otherwise, they are
dependent.

Or do you have a definition of independence for dimensions? That would first
require a different definition of dimension. Ordinarily, dimension is just the
cardinality of an independent, spanning set (we call this a basis). It is a
single number. A linear space does not have a plurality of dimensions, nor do
matrices have an associated plurality of dimensions.

Please don't mix technical words without defining them. In other areas of
life, humans get along fine by using very imprecise, poetic, ambiguous, and
metaphoric language. This is not how mathematics works. Everything you say
must have a precise definition. :-(

~~~
vonnik
Hi Jordi - I'm sure you're right. You have an amazing resume. Please help us
make ND4J/ND4S better.
[https://github.com/deeplearning4j/nd4j](https://github.com/deeplearning4j/nd4j)

Nasa JPL is using the framework for climatic modeling, so maybe we can help
save the planet together. :)

One response I would have to both your comment and @alephone's rewording,
which is perfectly correct, is that they are incomprehensible to most people.
So the question is: How do you bridge the gap to make non-mathematicians
understand?

The other would be: It's hard to talk about math. There's always slippage when
you try to speak in an everyday way about technical things. I'm not sure how
to get around that. Too much precision, in this case, undoes the goal of
reaching another audience.

~~~
tanderson92
> One response I would have to both your comment and @alephone's rewording,
> which is perfectly correct, is that they are incomprehensible to most
> people.

They are incomprehensible to those who do not understand the mathematics which
are being discussed.

> So the question is: How do you bridge the gap to make non-mathematicians
> understand?

By teaching them mathematics.

> Too much precision, in this case, undoes the goal of reaching another
> audience.

No. In this case what failed was the author's understanding of the criterion
for diagonalizability: the operator being normal or self-adjoint (depending on
the underlying field). It isn't hard to add a caveat emptor saying "in some
cases operators may not have a full set of eigenvectors, so it is only
appropriate to say a matrix can have at most as many linearly independent
eigenvectors as its order", provided the author realizes that this is indeed
the case.

Precision is important because there are phenomenally important cases where
operators have no eigenvectors at all! Consider an arbitrary rotation in R^2
(fun fact: in odd dimensions all real operators have an eigenvalue, why?);
clearly, this fixes no vector.

~~~
vonnik
I've added a caveat along the lines of your suggestion.

> By teaching them mathematics.

The terse complacence of this statement, and your easy use of technical terms,
make me doubt whether you know what it means to teach mathematics, to
empathize with those who lack knowledge, and ultimately to help math exit the
ghetto of jargon.

~~~
tanderson92
It wasn't meant to be complacently terse, it was meant to represent the basic
tautology. Mathematics is a language as well as a way of thinking and one
needs both since things are truly complicated under the hood. The language is
actually our friend here, it allows people to speak clearly and have near
universal understanding of what is meant. With this in mind, the correct
answer to 'how do we communicate maths to non-mathematicians'(as opposed
merely to teaching its importance which is advocacy) is to teach them the
maths. That's how I see it anyway.

~~~
vonnik
I understand the tautology. It's inherent in every field that requires
specialization. And the point I'm trying to make is that the best teaching
relies on ideas or analogies already familiar to the student. It's not enough
to say math is a language. So is French. But to learn French, we first learn
to map our native language to the new one. The analogy isn't perfect, but math
often describes things in the world, and we can use those things to provide
windows to the math itself.

While the language of math is a useful tool for the initiated, like all
linguistic barriers it keeps out newcomers. So yes, we must teach people math,
but we must do it at first, as much as possible, in a language that allows
them to _enter_ into a new way of thinking. It's about unfurling a series of
ideas one at a time instead of all at once. (I don't pretend to do that
perfectly, but that was the intent. My understanding is as defective as some
matrices. ;)

------
tsumnia
You linked it on the bottom of the page, but how do you think the article
compares with
[http://www.cs.otago.ac.nz/cosc453/student_tutorials/principa...](http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf)
?

I'm planning to read through it a little more this weekend, but I remember
using the PDF and a whiteboard back in the day to understand PCA.

~~~
vonnik
The PDF through the link is excellent. Two thumbs up.

------
madvoid
I understand the concept of eigenvectors/eigenvalues, and I understand the
concept of the covariance matrix, but I'm having a hard time understanding why
the eigenvectors of the covariance matrix represent the principal components.
Does anyone have a intuitive explanation they would like to share? Maybe I
don't understand eigenvectors as well as I thought I did...

~~~
tanderson92
The eigenvectors of any diagonalizable matrix (which any covariance matrix is,
see elsewhere on this page for a discussion of why this is the case) form a
basis for the space in which the matrix operates. The eigenvectors
corresponding to the "largest" eigenvalues are called principal components
because they are the directions in which the matrix principally act.

Consider a matrix A which has two eigenvectors u, v with eigenvalues a, b, and
a >> b. Then A(u + v) = au + bv. But since a >> b the operator is stretching u
moreso than v, and as a result we think of an operator acting 'principally' in
direction u rather than v, and hence the eigenvectors are considered the
'principal components'.

I exaggerated the case by letting a >> b, but the same holds true for a
similar in magnitude to b, but by considering multiple applications of the
matrix you see the same effect: A^n (u + v).

covariance matrices are nothing special in this regard.

------
vonnik
DL4J co-creator here: we did a tutorial on restricted Boltzmann machines last
week that people might find helpful as well:
[http://deeplearning4j.org/restrictedboltzmannmachine.html](http://deeplearning4j.org/restrictedboltzmannmachine.html)

------
bigger_cheese
Thank you for posting this. It was fortuitously timed I am reading through a
thesis at the moment where the author has used PCA along with something called
a PLS (Projection to Latent Structure) to perform a multivariate analysis. I
was hoping I could use this technique on my own dataset

I was in the process of googling the acronyms when I stumbled on this link in
HN.

------
tptacek
This is extraordinarily well written.

------
jcadam
Nice. I was futzing with covariance matrices in Scala today at work and
realized I really need a refresher on some of this stuff (college was over a
decade ago).

Fortunately, we have a couple of scientists on staff I can usually bug for
assistance when I get confused.

------
bcheung
Very simple to understand and straight forward. Thanks for writing this up.

------
stared
This "entropy" part is missing - no pictures, no formulae... Delete it or
elaborate carefully. (Other parts are great!)

------
abhisheks91
Very well written !! Thank you !

