
Principal Component Analysis for Dummies - jackkinsella
http://georgemdallas.wordpress.com/2013/10/30/principal-component-analysis-4-dummies-eigenvectors-eigenvalues-and-dimension-reduction/
======
nroman
When I was studying this in college I always found the "Eigenfaces" example
very enlightening
([http://en.wikipedia.org/wiki/Eigenface](http://en.wikipedia.org/wiki/Eigenface)).

In case you're not familiar with them, the basic idea is treating a image of a
face as a very high dimensional vector, and then doing what amounts to PCA on
a collection of them. I'm leaving off a few steps, but the resulting
eigenvectors converted back into images helped me grasp what was going on in a
much more intuitive fashion.

~~~
Homunculiheaded
About a year ago I repeated this experiment (well a very simplified version)
with MegaMan sprites, including a quick write up of the process for anyone
interested:
[http://willkurt.github.io/EigenMan/](http://willkurt.github.io/EigenMan/)

------
thearn4
Another angle: the PCA is given by computing the SVD (a more general analog of
eigenvalue/eigenvector decomposition) of a whitened representation of the
data. Some idiosyncrasies of PCA them become obvious: we can't determine if a
computed result is actually a sought result or its reflection/negative,
because the SVD is only unique up to sign variance.

This is also closer to it's actual implementation: while it's true that you do
technically need the eigenbasis of the covariance matrix, you should not
actually form the covariance matrix to get there...

~~~
Radim

      you should not actually form the covariance matrix to get there...
    

In case anyone's wondering why: it's not only because it takes extra time. The
main reason is that computing X*X^T can bring numerical instability, where a
direct SVD(X) would work just fine.

~~~
neumann
I remember this being elegantly demonstrated here:
[http://math.stackexchange.com/questions/3869/what-is-the-
int...](http://math.stackexchange.com/questions/3869/what-is-the-intuitive-
relationship-between-svd-and-pca)

------
avn2109
It seems to me that explanations of technical topics in natural, everyday
language are very valuable. People who would be turned off by a formalized
explanation and dense symbolic manipulation can still get a lot out of this.
Bravo.

------
adamnemecek
Does anyone else have an extremely hard time trying to read it due to being
blinded by the bright yellow?

~~~
blueblob
Definitely. I don't think it would be that bad on a 1024x768, but widescreen
it's really bright.

EDIT: By the way, you can make it readable with:

document.getElementById("wrapper").style.backgroundColor='white'

------
neltnerb
I like the goal, thanks for the presentation. Personally, I'd have preferred
an example on high-dimensionality data like curve fitting where this is
actually most important, but that's because I'm a nerd and I like graphs
perhaps.

FTIR data analysis is a fantastic example for PCA analysis -- each principle
factor ends up (probably) being the spectrum of one of the major real physical
components. But this is maybe too abstract?

A less abstract one might be a distribution of test scores. Your actual
dataset is "number" versus "score", and you could show two gaussians, one at a
low number and one at a high number. Then you could show that across three
exams, you always see the same scores, but with different intensities. That
would let you compute that the principle components are those two gaussians.
Then you can hypothesize that each group is a collection of students that
study together, and so they get similar scores. Or something like that.

Anyway, no intent to be a wet blanket. It's a nice writeup, and it is nice of
you to share.

------
jmdeldin
I found Lindsay I. Smith's "A tutorial on Principal Components Analysis" [1]
really useful because it covers the mathematics behind PCA but gives enough
linear algebra background for it to be understandable by those with distant or
weak math backgrounds (e.g., me).

[1]
[http://www.cs.otago.ac.nz/cosc453/student_tutorials/principa...](http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf)

~~~
Nick_C
Thanks, that is incredibly interesting, especially the example at the end
relating to facial recognition.

------
micro_cam
PCA's are cool but I find it maddening when people convert sparse data (like
counts of how many words are shared between documents) into dense distance
data to use it.

You can shortcut the whole process by finding the smallest non zero
eigenvalue/eigenvector pairs of the graph laplacian (Fiedler vectors). You
need to use a sparse solver that can find the smallest values/vectors instead
of the larges (like LOBCPG) but that is faster anyways.

------
ksk
Wow, this brings back memories. I remember using PCA for feature extraction
from image data to be used in SVM based image classification. Though as I
recall, PCA added a huge tax on the processing time and provided, in
comparison, a small boost in accuracy. (IIRC We split the data 4:1 into
training & classification)

------
hcarvalhoalves
Thank you for this primer, this is related to something I'm studying right now
and is much easier to understand.

------
dnautics
doesn't the spread-out-ness then depend on the units? If your data have unit X
on one axis, and unit Y on another, then how can you say that the "maximal-
spread-outness" is in any given direction, when you can merely adjust the
scale on one axis and alter how numerically spread out it looks?

~~~
Bill_Dimm
Yes. The author of "Information Theory, Inference and Learning Algorithms" had
this to say about it:

[http://www.amazon.com/review/R16RJ2PT63DZ3Q/ref=cm_cr_rev_de...](http://www.amazon.com/review/R16RJ2PT63DZ3Q/ref=cm_cr_rev_detmd_pl?ie=UTF8&asin=0521642981&cdForum=Fx37214P6NH2KSB&cdMsgID=Mx19BAIRVRARPZV&cdMsgNo=1&cdPage=1&cdSort=oldest&cdThread=Tx2OVZUUHW9MMJ9&store=books#Mx19BAIRVRARPZV)

~~~
joe_the_user
An interesting argument.

It seems like PCA would already be a method that would only mean something if
it was applied to comparable dimensions. What would transformed, dimensioned
variables mean anyway? Chart A=mass - 3 _charge by B = mass + 2_ charge. What
could a correlation mean.

------
therobot24
yes there's hundreds of these online - though i must admit that the images are
well done to convey the main point of dimensionality reduction, those who
don't necessarily understand eigenvectors or covariance matrices will be able
to see what's happening between the lines

------
bsaul
that's brilliant. i've studied the maths behind all this years ago, and only
now do i find an intuitive explanation of all this. many thanks.

------
AsymetricCom
These intuitive guides are great for those of us who've built an intuitive
understanding of higher math through practical application on computers
instead of formal, academic means.

