
A tutorial on Principal Component Analysis - tehnokv
https://tkv.io/posts/tutorial-on-pca
======
presscast
Relatedly, from a colleague for whom I have _a lot_ of respect: [http://gael-
varoquaux.info/science/ica_vs_pca.html](http://gael-
varoquaux.info/science/ica_vs_pca.html)

~~~
mayankkaizen
Your colleague is actually quite famous in ML space. You are lucky to have him
as your colleague.

~~~
presscast
Oh, I know!

(To be fair, I should have said _former_ colleague, back when I worked at
NeuroSpin in 2010-2012).

------
rundigen12
"By defining \mathbb{E}\left[\mathbf{x}\right]=\muE[x]=μ, ...and using the
linearity of the expectation operator \mathbb{E}E, we easily arive [sic] to
the following conclusion..."

Yikes. You don't define that \mathbb{E} was an 'expectation operator', or what
an expectation operator even does, or the fact that it's linear. The v's
disappeared somehow from inside the square brackets -- maybe you meant
\muE[v]=μ?

So far this "tutorial" isn't defining its terms very well. I'm lost and it's
only the very beginning.

~~~
IngoBlechschmid
"E[foo]" is syntax to mean the expected value of the random variable foo,
roughly speaking the mean value. (For instance the expected value of a dice
roll is 3.5. The terminology is slightly suboptimal, since we will never
expect a dice to come up 3.5.)

Hence the "E" itself is called an "operator". It can be applied to a random
value in order to yield its expected value. You can read up on it here:
[https://en.wikipedia.org/wiki/Expected_value](https://en.wikipedia.org/wiki/Expected_value)

The definition "E[x] = mu" is correct, though I would write it the other way,
as "mu = E[x]", as it's the variable mu which is being defined.

The v's disappear because of a suppresed calculation:

sigma^2 = E[ (v^T x - E[v^T x])^2 ] = E[ (v^T x - E[v^T x]) (v^T x - E[v^T x])
] = E[ v^T x v^T x - 2 v^T x E[v^T x] + E[v^T x] E[v^T x] ] = E[ v^T x x^T v -
2 v^T x v^T E[x] + v^T E[x] v^T E[x] ] = E[ v^T x x^T v - 2 v^T x E[x]^T v +
v^T E[x] E[x]^T v ] = E[ v^T (x x^T - 2 x E[x]^T + E[x] E[x]^T) v ] = v^T E[ x
x^T - 2 x E[x]^T + E[x] E[x]^T ] v = v^T E[ (x - E[x]) (x - E[x])^T ] v = v^T
E[ (x - mu) (x - mu)^T ] v = v^T Sigma v.

~~~
bunderbunder
Also, the "E[foo]" notation is something you'd pick up in an introductory
statistics course. Which, IMO, means it's perfectly appropriate to use it
without further explanation in this sort of context.

It's not really reasonable to expect technical subjects like this to always be
presented in a way that's easily digestible to people who lack any background
in the subject area. This article is clearly aimed at people who are studying
machine learning, and anyone who is studying machine learning should already
have a good command of basic statistics in linear algebra.

------
IngoBlechschmid
An important application of principal component analysis is "low-rank
approximation of matrices"; very roughly, this can be used for "dimension
reduction": Substituting a system of hundreds of thousands of equations by a
system with only hundreds.

This is neatly illustrated by applying the technique to compress images, even
if all the usual compression techniques actually employ vastly different
methods. An interactive demo is here: [https://timbaumann.info/svd-image-
compression-demo/](https://timbaumann.info/svd-image-compression-demo/)

------
mitchtbaum
Which are you your other favorite forms of analysis? Mine are:

[https://en.wikipedia.org/wiki/Formal_concept_analysis](https://en.wikipedia.org/wiki/Formal_concept_analysis)

[https://en.wikipedia.org/wiki/Latent_semantic_analysis](https://en.wikipedia.org/wiki/Latent_semantic_analysis)

[https://en.wikipedia.org/wiki/Root_cause_analysis](https://en.wikipedia.org/wiki/Root_cause_analysis)

~~~
bunderbunder
FWIW, latent semantic analysis is just a particular application of principal
component analysis.

~~~
closed
And principle component analysis is a special case of factor analysis, which
definitely makes all this stuff fit together nicely!

[https://www.microsoft.com/en-us/research/wp-
content/uploads/...](https://www.microsoft.com/en-us/research/wp-
content/uploads/2016/02/bishop-ppca-jrss.pdf)

~~~
kurthr
This PPCA is really clever... allowing MLE for missing data/variance in a PCA
in genius.

