
Principal Component Analysis - olooney
http://www.oranlooney.com/post/ml-from-scratch-part-6-pca/
======
Der_Einzige
BTW for real world use if you want to do PCA but want a better solution than
an algorithm which makes linearity assumptions there are two really hot
algorithms for dimensionality reduction right now

UMAP - topology manifold learning based method

Ivis - simese triplet network autoencoder

Both of them will blow PCA out of the water on basically all datasets. PCAs
only advantages are speed and interpretability (easy to see explained
covariance)

~~~
amrrs
I'm just wondering if leaving out tSNE was intentional? I'm a big fan of UMAP
too. Just created this ^1 Kaggle Kernel a couple of days back to show how UMAP
works on Kannada MNIST data.

I also think PCA still rules in a lot of cases like when people try to create
a new index representative of a bunch of numeric variables

[1] [https://www.kaggle.com/nulldata/umap-dim-reduction-viz-on-
ka...](https://www.kaggle.com/nulldata/umap-dim-reduction-viz-on-kannada-
mnist)

~~~
nestorD
I am not the OP but I would say that UMAP is a clear superset of t-SNE being
able to do the same thing but faster and with a better conservation of large
scale distances.

------
tomkat0789
PCA is great. I like this paper where it holds its own against all the fancy
nonlinear techniques:

[https://lvdmaaten.github.io/publications/papers/TR_Dimension...](https://lvdmaaten.github.io/publications/papers/TR_Dimensionality_Reduction_Review_2009.pdf)

When have you needed something stronger than PCA? Anybody have good stories?

~~~
qmmmur
Yes. I do some creative work in audio and I wanted to cluster a fairly large
database of audio. I needed a dimensionality reduction technique to take a set
of audio descriptors down to two values (to use for coordinates) in order to
create a 2d explorable space. I actually ended up using MFCCs[1] and not
typical audio descriptors[2] for my analysis data as there are lots of issues
with making the numbers meaningful to begin with. I ended up munging to get
around 140 numbers for each audio file by taking the MFCCs, and getting
statistics over 3 derivates so that the data somewhat represented change over
time. I tried out a number of reduction techniques and PCA was one of them.
Perceptually, the groupings it produced were weak and techniques I found
useful were ISOMAP[3], t-SNE[4] and lately UMAP[5]. [4] and [5] have given me
the best perceptual groupings of the audio files.

You can see __some __of the code here on Github, although a lot of it depends
on having some audio to test among other closed source projects (sorry I have
no control over that).

[https://github.com/jamesb93/data_bending](https://github.com/jamesb93/data_bending)

[1] - [https://en.wikipedia.org/wiki/Mel-
frequency_cepstrum](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)

[2] -
[http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Herrera_19...](http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Herrera_1999_ICMC_MPEG7.pdf)

[3] - [https://scikit-
learn.org/stable/modules/generated/sklearn.ma...](https://scikit-
learn.org/stable/modules/generated/sklearn.manifold.Isomap.html)

[4] - [https://scikit-
learn.org/stable/modules/generated/sklearn.ma...](https://scikit-
learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)

[5] - [https://umap-learn.readthedocs.io/en/latest/](https://umap-
learn.readthedocs.io/en/latest/)

~~~
rotexo
Ooh, corpus based concatenative synthesis! I look forward to reading through
these links. The only features that I have had luck with are spectral centroid
(for ‘brightness’) and some measure of loudness, but I have always wanted
more. Thanks for posting this stuff!

~~~
jcims
I'm not sure if i'm parsing this conversation correctly, but 'corpus based
concatenative synthesis' reminds me of Scrambled? Hackz! -
[https://www.youtube.com/watch?v=eRlhKaxcKpA](https://www.youtube.com/watch?v=eRlhKaxcKpA)

------
objektif
Can someone knowledgable please give us examples of real life use of PCA. Not
could be used here could be used there kind of toy exmaples but actual use.

~~~
anthony_doan
If you want to do inference and hypothesis testing.

You need to save your degree of freedom. You want 10 observations to 20
observations per predictor. So you can use PCA to collapse a subset of
predictors and keep the predictors you are inferring. This will help the
sensitivity of your test.

Another thing is when you do linear regression or any modeling where it
multicolinearity is a problem. This problem is where predictors are confounder
or affect each other. PCA change basis so that new predictors are orthogonal
to each other getting rid of multicolinearity problem.

A toy example is:

student gpa, sat score, math grade, height, hour of study

Where student GPA is the response or what you want to predict.

If you apply PCA to sat score (x1), math grade(x2), height(x3), and hour of
study (x4) then it'll give you new predictors that is a linear combination of
those predictors. Some statistic book will refer this to a sort of regression.

Anyway you may get new variables as:

new_x1 = 0.4 _sat score + 1.2_ math grade

new_x2 = 0.1* height + 0.5* hour of study

These new predictors are orthogonal to each other so they don't suffer from
multicolinearity. You can now do linear regression using these predictors.

The problem is explanation, something you get grouping like height + hour of
study.

Actually just look here for example:
[https://www.whitman.edu/Documents/Academics/Mathematics/2017...](https://www.whitman.edu/Documents/Academics/Mathematics/2017/Perez.pdf)

Under "6.4 Example: Principal Component Analysis and Linear Regression"

~~~
asplake
I use it for exactly that kind of purpose - highlighting interesting relative
strengths and weaknesses in a 42-point assessment. So much better than
benchmarking against some average, with the added advantage that it will keep
finding interesting points even as scores improve.

Amazingly little code too. Numpy and Scypi are awesome :-)

------
zmmmmm
Am I missing something or is equation (9) incorrect unless the mean of the
random variable x is zero (which is never specified)?

~~~
clircle
You are correct, eqn 9 is technically wrong. But it is traditional in PCA to
de-mean the columns of X.

------
SubiculumCode
I've been using a somewhat related technique in my research: Principal
Coordinate Analysis (PCoA) also called Multidimensional Scaling (MDS) which
works on a dissimilarity matrix. See [1] for the differences.

[1]
[http://occamstypewriter.org/boboh/2012/01/17/pca_and_pcoa_ex...](http://occamstypewriter.org/boboh/2012/01/17/pca_and_pcoa_explained/)

------
alexcnwy
Excellent explanation!

I find this visualization really helpful when I need to explain PCA:
[http://setosa.io/ev/principal-component-
analysis/](http://setosa.io/ev/principal-component-analysis/)

------
Patient0
This was a great article - I'd been wanting to understand PCA. Particularly
liked the digression on Lagrange multipliers.

------
ris
Some nice intuitive explanations in there.

------
platz
anyone want to comment on how to choose between PCA and ICA?

(Some of hardest parts of ML imho is more in selection)

~~~
ska
From a practical point of view, it's really in the name: Independence. PCA is
great for finding a lower dimensional representation capturing most of what is
going on (the basis vectors will be uncorrelated but can be hard to
interpret). ICA is great for finding independent contributions you might want
to pull out or analyze separately (the basis vectors are helpful in
themselves).

PCA is very practical for dimensional reduction, ICA for blind source
separation.

You wouldn't usually use ICA for dimensional reduction unless you have a known
contribution you want to get rid of, but for some reason have difficultly
identifying it.

