
Principal Component Analysis Explained Visually (2015) - espiii
http://setosa.io/ev/principal-component-analysis/
======
aqsalose
Nice visualization! This provides me an opportunity to go on a random tangent
on PCA:

The post considers PCA from visualization perspective, but the exactly same
thing can also be viewed as a method for reducing number of dimensions in the
original dataset. [1] Now, one of the interesting questions in a
dimensionality reduction task is, how to pick the number of dimensions
(principal components)? A _good_ number? In a principled way, instead of just
computing the next component and the next and the one after that, until you
get bored? (It works for visualizations where you often want only the first
two or three components anyway, but suppose we want more information than
plots.)

I recently learned that there's a _fascinating_ way to do this, presented in
Bishop's paper [2] from 1999. In short: this can answered by recasting the PCA
as Bayesian latent variable model with a hierarchical prior. (Yes, it is a bit
of mouthful to say. Yes, it is fairly mathematical, unlike the visualization.)

[1]
[https://en.wikipedia.org/wiki/Dimensionality_reduction](https://en.wikipedia.org/wiki/Dimensionality_reduction)

[2] [https://www.microsoft.com/en-
us/research/publication/bayesia...](https://www.microsoft.com/en-
us/research/publication/bayesian-pca/)

~~~
autokad
you can look at it in terms of reconstruction error

~~~
alexcnwy
Yeah PCA will give you eigenvalues of the PCs in descending order of variance
explained so just summing and weighting those tells you the first 3 PCs
explain say 93% of the variance in the data.

~~~
Houshalter
But the question is when exactly to stop, because the reconstruction error is
always going to get lower. The bayesian solution, IIUC, is something like
"stop when information required to store the new PCs is more than the
information gained from the reduced reconstruction error."

The only issue with this is, if you get tons of data then there will be less
uncertainty in the principal components. And so it will recommend as many as
possible, even if they only decrease the reconstruction error a tiny bit.

------
vicapow
Hello, HN. Co-author, here. Surprised to see this pop up again!

You can find the source code here: [https://github.com/vicapow/explained-
visually](https://github.com/vicapow/explained-visually)

Wish I had the free time to work on these more.

~~~
theglider
I really love these visual explanations, and wish I had more at degree level.
Do you think these will ever permeate traditional higher education?

------
lewis500
Hey HN. Co author here. Crazy to see this up here again. Anyway, just letting
y'all know I finished my phd and teaching for now so there will be a lot more
visualizations like this coming out this summer. Will get Vicapow back in the
game for one last score, too. 1\. PDE's 2\. Lorenz attractor with a waterwheel
in threejs 3\. Macroscopic fundamental diagram theory of traffic flow in
cities 4.??? 5\. Profit

~~~
lxe
I am especially excited to see more great content from setosa.io

------
gabrielgoh
I love this visualization - but I think there's a very different intuition you
get from PCA in high dimensions.

I prefer to think of the singular vectors in PCA as an ordering of "prototype
signals" for which some linear combination best reconstructs the data. That
explains, for example, why the largest singular vectors on natural time series
data gives fourier like coefficients, and why the largest singular vectors on
aligned faces gives variations in lighting.

------
holografix
Very nice simple post. One of the topics that always makes me scratch my head
as it wasn't directly applicable to training a mode in Andrew Ng's Machine
Learning course

------
omginternets
I remember PCA really "clicked" when I saw this intuitive explanation:
[https://youtu.be/BfTMmoDFXyE](https://youtu.be/BfTMmoDFXyE)

------
dang
Previously at
[https://news.ycombinator.com/item?id=9040266](https://news.ycombinator.com/item?id=9040266).

------
pen2l
As someone who knows nothing about machine learning and nothing about PCA
(well, until now :)), can someone please explain how the two relate to each
other? Is one of them a subset to the other, or what?

~~~
Longwelwind
Machine learning (and more specifically here, supervised learning) is about
predicting a specific attribute of a new sample, based on the attributes of
the samples that you've acquired. For example, if you have access to the
database of the clients of a bank, containing their attributes such as their
income, their age, their occupation and whether or not the bank accepted to
give them a loan, you may want to create a system that based on this database,
can predict whether a new person will get a loan or not.

It happens that having too much different features is not necessarily a good
thing, in a phenomenom called curse of dimensionality.

Due to this, we are interested in trying to reduce the number of attributes
our algorithm will process. There are two big categories of methods to do
that: feature selection and feature extraction.

In feature selection, you try to select the attributes that are the "best" to
predict your value. For example, computing the statistical correlation between
the attributes and the value you want to predict, and choose those with the
highest correlations.

In feature extraction, you create new attributes that are a linear
combinations of the original attributes. PCA is a feature extraction
algorithm.

~~~
pen2l
Beautiful explanation, thank you.

------
KhalilK
PCA can also help you study the relationships between variables.

I recently used it for a class project to explore the distribution of certain
French cities in regards to socio-economic variables.

[http://khalil.kacem.xyz:3838/#section-
variables](http://khalil.kacem.xyz:3838/#section-variables)

You can see that Security and Economic Activity are opposite for example.

------
Quasimoto3000
This is great, thanks for sharing.

