

Principle Component Analysis - karamazov
https://datanitro.com/blog/2012/08/17/PCA/

======
lbarrow
The author is basically using a linear algebra tool for creating orthogonal
basis vectors of a matrix of stock prices. (The PCA is like eigenvector
decomposition, but it works on rectangular matrices too. In fact, unlike many
operations, it's very fast on unbalanced rectangular matrices!) Since these
vectors are, by definition, uncorrelated, they can be very useful in building
CAPM-balanced stock portfolios.

Using the PCA is great in this situation, but people often run into traps when
using these sorts of spectral-decomposition methods on real world data.

The most obvious is that they try to interpret what the vectors "represent".
Sometimes this is reasonable -- if you did a similar experiment on the stock
price of energy companies, the strongest vector probably really would be
closely correlated with the price of oil. But aside from unusual situations
like that, interpreting the "meaning" of spectral vectors is a fool's errand.

~~~
mturmon
It's often true that you can figure out what the first handful (say, 3 to 6)
PCA components mean, in a large problem.

The first is usually the mean of the quantities. It is typical in practice to
compute PCA by using the SVD of the data itself; if you subtract the mean
first, then of course it will not appear as the first PCA component. In
matlab, this is literally a one-liner using the svd of the original data --
not even forming a covariance matrix.

No, the people who do this don't care if you know what the Karhunen-Loeve
decomposition is, they just use the one-liner:

    
    
      [U,D] = svd(X)
    

Anyway, after the mean, then you get the varying components. It's smart to
plot these somehow, to interpret meaning.

The post should have plotted the time history of the 6 stocks together with
the time history of each PC, then some pattern might have suggested itself.
The first PC could be as simple as "GOOG,AMZN,AAPL,AKAM going up, MSFT steady,
and FB going down", given the stocks mentioned and their weighting.

The classic examples are (mentioned elsewhere on the thead) the eigenfaces
example (on Wikipedia), where PCA was used for faces, and various features
like eyes, foreheads, and mouths are emphasized, plus "second-order" features
like edges _around_ the eyes, noses, and mouths. If you try it yourself, what
you find is that adding more of these second-order features to a face
(literally, adding, as in:

    
    
      new_image = old_image + alpha * second_order_feature
    

where alpha is a small scalar) will shift the nose left or right, or make the
mouth bigger.

People have done the same thing with natural images, and out pop things like
2d wavelets (the Gabor filters, <http://en.wikipedia.org/wiki/Gabor_filter>).
It's somewhat magical, because you went in with no information, and out pops
this structure, which also characterizes (surprise!) the human visual cortex.

Other classic examples are in atmosphere/weather analysis, where ENSO ("el
nino") will pop out of analysis of temperature and pressure fields in the
Pacific ocean.

~~~
psb217
FYI, Gabor-like filters pop out from doing ICA (i.e. Independent Components
Analysis), not PCA. While PCA looks for orthogonal vectors onto which the
data's projection is normally-distributed (among other properties), ICA,
roughly speaking, looks for a set of orthogonal vectors onto which the data's
projection has maximal kurtosis (among other properties).

It is the kurtosis-maximization of ICA that tends to produce filters mimicking
those found in (early layers of) visual cortex. Hence, the production of such
filters by techniques like "sparse coding" and "sparse autoencoders", which
explicitly pursue highly-kurtotic representations of the training data. PCA,
on the other hand, tends to produce checkerboard (i.e. 2d sinusoidal) filters
of various frequencies when trained on "natural image patches".

See: "The 'independent components' of natural scenes are edge filters" by Bell
and Sejnowski, 1997.

~~~
mturmon
Thanks for the reminder. I was thinking of this 1991 paper, which I ran into a
long time ago:

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.192)

They used a (linear) "neural network" with gradient descent training that
implemented PCA (kind of an iterative graham-schmidt process), and got Gabor-
like filters. I think a lot of people have done similar experiments, with
varying results.

~~~
psb217
I hadn't seen that paper before; thanks for the reference. I read through it
and saw that they were reweighting the sampled image patches with a Gaussian
mask prior to learning, which explains how they got Gabor-like filters. The
masking effectively forced the learned filters to have localized receptive
fields, while locality/nonlocality is generally one of the (visually) clearer
differences between filters learned with ICA/PCA.

In other words, the Gaussian-modulated part of Gaussian-modulated sinusoids
was built into their learning process, rather than appearing as an emergent
property. I also chuckled a bit when they described how computing eigenvectors
for 4096x4096 matrices was "beyond reasonable computation".

------
btilly
PCA is a very useful tool in lots of places. But be warned that when you use
it on stocks, you'll find correlations, make your investment, then discover
that during a financial crisis all sorts of things that were not previously
correlated, now are. Thus your analysis falls apart at exactly the moment you
would least want it to do so.

Incidentally if you take answers to a wide variety of questions that are meant
to test intelligence, how the component of your score on the first component
on a PCA analysis should be fairly well correlated with IQ or your SAT score.
The second component should be reasonably well correlated to the difference
between your math and verbal scores on the SAT. And people have much less
variability on the third component than on the first two.

~~~
agentq
In financial practice, asset-level PCA isn't as common, especially in systems
where covariance estimation is fraught with misspecification errors. Instead,
individual securities first condensed to factors (e.g., for equity some
examples are book/price, momentum, large vs. small cap, etc.).

~~~
azmenthe
Yes. The fund I work for has a very successful track record and we take all
PCA (on factors) with a huge grain of salt.

Also any strategy that has more than a 20% thesis alignment on PCA (on
factors!) is most likely laughed at.

------
robert00700
Nice to see PCA in an HN article, it's a very powerful tool.

For those struggling to get the example in this article, I find PCA easier to
understand given visual examples, and in less dimensions (try
<http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png>)

Note how this dataset is two dimensional in nature, and PCA yields two
vectors. The first gives the direction of the greatest variation, and the next
gives the variation orthogonally to the first.

An awesome use of PCA is for facial detection, a method called 'Eigenfaces'
<http://en.wikipedia.org/wiki/Eigenface>

~~~
j2kun
I wrote a blog post with more detail, and lots of intuitive examples. see
<http://jeremykun.wordpress.com/2011/07/27/eigenfaces/>

------
tel
PCA goes far deeper than meets the eye. For instance, it's a well-known
phenomenon that too much dimensionality can actually drive predictor
performance to random, but PCA can mitigate that. It's a basically the bread
and butter of practical unsupervised learning.

~~~
mturmon
"bread and butter of practical unsupervised learning" -- true, although I
might have said "exploratory data analysis".

If you can make a vector out of it somehow, it can't hurt to try PCA. Because
you don't have to figure out some fancy tailored model, or really (cough,
cough) understand much about the data at all. (It sounds like I'm being
sarcastic, but I'm serious -- sometimes all you want is a quick look.)

Unsupervised clustering is a similar technique.

~~~
tel
I find that more often than expected, PCA (or maybe MDS) gets a majority of
the performance of any kind of unsupervised method. If you're really
interested in exploring the data and methodologies, then PCA is a poor
stopping point... but if you just want it to work, it's surprising how well
PCA tradeoffs are good tradeoffs.

All the obvious caveats apply to that whole line of thought, though.

------
misiti3780
PCA can also be used for compression

[http://www.willamette.edu/~gorr/classes/cs449/Unsupervised/p...](http://www.willamette.edu/~gorr/classes/cs449/Unsupervised/pca.html)

Also worth noting is apache mahout supports PCA - you can perform this type of
analysis on large matrices pretty easily these days

------
mturmon
This expository post lined up the 6 stocks and computed the SVD of the time
history of all 6 together. This shows how the 6 stocks correlate.

You can do it another way. Run a sliding window across one single stock, line
up all the resulting vectors, and then take the SVD of (err...apply PCA to)
that. That is, if you started with a single-stock time history:

    
    
      x1, x2, x3...
    

then form:

    
    
      z1 = [x1 x2 x3]
      z2 = [x2 x3 x4]
      z3 = [x3 x4 x5]
    

etc., and use PCA on the z's instead of the x's. (In practice, you'd make the
z's much longer.)

This will extract seasonable variability (on all kinds of scales -- not just
annual). One name for it is Singular Spectrum Analysis
(<http://en.wikipedia.org/wiki/Singular_spectrum_analysis>)

------
eykanal
For what it's worth, the best PCA tutorial I've seen online is this blog post,
which uses plots to describe the technique:

<http://stats.stackexchange.com/a/2700/2019>

PCA is nothing more than a "basis shift", or changing where the x and y axes
are placed. This image-based tutorial makes understanding very intuitive.

