Hacker News new | past | comments | ask | show | jobs | submit login
Principal Component Projection Without Principal Component Analysis (arxiv.org)
117 points by beefman on Feb 24, 2016 | hide | past | web | favorite | 16 comments

I don't really see a compelling case for this method. The empirical convergence rates are clearly polynomial, while subspace iteration (e.g., Krylov) are "in theory" exponential and in practice pretty good. There is a bit of hand-waving about why Kyrlov methods are not as good (the theory is supposedly not robust to noise), but the techniques for robustifying Krylov-type subspace iterations, such as restarts, are pretty mature.

More than that, the method actually appears to be an iterative "prox" method. These things are very well studied in the convex analysis literature. I wouldn't be surprised if this already appears as a special case of an algorithm in the literature somewhere.

A bit of a meta question, but what is the current thinking regarding the use of PCA in understanding data(sets)? My recollection is that PCA was somewhat en vogue in astronomy 8–10 years ago. I saw it applied to various mid-infrared data, but it seemed difficult to actually translate the principal components into useful physical knoweldge about the datasets or the astronomical objects. Since then, I rarely see astronomy papers with PCA analysis, and even then, the PCA analysis doesn't seem to contribute much to the physical understanding of the objects being studied.

Is this just a case of PCA being ill-suited to the analysis of these datasets in astronomy? Or is it a more general problem that PCA can reduce datasets to arbitrary component vectors but those vectors may not contain easily-quanitifiable physical information (but might contain predictive power, if an understanding of the underlying physical system is not the goal)?

The thing with PCA for dimensionality reduction in regression problems, specifically, is that it carries an assumption: that small variations in your predictor variables will not account for large variation in your dependent variable. This may or may not be justified. If you regard small variations as noise then by all means throw them away; if you don't, then you risk throwing out a minor component X that has a major influence on Y.

Regularized regression such as ridge regression can achieve the same thing without the risk of throwing away small but significant variation. So I think a current trend is to replace PCA+regression with ridge regression. This paper seems to take that trend a step further by replacing PCA itself with ridge regression even when the desired outcome is PCA, which is neat.

PCA and similar techniques are quite useful for exploratory data analysis in bioinformatics, in particular determining which experimental factors or batch effects have the largest influence on the data. Generally, if you can't identify your effect of interest in the first few principal components of a gene expression dataset, you're going to find few, if any, significant genes.

It's used in many quantitative trading models [1]. However, the time it takes compute the eigenvectors is irrelevant to the field because the stocks correlation matrix on which the which the PCA is applied is generally a long term correlation matrix that doesn't need to be updated too frequently.

[1] https://www.math.nyu.edu/faculty/avellane/AvellanedaLeeStatA...

PCA is useful as a pre-processing step to reduce the size of a data set to be used as the input to a machine learning algorithm.

But then do you have to PCA all your input test data in the future too?

the output of a PCA process is a system of linear combinations of the data. so yes, you'd take the principal components of your future inputs, but the process would be super fast (it's a linear transformation with known parameters) since you don't need to rerun anything.

you can also do online PCA (updating the model as you get new data) but i'm not sure about runtime/computation requirements.

I think the paper disagrees w/ you: "Computing principal components can be an expensive task..."

The result of PCA is a linear projection matrix. Applying that projection is fast.

You learn the projection from an initial dataset. That is "slow". You apply the projection to new data. That is "fast".

thanks for explaining

I think you misunderstood the OP. Once you have the PCs (and loadings) for a dataset converting new data into the same axis system is just a linear conversion.

One alternative to look at for some datasets is nonnegative matrix factorization. It's similar to PCA, but every element of each basis vector has to be positive. This tends to make the results a lot easier to interpret.

Basically, the purpose of PCA is to reduce the dimensions of the dataset by finding components that most influence the data. I think your last sentence is quite accurate to describe what PCA is.

Some background on classic PCA I just wrote uphttp://www.eggie5.com/69-dimensionality-reduction-using-pca interesting read in light of this...

NIPALs has been available for decades to do this. I don't see it referenced in this manuscript.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact