I often wonder what society would be like if we treated engineers and scientists like we do sports and entertainment and what you said I think falls loosely to into that realm of thinking.
[Scene: Outside Conference Centre. Scientists from all over the world arrive. Photographers take photos of them and fans in the crowd wave papers for them to sign.]
Woman #2: [shouting] I love you!
[Farnsworth steps out of a limo. Joan Rivers' Head is commentating at the star-studded event.]
This is a strange comment since my primary usages of PCA/SVD is as a first step in understanding latent factors which are driving the data. Latent factors typically involve all of the important things that anyone running a business or deciding policy care about: customer engagement, patient well being, employee hapiness, etc all represent latent factors.
If you have ever wanted to perform data analysis and gain some exciting insight into explaining user behavior, PCA/SVD will get you there pretty quickly. It is one of the most powerful tools in my arsenal when I'm working on a project that requires interoperability.
The "loadings" in PC and the V matrix in SVD both contain information about how the original feature space correlates with the new projection. This can easily show thing things like "User's who do X,Y and NOT Z are more likely to purchase".
Likewise in LSA (Latent Semantic Analysis/indexing) on a Term-Frequency matrix you will get a first pass at semantic embedding. You'll notice, for example, that "dog" and "cat" will project onto the new space in a common PC which can be used to interpret "pets".
> I've never seen it result in a more accurate model. I could see it potentially being useful in situations where you're forced to use a linear/logistic model
PCA/SVD are a linear transformation of the data and shouldn't give you any performance increase on a linear model. However they can be very helpful in transforming extremely high dimensional, sparse vectors into lower dimensional, dense representations. This can provide a lot of storage/performance benefits.
> NNs, etc. are all able to tease out pretty complicated relationships among features on their own.
PCA is literally identical to an autoencoder minimizing the MSE with no non-linear layers. It is a very good first step towards understanding what your NN will eventually do. After all, all NNs perform a non-linear matrix transformation so that your final vector space is ultimately linearly separable.
If you like PCA and find it works in your particular domains, all the more power to you. I just don't find it practically useful for fitting better models and am generally suspicious of the insights drawn from that and other unsupervised techniques, especially given how much of the meaning of the results gets imparted by the observer who often has a particular story they'd like to tell.
We use PCA quite a lot at my quant firm do something similar to clustering in high dimensional spaces. A simple use case would be to arrange stocks so that stocks that move similarly to one another are grouped close together.
Another use case for PCA is breaking stocks down into constituent components, for example being able to express the price of a stock as a linear combination of factors: MSFT = 5% oil + 10% interest rates + 40% tech sector + ...
You can also do this for things like ETFs, where in principle an ETF is potentially made up of 100 stocks, but in practice only 10 of those stocks really determine the price, so if you're engaged in ETF market making you can hold neutral portfolio by carrying the ETF long and a small handful of stocks short.
if you know that your data comes from a stationary distribution, you can use it as a compression technique which reduces the computational demands on your model. sure, computing the initial svd or covariance matrix is expensive, but once you have it, the projection is just a matrix multiply and a vector subtraction. (with the reverse being the same)
if you have some high dimensional data and you just want to look at it, it's a pretty good start. not only does it give you a sense for whether higher dimensions are just noise (by looking at the eigenspectrums) it also makes low dimensional plots possible.
pca, cca and ica have been around for a very long time. i doubt "their time has passed."
but who knows, maybe i'm wrong.
Accommidating as many people as possible is good, but you can never accommidate everyone. Same goes for all of these "things programmers believe about X", be it names or whatever. You absolutely need to provide working product for majority of people first and majority of people have at least two names or in this case 5 fingers.