Hacker News new | past | comments | ask | show | jobs | submit login
PCA is not a panacea (2013) (danluu.com)
10 points by herecomethefuzz on March 13, 2024 | hide | past | favorite | 4 comments



Not discussing the trivial example (since for any model there exists a distribution and a dataset on which a model will not perform). Just a general thought. Intro to ML teaches us this: if we want to "learn" a hypothesis class reasonably well with a finite sample, the class shouldn't be too complex. Otherwise we lose precision and/or any guarantees. This implies that for any DL algorithm A(S) on sample S, there exists a data transformation g such that B(g(S)) will result in a lower* risk for some simpler algorithm B. The question is not whether linear models are good or bad, but how complex/expensive is the transformation g.


Thanks for posting this.

I work in data science and I haven’t seen anybody claiming that PCA, an unsupervised technique, is a good classification (supervised) technique. I think this is the framing that the author is pursuing.

That said, some candidates might mention clustering because it’s easy to understand and you can apply an action (this group is high but also good customers so they get a special treatment).


I think PCA plus nearest neighbor was one of the earliest effective face recognition algorithms aka the eigenfaces method.


Finding rotations that maximize variance is not that far from euclidean based clustering.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: