Not discussing the trivial example (since for any model there exists a distribution and a dataset on which a model will not perform). Just a general thought. Intro to ML teaches us this: if we want to "learn" a hypothesis class reasonably well with a finite sample, the class shouldn't be too complex. Otherwise we lose precision and/or any guarantees. This implies that for any DL algorithm A(S) on sample S, there exists a data transformation g such that B(g(S)) will result in a lower* risk for some simpler algorithm B. The question is not whether linear models are good or bad, but how complex/expensive is the transformation g.
I work in data science and I haven’t seen anybody claiming that PCA, an unsupervised technique, is a good classification (supervised) technique. I think this is the framing that the author is pursuing.
That said, some candidates might mention clustering because it’s easy to understand and you can apply an action (this group is high but also good customers so they get a special treatment).