I don't think anyone actually doing NLP research has thought that BERT is always...

I don't think anyone actually doing NLP research has thought that BERT is always better than simpler methods. Linear classifiers with ngrams, or even better, large margin linear classifiers, are well known to be competitive with things like BERT on a variety of tasks, with orders of magnitude better runtime.

In contrast, this gzip technique is considered a cute application of information theory, but even in academia is rarely included in studies because there are simpler and better techniques for NLP.

Yes, if you are chasing the ultimate accuracy, then using a LLM (not necessarily BERT either) is going to be the best. But for a practical system trading some accuracy for vastly improved runtime is usually a very good trade-off. And again, it depends on your domain. Topic classification, stick with a linear model. Sentiment analysis? Ok, here a LLM actually gives substantially better results so it's worth the extra cost if sentiment is crucial to your application.

I personally like the CW algorithm I mentioned because it's relatively easy to implement and has excellent qualities. But if I were a dev looking for a ready to go already implemented production system I'd go for vowpal wabbit and move up to a LLM if I'm not getting the accuracy I need for my application.