I stand by it.
In an ideal world where apps are totally regular and load is equally balanced and every request is equally expensive and libraries don't spawn threads, sure. Maybe it's fine to use limits. My experience, on the other hand, says that most apps are NOT regular, load-balancers sometimes don't, and the real costs of queries are often unpredictable.
This is not to say that everyone should set their limits to `1m` and cross their fingers.
If you want to do it scientifically:
Benchmark your app under a load that represents the high end of reality. If you are preparing for BFCM, triple that.
For these benchmarks, set CPU request = limit.
Measure the critical indicators. Vary the CPU request (and limit) up or down until the indicators are where you want them (e.g. p95 latency < 100ms).
If you provision too much CPU you will waste it. Maybe nobody cares about p95 @50ms vs @100ms. If you provision too little CPU, you won't meet your SLO under load.
Now you can ask: How much do I trust that benchmark? The truth is that accurate benchmarking is DAMN hard. However hard you think it is, it's way harder than that. Even within Google we only have a few apps that we REALLY trust the benchmarks on.
This is where I say to remove (or boost) the CPU limit. It's not going to change the scheduling or feasibility. If you don't use it, it doesn't cost you anything. If you DO you use it it was either idle or you stole it from someone else who was borrowing it anyway.
When you take that unexpected spike - some query-of-doom or handling more load than expected or ... whatever - one of two things happens. Either you have extra CPU you can use, or you don't. When you set CPU limits you remove one of those options.
As for HPA and VPA - sure, great use them. We use that a LOT inside Google. But those don't act instantly - certainly not on the timescale of seconds. Why do you want a "brick-wall" at the end of your runway?
What's the flip-side of this? Well, if you are wildly off in your request, or if you don't re-run your benchmarks periodically, you can come to depend on the "extra". One day that extra won't be there, and your SLOs will be demolished.
Lastly, if you are REALLY sophisticated, you can collect stats and build a model of how much CPU is "idle" at any given time, on average. That's paid-for and not-used. You can statistically over-commit your machines by lowering requests, packing a bit more work onto the node, and relying on your stats to maintain your SLO. This works best when your various workloads are very un-correlated :)
TL;DR burstable CPU is a safety net. It has risks and requires some discipline to use properly, but for most users (even at Google) it is better than the alternative. But don't take it for granted!