Hacker News new | past | comments | ask | show | jobs | submit login

I'm always a bit confused about the CPU limit (for the pod), some guides (and tools) advice to always set one, but this one [0] doesn't. Ops people I worked with almost always want to lower that limit and I have to insist for raising it (no way they disable it). Is there an ultimate best practice for that?

[0] https://learnk8s.io/production-best-practices




CPU limits are harmful if they strand resources that could have been applied. I usually skip them for batch deployments, use them for latency-sensitive services. Doesn’t seem like a security topic though.


They are actually even worse for latency sensitive workload because cfs with 100ms default period will cause crap tail latency (especially for multithreaded processes such as most go programs)


Interesting. It's my impression too. I understand that CPU limit will artificially throttle CPU, when not necessarily needed, wasting CPU cycles I could use. (Java programs in my case but I imagine it's comparable to Go ones)

Do you recommend to disable CPU limit? In the general case.


We don’t set them anywhere in prod and generally didn’t have any issues. We always set cpu requests and alert if those are exceeded for prolonged periods and always set memory req=limit


Yes, but I put limits on LS workloads because I expect them to have a capacity plan, stick to it, and not abusively starve out batch workloads.


I think this is backwards. How are you planning on “sticking to it” when you’re serving unpredictable user traffic? If requests are set appropriately everywhere then it won’t really starve batch as kernel would just scale everything to their respective cpu.shares when cpu is fully saturated. This would allow you to weather spiky load with minimum latency impact and minimize spend


It's weird that apparently you are a borg user from google, according to other discussions we have exchanged, but you question the value of hard-capping for latency-sensitive processes.


Borg sre even ;) (former) and yes i do question them. For one borg aint using 100ms cfs period and it wasn’t even standard cfs if i recall so yes i do question that outside of limited borg usecase




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: