Hacker News new | past | comments | ask | show | jobs | submit login

Having the system thrashing as described is worse than most alternatives, including those below for most workloads. There are probably a few workloads where thrashing is better, as maybe an operator can get in despite the thrashing and do the right thing.

a) killing the malfunctioning process (this is often hard)

b) killing the biggest process

c) kernel panic / locking up

d) killing more or less random processes until memory pressure is relieved

Thrashing in a distributed environment is probably going to end up with partially failed health checks and all the nastiness that comes with flapping. If you're lucky, failing health checks will reduce load and memory usage, and get you back to a happy place; if you're realistic, you probably got into the thrashing situation because of a burst of latency or traffic that resulted in slower processing for a bit and the resulting retries killed the system.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: