Hacker News new | past | comments | ask | show | jobs | submit login

This reminds me of a service I recently found that was routinely crashing out and being restarted automatically. I fixed the crash, but it turns out it had ALWAYS been crashing on a reliable schedule - and keeping the service alive longer created a plethora of other issues, memory leaks being just one of them.

That was a structural crash and I should not have addressed it.




How many memory leaks were discovered only during the winter code freeze, because there were no pushes being done, so no server restarts


At Fastmail the ops team we ran fail overs all the time just to get our failures so reliable they worked no matter what. Only once in my tenure did a fail over fail and in that case there was a --yolo flag


At reddit we would randomly select a process to kill every 10 minutes out of the 10 or so on each machine, just so they would all get a restart in case we didn't do a deployment for a few days.

At Amazon they schedule service bounces during code freeze for any service that is known to have memory leaks because it's easier than finding the leak, which isn't usually an issue since it gets deployed so often.


And as a nice bonus you get chaos monkey for free:)


Oooh, you’ve just reminded me of the email server at my first dev job. It would crash every few days and no one could work out why. In the end someone just wrote a cron job type thing to restart it it once a day, problem solved!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: