This reminds me of a service I recently found that was routinely crashing out an...

rolisz · on Dec 22, 2023

How many memory leaks were discovered only during the winter code freeze, because there were no pushes being done, so no server restarts

calvinmorrison · on Dec 22, 2023

At Fastmail the ops team we ran fail overs all the time just to get our failures so reliable they worked no matter what. Only once in my tenure did a fail over fail and in that case there was a --yolo flag

jedberg · on Dec 23, 2023

At reddit we would randomly select a process to kill every 10 minutes out of the 10 or so on each machine, just so they would all get a restart in case we didn't do a deployment for a few days.

At Amazon they schedule service bounces during code freeze for any service that is known to have memory leaks because it's easier than finding the leak, which isn't usually an issue since it gets deployed so often.

yjftsjthsd-h · on Dec 23, 2023

And as a nice bonus you get chaos monkey for free:)

simonbarker87 · on Dec 22, 2023

Oooh, you’ve just reminded me of the email server at my first dev job. It would crash every few days and no one could work out why. In the end someone just wrote a cron job type thing to restart it it once a day, problem solved!