This reminds me of a service I recently found that was routinely crashing out and being restarted automatically. I fixed the crash, but it turns out it had ALWAYS been crashing on a reliable schedule - and keeping the service alive longer created a plethora of other issues, memory leaks being just one of them.
That was a structural crash and I should not have addressed it.
At Fastmail the ops team we ran fail overs all the time just to get our failures so reliable they worked no matter what. Only once in my tenure did a fail over fail and in that case there was a --yolo flag
At reddit we would randomly select a process to kill every 10 minutes out of the 10 or so on each machine, just so they would all get a restart in case we didn't do a deployment for a few days.
At Amazon they schedule service bounces during code freeze for any service that is known to have memory leaks because it's easier than finding the leak, which isn't usually an issue since it gets deployed so often.
Oooh, you’ve just reminded me of the email server at my first dev job. It would crash every few days and no one could work out why. In the end someone just wrote a cron job type thing to restart it it once a day, problem solved!
That was a structural crash and I should not have addressed it.