It's a fact of life that when dealing with complex, tightly coupled systems with multiple interactions between subsystems that you will routinely see accidents caused by improbable combinations of failures.
AWS could create some machines at a lower cost and lower availability, just something that if goes down doesn't affect you much, or one-off usages.
I'm not sure how migrating machines between nodes happens in S3 or if it's easy to do it (maybe with some downtime)
It's the same as advice to routinely replace your live data from backups. It's not a real backup until you've tested that you can recover from it.
eg. If you have a test that is 99% accurate and a treatment that harms 1% of the patients and you do this screening of a million people - how common does the disease have to be before you cure more people than you kill ?