I agree, it sounds like this could have been discovered had the two backup power systems been properly tested.

Note how in the case of backup power, "properly tested" doesn't mean 'Does the generator turn on? Are we getting electricity from it? Ok, pass!'. It means running the backup generator in a way that is consistent with what you would expect in an actual power failure - i.e., for more than just a few minutes.

Same thing with storage backup. Checking your backups isn't just 'was a backup file/image created?', it means _actually trying to recover your systems from those backup files_.

But what about parts that are going to fail in the next 35 minutes of running, and then you run it for 30. There are too many variables to account for here I think.

