We throttle automatic terminations so that it doesn't drop an entire cluster at once. Yet to cause an outage, fingers crossed!
We generally run into two classes of errors: 1) Software bugs which follow the process outlined above. 2) Issues with AWS... an example being some virtual servers running on hardware experiencing a network issue. If they're terminated and replaced by the ASG generally the new ones spin up on good hardware and we've avoided the issue. Rare but it does happen at our scale.