
Ask HN: How to determine the failure of availability zone? - rahulskn86
Been seeing a lot of instances of downtime or degraded service with cloud providers these days.<p>The AWS Well-Architected [1] suggests to spread the service over multiple AZ&#x27;s to reduce downtime.<p>In case of AZ failure, is there any API to determine the status of AZ dynamically or do we rely on own health checks to determine corrective action? Is AWS Health API [2] any useful? Can it be made cloud-agnostic?<p>[1] https:&#x2F;&#x2F;aws.amazon.com&#x2F;architecture&#x2F;well-architected&#x2F;
[2] https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;health&#x2F;latest&#x2F;ug&#x2F;health-api.html
======
btown
There are lots of reasons that _your_ instances in an AZ or cloud could all go
down, other than the AZ's/cloud's operator self-reporting failure! For
instance, what if someone wanted to do a rolling update starting in one AZ,
and it starts to saturate a limited resource in that AZ halfway through, that
doesn't affect the overall AZ but does affect your instances therein? Your
internal health checks, rebalancing, and monitoring should be what you use to
determine your health and trigger rebalancing.

