
The Calculus of Service Availability - yarapavan
https://queue.acm.org/detail.cfm?id=3096459
======
yarapavan
Implication 1: Rule of the extra 9

A service cannot be more available than the intersection of all its critical
dependencies. If your service aims to offer 99.99 percent availability, then
all of your critical dependencies must be significantly more than 99.99
percent available.

Internally at Google, we use the following rule of thumb: critical
dependencies must offer one additional 9 relative to your service—in the
example case, 99.999 percent availability—because any service will have
several critical dependencies, as well as its own idiosyncratic problems. This
is called the "rule of the extra 9."

If you have a critical dependency that does not offer enough 9s (a relatively
common challenge!), you must employ mitigation to increase the effective
availability of your dependency (e.g., via a capacity cache, failing open,
graceful degradation in the face of errors, and so on.)

