
Ask HN: How to find 'error budget' as a DevOps Engineer? - FahadUddin92
I am trying to find an error budget for a site that has several outside APIs integrated in it for its core features. How can I find the error budget for it?
======
Juliate
Error budget = the actual downtime duration your site can still afford within
a given time frame.

If you have the SLA of your outside APIs, you may compute your own maximum
possible SLO and deduce from that your full error budget. But your error
budget will diminish over time, as you use it.

Say your site depends on 3 external APIs having each a 99% SLA, your best
possible site SLO would be 99% x 99% x 99% = 97% (= your site is, at best, as
much reliable as the product of the reliability of your dependencies).

That is, unless your site has some built-in tactics for the specific downtime
scenarios of these APIs (caching, retry, slow down, graceful limitation of
features, etc.).

Should you pick a lower SLA than your SLO for your site then? Always. Things
happen.

Let's take 95% SLA for simplicity.

Your max error budget would be, for 30 days, as a formula:

    
    
      + total time frame (say, 30 days = 720 hours)
      - target availability (at 97% avail. that would be 684 hours)
      - total downtime you've had already within this time frame
      = 36 hours or less
    

That's a start. Then you may track your actual production own indexes and
adjust accordingly.

Reminder: [https://enqueuezero.com/the-difference-between-sli-slo-
and-s...](https://enqueuezero.com/the-difference-between-sli-slo-and-sla.html)

