> "We have to upgrade right now because everything is on fire!", I don't know ex...

0xbadcafebee · on Oct 28, 2019

With DevOps you monitor [at least] four metrics: lead time for changes, deployment frequency, mean time to recover/restore service, and change failure rate. These indicators show if you're improving, stagnating, or straining. You can also track more specific service level indicators, the amount of toil to project work, and tech debt versus feature backlogs.

The "everything's on fire" metaphor also has a larger context. Sure, when people are getting woken up in the middle of the night because the site is down, shit's on fire. But also "we're constantly missing our deadlines" is shit's on fire, "our customers are not satisfied" is shit's on fire, "our overhead is way too high for what we're delivering" is shit's on fire. If you're out on the water and building a new boat because yours is on fire, it's a little late.

war1025 · on Oct 28, 2019

I think we're both saying basically the same thing. So I agree.

noobiemcfoob · on Oct 28, 2019

My experience echoes yours. We had 2 senior devs and 1 junior. We had one major event in 3 years -- a long weekend where things were on fire with code being updated on the half hour. But afterwards, we had so many indicators of when things were even approaching that level the job got almost boringly simple afterwards.