Hacker News new | past | comments | ask | show | jobs | submit login

Would defining a (required) key set of observables not help to reduce the variability across teams? Or perhaps the issue you're describing is that teams aren't required to demonstrate any?

In my experience (as a chief architect and engineering VP), laying out some baseline metrics closes much of the reliability variance across teams.

As the teams get more competent at baking in the basics (overall load, latency, resource utilization, error rates, error events posted to chat) you can ratchet up the competency to include higher order observables (scaling events, business transactions, circuit statuses, traces, anomalies).

Which is to say its been straightforward (in my experience) for most teams to raise the bar once they know there _is_ a bar and once they can see the bar.




Teams aren't required to demonstrate any. Engineers who understand the importance of quality bake in time, and management is supportive of self-imposed controls. Our NPS numbers are trash, though. But it's a different kind of quality issue: we aren't fighting fires every week. The problem is our API response codes are often wrong or unpredictable. The original engineers clearly did not understand the HTTP protocol.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: