Hacker News new | past | comments | ask | show | jobs | submit login

My company does a lot of things the dumbest way possible but they’re always been good at tracing service requests and telemetry. Any time correlation IDs were missing was considered a regression. And we are quick to agree to action items that increase visibility from a postmortem. The one complaint I have is we don’t move things out of Splunk into graphs as fast as I’m comfortable with.

But the thing I see we are all missing for telemetry is a sense of coverage. We changed backends last year and as we were wrapping that up I found big chunks of code that were missing coverage. Meanwhile we found a bunch of stats that nobody looks at, meaning the SNR goes down over time. There are only so many dashboards I can watch.

So some sense of code blocks missing telemetry, and data that is collected but has no queries, queries that exist but never run (that dashboard nobody remembers making and isn’t in the runbooks anywhere) would be a great goal for a next level of maturity.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: