idea0rbit's comments

idea0rbit · 2025-10-12T23:29:39 1760311779

You are absolutely right. What could an alternate design and business model look like?

idea0rbit · 2025-10-12T03:17:12 1760239032

both. In general I think most observability systems are broken. This article captures the sentiment pretty well

https://www.linkedin.com/pulse/observability-broken-its-time...

tanelpoder · 2025-10-12T03:44:18 1760240658

Good article, thanks for sharing. I've been working on one part of this problem space for quite a while too. I want ability to directly drill down into latency reasons and underlying application component threads' wall-clock time, instead of having to correlate various systemwide utilization metrics and try to manually connect the dots.

I'm using eBPF-based dimensional data analysis, starting from bottom up (every system is a bunch of threads, including distributed systems) and move up from there. This doesn't replace existing distributed tracing approaches for end to end request view, but gives you deep observability all the way down to each service's underlying threads' wall-clock time (where blocked, sleeping and why, etc).

At this year's P99CONF I will launch the first GA release of my (open source) 0x.tools xcapture eBPF collectors, with a reference implementation of a TUI tool, showing dimensional performance modeling on these new thread sampling signals (xtop).

A couple of 1-minute asciicasts of xtop are here: https://tanelpoder.com/posts/xcapture-xtop-beta/

idea0rbit · 2025-10-10T20:47:14 1760129234

After a decade building large-scale systems at Google, Datadog, and Meta, I’ve noticed the same pattern repeat: observability keeps getting louder, costlier, and less useful.

We’re drowning in telemetry but starving for insight. The industry incentives are misaligned: they reward ingestion and storage, not intelligence.

I recently started an open, collective movement called omji.ai to explore a fundamental shift: measuring insight per dollar of telemetry stored. We need to push vendors and internal teams toward intelligence, not ingestion.

I’m curious to hear from folks facing the pain - how do we fix this? We need practical, non-obvious ideas.

1. What technical or economic levers would actually shift the industry's focus from volume to intelligence?

2. Has anyone in a large organization tried benchmarking observability systems based on insight (e.g., MTTR impact) vs. telemetry cost?

3. How could open collaboration (tools, standards, benchmarks) make this practical for every engineering team?

Full background post: https://www.linkedin.com/pulse/observability-broken-lets-cha...