> The data collected from these streams is sent to several vendors including Datadog (for application logs and metrics), Honeycomb (for traces), and Google Cloud Logging (for infrastructure logs).
It sounds like they were in a place that a lot of companies are in where they don't have a single pane of glass for observability. One of if not the main benefit I've gotten out of Datadog is having everything in Datadog so that it's all connected and I can easily jump from a trace to logs for instance.
One of the terrible mistakes I see companies make with this tooling is fragmenting like this. Everyone has their own personal preference for tool and ultimately the collective experience is significantly worse than the sum of its parts.
I feel we hold up single-observability-solution as the Holy Grail, and I can see the argument for it- one place to understand the health of your services.
But I've also been in terrible vendor lock-in situations, being bent over the barrel because switching to a better solution is so damn expensive.
At least now with OTel you have an open standard that allows you to switch easier, but even then I'd rather have 2 solutions that meet my exact observability requirements than a single solution that does everything OKish.
Biased as a founder in the space [1] but I think with OpenTelemetry + OSS extensible observability tooling, the holy grail of one tool is more realizable than ever.
Vendor lock in with Otel now is hopefully a thing of the past - but now that more obs solutions are going open source, hopefully it's not necessarily true that one tool would be mediocre over all use cases (since DD and the likes are inherently limited by their own engineering teams, vs OSS products can have community/customer contributions to improve the surface area over time on top of the core maintainer's work).
I think that OpenTelemetry will solve this problem of vendor lock in. I am a founder building in this space[1] and we see many of our users switching to opentelemetry as that provides an easy way to switch if needed in future.
At SigNoz, we have metrics, traces and logs in a single application which helps you correlate across signals much more easily - and being natively based on opentelemetry makes this correlation much easier as it leverages the standard data format.
Though this might take sometime, as many teams have proprietary SDK in their code, which is not easy to rip out. Opentelemetry auto-instrumentation[2] makes it much easier, and I think that's the path people will follow to get started
Switch the backend destination of metrics/traces/logs, but all your dashboards, alerts, and potentially legacy data still need to be migrated. Drastically better than before where instrumentation and agents were custom for each backend, but there's still hurdles.
Depending on your usage it can be prohibitively expensive to use datadog for everything like that. We have it for just our prod env because it's just not worth what it brings to the table to put all of our logs into it.
I've spent a small amount of time in datadog, lots in grafana, and somewhere in between in honeycomb. Out applications are designed to emit traces, and comparing honeycomb with tracing to a traditional app with metrics and logs, I would choose tracing every time.
It annoys me that logs are overlooked in honeycomb, (and metrics are... fine). But, given the choice between a single pane of glass in grafana or having to do logs (and metrics sometimes) in cloudwatch but spending 95% of my time in honeycomb - I'd pick honeycomb every time
Agreed - honeycomb has been a boon, however some improvements to metric displays and the ability to set the default "board" used in the home page would be very welcome. Also would be pretty happy if there was a way to drop events on the honeycomb side for a way to dynamically filter - e.g. "don't even bother storing this trace if it has a http.status_code < 400". This is surprisingly painful to implement on the application side (at least in rust).
Hopefully someone that works there is reading this.
I think Honeycomb is perfect for one kind of user, who's entirely concerned with traces and very long retention. For a more general OpenTelemetry-native solution, check out Signoz.
It seems to miss some aggregation stuff, but also it's improving every time I check. I wonder if anyone's used it in anger yet and how far is it from replacing datadog or honeycomb.
tempo still feels very much: look at a trace that you found from elsewhere (like logs).
with so much information in traces and the pure volume, the aggregation really is the key to actionable info out of a tracing setup if it's going to be the primary entry point.
I've not. Honestly, I'm not in the market for tool shopping at the moment, I need another honeycomb-style moment of "this is incredible" to start looking again. I think it would take "Honeycomb, but we handle metric rollups and do logs" right now.
You can also check out SigNoz - https://github.com/SigNoz/signoz. It has logs, metrics, and traces under a single pane. If you're using otel libraries and otel collector you can do a lot of correlation between your logs and traces. I am a maintainer, and we have seen a lot of our users using signoz to have the ease of having three signals in a single pane.
> It sounds like they were in a place that a lot of companies are in where they don't have a single pane of glass for observability.
One of the biggest features of AWS which is very easy to take for granted and go unnoticed is Amazon CloudWatch. It supports metrics, logging, alarms, metrics from alarms, alarms from alarms, querying historical logs, trigger actions, etc etc etc. and it covers each and every single service provided by AWS including metaservices like AWS Config and Cloudtrail.
And you barely notice it. It's just there, and you can see everything.
> One of the terrible mistakes I see companies make with this tooling is fragmenting like this.
So much this. It's not fun at all to have to go through logs and metrics on any application,and much less so if for some reason their maintainers scattered their metrics emission to the four winds. However, with AWS all roads lead to Cloudwatch, and everything is so much better.
> ...with AWS all roads lead to Cloudwatch, and everything is so much better.
Most of my clients are not in the product-market fit for AWS CloudWatch, because most of their developers don't have the development, testing and operational maturity/discipline to use CloudWatch cost-effectively (this is at root an organization problem, but let's not go off onto that giant tangent). So the only realistic tracing strategy we converged upon to recommend for them is "grab everything, and retain it up to the point in time we won't be blamed for not knowing root cause" (which in some specific cases can be up to years!), while we undertake the long journey with them to upskill their teams.
This would make using CloudWatch everywhere rapidly climb up into the top three largest line item in the AWS bill, easily justifying spinning that tracing functionality in-house. So we wind up opting into self-managed tooling like Elastic Observability or Honeycomb where the pricing is friendlier to teams in unfortunate situations that need to start with everything for CYA, much as I would like to stay within CloudWatch.
Has anyone found a better solution to these use cases where the development maturity level is more prosaic, or is this really the best local maxima at the industry's current SOTA?
It sounds like they were in a place that a lot of companies are in where they don't have a single pane of glass for observability. One of if not the main benefit I've gotten out of Datadog is having everything in Datadog so that it's all connected and I can easily jump from a trace to logs for instance.
One of the terrible mistakes I see companies make with this tooling is fragmenting like this. Everyone has their own personal preference for tool and ultimately the collective experience is significantly worse than the sum of its parts.