We use VictoriaMetrics at work since I think 2019 or something. The tl;dr is that it works great.
When we migrated to it we replaced 6 prometheus servers with only 1 victoria metrics server, it was crazy efficient.
We ran with a single node for all our metrics for 3 years while the number of data points observed kept growing. Today we run a 7 node cluster.
I have no idea if today Prometheus could serve our workload as efficiently but for information, a provider wanted to quote us like 300k€ per year to serve the same metrics; our setup doesn't cost anywhere near that.
Take a look at VictoriaLogs then [1]. It is ready to accept and query big amounts of logs without the need to configure anything for achieving high efficiency (e.g. low RAM usage and low disk space usage). See how does it compare to ClickHouse for logs [2].
The article specifically talks about metrics part of Otel. Having read a bit of the script, I agree, it is very bloated. Statsd is a very mature and lean protocol, even with the Datadog extensions to add tagging. It is a defacto open standard that predates Otel. It may well be that people who want to stick to a lean open standard will stick with using statsd or its extended versions for collecting metrics.
The Tracing component is fine. It looks a little bloated, but it also supports distributed tracing, and has also defines schemas and semantics for trace attributes. I can look that up instead of figuring that stuff out on my own.
As far as rumors about Datadog bloating it up to make it fail … their official Nodejs sdk library pulls in Otel tracing under the hood, and the library interface was reimplemented to incorporate that. The agent you install will accept otel traces. It seems to me that this is a path towards allowing people to transition to using otel.
Datadog’s strategy, at least to an outside observer to me, builds a moat that relies on synergies and products that are more useful when you can ingest from more sources. The kind of features they have been adding in the past two years (since the 1.0 release of Otel) includes appsec scanning, infrastructure anomaly scanning, security anomaly scanning, sql database analysis (what pganalyze does for just Postgres), streaming data pipeline (such as kafks) tracing and analysis, CI/CD integration, cloud cost monitoring and analysis. Those all cost money.
The lockin happens when you sign an annual contract because the on demand costs are a noticable percentage of your cloud costs. Looking at their whole platform and their business model, Otel looks to me the way they can reduce switching costs and lock you in with their other products after you switch.
I don't think Datadog did much in terms of specification work, other maintainers were quite active in bringing in the feature creep. AFAICT, it was an assumption that Google's practices apply to the rest of the world.
Anyways complicated protocols benefit the vendors only, since it makes it harder for other vendors (in this case VictoriaMetrics) to come in. Datadog is still popular because their instrumentation libraries, not the protocol which now I suspect is OTel anyways, is much better than most in OTel.
I have a lot of respect for Aliaksandr Valialkin[1], we use his FastHTTP and Quicktemplate libraries every day on dozens of projects and they purr.
[1]: https://github.com/valyala