Telemetry seems like it would be a great candidate for columnar storage formats like Parquet or arrow. In particular I expect that it would compress very well, which could reduce telemetry bandwidth consumption / allow for a bigger sample rate.
Does anybody have any experience with the intersection of these technologies?
The Parquet backend helped unlock traces search for large clusters (>400MB/s data ingestion) and over longer periods of time (>24h). It also helped unlock TraceQL (a query language for traces similar to PromQL/LogQL). There's more details in this blog post: https://grafana.com/blog/2023/02/01/new-in-grafana-tempo-2.0...
I don't have the exact CPU/bandwidth numbers on me right now but CPU usage has went up by about ~50% on our "Ingester" and "Compactor" components (you can read up about the architecture here - https://grafana.com/docs/tempo/latest/operations/architectur...). But this is optimising for read performance which improved significantly.
Oh nice, thank you (and also solumos) for the links! It looks like oteps/pull/171 (merged June 2023) expanded and superseded the opentelemetry-proto/pull/346 PR (closed Jul 2022) [0]. The former resulted in merging OpenTelemetry Enhancement Proposal 156 [1], with some interesting results especially for 'Phase 2' where they implemented columnar storage end-to-end (see the Validation section [2]):
* For univariate time series, OTel Arrow is 2 to 2.5 better in terms of bandwidth reduction ... and the end-to-end speed is 3.1 to 11.2 times faster
* For multivariate time series, OTel Arrow is 3 to 7 times better in terms of bandwidth reduction ... Phase 2 has [not yet] been .. estimated but similar results are expected.
* For logs, OTel Arrow is 1.6 to 2 times better in terms of bandwidth reduction ... and the end-to-end speed is 2.3 to 4.86 times faster
* For traces, OTel Arrow is 1.7 to 2.8 times better in terms of bandwidth reduction ... and the end-to-end speed is 3.37 to 6.16 times faster
Pretty exciting results! The OTEL-Arrow adapter has subsequently been donated to the otel community; here's a comment that does a good job of summarizing the results and the recommendations that came out of the test [3].
Does anybody have any experience with the intersection of these technologies?