Hacker News new | past | comments | ask | show | jobs | submit login

Telemetry seems like it would be a great candidate for columnar storage formats like Parquet or arrow. In particular I expect that it would compress very well, which could reduce telemetry bandwidth consumption / allow for a bigger sample rate.

Does anybody have any experience with the intersection of these technologies?




Honeycomb built their own columnar database[0] to support their product.

[0] - https://www.honeycomb.io/resources/why-we-built-our-own-dist...


Grafana Tempo also switched from Protobuf storage format to Apache Parquet last year. It's fully open source, and the proposal (from April 2022) is here: https://github.com/grafana/tempo/blob/main/docs/design-propo...

The relevant code for parquet storage backend can be found here: https://github.com/grafana/tempo/tree/main/tempodb/encoding

disclosure: I work for Grafana!


Cool thanks for sharing. Can you say something about how it's worked out? Has it reduced bandwidth or CPU usage?


The Parquet backend helped unlock traces search for large clusters (>400MB/s data ingestion) and over longer periods of time (>24h). It also helped unlock TraceQL (a query language for traces similar to PromQL/LogQL). There's more details in this blog post: https://grafana.com/blog/2023/02/01/new-in-grafana-tempo-2.0...

I don't have the exact CPU/bandwidth numbers on me right now but CPU usage has went up by about ~50% on our "Ingester" and "Compactor" components (you can read up about the architecture here - https://grafana.com/docs/tempo/latest/operations/architectur...). But this is optimising for read performance which improved significantly.


Someone from F5 worked on this with OpenTelemetry [0] for Arrow, another effort was done for Parquet but was dropped [1]

[0]: https://github.com/open-telemetry/oteps/pull/171

[1]: https://github.com/open-telemetry/opentelemetry-proto/pull/3...


Oh nice, thank you (and also solumos) for the links! It looks like oteps/pull/171 (merged June 2023) expanded and superseded the opentelemetry-proto/pull/346 PR (closed Jul 2022) [0]. The former resulted in merging OpenTelemetry Enhancement Proposal 156 [1], with some interesting results especially for 'Phase 2' where they implemented columnar storage end-to-end (see the Validation section [2]):

* For univariate time series, OTel Arrow is 2 to 2.5 better in terms of bandwidth reduction ... and the end-to-end speed is 3.1 to 11.2 times faster

* For multivariate time series, OTel Arrow is 3 to 7 times better in terms of bandwidth reduction ... Phase 2 has [not yet] been .. estimated but similar results are expected.

* For logs, OTel Arrow is 1.6 to 2 times better in terms of bandwidth reduction ... and the end-to-end speed is 2.3 to 4.86 times faster

* For traces, OTel Arrow is 1.7 to 2.8 times better in terms of bandwidth reduction ... and the end-to-end speed is 3.37 to 6.16 times faster

Pretty exciting results! The OTEL-Arrow adapter has subsequently been donated to the otel community; here's a comment that does a good job of summarizing the results and the recommendations that came out of the test [3].

[0]: https://github.com/open-telemetry/opentelemetry-proto/pull/3...

[1]: https://github.com/open-telemetry/oteps/blob/main/text/0156-...

[2]: https://github.com/open-telemetry/oteps/blob/main/text/0156-...

[3]: https://github.com/open-telemetry/community/issues/1332#issu...


We're big users of clickhouse at https://highlight.io. Some more details here if you're interested: https://www.highlight.io/blog/how-we-built-logging-with-clic...


A lot of companies roll their own (like Honeycomb, and Kentik (network telemetry)). Clickhouse is a very good open option.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: