Hi HN, Mike and Warren here! We're excited to share some (early) work towards our next major version of HyperDX. HyperDX makes it easy to visualize/search logs & traces on top of Clickhouse (so incident & bug investigations hopefully go by a little easier). For example, if a team is thinking of migrating to Clickhouse for their observability data warehouse [1][2][3] usually due to cost or data privacy reasons, they can easily throw HyperDX on top to do the UI layer for analysis and dashboarding in a dev-friendly way (aka not needing to type paragraphs of SQL to find some logs)
Over the past year we've seen a ton of excitement in companies adopting Clickhouse-based observability stacks - but one of the biggest challenges we've seen is that the UI layer on top of Clickhouse is either clunky to use for observability use cases (ex. BI tools), or too tied to a specific ingestion architecture to scale to every use case (we used to be in this category!). For companies that needed more flexibility in how their data is ingested and stored (usually due to running at a large scale), there's really no good options for a developer experience (DX) focused observability layer on top of Clickhouse (Shopify spent 3 years building it in-house!)
Our current release works completely in the browser - and it does this by building on top of Clickhouse's HTTP interface, which our React app can directly talk to. This means you can actually try HyperDX in your browser on your own Clickhouse with no installation! This was fortunately easy for us to accomplish due to being full stack Typescript, making it incredibly easy to shift between server and client code. On top of this we've been spending time baking in performance optimizations to ensure that HyperDX can continue to leverage Clickhouse efficiently at larger data volumes. We do a few tricks like only fetching columns that are needed for the current search, and re-querying to expand the entire row if needed to fully leverage Clickhouse's columnar nature (40% faster, ymmv!) - or rewriting queries to use materialized columns to speed up Map column access when available (10x faster!).
On the DX side: we support querying using both Lucene (ex. `fullText property:value`) and SQL syntax. We've found the former to be our favorite for how concise it is. Similarly for charts, our chart builder has been upgraded to accept SQL expressions as well, so you can leverage the full power of SQL, while avoiding typing paragraphs of boilerplate SQL for time series data. We also make it easy to switch between UTC/local timestamps! Lastly, we've added high cardinality outlier analysis by charting the delta between outlier and inlier events (a la bubble up) - which we've found really helpful in narrowing down causes of regressions/anomalies in our traces.
We have a lot more planned for the full release - but wanted to get this out early to hear your feedback and opinions!
In Browser Live Demo: https://play.hyperdx.io/search
Github Repo: https://github.com/hyperdxio/hyperdx/tree/v2
Landing Page: https://hyperdx.io/v2
[1]: https://www.uber.com/blog/logging/
[2]: https://blog.cloudflare.com/log-analytics-using-clickhouse/
[3]: https://www.youtube.com/watch?v=LDj3_jMsCXg&list=PLvQF73bM4-...
The docs at https://www.hyperdx.io/docs/search don't seem to talk about this key design decision.
I have a couple 100 GB to few TB logs (all from `journald` or JSON lines), just want to store them forever, and find results fast when searching for arbitrary substrings.
Loki does not use an index, so it's pretty slow at finding results in TB-sized logs (does not return results within a few seconds, so it's not interactive).
https://quickwit.io is one thing I'm looking at integrating, that can solve much of the index-based log search.
(Note I'm not super familar with the capabilities of ClickHouse itself regarding indexed full-text search.)