Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: GraphJSON – Easily log and analyze events using ClickHouse (graphjson.com)
128 points by flurly on Aug 17, 2022 | hide | past | favorite | 35 comments
Hi HN,

My name is JR and I had a need for a simple analytics solution that allowed me to store (timestamp, json) logs and run SQL over them.

It was hard to find the right solution. Solutions like Mixpanel and Amplitude optimized for particular report types. Whereas solutions like Snowflake, BigQuery, etc. required a lot of setup.

I built GraphJSON to fit in the middle. I strived for the ease of use of tools like Mixpanel and Amplitude, but wanted to ensure affordances were built to support use cases that big data warehouses enable.

Under the hood, GraphJSON is powered by ClickHouse. This enables really efficient disk compression and fast queries. In many ways, you can think of GraphJSON as an easy way to explore ClickHouse without having to run and maintain your own clusters.

I'd love for you to give it a try. You can generally start logging your data in under a minute. From there, you can either use the UI tooling to create graphs in a no-code way. Or if you're more advanced, you can use the SQL editor to do any query you can think of!




I used GraphJSON to build myself a slightly ridiculous dashboard (callum.run) and it's been great to build on. And JR has been super helpful with getting all my wacky ideas working :) Everything on that site is a GraphJSON visualisation, some displaying the logged data directly and some displaying results of SQL queries over it. Being able to combine them is really nice, the SQL gives a lot of power. It was a cool excuse to learn some ClickHouse functions too, which are awesome!

I also have a GraphJSON importer in pyground (pyground.vercel.app), which lets you load a GraphJSON collection into Python in-browser and use eg. pandas/matplotlib to poke at it too.


Thank you Callum!


GraphJSON has some nice APIs for embedding charts inside your own apps – Much like how Keen.io operated back in the day (but without the scammy business practices).

So, if you're a developer and you want a Mixpanel-style API to send events, and then a secure way to query them for individual customer accounts, and embed dashboards in your SPA, you can do all this, with some nice data visualization / chart building tools inside GraphJSON.


Thank you for the kind words James!!


We use GraphJson for some internal analytical data at Cronitor and couldn’t be happier. JR is responsive and the service does exactly as advertised.


Thank you Shane!!


We're a small startup that switched from Mixpanel to GraphJSON for our internal analytics dashboards.

Amplitude and Mixpanel were the go-to's I had heard of when starting my company. Amplitude makes you schedule a sales call, so Mixpanel it was. One of our engineers filed a bug affecting our dashboard in a significant way, and after many months, no movement. That + some other persistent bugs, I was having a bad time.

I was able to migrate everything in one call with JR. He's responded to all my emails in O(hours). I'm paying way less than I was, and have 100% parity with what I was using Mixpanel for. I'm sure there exists features on Mixpanel that don't on GraphJSON; we just haven't hit it yet. Highly recommend.


Thank you Alexander!!


Is there more behind the scenes than just the `timestamp | json` table? From what I understand, any query in clickhouse against that involving a filter would require a full table scan


Yes behind the scenes we have a few additional columns like uid, collection and insert_timestamp to optimize queries and support migrations. I just use timestamp/json columns as examples to illustrate the core idea behind GraphJSON.


I'd extend the API to allow for querying via JSON similar to keen rather than merely focusing on visualizations.

Honestly as a dev all I want is a simple/fast/cheap keen alternative that I can dump events into and do ad-hoc analytical queries with programmatically.


It's not documented, but you can actually already do this. Feel free to shoot me an email at hi@graphjson.com and I'll get you set up.


Small note: there's a typo in the FAQ under "What am I paying for?": "entrepeneur".

I’m not sure I understand the pricing: the FAQ talks about a "free tier" but the Pricing page shows only a $12/mo tier.


Pushing typo fix now. Thanks!

The pricing page could definitely use some work. To clarify - there is a free forever tier of 5k events. If you go over, then the only available tier is the $12/mo per million events tier.


Thanks!


Awesome ! Congrats on the Show HN launch.

curious to know if you are using the newly introduced JSON Type?

At work we run a small CH cluster ( 10 TB+ Data ) and the functions for Working with JSON consume too much CPU resources.


Thank you!

Regarding the JSON column type - we're waiting for it to become more battle tested before introducing it in prod. So far we've been scaling by adding more compute since CH scales pretty well horizontally. One nice thing about multi-tenancy is the queries per user are generally spiky and rare. So all GraphJSON users get to share one beefy cluster instead of every user buying their own.


I don't have experience with clickhouse but if you are not using the JSON column type, isn't the querying quite inefficient?


Looks great - recently there have been many logging tools built around Clickhouse and JSON, what differentiates you from the existing services?


Thank you! I'm actually not familiar with these recently built services. Can you link a few?



That was an incredibly well-presented < 1 min intro video. Makes me want to try it straight away (and I shall)! Well done!


Thank you so much for the kind words!


GraphJSON looks nice. Do you mind sharing the clickhouse infrastructure that's powering it?


I'm not sure what you mean. ClickHouse is an open source project. You can check it out here https://github.com/ClickHouse/ClickHouse


I believe they are referring to how you architected your ClickHouse deployment(s).

Are you cloud based? Using container orchestration?

Is it one cluster per customer? Multi-tenant? Can I use all standard available ClickHouse integrations?

How do you manage scaling?

Its not the easiest system to manage at scale, so getting some insight here would be a good way to help your customers see some competency here!


Ah I see. Let me answer your questions one by one.

> Are you cloud based? Using container orchestration?

Yes it's all in the cloud. The infrastructure is spun up using terraform and automated using ansible.

> Is it one cluster per customer? Multi-tenant?

Multi tenant with rate limits to ensure one customer doesn't take down the entire cluster.

> Can I use all standard available ClickHouse integrations?

Most of them are enabled by default - for instance window functions are available. If there is one that isn't available, feel free to email me and I'll most likely enable it.

> How do you manage scaling?

Unfortunately ClickHouse doesn't have a notion of consistent hashing, so currently we scale by simply adding more nodes and reindexing the whole table. That being said, Clickhouse is incredibly space efficient, so we haven't had to do this very often.


Thanks. That's exactly what I meant.


Infra meaning hardware ?


Why is it called "Graph"? Sounds like a format for storing graph data.


The original thinking was it is a product that turns json events into graphs, thus GraphJSON!


That's an excellent explainer video. Good luck!


Thank you!!


Looks cool. Do you plan to support Java?


Yes eventually! It's worth noting that we have a REST API. If you look at the curl code snippets, it should be pretty easy to reverse engineer it into a java code snippet.


[deleted]




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: