- Built-in replication without the need to run Zookeeper, with multi-tier sharding support. They rewrite the ZK protocol using multi-paxos within CH itself. It's great.
- Built-in batch inserts via the HTTP protocol... sorry, TCP :(. Previously you'd have to batch using buffer tables, proxies, or in-memory buffering within your client apps. This is no longer needed!
- Better support for external data formats (avro, parquet)
It's just... so good.
AFAIK this description is kind of misleading. When they say that they got rid of Zookeeper people expect that they can just connect clickhouse nodes to each other and the replication will work. But that is not how things work - you still have to run external service called clickhouse-keeper. Basically what they did is they rewrote Zookeeper in C++.
Very welcome. We used to do that with a dedicated app.
> It's just... so good.
What Postgres is for RDBMs and SqlLite for embedded, ClickHouse is for time series. Tastefully designed and driven by engineering excellence. I wish them all the best.
It appears to be implemented with Raft, not Paxos (per https://presentations.clickhouse.com/meetup54/keeper.pdf, slide 21).
* Zookeeper's wire protocol is emulated in CH-Keeper. Nice! So all clients are compatible, etc.
* Zookeeper uses a distributed consensus algorithm called ZAB. Which is not Paxos --but many believes so. CH-Keeper uses Raft, and it can do so as the consensus algorithm is not exposed directly: it is an internal property hidden behind the API and obviously the wire protocol.
As much as it sucks to have been an OSS-ish product Amazon has taken a chunk out of, the game is now known and can be proactively neutered with good planning.
When we did a PoC, the operational aspect of clickhouse and performance was severely lacking as compared to druid. Clickhouse had bigger resources at its disposal than druid during this PoC.
If they could improve the operational aspect and introduce sensible defaults so that the users don't have to go through 10000 configuration to work with data in clickhouse, I am sure I will give it a go for some other usecase. It is simple on surface but devil is in the details. Druid is much simpler and sane at the scale I need to operate.
Clickhouse cluster quite simply doesn’t support elastic rebalancing. Avoid CH if that is a hard requirement for your setup.
Clickhouse is significantly easier to operate than Druid in my experience.
In this short article nowhere said that Druid is faster than ClickHouse in 8 times. They claimed: "Druid is simply 2 times faster than ClickHouse" (actually by total runtime it's only 1.5 times faster)
There are also newer ClickHouse benchmark which total runtime = 0.779s
This is almost the same number as in Imply statement.
It really is a super power.
I then use this to kick off materialized views to automagically pluck out relevant JSON fields into views
Similar to this
We do the exact same thing at GraphJSON https://www.graphjson.com/guides/about
- Snowflake is SaaS, Clickhouse isn’t yet
- Clickhouse is open source, Snowflake is proprietary
- Snowflake has the virtual warehouse concept and ability to scale compute up and down with a single SQL statement. Clickhouse is a bit more traditional in architecture.
- Snowflake is hella expensive
- Snowflake is a bit more of a traditional data warehouse, whereas Clickhouse is philosophically about powering through big datasets such as denormalised click stream or logs
Both great products for their respective use cases
Disclaimer: I work on Altinity.Cloud.
Disclaimer: I work at Altinity.
Just forget it - primitive, performance wasn't great even compared to classic RDBMS, and so on => cannot be used in a real-world scenario, but interesting as an experiment.
> By default, ClickHouse Keeper provides the same guarantees as ZooKeeper (linearizable writes, non-linearizable reads). It has a compatible client-server protocol, so any standard ZooKeeper client can be used to interact with ClickHouse Keeper. Snapshots and logs have an incompatible format with ZooKeeper, but clickhouse-keeper-converter tool allows to convert ZooKeeper data to ClickHouse Keeper snapshot. Interserver protocol in ClickHouse Keeper is also incompatible with ZooKeeper so mixed ZooKeeper / ClickHouse Keeper cluster is impossible.
So I guess yes?