
ClickHouse, a column-oriented DBMS to generate analytical reports in real time - jinqueeny
https://github.com/yandex/ClickHouse
======
lykr0n
Clickhouse has some weird quirks when you think of it as a SQL Database, but
its astounding to use it. It's faster than one would think, it can do some
really cool data modeling, and provides a wealth of features for the average
user out of the box.

The most important thing, and the thing that makes it attractive to me is that
it is almost stupidly simple to setup and get running. It's quite simple (when
you wrap your head around it) to do sharding or replication and scale up. The
zookeeper stuff takes a bit more effort, but most of that is due to zookeeper
and not ClickHouse.

~~~
gary__
A look through the below does highlight some of its differences with a
standard sql database.

[https://www.slideshare.net/Altinity/migration-to-
clickhouse-...](https://www.slideshare.net/Altinity/migration-to-clickhouse-
practical-guide-by-alexander-zaitsev)

Year on now, perhaps things have changed.

------
drej
I remember playing with it a while ago and here are my 2c:

\- It's ridiculously fast, like, I didn't know where the performance was
coming from.

\- Getting it up and running was a bit clunky (Docker saved me), but I hear
it's better now.

\- It has non-standard (I mean, everyone is non-standard, but this is way off
ANSI) and case-sensitive (???) SQL syntax. This annoyed me again and again.

\- It seemed (and still seems) like a project that lives and dies with one
developer - and no matter how brilliant he may be, I'm not willing to invest
in a technology that has this risk and it's so hard to migrate off of (because
of the non-standard SQL).

I'm sad about the last two points, because the database is rather brilliant
otherwise.

~~~
dschuler
By one developer you mean Yandex or that most commits are made by a couple of
users? Being backed by a large company (the Russian Google apparently) that
has an independent revenue stream seems like a large plus, but maybe not
enough to cancel out.

I'm wary of investing effort into a potentially unsupported project as well,
but I wonder if ClickHouse only seems "out there" because we're not aware of
the Russian tech ecosystem (at least I'm not).

People don't seem concerned about building anything with Firebase, but Google
doesn't have a good track record of changing its mind about priorities or
service pricing.

What would you recommend instead for a column-oriented db that you can self-
host (commercial or open source)?

~~~
drej
It's in the Yandex namespace and it is used by said company, which is a huge
plus. But if you looked at the development history just a while ago, it was
highly dependent on Alexey.

It reminds me of Grumpy
([https://github.com/google/grumpy](https://github.com/google/grumpy)), which
was released by Google, but was later basically abandoned when the lead left
Google.

That being said, the situation is better than last time I checked this, there
is a handful of somewhat active developers.
[https://github.com/yandex/ClickHouse/graphs/contributors](https://github.com/yandex/ClickHouse/graphs/contributors)

------
dang
Discussion from a couple years ago:
[https://news.ycombinator.com/item?id=11908254](https://news.ycombinator.com/item?id=11908254)

------
georgewfraser
The basic techniques for implementing a fast column-store data warehouse have
been well-known for 10 years. There are several excellent commercial and open-
source implementations of these techniques:

    
    
        - BigQuery
        - Snowflake
        - Redshift
        - Presto
    

ClickHouse is not one of them. It doesn't have:

    
    
        - Transactions
        - Distributed joins
        - Separate compute from storage
        - UPDATE
        - User management
    

I don't mean to be a jerk, I'm just trying to save people some time. Columnar
DBs is well-trod territory and ClickHouse is way behind.

~~~
manigandham
Redshift doesn't separate compute from storage either unless you're using
Spectrum. Presto isn't a database at all and can read from many data stores.
The rest are all cloud-hosted with lots of moving parts. MemSQL, Vertica,
Actian, Greenplum, and SQL Server are better comparisons.

ClickHouse _is_ a column-oriented db and actually one of the most advanced,
focusing on performance at all costs with lots of table storage engines that
provide flexibility for your exact use-case. It also supports distributed
joins and deletes but has some limitations they are working on.

It can definitely use better tooling and compatibility though, but that's the
tradeoff the core team made, and it seems to be working well for the companies
that can afford the time and talent.

~~~
georgewfraser
My point is there’s just no advantage to ClickHouse. The things that make it
fast are in _every_ column store. There’s other options that do everything it
does and more.

~~~
manigandham
Speed is the advantage. It's very fast for a self-hosted system and probably
only beaten by BigQuery for throughput.

You're right though that most people don't need it and can get 90% of the
speed with better usability with the other options.

------
PeterZaitsev
ClickHouse Indeed does not do "Separate Compute from Storage" yet it is
architectural decision not a feature gap. Running ClickHouse with directly
attached storage and built in replication can be super fast and cost
efficient. It works best for stable workloads

------
tuananh
CloudFlare is using ClickHouse. That does say something

[https://blog.cloudflare.com/http-analytics-
for-6m-requests-p...](https://blog.cloudflare.com/http-analytics-
for-6m-requests-per-second-using-clickhouse/)

------
dorfsmay
Can it run on a cluster? Or single server only?

Can you update specific rows? How fast are updates?

How does it compare to monetDB.

~~~
sin7
You can run it on a cluster or a single server. It's pretty easy to setup
either way.

No updates. Fast inserts.

You can only join two tables at a time, but the joins can be chained to deal
with this limitation.

I tried Monet. It wasn't very stable for me. I didn't stick with it long
enough to judge it. ClickHouse has backing of Yandex. I think that makes a
huge difference.

I have used Clickhouse for the past year. Thrown 3000 column by 120 million
row tables on it. It worked where PostgreSQL came to a halt. Different use
cases really.

I fits my use case perfectly. Large amounts of data with no updates and tons
of aggregations. It's lighting fast.

~~~
manigandham
It does support updates and deletes now, but still limited.

