
ClickHouse – high-performance open-source distributed column-oriented DBMS - mechmind
https://clickhouse.yandex/reference_en.html
======
buremba
This is huge. It seems to me that it's similar to BigQuery but has many other
features that I didn't see in other databases.

AggregatingMergeTree is especially one of them and allows incremental
aggregation which is a huge gain for analytics services.

Also it provides many table engines for different use-cases. You don't even
need a commit-log such as Apache Kafka in front of ClickHouse, you can just
push the data to TinyLog table and and move data in micro-batches to a more
efficient column-oriented table that uses different table engine.

------
gopalv
This looked interesting at first sight (considering I work on ORC & Hive) -
the single machine configuration with TinyLog and the aggregate indexes look
good[1].

But the more I look into the distributed storage section, the more corners I
see cut in the data consistency section.

"There are no quorum writes. You can't write data with confirmation that it
was received by more than one replica. If you write a batch of data to one
replica and the server with this data ceases to exist before the data has time
to get to the other replicas, this data will be lost."

"The reason for this is in case of network failures when the client
application doesn't know if the data was written to the DB, so the INSERT
query can simply be repeated. It doesn't matter which replica INSERTs were
sent to with identical data - INSERTs are idempotent. This only works for the
last 100 blocks inserted in a table."

"Failover is automatic (for small differences in data) or semi-automatic (when
data differs too much, which may indicate a configuration error)."

"When the server starts (or establishes a new session with ZooKeeper), it only
checks the quantity and sizes of all files. If the file sizes match but bytes
have been changed somewhere in the middle, this is not detected immediately,
but only when attempting to read the data for a SELECT query."

These sort of network blips and bad disk problems happen every week at least
once on a big cluster and something like hadoop wastes a lot of IO doing bit-
rot checks on cold data all the time.

The nature of clickstream data makes it somewhat okay to lose a few chunks in
transit - I can imagine at least a few of the beacons will get dropped purely
over the HTTP mechanism which pumps data into the system.

At some point, the data consistency costs money, slows down inserts and
creates all sorts of limitations on how recovery of data would play out.

But as a general purpose replicated DBMS which serves as a system of record
against fraud allegations (for instance), I can't see this comparing well.

[1] - 200 million rows/sec for join+aggregation is ~2x as good as Hive 2.0
single node LLAP on a hot run
([https://people.apache.org/~gopalv/LLAP.gif](https://people.apache.org/~gopalv/LLAP.gif))

------
grogenaut
For those who are wondering: Doesn't build under osx, tho the issues initially
don't see insurmountable, more that it hasn't been tried under CLANG.

[ 3%] Building CXX object
contrib/libpoco/Foundation/CMakeFiles/PocoFoundation.dir/src/AbstractObserver.cpp.o
error: unknown warning option '-Wno-unused-local-typedef' [-Werror,-Wunknown-
warning-option] error: unknown warning option '-Wno-for-loop-analysis'; did
you mean '-Wno-loop-analysis'? [-Werror,-Wunknown-warning-option] make[2]: __*
[contrib
/libpoco/Foundation/CMakeFiles/PocoFoundation.dir/src/AbstractObserver.cpp.o]
Error 1 make[1]: __* [contrib
/libpoco/Foundation/CMakeFiles/PocoFoundation.dir/all] Error 2

A bit of googling shows this is likely because Clang -Wall overrides any other
flags set earlier which is apparently different than GCC. This makes me think
they aren't lying when they say it supports linux in that they likely haven't
tried building it on mac much.

That said it doesn't see to be using any crazy deps that don't support
multiple platforms. Poco above is fully cross platform.

Note per apparent comments below: Lots of people develop or use macs and so
they'd be interested if they'd have to have a VM or other option to use this.
Since the readme is super thin and it just says Only linux xxx I felt they
didn't have much info. I'm used to the days where people built projects that
compiled everywhere but didn't build packages for them for some reason.

~~~
fsaintjacques
It supports Linux x86-64 with gcc >= 5.0 .

~~~
grogenaut
I can read as well. Neither your comment nor their website say how far it
might or might not be from running on another platform. My comment was merely
a initial analysis for how far it might be from working on a mac which is
important to a lot of people in here. Seeing that it's not working due to
clang testing shows that they likely haven't tried on mac much.

~~~
inlineint
Did you try to build it on Mac with GCC from Homebrew?

~~~
grogenaut
No just default build.

------
waynenilsen
I found this [1] much more informative in terms of what this is good for.

[1]
[https://clickhouse.yandex/reference_en.html](https://clickhouse.yandex/reference_en.html)

~~~
dang
Ok, we changed the URL to that from
[https://clickhouse.yandex/](https://clickhouse.yandex/).

------
threeseed
This looked really interesting and then saw this: "This is not a cross-
platform system. It requires Linux Ubuntu Precise (12.04) or newer, x86_64
architecture with SSE 4.2 instruction set."

That pretty much rules out most enterprise deployments or big data appliances.

Anyone know the difficulty of getting something like this running on RedHat
distros. Surely it's nothing major.

~~~
tyingq
This page seems to cover most of the prerequisites, and some other advice:
[https://github.com/yandex/ClickHouse/blob/master/doc/build.m...](https://github.com/yandex/ClickHouse/blob/master/doc/build.md)

And it does say this: _" With appropriate changes, build should work on any
other Linux distribution...Only x86_64 with SSE 4.2 is supported"_

~~~
threeseed
Thanks so much for that. It definitely looks like the documentation might be
out of sync as there are also some references to ODBC support in the code:

[https://github.com/yandex/ClickHouse/blob/32057cf2afa965033c...](https://github.com/yandex/ClickHouse/blob/32057cf2afa965033cf850a081e7a9dfa306594d/dbms/src/ODBC/info.cpp)

Going to look into building a Spark data source for this so we can see how it
well it compares to databases like Cassandra.

------
nfa_backward
Looks really interesting and not another SQL on Hadoop solution. The
benchmarks look impressive, but all of the queries were aggregations of a
single table. I did not see any joins. I wonder how mature the optimizer is.

~~~
Lethalman
I think it's because the docs say "If there isn't enough memory, you can't run
a JOIN." . While SQL on Hadoop solutions work also without enough RAM by
spilling on disk. I don't think a comparison with JOINs would be fair in this
case.

------
rubiquity
Yandex has their own TLD? :/

~~~
negus
Are you really surprised, that largest European internet company has own TLD
when even Suzuki, BMW and mormons have one?
[https://nic.yandex](https://nic.yandex)

~~~
rubiquity
Oh they are bigger than I realized. Thanks.

------
cven714
Interested in how this differs from kdb+, I see it isn't part of the
performance comparison chart but seems to have some feature overlap.

~~~
nickpeterson
Knowing the kdb community, someone will post a benchmark full of nonsensical
values in a plain text file available on an ftp site that shows kdb winning by
100000000 of some unknown unit...

~~~
StreamBright
In my experience the kdb community does exactly the opposite, showing a real
world use case (high frequency trading for example) using real world data and
shows that their platform is significantly faster than the competitors. Is it
worth the price? Not so sure, but it certain use cases it certainly is.

~~~
SEJeff
Yeah kdb is built so that the entire core of the db fits in L1 CPU cache. It
really is an impressive feat of engineering and optimization.

I did hate it when we used it however. It required more or less a team of
people doing querying on it fulltime. Have you seen the queries in Q? It looks
like someone set the baud speed wrong on a serial connection:

From wikipedia's page on Q (the query language of kdb):

    
    
        The factorial function can be implemented directly in Q as
        
        {prd 1+til x}
        or recursively as
        
        {$[x=0;1;x*.z.s[x-1]]}

~~~
PeCaN
Makes sense to me, assuming 'til' is 'iota' and 'prd' is 'fold-with-
multiplication'.

I don't know Q, but I'd guess you read it right-to-left like J/K/APL. `x` is
the right argument (in J it's `y`), so if we were to call `factorial 5`:

1\. Create an array of [0,x) | 0, 1, 2, 3, 4

2\. Add 1 to each element of the array | 1, 2, 3, 4, 5

3\. Fold the array with multiplication | 120

Array languages are super elegant and fun once you use them a bit.

------
moondowner
I guess the community will step in and for starters will: add support for
other Linux distros, and add comments in English alongside Russian in the
source code.

------
biokoda
This looks impressive as hell.

~~~
etatoby
Wow, just wow.

I guess it's time to rewrite some backends...

~~~
slagfart
Actually, the benchmarks aren't against alternative columnstore DBs, which
would really put proof on their claims. Would love to see a comparison with
Greenplum or Redshift.

For those who haven't worked with them before, columnstore databases are
clever, but also do as much processing as they can at ingestion time, so that
they can have good SELECT benchmarks later.

~~~
RMarcus
Vertica is a pretty large commercial column store with a lot of customers.
Although I'm not sure if the Yandex folks got permission to publish benchmarks
about it...

(MonetDB is a classic research columnar database, and I believe MemSQL has a
columnar mode. Greenplum is missing, for some reason.)

~~~
fsaintjacques
That's what I want to know, how fast is it agaisn't Vertica.

~~~
RMarcus
Well, they have benchmarks! Against Vertica! Which is surprising, because I
imagine Vertica's EULA forbids that.

[https://clickhouse.yandex/benchmark.html#["1000000000",["Cli...](https://clickhouse.yandex/benchmark.html#\["1000000000",\["ClickHouse","Vertica"\],\["0"\])]

~~~
overcast
Wait, Vertica forbids publishing benchmarks? What exactly are they going to
do?

~~~
EwanToo
Clicked one of the vote buttons while trying to hit reply, sorry if it was a
down...

Most DB vendors forbid benchmarks, Oracle, IBM DB2, etc, they claim they can
remove your rights to run the software if you publish them

------
alexsderkach
Another summary with examples: [http://bit.ly/1Uc5uqb](http://bit.ly/1Uc5uqb)

~~~
uniacid
Awesome, thanks for sharing!

------
betimsl
Are they planning to release any libraries? HTTP interface is quite good but I
think it'd be better not to have the HTTP overhead and have an direct
interface.

------
StreamBright
Without supporting other operating systems it is hard to consider it as an
alternative to anything. We have several clusters using another operating
system than the one supported by ClickHouse. Unfortunately few customers are
going to invest into a different platform to try out something new like this.
The lower the entering bar for new tech is the better.

~~~
pnathan
> ClickHouse manages extremely large volumes of data in a stable and
> sustainable manner. It currently powers Yandex.Metrica, world’s second
> largest web analytics platform, with over 13 trillion database records and
> over 20 billion events a day, generating customized reports on-the-fly,
> directly from non-aggregated data. This system was successfully implemented
> at CERN’s LHCb experiment to store and process metadata on 10bn events with
> over 1000 attributes per event registered in 2011.

Might want to rethink that consideration. :-)

~~~
StreamBright
I have systems with 100 trillion records and 100 billion events / day using a
technology that runs on CentOS, Ubuntu, FreeBSD, Windows Server and Solaris.
What am I doing wrong? Why should I vet a technology that does not run on the
platform that our company has chosen, it is approved by security and compliant
with all the corporate requirements while the only platform ClickHouse is
using is not?

~~~
flr_null
What open source columnar storage do you use for 100 trillion records? It's
very impressive.

~~~
StreamBright
Hive/Tez + ORC on HDFS and on S3. I do not think it is very impressive just by
the numbers mentioned so far. The performance of data access like how long an
average query runs is much more interesting. You can easily store 100T lines
but it is much harder to make it queryable. There is a lots of interesting
work being done in that area, many in-memory engines are out there to provide
access of a subset of the entire dataset at much faster speed.

~~~
flr_null
Of course I'm asking with context of massive parallel realtime queries from
customers as it's going in yandex metrika or google analytics. And yes 100T
lines with Hive+ORC it's not an issue, but it's another league.

I assume that only Kudu is a competitor for ClickHouse now. Or may be
Greenplum.

Thank you for your answer.

------
tkyjonathan
_Mind Blown_ /puts hadoop down and reads this.

------
k__
When to use such a DB?

~~~
RyanHamilton
Imagine the layout on disk for both a row based and a column based
database...or see this link:

[http://www.timestored.com/time-series-data/what-is-a-
column-...](http://www.timestored.com/time-series-data/what-is-a-column-
oriented-database#why-columns)

Now imagine which areas need read when you perform a query like "average
price" for all dates. In row-oriented databases we have to read over large
areas, in column-oriented databases the prices are stored as one sequential
region and we can read just that region. Column-oriented databases are
therefore extremely quick at aggregate queries (sum, average, min, max, etc.).

Why are most databases row-oriented? I hear you ask. Imagine we want to add
one row somewhere in the middle of our data for 2011-02-26, on the row
oriented database no problem, column oriented we will have to move almost all
the data! Lucky since we mostly deal with time series new data only appends to
the end of our table.

~~~
k__
I see.

So if my data consists of time series I use column stores?

Is this the only use case? The page talks about games.

------
vegabook
so this is basically Russian Cassandra? Written in C, not Java. Enough for me;
will test it.

~~~
flr_null
No, it's actually not.

Cassandra is not columnar db. It means if you need all user_id from your table
Cassandra will scan all your data (e.g. 10PB) on disks. But ClickHouse will
only scan 1 column file (e.g. 10GB).

ClickHouse is Russian Kudu.

