One way to add some trust is to make benchmarks open-source and reproducible: ht...

glogla · on May 19, 2023

I wonder if ClickBench needs more data or something.

I just randomly compared ClickHouse, Databend and Doris on c6a.4xlarge machine. For some queries, there's a big differences, like ClickHouse being order of magnitude faster than the others in Q28 and two orders of magnitude slower in Q29. That sounds useful because it shows some of the databases do something differently than the others and areas for improvement.

But for most of the queries, the comparison is like 0.02s vs 0.03s vs 0.03s. That doesn't sound very meaningful and also I wonder how precise the measurement is, since if we're looking at difference of milliseconds it is much easier for randomness to sneak in, compared to measurements in minutes.

While I can read it in a way that modern columnar databases are superfast and awesome, I wonder how the results would look like on order or two orders of magnitude more data.

zX41ZdbW · on May 19, 2023

It can be applied for a larger dataset, see https://github.com/ClickHouse/ClickBench/tree/main/clickhous... (100 billion records), but so far only ClickHouse has been tested on this volume - it takes too much time and cost to load this data into every DBMS.