Hacker News new | past | comments | ask | show | jobs | submit login
ClickHouse cost-efficiency in action: analyzing 500B rows on an Intel NUC (altinity.com)
216 points by hodgesrm on Jan 6, 2020 | hide | past | favorite | 86 comments



Hey, Ofek from Datadog here!

I recently implemented our ClickHouse integration [1], so if any of you would like to try it out we would appreciate feedback. I really enjoyed learning about this database, and it has excellent docs :)

Oh fun fact, speaking of docs, this was the first integration of ours that we scrape docs for as part of the test suite. So when a new built-in metric is added it will fail our CI until we support it [2]. We just did this again for Apache Airflow [3].

[1]: https://github.com/DataDog/integrations-core/pull/4957

[2]: https://github.com/DataDog/integrations-core/pull/5233

[3]: https://github.com/DataDog/integrations-core/pull/5311


So at datadog, after aggregation with spark and storage into parquet, what is used for serving queries of all the datadog aggregated telemetry data (logs, apm and infra telemetry) to the consumers?

(interestingly, we have a nearly identical data ingestion/ETL stack running on spot instances and saving to parquet/s3)


Hi there! We actually talked quite a bit about that in a recent podcast https://www.dataengineeringpodcast.com/datadog-timeseries-da...


We use ClickHouse extensively and it's been great: https://blog.cloudflare.com/http-analytics-for-6m-requests-p...


You've lost a Russian IT meme in translation.

We use the verb "to use brakes" to describe that something works slowly. There was a long story of threads about Java performance, until the meme was solidified after the news from 2005 DARPA Grand Challenge (racing competition for autonomous robotic self-driven cars): a car named Tommy by Jefferson Team which was running Java under Linux haven't used breaks before a turn and crashed to the wall at 70 mph, hence Java runs fast and Java doesn't use brakes were both described perfectly by the same sentence.

Yandex used the meme to advertise Clickhouse for engineers.

edit: formatting

edit2: brakes instead of breaks, wtf


I think this is poor translation. In Russian "Тормозить" may mean to "use brakes" when applied to a car or just "be slow" when applied to a program (or a person). "MySQL сегодня тормозит" would mean MySQL is acting slow today, not what it is using brakes. So that meme I think is best translated as "Clickhouse is never slow" or "Clickhouse never acts slow"


I think you mean "brakes," as I thought you meant "breaks" like in an iterator. Not that familiar with Java though, just assuming based on the reference to a vehicle failing to brake before a turn.

I thought maybe it meant using break statements in your loops slows things down or something.


Yeah, my bad.


Not exactly a good comparison if you don't generate the data the same way for the test setup. Your generated data is more compressible by clickhouse, that skews the comparison. Would have been better to not change the test data if you wanted to do a comparison.


I bet results would be roughly the same even for the exact same dataset - Scylla and other K/v data stores can’t compete with columnar databases that are purpose built for complex analytics queries. the many orders of magnitude query performance differences (not to count storage, compute overhead) show it enough.

It was kind of a crummy use case for Scylla anyway (it’s a transactional write store, not an analytics engine)


The important difference is that we used a more realistic temperature profile, which as you say does affect compression for that column. Schema design (including sort order, compression, and codecs) for the remaining columns is just good ClickHouse practice. Much of the storage and I/O savings is in the date, time, and sensor_id and columns.

It's also useful to note that the materialized view results would be essentially the same no matter how you generate and store data because the materialized view down-samples temperature max/min to daily aggregates. The data are vastly smaller no matter how you generate them.

The article illustrates that if you really had such an IoT app and designed it properly you could run analytics with surprisingly few resources. I think that's a significant point.


That's what you wanted to show, but what you ended up showing is that if you have different data, then the query performance can be quite good.

I get the desire to critique the temperature profile, but completely changing it makes the comparison worthless. From a data perspective it's like saying "if all the sensors just report 1 for temperature every reading, computing the min, max, and average is super fast". No shit, that wasn't the task though.


But they didn't set the temperature reading to anything that would advantage their tests. Without access to the original data they simply generated a dataset as close to the original dataset and volume as possible. The fact they took a few sentences talking about the temperature doesn't equate to invalidating the test.

Looking at this your way - Scylla used an INT, Altinity used a Decimal type with specialized compression (T64). I can tell you that this would have hampered ClickHouse and advantaged Scylla. It's the opposite of what you're saying. They actually performed this benchmark with one arm tied behind their back.

It's a funny benchmark anyway because the two systems have very different use cases but it doesn't invalidate the result.


Then you should provide results for both test datasets to make the point of using a more realistic approach. Materialized views are not news, nor is properly designed analytics applications. For me the importance is how click house is better and why.


A column-store will be magnitudes faster at analytical queries than any rowstore system. This is fundamental architecture and the data used makes little to no difference. You could use the exact ScyllaDB dataset duplicated to trillions of rows and still arrive at the same relative performance figures.


It doesn't matter. ScyllaDB is a Cassandra clone, an advanced nested key/value database that stores data per-row and requires slow iteration to scan through an entire table.

Column-oriented databases will always be much faster at analytical queries because of the difference in physical layout and vectorized processing. Scylla's has very impressive OLTP performance but really shouldn't be compared to OLAP databases at all. That original 1B rows/sec blog post by them is kind of a strange benchmark to begin with.


Problem is what use cases are strictly OLTP? At this point, I’d consider Scylla/C* to usable for a write-only workload with single-row lookups, or a single-column range lookup.

Same question has to be raised: do you have enough rows to justify a distributed Scylla/C* or could you have used MySQL or Postgres on a giant box?


Plenty of OLTP scenarios that need distributed scale and/or high availability of C* - we use it for user profiles/session storage, counters and some high-volume logging that needs access to individual events.


The compression is a property of the table and done on the fly, transparently to the user. If the difference was compressing/decompressing as part of the user task, I'd agree. But this is something that comes for free by a few extra characters in the schema.


At Sematext we replaced our HBase based metrics datastore with Clickhouse. We are happy with the performance gain and flexibility. We also added support for Clickhouse monitoring - https://sematext.com/blog/clickhouse-monitoring-sematext/


We use Clickhouse extensive at work and boy is it better than anything i have used in column oriented databases so far, documentation is good, http query interface and features such as builtin url parsing are amazing, we also tested Druid and found Clickhouse to be better than Druid, it is easier to setup and maintain as well


Curious about what Google does in this area?

https://blog.acolyer.org/2019/09/11/procella/


VictoriaMetrics core developer here.

The performance numbers from ClickHouse running on Intel NUC are impressive! We are going to publish VictoriaMetrics performance numbers for the original Billy benchmark from ScyllaDB [1] running on the same hardware from packet.com . Initial results are quite promising [2], [3].

[1] https://www.scylladb.com/2019/12/12/how-scylla-scaled-to-one...

[2] https://mobile.twitter.com/MetricsVictoria/status/1209116702...

[3] https://mobile.twitter.com/MetricsVictoria/status/1209186575...


Note: not your standard $250 NUC, it's a canyon something with much more oomphs.


It does have more oomph than most NUC's, but it's not a Canyon-something NUC; as stated in the article, the CPU model is Intel i7-6770HQ, which was released in Q1 2016 on Intel's 14nm node, so it's Skylake.

https://ark.intel.com/content/www/us/en/ark/products/93341/i...


No, it is a Skull Canyon NUC from 2016 (with an i7-6770HQ), see https://www.techradar.com/reviews/pc-mac/pc-mac-desktops/int... for a description (I think this https://www.intel.com/content/www/us/en/products/boards-kits... is the Intel product page).

Intel is using the Canyon word for their range of powerful NUCs (Skull Canyon, Hades Canyon, etc. and the upcoming Ghost Canyon).


One thing I haven't seen anyone note about clickhouse though which would be really important to many for data durability, is that it does not use fsync anywhere at all.


I can't find anything about this in the docs except[1]. I also can't find any issues in their bug tracker related to clickhouse not using fsync[2].

I can however find code that actually calls fsync[3][4]. To be fair I haven't read enough to determine how this (doesn't) affect durability. Nevertheless I'm wondering do you have a source for this claim?

  [1]: https://clickhouse.yandex/docs/en/operations/settings/settings/#fsync-metadata
  [2]: https://github.com/ClickHouse/ClickHouse/search?q=fsync&type=Issues
  [3]: https://github.com/ClickHouse/ClickHouse/blob/355b1e5594119e036a2d62988bfa42bc8b1a1687/dbms/src/IO/WriteBufferFromFileDescriptor.cpp#L113
  [4]: https://github.com/ClickHouse/ClickHouse/blob/e765733a26cfc4cecc13c981686560338256a6b1/dbms/src/IO/WriteBufferAIO.cpp#L98


As I mentioned, there's only 1 place where it says anything about fsync, and in that page, it says that is only for creating .sql files.

https://groups.google.com/d/msg/clickhouse/cjJ6v8uzu0Q/jGV59...

> The reason is because CH does not use fsync (for performance)

https://www.linkedin.com/in/dzhuravlev/


It's pretty clearly laid out in the docs. Hopefully anyone seriously considering using Clickhouse reads the docs thoroughly and understands what they're implementing.


What do you mean clearly laid out? This is the only mention of fsync I could find through google or their own search function.

https://clickhouse.yandex/docs/en/operations/settings/settin...


The title of the page might be a little snarky, but it's in the introduction that transactional queries are not supported:

https://clickhouse.yandex/docs/en/introduction/features_cons...

Sure it's not specifically about `fsync` but presumably this is what the consumer of the database actually wants to know.


Fsync is not synonymous with transactions. Not using fsync anywhere means there's a wide window that can be over 10 minutes long when data can be lost when a server gets an unplanned shutdown.


Btw, it should be easy adding fsync to ClickHouse. For instance VictoriaMetrics uses similar file format as ClickHouse, and it issues proper fsyncs at least every second, so it may lose only the last second of data on unclean shutdown such as OOM, hardware reset or `kill -9`. [1], [2].

[1] https://medium.com/@valyala/wal-usage-looks-broken-in-modern...

[2] https://medium.com/@valyala/how-victoriametrics-makes-instan...


I’m not understanding the implication of this, could you explain?


When you write to a file, you generally don't write to physical storage. Instead the writes get buffered in memory and written to physical storage in batches. This substantially improves performance but creates a risk: If there is some sort of outage before the data is flushed to disk, you might lose data.

In order to address that risk, you can explicitly force data to be written to disk by calling fsync. Databases generally do this to ensure durability and only signal success after fsync succeeded and the data is safely stored.

So ClickHouse not calling fsync implies that it might lose data in case of a power outage or a similar event.


Most ClickHouse installations run replication for availability and read scaling. If you do get corrupted data for some reason, you can read it back from another replica. That's much more efficient than trying to fsync transactions, especially on HDD. The performance penalty for fsyncs can be substantial and most users seem to be pleased with the trade-off to get more speed.

This would obviously be a poor trade-off for handling financial transactions or storing complex objects that depend on referential integrity to function correctly. But people don't use ClickHouse to solve those problems. It's mostly append-only datasets for analytic applications.


is this really that important, thought, since all servers feed power from uninterruptible power supply and most data centers have multiple power sources.


It’s a significant deviation from what I would expect from a disk oriented database. So I would definitely expect it to be well documented, along with the reason for it, why the developers believe it is a reasonable (or even safe) choice, what assumptions went into that (such as availability of uninterruptible power supply) etc.

Additionally keep in mind that with EBS most people probably use network attached storage and fsync involves the network. Outage doesn’t just mean power outage, it could also be a network issue.


The implication is that clickhouse can't easily support transactional queries. That's why it's an OLAP not OLTP database. (On-Line Analytics Processing vs On-Line Transaction Processing).


This is not the implication at all.

Clickhouse can easily add fsync, they just choose not to do it.

Mongodb also did not use fsync and was ridiculed for it, yet no one mentions this about clickhouse.


> Mongodb also did not use fsync and was ridiculed for it, yet no one mentions this about clickhouse.

MongoDB claimed to be a replacement for RDBMS-es (which includes OLTP). ClickHouse is explicit about being OLAP-only. MongoDB also hid the fact that they weren't doing fsync, especially when showing off "benchmarks" against OLTP RDBMS-es, while ClickHouse has not tried to show themselves as a replacement for OLTP RDBMS-es.

> Clickhouse can easily add fsync, they just choose not to do it.

For good reason. It's not a simple matter of choosing one of two options. The choice has consequences: performance.


I can't find any evidence showing that OLAP means it is okay to lose data from unexpected shutdowns. How can you have correct analytics without a complete set of data?

> For good reason. It's not a simple matter of choosing one of two options. The choice has consequences: performance.

It is a simple matter though. They can choose to sacrifice performance for data durability which I suspect would not be impacted very much since clickhouse acts like an append log. It just seems that Yandex doesn't care much for durability since they are just using the database to store people's web traffic. They wouldn't care if some of that data is lost so they don't use fsync.


> I can't find any evidence showing that OLAP means it is okay to ...

OLAP also doesn't mean "be the source of truth of the data". You can have a separate source of truth of the "complete set of data" outside of your OLAP engine and load (and reload) data into your OLAP engine any time you're not sure if you have the "complete set of data" in it.

The important difference lies in how often one finds themselves in that situation. In OLAP, the sheer majority of the time is spent querying (i.e., reading) data than loading (i.e., writing) data and waiting for it to be durably saved (i.e., fsync-ed). Because of this imbalance, it makes sense to prioritise for one scenario and handle the other sub-optimally.

> They wouldn't care if some of that data is lost so they don't use fsync.

Or, they can still care about data correctness and simply re-load data they suspect is/may not consistent in the rare case of an improper shutdown. It's not like they use ClickHouse as their primary data store.


To add to pritambaral comments.

The top commercial high performance timeseries databases, which ClickHouse can usually best, used by banks to make decisions on your money also don't use fsync. You can literally quit the software and watch your transaction data be written out 5 seconds later.

Edit: a word


Oh that’s not too bad, they’re very explicit about not having transaction support, thanks for explaining.


Yes, I have the same model. I think I bought it for between $650 and $750.


We have been using ClickHouse in production for some months already and we find it a real game changer. We're running queries over 3B rows for business intelligence purposes.

As a side project, a group of friends and I are working on a simple web analytics project powered by ClickHouse (what it's been originally build for). If anyone wants to contribute, just let me know.

Cheers


Is ClickHouse good for event data when you want to do rollups? For example, say all my events are of the form:

{event: "viewedArticle", article_id: 63534, user_id: 42, topic: "news", time: "2020-01-06"}

I want to be able to build aggregations which shows number of "viewedArticle" events grouped by hour, grouped by topic, counting unique user_ids within each bucket.

Or let's say I want the top K articles viewed each day, filtered by a topic.

That's something that's trivial with Elasticsearch, which has a hierarchical aggregation DSL. Is ClickHouse good at this?

Whenever I see time-series databases such as InfluxDB mentioned, they look like they're focused on measurements, not discrete rows. You can attach the event data as "labels", but this isn't efficient when the cardinality of each column is very high (e.g. article IDs or user IDs in the above example).


Yes. Clickhouse is a column-oriented relational database among many others like MemSQL, Vertica, Redshift, BigQuery, Snowflake, Greenplum, etc. They're all focused on analytical queries over very large datasets using SQL.

An aggregation with several `group by` statements is no challenge and all of these databases also support approximate counting via HyperLogLog for faster results.

Clickhouse has some unique features where each table can have a separate 'engine' including some that automatically apply aggregations. Start with a normal table though since it'll be plenty fast enough for most use cases.


Thanks! Looks like the only downside is that, as it returns rows as results, you end up getting a lot of duplicate column data back and need to "nest" the nested buckets yourself.

For example, a result like:

  topic;time;count
  news;2020-01-01;44
  news;2020-01-02;31
Now you have "news" repeated, and to group this into buckets for rendering summary tables and such (with sub totals at each level), you need to iterate through the flattened results and generate nested structures. This is something Elasticsearch gives you out of the box.

Last I looked at Clickhouse, it had master/slave replication only, and if you want shards of data distributed across a cluster it's something you need to manually manage?


> Now you have "news" repeated, and to group this into buckets for rendering summary tables and such (with sub totals at each level), you need to iterate through the flattened results and generate nested structures. This is something Elasticsearch gives you out of the box.

ClickHouse has a number of optimization for solving 'visitor' problems that you describe. Assuming you just want to group in different ways an idiomatic ClickHouse solution is to construct a materialized view that aggregates counts (e.g., of unique users like uniq(user)). You can then select from the materialized view and further aggregate to have larger buckets. ClickHouse can also compute single-level totals using the WITH TOTALS modifier.

If you need to have cascading sub-totals within the same listing as far as I know you'll have to compute the totals yourself. (That feature actually might be an interesting pull request since ClickHouse generates JSON output.)

> Last I looked at Clickhouse, it had master/slave replication only, and if you want shards of data distributed across a cluster it's something you need to manually manage?

ClickHouse replication is multi-master. The model is eventually consistent. Also, ClickHouse can automatically shard INSERTs across a cluster using distributed tables. That said, many users insert directly to local nodes because it's faster and uses fewer resources.


Sounds great, thank you!


Right, relational databases only return flat tabular results but that seems minor compared to performance increase you gain.

Clickhouse is fast but not as operationally friendly as the others. It's more much work once you go beyond a single node so I'd suggest looking at those other options if you want something easier to operate, or use one of the cloud data warehouses like Bigquery or Snowflake to eliminate ops entirely.


ClickHouse achieved good (great) results. However, it's a bad comparison. Clickhouse is an analytics DB while Scylla is a realtime, random access one. /me am Scylla co-founder.

We could stack 100k rows in a single partition and be 1000x faster in this use case than the performance we demonstrate but we wanted to keep it real. Actually the use case we wanted to show is a single row per partition which would require more machines but surprisingly we couldn't provision that many on AWS.

The presented usecase by ClickHouse is 100x slower on writes (8M row/s) as they report. It doesn't matter since it's just a completely different use case. Use Clickhouse for analytics (I wonder why stop in SSE and not to go all the way to the GPU like SqreamDB) and use Scylla for OLTP


The ScyllaDB one is a bit funny anyway as it doesn't really target analytical workloads. On SSE/GPUs - the ClickHouse guys don't use GPUs today (GPUs are on the roadmap for next year) as their workloads target volumes greater than GPU memory. If your hot dataset sits totally in GPU memory then it makes sense for some things otherwise they found the cost/performance ratio doesn't add up after you paginate in/out. I don't doubt GPU based DB perf numbers but cost is the main factor.

Now just to clarify - you're saying Scylla writes are 100x faster on the same hardware as ClickHouse (so 800M row/s on a NUC). Using the same code that Altinity used I manage around 25M rows/s on my home PC (8 cores/16HT) and elsewhere in this thread the guys from VictoriaMetrics pulled in 53M rows/s on a single node with 28 cores/56 threads (probably doable with ClickHouse on similar hardware I'd suspect).

I'm going to test this with Scylla on my home PC to validate your 800M row/s claim and I'll post about it - I should be able to hit around 2.5 billion rows/s with Scylla if what you've said is true. I've had CH write 300M row/s on my 8 core box using memory buffered tables but that was only at burst.


I love stuff like this but I am massively put off by any database system that I need to "Load" into. There are so many amazing file formats now that can be queried directly as files - a 17H load time simply isnt feasible for much of this work.

Does anyone know what format it stores it internally? Is there a way to simply have it query in-place?


I've worked with MPP DBs, Hadoop, Spark, ElasticSearch, Druid, kdb, DolphinDB and now ClickHouse and performance wise it's all true - in our case ClickHouse was 10-20x faster than Spark and used 4x less memory. I've seen it outperform the fastest commercial timeseries stores by 2x.

This will make me unpopular but my conclusion is that the file based data lake, splitting data from compute, is not the right approach in many (not all) cases and that Spark was not really that revolutionary. I would go as far to say that the direction data has taken has been a failure and ClickHouse and such come closer to solving the real problem of 'BigData'.

So two things here about 'loading'...

1) ClickHouse table/data files are completely portable (like Parquet) and can be moved from one server to another, copied or cloned etc.. there is even a mechanism to allow remote execution or to pull just the files from a remote server or an S3 store etc.. Just because the CH native file format isn't spoken about in the same circles as Parquet and ORC doesn't mean it can't be treated the same way if thats your thing. The CH native format is far more performant/compressible than Parquet or ORC and the specification is Open Source. Someone could implement a CH native file format serdes for Hive for example.

2) In this instance they were generating the data so no different to running Spark and writing to a Parquet file and running analytics on it later. Spark can't write / generate this amount of data in this amount of time on these resources and write out / compress the data to Parquet or whatever other preferred format. I've tried.

ClickHouse isn't perfect and I'm not affiliated with the Altinity guys but I can tell you this is the real deal.


I would like to see comparisons between CH files, I would specifically challenge the compressability of them vs ORC which pretty much maxes out current compression techniques.

As soon as I see CH format being widespread enough to interact with the multitude of other tools that are available then I would consider getting on board - for now a "loadable" data warehouse does little for the kind of workflows we deal with as the loading would take longer than the processing.

With regards to item two - we use a standard consumer GPU (1060 GTX) to handle the conversion from CSV to ORC / Parquet and it is much much faster and cheaper than a 20+ node spark cluster - hence the preference to work on files.

As everything else runs off these files it is kind of integral to our workload


You can also use ClickHouse to query files directly - with clickhouse-local tool.

Example: https://www.altinity.com/blog/2019/6/11/clickhouse-local-the...


https://github.com/ClickHouse/ClickHouse/pull/8430

Just to answer my own question - this looks good - I might have to try it out!


This is going to be your next cloud agnostic data warehouse.


For what it’s worth, I’ve used Clickhouse and Snowflake and I strongly prefer Clickhouse.

Performance was superior, client libraries and built-in HTTP interface was a god-send, it supported geospatial queries. I had perpetual issues with getting Snowflake to properly escape strings in CSV, handle JSON in anything approaching a sensible way, there’s claims that it integrates properly with Kafka as a consumer, but it most certainly does not. The UX is horrible to boot.


What were the issues with JSON? Snowflake is the best cloud data warehouse when it comes to support for unstructured data and far better than Redshift, Bigquery or the others. Snowflake also has geo support.

It doesn't support streaming data though so things like Kafka aren't a good fit yet. They have a connector but it's basically a little app that automates consuming from Kafka, writing files to S3, then loading them into your tables.


Getting it to export JSON was needlessly complicated, those Variant type columns are the worst.

Does it have geospatial support now? Because they definitely didn’t have it when I used it.

Ultimately though, I don’t see enough benefits of SF over CH: feature set overlaps but with no real winner features. observed performance was worse, the client libraries are few and far between and not great, and no HTTP interface, and no real streaming support. On top of being hideously expensive, and closed source.


Yes it has geospatial support. Variant columns are better than every other database so far. Redshift and Bigquery just have a text field and require far more verbose commands to operate and cast. It sounds like most of your issues are with importing and exporting data rather than querying it?

Snowflake is basically EC2 servers reading files from S3 so you get more bandwidth with a larger warehouse size but it's fundamental limit and will have much higher latency compared to running on a local SSD with clickhouse. Lack of streaming is a known problem. They actually do have an HTTP interface, you just don't see it but that's how all the ODBC drivers are implemented (with HTTP calls and JSON data in the background).

If your data fits or you don't mind the operational overhead of running your own data warehouse then it's almost always a cheaper and faster option.


Push-down operator algebra, taken to an extreme...

https://www.yellowbrick.com/


What’s the tldr on why it is fast?


Heavily optimized column store, incl use of SSE instructions, etc. Moreover, some architectural tradeoffs mentioned at https://clickhouse.yandex/docs/en/introduction/features_cons...


I'd say the materialized view is the main thing:

> Thus, we add the following materialized view ... At the end we should have 1440 times fewer rows in the aggregate than the source table.

The cost of populating that view is amortized over the 17.5 hours it took to load the data.


Mat views are great as the article showed. I use them to get query response down to milliseconds, as they vastly reduce the amount of data ClickHouse must scan.

That said, there are a lot of other tools: column storage, vectorwise query, efficient compression including column codecs, and skip indexes to name a few. If you only have a few billion rows it's still possible to get sub-second query results using brute force scans.

Disclaimer: I work for Altinity, who wrote this article.


p.s. Loading the view is low-cost compared to loading the source data. On the NUC it's 40-60 minutes, so worst case it's something like 1h / 17.5h = 5.71%. Also, you can still query the source data. That is fast for individual sensors as the examples showed.


Yeah I was confused, where I couldn't tell what was precomputed stats (col min/max/count), view calcs, and what's actual perf -- even legacy SQL vendors do all those. That's apples/oranges, more of a statement against the other db vs for clickhouse. Likewise, the db comparison I'd like to see if _other_columnar_stores_.

I know some folks running one of the larger clickhouse instances out there... but this article made me trust the community less, not more.


Because it front loaded all operations so that they happen outside of the benchmark. Depending on what you want to do it makes sense but the original intention of the benchmark was a brute force query benchmark.

Have you seen the sticker on the NUC? 116 billion rows per second at 233.61GB/s. If you spend even a single second thinking about how absurd that number is you would start to see that the two benchmarks measure completely different things. Even with a quad channel Xeon CPU you won't see significantly more than 100GB/s memory bandwidth. Those 116 billion queries didn't actually happen. It's just a synthetic number. The result of the query was calculated during insertion of the temperature record before the benchmark has even started and then they just calculated the theoretical number of queries you would have to do for an equivalent result and slapped that fictional number on their NUC.


That's a sticker from a ClickHouse community event, not related to the benchmark. We tend to stick them on anything flat. My ancient Dell XPS-13 has one. It's definitely not that fast.

That said, the sticker is from a real performance test. I assume it was a cluster but don't have details. ClickHouse query performance is outstanding--it's not hard to scan billions of rows per second on relatively modest hosts. These are brute force queries on source data, no optimization using materialized views or indexes.

For instance, I have an Amazon md5.2xlarge with 8 vcpus, 32 GB of RAM, and EBS GP2 storage rated at 100 iops. I can compute average passengers on the benchmark NYC taxi cab dataset [1] in .551 seconds using direct I/O. The throughput is 2.37B rows/sec.

ClickHouse is so fast on raw scans that many production users don't even use materialized views. I mostly use them to get responses down to small numbers of milliseconds for demos.

[1] https://tech.marksblogg.com/benchmarks.html


> 116 billion rows per second at 233.61GB/s

The numbers on sticker are from a cluster of 400 servers and 200 servers were participated in query execution.


I'd argue on ClickHouse not even being that fast (compared to comparable technology like Snowflake, Redshift or BigQuery) but actually the ScyllaDB example being completely misleading. Scylla is probably one of the fastest OLTP datastores, yet they're benchmarking an analytics query — which is pretty easy to crack by any columnar datastore.

The actual point here is that you can execute millions of (different!) individual queries per second on ScyllaDB, which beats any columnar datastore hands down. ClickHouse "cheated" here by translating the (unfortunate) benchmark setup into a single query that's extremely heavily optimized under the hood.


Actually while ClickHouse does not have all features of RedShift, BigQuery etc it usually is much faster than them. It can be slower on some workloads on GPU powered systems, when all data fits in GPU memory but it is not the use case it targets.

ScyllaDB is amazing when it comes to OLTP performance but not in the Analytical ones.

I think they took pretty mediocre Analytical Workload results and shared them as something outstanding.


The restriction to a tiny GPU workload is increasingly wrong for assessments.

GPU compute stacks are increasingly geared towards multi-gpu/multi-node & streaming, esp. given the crazy bandwidth they're now built for (2TB/s for a dgx2 node?). Likewise, per-GPU memory and per-GPU-node memory is going up nicely each year (16-24GB/GPU, and 100GB-512GB/node with TBs connected same-node). Network is more likely to become the bottleneck if you saturate that, not your DB :)

Though I like to do mostly single gpu streaming in practice b/c I like not having to think about multinode and they're pretty cheap now :)


Why do you say CH isn’t as fast as Snowflake? Because my experience is the opposite.

Also, Snowflake is so painful to use and has a bunch of weird edge cases. It’s also so expensive.


There is a video named "The Secrets of ClickHouse Performance Optimizations":

https://youtu.be/ZOZQCQEtrz8


Precomputed materialized views


Yeah, this post could have been titled "Why Materialized Views are Awesome"


Clickhouse is also crazy fast without materialized views - I've only done some PoC's against it, but in loading a largish data set of raw invoice CSVs, I was very impressed with the performance compared to our standard RDBMS.


that sounds like a non-canonical use of clickhouse. Wouldnt a good RDBMS be a better fit for invoice data? This is on the surface, of course, really interested in what is this invoice data like, and what queries are you trying to run on them.


Tightly coupled storage and execution engines, along with an opinionated view on the actual storage format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: