Hacker News new | past | comments | ask | show | jobs | submit login
ClickHouse, Inc. (github.com/clickhouse)
519 points by zX41ZdbW on Sept 20, 2021 | hide | past | favorite | 159 comments



I'd like to thank the creators of ClickHouse as i hope they are reading here. We've been using it since 2019 in a single server setup with billions of rows. No problems at all. And query speeds that seem unreal compared to MySQL and pg.

As we did not want to go into the HA/backup/restore details at that time we created a solution that can be quickly recreated from data in other databases.

Interesting presentation from Alexey about Features and Roadmap from May 2021:

https://www.youtube.com/watch?v=t7mA1aOx3tM


I have similar first hand experience with ClickHouse. In the past I have moved custom analytics solution I had built on HBase to a solution running on a single node ClickHouse and had no issues whatsoever. In my current startup I am again using ClickHouse with great success. It's a mind boggling fast. Thanks ClickHouse team for building such an amazing system and for making it open source.


That's exactly a use-case I meant below. Do you use any BI tool to visualize CH queries?


I use Grafana for that. At the moment, we have developed entire internal products based on ClickHouse + Grafana.


There’s a community connector for metabase https://github.com/enqueue/metabase-clickhouse-driver


No, the results are embedded in a web app.


Haven't used any of these yet, but how does ClickHouse compare to Postgres extensions like TimescaleDB and Citus (which recently launched a columnar feature)? I remember reading in the ClickHouse docs some time ago that it does not have DELETE functionality. Does this pose any problems with GDPR and data deletion requests?


I benchmarked ClickHouse vs. Timescale, Citus, Greenplum and Elasticsearch for a real-time analytics application. With a couple hours learning for each (although I’ve used Postgres extensively and so Postgres-backed databases had a bit of an advantage), ClickHouse’s performance was easily an order of magnitude or two better than anything except ES. ES had its own downsides with respect to the queries we could run (which is why we were leaving ES in the first place).


Cloudflare’s analytics have been powered by Clickhouse for a long time. And I was an early investor in Timescale. They’re both excellent products.


Yeah, I was a little disappointed because I was rooting for Timescale myself. And, maybe if I had spent more time optimizing, I could have made Timescale work but, between our experiments and Cloudflare’s blog post about migrating from Citus to ClickHouse, ClickHouse seemed like the one most likely to hold up for our production workloads.


What did you end up going with?


ClickHouse deployed to EKS with Clickhouse operator


I know of a database that claims to perform even faster than that one, It is commercial, though. For very massive data, in a time-series setup basically.

https://www.hydrolix.io/


From docs it seems they use forked ClickHouse code.


As far as I understand, they use a part as a frontend, not as a full engine for everything.


In a nutshell, my extremely subjective and biased take on it:

* Citus has a great clustering story, and a small data warehousing story, afaik no timeseries story;

* TimescaleDB has a great timeseries story, and an average data warehousing story;

* Clickhouse has a great data warehousing story, an average timeseries story, and a bit meh clustering story (YMMV).

(Disclaimer: I work for a competitor)


[Timescale co-founder]

This is a really great comparison. I might borrow it in the future :-)

But yes, if you have classic OLAP-style queries (e.g., queries that need to touch every database row), Clickhouse is likely the better option.

For anything time-series related, and/or if you like/love Postgres, that is where TimescaleDB shines. (But please make sure you turn on compression!)

TimescaleDB also has a good clustering story, which is also improving over time. [0][1]

[0] https://news.ycombinator.com/item?id=23272992

[1] https://news.ycombinator.com/item?id=24931994


I like how you always chime in with time series stuff. Just by watching you interact here, boost my confidence on your product


Thanks for the kind words. Our first company value is, "Help first." :-)


I can vouch for this. Ajay replies to my emails on a weekend. I'm nobody, not even a customer. He doesn't need to do that at all. I imagine he does it because he's a genuinely good and helpful type of person.


Thanks! It's not just me, it's what we all try to do at Timescale :-)


I thought Pinot and Druid are direct competitors


> Disclaimer: I work for a competitor

What competitor btw? I tried to open a link from your profile but it does not work.


It appears the competitor is QuasarDB


Excellent comment archeology, this is correct. :)


actually grabbed it from Keybase -> Googled name -> LinkedIn :)


ClickHouse wins on licensing--Apache.

The TimeScale licensing approach, the way it is written, perhaps accidentally, has lots of hidden landmines. The TimeScale license slants toward cloud giant defense to the extent that normal use is perilous.

For example, timescale can be used for normal data (postgres) as well, so any rules seem to apply to all your data in the database. The free license only usable available if:

the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects, such as through use of the Timescale Data Definition Interfaces, in a Timescale Database utilized by such Value Added Products or Services.

My read is that if you let a customer do anything that adds a custom field, or table, or database, or trigger, or anything that is "structural" (even in the regular relational stuff) anywhere in your database (metrics or not), you are in violation. There doesn't seem to be a distinction about whether this is "direct" control or not, or whether a setting indirectly adds a trigger. I don't want to be in a courtroom debating whether a new metric is a "structural change!"

Now, none of that might be the intent of the license, but you have to go by what it says, not intentions.

The sad part of that is, I, and I'm sure many folks, have no interest in starting a database company, but we can't rally timescale because of legal risk. Looks awesome otherwise, though.


[Timescale co-founder here]

Hi Eric, thanks for taking a close look at our license.

I'd like to dispel some misconceptions:

The core of TimescaleDB is Apache2. Advanced features are under the Timescale License.

Regarding this:

  the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects, such as through use of the Timescale Data Definition Interfaces, in a Timescale Database utilized by such Value Added Products or Services.

  My read is that if you let a customer do anything that adds a custom field, or table, or database, or trigger, or anything that is "structural" (even in the regular relational stuff) anywhere in your database (metrics or not), you are in violation. There doesn't seem to be a distinction about whether this is "direct" control or not, or whether a setting indirectly adds a trigger. I don't want to be in a courtroom debating whether a new metric is a "structural change!"
That's not correct, and we took pains to clarify that in the license:

  3.5 "Timescale Data Definition Interfaces" means SQL commands and other interfaces of the Timescale Software that can be used to define or modify the database schema and other structural aspects of database objects in a Timescale Database, including Data Definition Language (DDL) commands such as CREATE, DROP, ALTER, TRUNCATE, COMMENT, and RENAME. [0]

Strictly speaking, if you provide Data Definition Interfaces (DDL) to customers via a SaaS service (ie you are running a TimescaleDBaaS - which applies to < 0.000001% of all possible users) you are in violation of the license. But otherwise you are fine.

If you are looking for more votes of confidence, today there are literally millions of active TimescaleDB instances, including by large companies like Walmart, Comcast, IBM, Cisco, Electronic Arts, Bosch, Samsung, and many many smaller ones. [2]

If you have any other questions, I'm happy to answer them here, or offline (ajay at timescale dot com).

[1] https://www.timescale.com/legal/licenses#section-3-5-timesca...

[2] https://www.timescale.com/


Are you sure?

The phrasing "such as" in "such as through use of the Timescale Data Definition Interfaces" looks to me like it can be interpreted as saying "Including but not limited to"


Yes :-)

It was a little cumbersome to list every DDL SQL command, which why it uses that language. But that is the intent.

If you have a specific question, happy to answer it here (or offline)

Also, we used this language deliberately to provide more clarity. DDL vs DML is a pretty clear line to most developers who use TimescaleDB (vs some other companies who use language like, "you can't compete with us" etc).


The issue isn't the DDL vs. DML.

The issue is the 'such as', which reads to me as indicating that providing an API endpoint that can add fields would also be a way of letting the customer modify the database schema and therefore covered.

Which means that building a SaaS app backed by timescale that has any level of customisation exposed to the user appears to be prohibited.

This seems a rather stronger level of prohibition than stopping people directly competing with you, and would suggest that if it's intended it would help to make it more explicit, and if it isn't then an explicit statement of that would be worth adding.


Practically speaking, there are already 1000s+ SaaS apps built on TimescaleDB (and using the Timescale License features)

But I appreciate the feedback on how we could make our language clearer. Will share with the team!


The issue is likely something like this:

a) Oracle buys TimescaleDB

b) Oracle sues any of 1000+ SaaS apps that have decent revenue and they can identify

c) The other 900 suddenly have massive due diligence issues even if not sued


I mean, that’s cool, but maybe a bit problematic if the language of the license means they’re all in violation of it.

I understand from your perspective this whole discussion probably looks silly, because you know what’s in your mind.

There are just too many stories of this blowing up in someone’s face (mostly due to legal, not due to actual enforcement) :/


Right, and we're subject to what's in his mind, which we can't know, and what's in his acquirer's mind, which even he can't know!


Right, I'm onboard with your stated intent, and the lovely database, and even sympathetic on the cloud provider threat.

My interpretation doesn't hinge on the DDL/DML question. If what you said is your intent, the legal language used is wrong and you should fix it. Consider this a bug report. Here's a source!

https://www.law.cornell.edu/definitions/uscode.php?width=840...

In order for the license to be usable, you need to be limitative here.


Thanks for the feedback, will pass along :-)


So this is a very relevant question and I have been trying to figure out if I need to migrate off timescaledb asap coincidentally over the last 2 weeks (We're pre-production anyway, so it's the time to do so!). Doing so has been super low priority, but since you're here .... :)

If I have a table that records timeseries data and then another table that has a customer-provided extensible set of metadata where a customer can define columns and other related tabular data, would that violate the license? The customer doesn't have a direct, like, psql level of access but the API intentionally provides a very similar level of interaction.

Does this qualify as providing Data Definition Interfaces? If none of those additional columns and such appear on tables set up as timescale tables does that make any difference?


One clarification to the above discussion -- about whether these restrictions also get applied to "even the regular relational stuff" -- which I think is relevant to you (and parent):

The Timescale License only covers the TimescaleDB code. Postgres code continues to be covered by the OSS PostgreSQL License [0].

So putting aside the question about API vs. psql level (again, the Timescale License was drafted to enabled this for "Value Added Services", e.g., where "such value-added products or services are not primarily database storage or operations products or services"), this license wouldn't apply for non-Timescale code.

[Timescale co-founder here]

[0] https://github.com/timescale/timescaledb/blob/master/NOTICE


Ok thank you!

I appreciate from your POV this probably is silly but for us it’s very helpful to have explicit clarity since investors are questioning and demand certainty. I once had to rewrite https handshakes because a lawyer thought export compliance laws would be violated, so hopefully you understand my trepidation :-)


Just keep these comments handy in case you ever need them in court :P


Not the parent, but great to hear that clarification! By the way, doubt I would complain if Prometheus didn't look impressive. :-)

One other bug report on the license language front, this language could be construed to prohibit uses like Prometheus--unless there's a definition of operations products I missed (possible).

> are not primarily database storage or operations products

Suggested edit:

> are not primarily database storage or *database operations products*


Oh, English parsing =)

Yes, database is a modifier to both storage and operations: database (storage or operations) products.

It's a good edit; will keep in mind.

(For good reason, we generally just don't like to "update" the text of the license too much, even for minor nits.)

Thanks!


Oh sorry, said Prometheus above, meant TimeScale is impressive!

Understood. If you see the other thread regarding the "such as" language, there is a serious edit that you can batch-up and repair to reflect your intentions with this one.

The "such as" language, retains your right to sue anyone for license violations if their API allows any customer action that causes structure changes indirectly, via the DDL, even under the hood (materialized views, too, presumably). That's way way more use cases than just repackaging TSDB as a service. That's a landmine, which when people compare and choose databases, they'd just assume avoid, even if otherwise comfortable with a cloud-protective license. Making this clearer and less onerous probably will probably pay for itself with a wider top-of-funnel for the product with more people more confident in the license.

The "We Clarified It In a Thread on Hacker News Public License" is probably not as ideal as updating the places that need clarification. :-P


> ClickHouse wins on licensing--Apache

How so? An end user should prefer a database under a license that protect the developer and users from cloudification/proprietization/SaaS


On top of that, Yandex requires a very aggressive CLA to be signed by contributors: https://yandex.ru/legal/cla/?lang=en

This worries me and makes me wonder if they are going for the open-source-only-by-name model.

[Please reply instead of giving silent dowvotes.]


We don't require Yandex CLA:

> As an alternative, you can provide DCO instead of CLA. You can find the text of DCO here: https://developercertificate.org/ It is enough to read and copy it verbatim to your pull request.

> If you don't agree with the CLA and don't want to provide DCO, you still can open a pull request to provide your contributions.

https://github.com/ClickHouse/ClickHouse/blob/master/CONTRIB...

Anyway, Yandex CLA will be removed in the upcoming days (it should be already removed).


ClickHouse competes with OLAP storages like Druid or Pinot.

I don't know about ClickHouse but the other 2 uses bitmap indexes to make storing petabytes of data affordable.

Row oriented databases would struggle to compete against ClickHouse. They are easily an order of magnitude slower.


ClickHouse uses skip indexes. They basically answer the question "is the value I'm seeking not in the block."

For example, there are a couple varieties of Bloom filters, which allow you to test for presence of string sequences in blocks. This allows ClickHouse to skip reading and uncompressing blocks (actually called granules) unnecessarily.


Sentry.io settled on Clickhouse for error and transaction data after reviewing several options including Citus and Elastic. We've been happy with both the performance and how well it scales from Open Source installs to our SaaS clusters.


There are many independent comparisons of ClickHouse vs TimescaleDB:

By Splitbee: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... By GitLab: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... And others: https://github.com/ClickHouse/ClickHouse/issues/22398#issuec... https://github.com/ClickHouse/ClickHouse/issues/22398#issuec...

If you'll find more, please post it there.

TimescaleDB can work pretty fine in time series scenario but does not shine on analytical queries. For most of time series queries, it is below ClickHouse in terms of performance but for small (point) queries it can be better.

The main advantage of TimescaleDB is that it better integrates with Postgres (for obvious reasons).

There are also many comparisons of ClickHouse vs Citus. The most notable is here: https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

ClickHouse can do batch DELETE operations for data cleanup. https://clickhouse.com/docs/en/sql-reference/statements/alte... It is not for frequent single-record deletions, but it can fulfill the needs for data cleanup, retention, GDPR requirements.

Also you can tune TTL rules in ClickHouse, per table or per columns (say, replace all IP addresses to zero after three months).


[Timescale DevRel here]

@zX41ZdbW@ - Thanks for pointing out the various benchmarks that have been run by other companies between Clickhouse and TimescaleDB using TSBS[1]. As we mentioned, we'll dig deeper into a similar benchmark with much more detail than any of those examples in an upcoming blog post.

One notable omission on all of the benchmarks that we've seen is that none of them enable TimescaleDB compression (which also transforms row-oriented data into a columnar-type format). In our detailed benchmarking, queries on compressed columnar data in Timescale outperformed Clickhouse in most queries, particularly as cardinality increases, often by 5x or more. And with compression of 90% or more, storage is often comparable. (Again, blog post coming soon - we are just making sure our results are accurate before rushing to publish.)

The beauty of TimescaleDB columnar compression model is that it allows the user to decide when their workload can benefit from deep/narrow queries of data that doesn't change often (although it can still be modified just like regular row data), verses shallow/wide queries for things like inserting data and near-time queries.

It's a hybrid model that provides a lot of flexibility for users AND significantly improves the performance of historical queries. So yes, we do agree that columnar storage is a huge performance win for many types of queries.

And of course, with TimescaleDB, one also gets all of the benefits of PostgreSQL and its vibrant ecosystem.

Can't wait to share the details in the coming weeks!

[1]: https://github.com/timescale/tsbs


I have a related question, in case anyone knows: We want to store typical analytics data somewhere (currently in BigQuery) to analyze with Looker. Things like "CI run started", "CI run finished" and then calculate analytics over average CI runtimes.

Which database would be a good fit for this? There isn't too much data, maybe tens of thousands of rows eventually. Would Timescale be a good fit? I'd prefer that, due to existing familiarity with Postgres, but if ClickHouse is better, that's good too.


Postgres has much more featureful query language, and at tens of thousands of rows the performance difference is irrelevant. The story becomes different when answering a query has to touch millions of records and the answer is needed in milliseconds.


Thanks!


Why move off BigQuery?


If BigQuery is good, there's no reason to. I just assumed a time-series database would be better suited to the workload.


(although it can still be modified just like regular row data)

But it can't be updated or deleted, so what do you mean by this?


That's a great catch @xdanger and you're right, my comment wasn't accurate. Honestly I rewrote the response a few times and this part wasn't cleaned up which is totally on me.

The overall concept that I was intending to highlight is that you can benefit from both row & columnar store in TimescaleDB. Chunks that are not yet compressed (row store data) can be modified (INSERT/UPDATE/DELETE) as usual and it's transactional - so you're assured it's been completed.

As of TimescaleDB 2.3, compressed chunks (columnar) do allow INSERTS but UPDATES/DELETES on compressed chunks are not yet supported natively. You _can_ decompress any chunk and modify the data (again, transactionally) as needed and recompress.


Thank you! Looking forward for a blog post. We need more references for comparison to optimize ClickHouse performance.


ClickHouse can delete rows but work as batch/async operations: https://clickhouse.com/docs/en/faq/operations/delete-old-dat...


Correct + wanted to mention that "lightweight/point-deletes" might come as a new feature.

Initial discussion: https://github.com/ClickHouse/ClickHouse/issues/19627

Being implemented: https://github.com/ClickHouse/ClickHouse/pull/24755


> I remember reading in the ClickHouse docs some time ago that it does not have DELETE functionality. Does this pose any problems with GDPR and data deletion requests?

Clickhouse has ALTER ... DELETE and ALTER ... UPDATE functionality now! (and TTLs)


(Timescale DevRel here)

We've recently been working through a detailed benchmark of TimescaleDB and Clickhouse. The DELETE/UPDATE question has been an intriguing story to follow - and I honestly hadn't considered the GDPR angle.

ATM, Clickhouse is still OLAP focused and their MergeTree implementation does not allow direct DELETE (or UPDATE) of any data. All DELETE/UPDATE requests are applied asynchronously by (essentially) re-writing/merging the table data (it's referred to as a "mutation") without whatever data was referenced in the DELETE/UPDATE. [1]

[1]: https://clickhouse.com/docs/en/sql-reference/statements/alte...


We are using Clickhouse combined with GDPR's Data Deletion Requests. We store the user-ids in a separate system, and run the ALTER/DELETE statements once per week. Works pretty smooth, though I would prefer some more automation within Clickhouse for them.

Data for in-active users gets deleted because our clickhouse retention policy is lower than the in-active-user timeout


ClickHouse does allow delete and update operations. They are just asynchronous functions.

I use them every now and then, but I prefer working with partition strategies when I have to these programmatically.


You are correct, the proper way to do deletions in ClickHouse is to use partitions, and drop partitions. That is probably good enough for most analytical use cases, but YMMV.


This question is becoming critical right now, as nonrecoverable deletes are required within 30 days for both GDPR and CCPA.

Most products do the asynchronous rewrite, especially if they're based on immutable storage. That's fine, but it should be tested to verify that it's not triggering on every delete, for example, and that it's resource-efficient.


The other option is to use row level encryption and throw away the keys when a user requests a delete.


> I remember reading in the ClickHouse docs some time ago that it does not have DELETE functionality. Does this pose any problems with GDPR and data deletion requests?

Altinity is fixing this. The project is called Lightweight Delete and it's for exactly the GDPR reason cited. The idea is that there will be a SQL DELETE command that causes rows to disappear instantly. What actually will happen is that they will be marked as deleted, then garbage collected on the next merge.

Disclaimer: I work for Altinity.


These some really great technology coming out of Russia in the information retrieval/database world: ClickHouse, a bunch of Postgres stuff that Yandex is working on, 2gis.ru (a super detailed vector map on a completely different stack to Google/MapBox), etc.


Definitely! Do you have any further info about what Postgres stuff Yandex is working on?


There is a company in Russia called Postgres Pro https://postgrespro.ru/ and they are those people who added json functionality to postgres. As far as I know they work on full text search for postgres now.


There's a bunch of stuff scattered around on mailing lists and conferences: I think it's the main data source for Yandex's email offering (gmail equivalent). They've got an async C++ library for postgres called Ozo, and they're quite active in the community!


> Most other database management systems don’t even permit benchmarks (through the infamous "DeWitt clause"). But we don’t fear benchmarks; we collect them.

I love the confidence here.


I don't have firsthand experience, but everyone I know with second- (colleague) and third- (acquaintance) -hand experience says the performance promises hold up.


Of course it does - it’s purpose built for a narrow use case. However it’s an extremely popular use case.

Clickhouse optimizes on the 2 most important things for OLAP - minimal disk space due to compression benefits of columnar storage and minimal compute for the same reason - and therefore fast.

However it isn’t flexible when you want to expand the use case. You can’t do any sort of text search, no complex joins (there are no foreign keys), and you need to order you tables there way you want to sort them.

For certain things it’s perfect. It was built to solve a problem Yandax had and that’s notable. But it doesn’t have anywhere near the flexibility of Elasticsearch for example.

But yes it’s purpose built to be extremely fast and minimize storage for the types of use cases it is built for.


> However it isn’t flexible when you want to expand the use case. You can’t do any sort of text search, no complex joins (there are no foreign keys), and you need to order you tables there way you want to sort them.

That is false. I have built a large scale system that does tons of text searches, complex joins and even queries on top of JSON objects with performance that rivals BigQuery and surpasses it in terms of cost.

Edit: actually much better performance when you account for BigQuery’s cold start scenario.


BigQuery isn’t really OLTP and if I needed real time results for things like search I don’t think Clickhouse is the solution at even limited scale. I guess everyone’s mileage varies though.


I don't think it's accurate to say the ClickHouse use case is narrow. ClickHouse is extremely good at loading and querying events arriving in near-real time from event streams like Kafka. It can also load very efficiently from data lakes. Like Druid it can offer low latency response even as the data set size scales into trillions of rows.

ClickHouse is used for everything from log management to managing CDN delivery to real-time marketing and many other applications. It's gone far beyond the web analytics use case for which it was originally developed at Yandex.

Edit: clarification


Can you explain more how it’s used for real time marketing and other things? Ive only used it for OLAP which although it has a lot of different things is still more or less the same thing abstractly. Would love to hear about these other uses.


For example, you can use ClickHouse queries to dynamically change the shape of pages based on user behavior across multiple sites (aka retargeting). You can can also use ClickHouse to manage CDN downloads in real time. Here are a couple of talks that illustrate both use cases.

We still call this OLAP but it's quite different from traditional uses. In particular the core data come event streams.

https://altinity.com/presentations/2020/06/16/big-data-in-re...

https://altinity.com/webinarspage/2020/6/23/big-data-and-bea...


Interesting news indeed! I very much wonder what it means long term in terms of Licenses. I would imagine much better future if Clickhouse would become Foundation driven process which gives good protection from license change (through I'm biased here) - Currently Clickhouse fully under Apache 2.0 license may look too good to be true compared to where many successful VC funded projects took licenses of their projects (think Elastc, MongoDB, Redis)

In any case though I expect a lot of growth in Clickhouse community now and investment both engineering and most importantly Marketing - I think Clickhouse technology has a lot more adoption potential than it currently has


I'd much rather they relicensed early if they can, to set expectations and to ensure talented people actually get to sustain its development, rather than parasitic jobsworth FAANG types who will inevitably drive development at Amazon. Free software in this context is very dead, let's not pretend network and channel effects of AWS were ever envisaged in the 1980s when most "contemporary" free software licensed were designed.


Canonical link: https://clickhouse.com/blog/en/2021/clickhouse-inc/

But I presume the GitHub link (https://github.com/ClickHouse/ClickHouse/blob/master/website...) has been submitted because clickhouse.com is going to be blocked for a large fraction of HN users (Peter Lowe’s Ad and tracking server list, which I think uBlock Origin has enabled by default, includes ||clickhouse.com^). I’m actually a bit curious why clickhouse.com (or more likely a subdomain?) would be being used this way; I’d have thought that they’d separate any such uses to a different domain so as not to hinder their main domain which is about the software and nothing to do with ads or tracking at all (even if that’s probably the main end use of such an OLAP DBMS).


Someone just reported this to me and I've removed the entry from my blocklist.

This was a very old entry - it was added on Fri, 06 Jun 2003 19:53:00. Back then it was a marketing company that served ads.

I pride myself on knowing the entries in my list very well, but I have to admit I forgot about this one, which is ironic because I use Clickhouse at my job these days.


I'm surprised and impressed that you would remember what's on the list.

Thank you for your hard work. Every day it makes my experience of the internet 100x better.


Great! That makes much more sense.

Thank you for maintaining such lists. You and a few others like you save me much time and aggravation.


Thank you so much for the hard work you do! It makes the web sooo much more usable.


Thank you for this service!


Thanks! This worked instantly just now as I purged cache and updated all lists in uBlock Origin.

I don't look forward to when Chrome enforces Manifest v3 when I'll probably have to wait for a whole extension to be updated instead of just a list file.


FWIW, clickhouse.com is also blocked by "Malvertising filter list by Disconnect"


It's also blocked on my Pi-Hole install network-wide apparently.


I'm assume that this mostly because of Ukraine. There is a blacklist for any Russian products. There is also people that try to extend this internal to UA list to various ads list.


I'm betting we'll see a "Clickhouse Cloud" product announcement in the next 12 months. I'm curious to see if they can provide enough add-on value to their open source product to be profitable. But I'm certainly rooting for them!


worth keeping in mind that Yandex and russian technology companies in general are used to running lean and profitable operations so unfathomed in the land of 0% interest rates and VC money on tap. If they continue as they are now (15 people) and convert customers like Cloudflare into paying engagements, there is nothing stopping them from being profitable.


How exactly do they get to operate that lean? It seems the most straightforward way is by paying employees less.


No scrum masters, no in-house chef, few or no dedicated product managers, no middle manager(s) between eng and C-Suite.


So theoretically really just more work per dev and less compensation. I guess that makes sense.


Get out of the valley. One engineer can do a lot. Even in Texas we run slimmer. I'm the sole frontend dev. I created and maintain our iOS, Android, and Web app for a largish tech company.

When you don't have VC money and you HAVE to profit, you really learn optimize the workflow.


> One engineer can do a lot. Even in Texas we run slimmer. I'm the sole frontend dev. I created and maintain our iOS, Android, and Web app for a largish tech company.

What do you mean by "do a lot". Can you deliver as quickly as a team? If so, do you work more hours or are you just better? If you're just better, why do you decide to stay with your largish tech company when we're acknowledging SV pays more? A remote role would increase your salary, no?

Breaking this down:

Russia has a great mathematics and engineering education system. Many graduates, unable to leave, take jobs with Russian tech companies. Russian tech companies pay less than US tech companies.

That's why the situation may be as is with ClickHouse. You're not in Russia.

Texas isn't particularly known for running lean. Every Big Tech has a presence in Austin. Dallas is filled with legacy financial companies burning money on IT.


Yeah, I use RN to do all the platforms. I spit out features pretty fast. They apply to all platforms since the codebase is 95% shared. I work normal 40 hours, never overtime. I could make more in SV possibly, but I like working where I'm at and I have a big influence (I decide the tech, style guide, some features etc.) Plus I don't like working with a team or too many meetings etc. I already work remote and I make a good salary. Cost of living would be much higher in SV and most companies would cut pay for remote work over there so it would be similar.


> Cost of living would be much higher in SV and most companies would cut pay for remote work over there so it would be similar.

I would validate this by getting an offer. Even adjusting for location, my guess is it's still significantly higher than what you're making locally. The adjustment isn't, say, you live in SF so you make 400k, you live in Houston you make 150k. It's ~15-20% for most places, at most.

> Yeah, I use RN to do all the platforms.

I want to point out those bloated teams that aren't lean from SV? They made ReactNative. They made Flutter too if you were thinking of swapping.


400k is for FAANG, I doubt I'd get that nor do I really want to compete for it. Seems like a lot. I'm really happy where I am, even if I'm not making top dollar. Money isn't everything, I want to be comfortable, happy, and stress-free. I also just don't like California, I prefer small towns in Texas.

I know Google and FB made Flutter and RN respectively, and I thank them for that. That doesn't mean other SV companies aren't bloated, FAANG has a lot of money from their spigots (Google & FB have their ad money streams) and live on another level, not VC funding.

Cross platform has come a long way and enables small companies to do a lot more with less. Flutter is nice but has flaws, RN is the sweet spot. I could talk for days about the pros and cons of both.

A previous comment of mine regarding of Flutter vs RN: https://news.ycombinator.com/item?id=28394396


Just want to point out that 400k isn’t just FAANGs, there are also public (Uber, DoorDash) and private (stripe) companies that offer that (and higher) compensation for senior IC roles.


Is your theory here that money is the main motivator for job choices? Some people are surely like that, but for a lot of people, developers very much included, other things matter more.

I was just talking with a friend who recently left Google. He's now trying to figure out what to work on next and has spent the months since reading widely. As we were talking, he gestured at the wall of bookcases behind him and said, "I'm not really concerned with maximizing income. I can already buy all the books I can read."

And personally, I'm at a not-for-profit because I want impact. I could make a lot more money elsewhere, and I certainly have in the past. But when I look back on that stuff, a lot of it just looks like a waste of time to me. The financial traders I worked for took in money that would otherwise have been hoovered up by other traders. The excellent code base that never got any users because the business side was kinda fucked up. The enterprise system that limped along a while longer thanks to our stress and overtime. Life's too short.

And I really get hunterb123's perspective here. I'd rather be part of a small team getting shit done rather than a highly paid developer on a vast effort to shift some ad-revenue metric by 0.2% over the next quarter. Some people like that and it's fine. But in interviews I've asked enough former FAANG developers, "So why did you leave?" that I know it's not for me.


Velocity always goes down as team size goes up. Bigger teams are not about delivering faster, it's about dealing with bigger scope.


Eastern Europe talent probably. They are fantastic and their salaries are cheap thanks to conversion rates.


Not picking on your comment, per se, but it is a pet peeve of mine that lower pay in societies with much lower costs of living is often couched as either exploitation or (reverse) unfairness to those in high cost-of-living settings. Consider a metro area in Eastern Europe or Asia. Suppose there are very few jobs available paying the equivalent of US$12 per hour [0]. Now someone starts a software shop or consultancy and offers $13, $15, or $18 (equivalent) per hour. Within that society, such opportunities can easily represent anywhere from a 200% to infinity% (for the unemployed) wage advantage.

tldr; Please consider contextual economic realities before applying reductive backhanded comments.

[0] I'm shooting from the hip, here. Apologies if my example numbers are way off for part or most of the regions I'm mentioning. Even if so, there was a day -- not very long ago -- when they were quite close.


It's all about culture.


There's definitely market for a managed clickhouse 1p product. It remains to be seen if the product is substantial enough to challenge the incumbents. The engineering pedigree is ample. So that's already 50% of the way there. With money in the bank, it is all about how they suit it up with their sales and marketing. Interesting times ahead for them.


Aiven will offer a managed Clickhouse service too: https://landing.aiven.io/2020-upcoming-aiven-services-webina...

And also Altinity is a trustable partner with a great know-how about Clickhouse internals. They have started to offer managed instances in AWS: https://altinity.com/altinity-cloud-test-drive/

At last, Alibaba Cloud has an option to use: https://www.alibabacloud.com/product/clickhouse

Are there any other ones?



To be clear, I meant 1p as in Confluent -> Kafka; not AWS -> Managed Kafka.

Despite Yandex (who originally built Clickhouse) offering a managed solution, a substantial investment outlay from the VCs does come off as a huge vote of confidence in the founders.


The only value they need to add is that you no longer need to run database clusters yourself. I would hope they don’t try to add any special paid features.


I'm guessing that's the entire purpose here. Build Snowflake with Clickhouse branding.


I wonder how this will play out with https://altinity.com who have been doing enterprise support for quite some time..


I run Altinity. We think it's great. This is going to help grow adoption which benefits everyone. Watch our blog for a post in a couple hours.

BTW congrats to Alexey on the new company.


As a sidenote, I saw your talk on Clickhouse to the CMU database group [1] back when and was extremely impressed with your deep technical knowledge yet down-to-earth presentation. Still haven't had an opportunity to use Clickhouse for production work, but would welcome it.

[1] https://www.youtube.com/watch?v=fGG9dApIhDU


Thank you!! That was the most fun I've ever had on a tech talk. Any Pavlo is a one man army when it comes to fun questions and there were more like him in the audience. The whole series of quarantine talks was great.


We recently setup Clickhouse on GKE using the Altinity operator (and signed up for Altinity support).

There's been so many queries where I've thought 'that's going to need a join and aggregation across tens of billions of rows, no way!' - and then Clickhouse spits back a query result in 10 seconds...


We are using Altinity too. Great support up to now. We are about to go live with it. For us (see my bio for company link) having a company manage the cluster was paramount. We just want to use the data and API, not manage the machines/VM's and k8s clustering stuff.


Cool! Thank you so much for posting. We get a huge kick when projects go live. (Being a manager has not beaten it out of me.)


Thank you! This is an important milestone for ClickHouse and will benefit the entire ecosystem.


I think similar to other situations e.g Starburst with Presto/Trino. There really are a limited number of devs pushing a long the core projects and a lot of people needing support. Each start up in the space can likely grow the pie for support and adoption and a few big enterprises will still hire in house devs.


We're using Clickhouse to power our in-product analytics. It's awesome but would love a managed service b/c it definitely requires a bit of management overhead. Super excited about this announcement!


Check out Altinity.Cloud. It's managed and works today.

Disclaimer: I work for Altinity, which operates the Altinity.Cloud service.


So forgive my very basic question here, I’m coming from mobile dev world. But can I:

1.) Use Clickhouse as infrastructure to build a product similar to MixPanel / Amplitude 2.) If I wanted a basic MVP of above can anyone point to me in steps (like 1., 2., 3., etc.) on what I would need to do to have a basic MVP ready. (Note: I am already very familiar with Docker, Kubernetes and writing rest APIs) Would greatly appreciate this since it would clear up a lot of questions I have


1) Essentially yes! You'll have to write SQL queries yourself 2) You'd want to have some way of sending events (simple api) to a Kafka cluster which would be read by Clickhouse, then you'd be able to query the data using metabase or datagrip.

(Or you can use PostHog, which has essentially done all this for you and has all the functionality that Mixpanel/Amplitude has, but you're able to self host it!)


You are a god send thank you.

Can you explain this in a little more detail:

“Which would be read by Clickhouse” are talking about something like a Kafka connector? Or some Ksql type query?


I wish the best to them. They forged a game changer software. Hopefully, they will also offer a competitive SaaS model (hopefully also at bare-metal based price).


I looked into Clickhouse for OLAP. Our main database would be PostgreSQL unfortunatly their MaterializedPostgresql does not support TOAST, which is a major downside, considering we are TEXT/JSONB heavy users.

Edit: I tested it but for some reason either the docs are strange or wrong. but TOAST tables are actually replicated?! or at least I see the data?


TEXT/JSONB are stored inline IIRC unless they hit a certain size limit at which point they'll be put into TOAST - so you'll see data in clickhouse but some big values might be missing.


do you know if it will in the future? or is this a clickhouse limit of their string data type?


ClickHouse has no limit on sting data length. MaterializedPostgreSQL engine is just very recent feature with not large adoption rate. I believe if community will use it frequently it will become more bulletproof and more edge cases will be supported. TOAST in replication protocol is just not trivial to implement.


Might make more sense to link to the blog post, instead of its underlying markdown in GitHub?

https://clickhouse.com/blog/en/2021/clickhouse-inc/


Most of the audience here blocks ads with either uBlock or something else and they all include that domain in the blacklist.


It's great to see this spin off. ClickHouse is fast but certain use cases are not ideal and having it spin off the team can focus more on the community and what they need instead of just the use cases Yandex had. Cheers to the team and good luck!



People often think that ClickHouse is useful only for TBs of data. That's wrong! ClickHouse perfectly works on a single-server, and it works like a charm as a data source for self-service BI tools - an maintenance of this setup remains very simple in comparing to CH cluster. Very powerful stuff for getting operational analytics with minimal investments.


There is a video (in russian) [1] with an idea "if someone claims something works faster than clickhouse - it means I haven't optimised this specific query yet" [1] https://www.youtube.com/watch?v=MJJfWoWJq0o


Awesome! Really excited for CH.


For those who don't know it:

ClickHouse is a columnar, analytic, close-to-a-DBMS, but not a full-fledged one. The "100x-1000x faster" is compared to row stores. Last time I checked it was mostly single-table-oriented.


We just did months of testing on a bunch of dbs for a time-series workload, and whilst we really liked the story and devs behind clickhouse. The ops burden of not separating storage and compute ended up being a turn off. Good to see it is progressing and will hopefully get more investment, although I wonder what this means for companies like Altinity.

We compared Timescale, Clickhouse, Snowflake and Firebolt. Ended up really liking Firebolt, some amazing tech with a few roughedges (its pretty new), basically Clickhouse speed meets Snowflake simplicity definitely one to watch.

https://www.firebolt.io/


reads like marketing copy


nope just an interesting test we ran, thought i'd share. we actually did all our testing and then came across firebolt very late, so thought others might now have seen it either.


are you planning on sharing the results of your comparisson?


kdb+ would probably wipe the floor with all of those


Data syncing in golang for ClickHouse: https://github.com/tal-tech/cds


Does anyone happen to know which country the new company is incorporated in? I'm still looking for a chance to use ClickHouse because it sounds so excellent!


I'm excited for the future of ClickHouse! I'm hopeful that this move will help smooth out the rough-edges of ClickHouse, mainly around clustering.


I learned of the Clickhouse in an unpleasant way. It is a dependency of Sentry. I was tasked with trying to install a self-hosted sentry on an OpenShift cluster, which failed on account of Clickhouse not running in unprivileged containers. No, I was not permitted to change the privileges or use a plain vm.


Just got started with clickhouse. Super cool software.


Why would my ad blocker (uBlock Origin) be blocking the domain of a "open-source column-oriented database management system"?

I guess this is used heavily in advertising?



Totally unrelated, but this part of the readme says

    Yandex N.V. is the largest 
    internet company in Europe 
    and employs over 14,000 people
I’m quite sure there are larger “internet companies” in the EU such as Booking, Zalando, etc.


Booking is 5K employees.


Last time I checked it had 17k employees total (source: I worked there).


well zalando employs (16,000) more people but way less work on tech, so I would still say that their readme holds true.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: