Hacker Newsnew | past | comments | ask | show | jobs | submit | ople's commentslogin

"Point at any ClickHouse table – no schema migrations/no OTEL requirement"

I haven't tried it yet but I definitely will because of this. Pretty much every other CH observability product is opinionated about their schemas, which may not be optimal for all the use cases and do not take into account that folks may already have a massive CH datastore. This creates a big hurdle for adoption.


This is a cool and underrated new feature: It basically enables simple joins of large and dynamically updated tables efficiently. This is something that has been a bit of an Achilles heel of ClickHouse, at least for me so it's exciting to see.

That said, it's not a solution for all kinds of joins, but probably covers the most common use case for most people.


Why not just use SQL? With LLMs evolving to do sophisticated text-to-SQL, the case for a custom language for the sake of simplicity is diminishing.

I think that expressiveness, performance and level of fluency by base language models (i.e. the amount of examples in training set) are the key differentiators for query languages in the future. SQL ticks all those boxes.


You are right. SQL is the best language, but it likely needs some extensions. See SQL with pipe syntax. Read Google paper or try it out in Big Query.

There are a lot of fundamentals in observability, but there are very verbose in SQL:

- rate operator, which translates absolute value to rate, possible with SQL and window functions, but takes many lines of code

- pivot, where you like to see the top 5 counts of errors of most hit-by-error microservices plus others over time

- sampling is frequent in observability and will be useful for LLMs, it is a one-liner in SQL with pipe syntax, even customizing specific

I actually believe LLM gen AI plays extremely well with pipe syntax. It allows us to feed partial results to LLM, sampling as well as show how LLM is evolving queries over time. SQL troubleshooting is not a single query but a series of them.

Still, SQL with pipe syntax is just syntactical sugar on SQL. It let's you use all SQL features as well as compiles to SQL.


This is one of the things I appreciate about ClickHouse: Contributors willing and able to dive in to performance tune even low-level core stuff like this and getting merged quickly and efficiently, even if they are not affiliated with ClickHouse Inc.

Lovely write-up as well!


Recently there seems to be an bunch of SPL (Splunk) -like query languages popping up: PQL, PRQL, Grafana Explore Logs syntax, Kusto.. Probably others as well. Does yet another similar but slightly different language make sense? Why not leverage an existing one?


Diversity is great! Let's see which query language for logs will survive.


True. My bet is that arbitrary text queries converted to SQL via LLM will become an increasingly popular alternative. Not sure if it will completely replace the custom query languages though..


Very interesting observations! Merge performance tuning seems often overlooked even though it's a key aspect of sustained ClickHouse performance.

I also like that the blog is quite compact and gets the points across without getting too much into the weeds.

One thing I've noticed also that bloom filter index types can be quite costly to merge. In many cases that's acceptable though due to the massive benefit they provide for text queries. One just has to be mindful of the overhead when adding them.


Exploring bloom filter index merges would be an interesting addition. I do wish it were easier to profile merge performance to break down where most of the CPU time is being spent.


I have had the same experience. I was constantly bumping into unexpected limitations. Moving to CH felt like the opposite, with many more ”Wow, I didn’t expect this to be possible but it is” experiences.

There is a place for BQ but it is good to set expectations correctly and also look at the constraints. They are sometimes not obvious. The docs do helpfully outline the limitations, for example:

- Materialized views: https://cloud.google.com/bigquery/docs/materialized-views-in... - Indexes: https://cloud.google.com/bigquery/docs/search-intro#limitati...


I have pretty much exactly the same experience.

However, I do feel that they are trying to really do the right thing with the new 3.0 architecture, addressing the deficiencies (most importantly performance and full-fledged SQL) while keeping the stuff that works (InfluxQL for simple and legacy queries). Also leveraging open-source projects and contributing to their upstream is a plus. Thus I’m hoping for them to succeed delivering on that promise.


Agreed, embracing battle-hardened open source tech will be a win for them and for customers.

However, once your storage layer is parquet and your query layer is SQL, well... DuckDB is also basically parquet+SQL, and it won't be long before there's a nice Postgres wire protocol adapter in front of it. What's the advantage of continuing to use InfluxDB if you don't need clustering or HA?


Unsurprisingly IBM pioneered this with "Capacity on Demand" https://www.ibm.com/docs/en/power9?topic=environment-capacit...

I'm guessing it has had some popularity among customers as it has been around at least for a decade.


It has been around far longer since this was one of the main ingredients of mainframe contracts (Variable Workload License Charges)

CoD was just the expansion to Power-based systems.


Even before that.

The System/360 machines were typically leased, not sold. And the leases had different tiers for different clock speeds. If you decided you needed more power, you'd call up IBM, accept a more expensive contract, their service rep would show up, unlock a padlock on the machine, open a panel, turn a knob, close and lock the machine again, and leave.


Indeed it feels like that - Intel makes only one part, sells it at one price and, if you want to enable the dark silicon, you pay Intel for a key and the feature is enabled.

It's better than what they used to do, when they killed parts of the CPU to match the SKUs they needed to sell. At least you can pay Intel to enable things.


> when they killed parts of the CPU to match the SKUs they needed to sell

Or that part of the CPU was dead to begin with; every wafer of silicon has some defects, but if these defects hit a part of the chip they can disable, they can avoid wasting a whole chip. Of course, if there's high enough demand for the SKU without that part, and low enough demand for the SKU with that part working, they might kill that part even when it's defect-free.

(Of course, if the part can be enabled "on demand", it means that the disabled part must be working, so they cannot be reusing partially defective chips; it smells like a cash grab.)


OTOH, if all I need is a powerful CPU and I have no use for any of the accelerators right now, I can pay a lower price than I would be able to, without Intel needing to make a different part for me.

For Intel, it also helps to fine tune the product lineup by getting more detailed usage information.


I'm pretty excited to buy an entry level server for the price of a high end one, and pay the difference again to get the high end server I paid for.

This is the future I was waiting for.


> to buy an entry level server for the price of a high end one

You pay base price for a mid-range server. You pay the difference to enable the accelerators, if you need them.

What is the problem with that?


> What is the problem with that?

Because, I guess, Intel will charge a premium for these SKUs anyway, because they have unlockable features inside.

So, you'll pay twice for the privilege.


Price elasticity will play a role here, but, all things equal, you'll get a slightly better part than the one you'd get with the units permanently disabled.


Hehe.. In retrospect the whole team was in fairly good spirit although the situation was stressful. A lot of this was due to the top management giving the time and space for the specialists to do their thing and the very understanding response from the customers once we explained the situation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: