Me with my friend Vitaly Ludvichenko made an experiment to combine ClickHouse server and client to make a self-contained program running a database engine and processing data without a server:
https://github.com/ClickHouse/ClickHouse/pull/150Development continued in the past 6 years, and now clickhouse-local becomes a swiss-army knife for data processing. Say "ffmpeg" for datasets and more.
It can resemble textql, octosql, dsq, duckdb, trdsql, q, datafusion-cli, spyql, but has better capabilities and performance.
Here is a tutorial: https://clickhouse.com/blog/extracting-converting-querying-l...
"serverless" in the same sense as here: https://www.sqlite.org/serverless.html
and also in the more common sense - clickhouse-local can be packaged into AWS Lambda and serve queries on a per-request basis, as here: https://github.com/ClickHouse/ClickHouse/issues/43589
It's great to see more tools in this area (querying data from various sources in-place) and the Lambda use case is a really cool idea!
I've recently done a bunch of benchmarking including ClickHouse Local, and the usage was straightforward, with everything working as it's supposed to.
Just to comment on the performance avenue though, one area I think ClickHouse could still possibly improve on - vs OctoSQL[0] at least - is that it seems like the JSON datasource is slower, especially if only a small part of the JSON objects is used. If only a single field of many is used, OctoSQL lazily parses only that field, and skips the others, which yields non-trivial performance gains on big JSON files with small queries.
Basically, for a query like `SELECT COUNT(*), AVG(overall) FROM books.json` with the Amazon Review Dataset (10GB), OctoSQL is twice as fast (3s vs 6s). That's a minor thing though (OctoSQL will slow down for more complicated queries, while for ClickHouse decoding the input is and remains the bottleneck, with the processing itself being ridiculously fast).
Godspeed with the future evolution of ClickHouse!
[0]: https://github.com/cube2222/octosql