On 2, I understand your thinking around purpose-built — but you're retrofitting an analytical database into a transactional database without fully supporting all the features (both in terms of functionality and performance) of either. It's really hard to be truly purpose-built this way. As a result, users might not get the best of both worlds.
PeerDB is different. We keep Postgres and ClickHouse separate and just move data reliably between them. Users get to query Postgres and ClickHouse in isolation and make the best of each of them.
Anyway, keep up the good work! Just wanted to share some challenges we've seen before when building an analytics extension (Citus), particularly around chasing both Postgres compatibility and performance.
Yep what I want say is the line between the two designs is indeed very blur.
Logical replication with mooncake will try to create a columnar version of a postgres heap table, that can be readable within postgres (using pg_mooncake); or outside postgres (similar to peerdb + clickhouse) with other engines like duckdb, StarRocks,Trino and possibly ClickHouse.
But since we can purposely build the columnstore storage engine to have postgres CDC in mind, we can replicate real-time updates/deletes(especially in cases traditional OLAP system won't keep up).
I understand. In that scenario, why can't users just use these other query engines directly instead of the extension. You're heavily relying on DuckDB within your extension but may not be able to unleash its full power since you're embedding it within Postgres and operating within the constraints of the Postgres extension framework and interface.
The focus of mooncake is to be a columnar storage engine, that natively integrate with pg, allowing writing from pg, replicating from pg, and reading by pg using pg_mooncake.
We want people to use other engine to read from mooncake, and here they are effectively stateless engine, that's much easier to manage and avoids all data ETL problems.
Sounds good. I'm still a bit confused. But will wait for your next version. :) ETL problems still aren't avoided — replicating from Postgres sources using logical replication is still ETL. One topic we didn't chat much is, be careful about what you're signing up for with logical replication — we built an entire company just to solve the logical replication/decoding problem. ;)
1, makes sense.
On 2, I understand your thinking around purpose-built — but you're retrofitting an analytical database into a transactional database without fully supporting all the features (both in terms of functionality and performance) of either. It's really hard to be truly purpose-built this way. As a result, users might not get the best of both worlds.
PeerDB is different. We keep Postgres and ClickHouse separate and just move data reliably between them. Users get to query Postgres and ClickHouse in isolation and make the best of each of them.
Anyway, keep up the good work! Just wanted to share some challenges we've seen before when building an analytics extension (Citus), particularly around chasing both Postgres compatibility and performance.