We (Aleksandar and Georg) want to share our new blog post on: "Dagster, dbt, DuckDB as new local MDS" here.
It re-introduces the local environment to enhance the developer productivity for data pipelines by bringing back software engineering best practices.
We suggest that PaaS platforms should become an implementation detail and refine the new local stack with great a data consumer experience by combining the best of both worlds.
Hey Georg, thanks for posting. I've been working on building a thing with almost the same stack lately and the Dagster integration is my next large-ish step. In the last couple weeks I've kicked around Duck and Clickhouse for the backend. It's been a lot of fun. Ultimately I'd like something that can be run locally or easily installed and run on a PaaS.
The only trick with local is that data sets of any appreciable size take ages to pull down, at least here in the US with our terrible internet (in the average Italian mountain village this would probably work great).
Well local can mean your laptop. But could mean your local server. However, it can also mean a VM on your cloud provider of choice colocated to the object store.
In such a case the network transfer (for many cases) can be almost irrelevant - at least for up to medium-ish sized datasets.
Please take a read here: https://georgheiler.com/2023/12/11/dagster-dbt-duckdb-as-new...
We (Georg Heiler and Aleksandar Milicevic) are keen to discuss the proposed new stack with you. Do not hesitate to reach out.