
Dagster: The Data Orchestrator - schrockn
https://medium.com/dagster-io/dagster-the-data-orchestrator-5fe5cadb0dfb
======
meigetsu
For anyone who has spent significant time with data and ML pipelines, read
posts about Dagster. Even if you don't end up using it in your pipeline, it is
full of well articulated and extremely valuable lessons about how to manage
the problems and complexity of data applications. I can't recommend reading
about dagster enough!

The software itself has matured significantly in the last year as well -
highly recommend taking a look at this if you're building new pipelines or
have a need to upgrade existing ones.

------
bserial
After playing around with this tool a bit in the past I can't seem to
understand what it's trying to solve. It comes across as a side project that
VCs happened to throw money at.

~~~
natekupp
hey bserial, I'm part of the team working on Dagster.

While there are many things we're working on, there are 3 goals that got me
excited about working on this system:

1\. Local development: most modern workflow orchestration systems don't have a
good local development story. We want to provide a seamless end-to-end dev
experience from your laptop to CI to dev to prod for authoring data workflows.

2\. Complexity: the Airflow deployments I've worked on or otherwise
encountered have hundreds of DAGs and thousands of tasks scheduled on an
hourly or daily cadence. We aim to provide abstractions to better support
managing and wrangling that complexity.

3\. Testability: Most modern data platforms are poorly tested. Many
orchestration systems, like Airflow, tend to hardcode deployment concerns the
business logic, e.g. EmrAddStepsOperator. With Dagster, we aim to separate the
business logic from environmental concerns to make it easy to swap out an
external resource implementation for a mock, dev version, etc.

Hope that makes sense!

