Hacker News new | past | comments | ask | show | jobs | submit login

A workflow language is only as good as it’s engine.

Nextflow was mentioned. I think what most people want is probably closer to Airflow, although it takes some time getting it up to production in a cloud environment (there is astronomer.io and a GCP product).

HTCondor via DAGMan has existed a long time, and there’s even engines built on that (Pegasus, Wings).

There’s Swift (http://swift-lang.org/main/) and it’s successor Parsl. Cray has Chapel. These are a bit different, in that they are more like a distributed computer program. Of course, so is Julia, but built into these languages is the assumption you can be using unreliable, in some way, computing. Makeflow and GNU Parallel are closer to this category too.

Then there’s Beam, but that’s dataflow.

The crappy thing about this is it’s hard to understand when to use a solution and when to not use a solution. Why are there so many solutions? Because there’s a ton of different needs, and a lot of these focus on a few in particular:


Scalability or workers

Dynamic Scalability of workers



Integration with existing Schedulers

Workflow Code Management (container support)

Maintainability of very large DAGs

Testability of DAGs/Development support

Execution Management support/Web APIs

Error recovery (especially for long running workflows)

Re-execution capabilities

Provenance tracking

Domain Specificity

Data Management (next to data processing)

... the list goes on.

Just regarding Airflow: unless Google has done a lot of work upgrading the internals since embracing Airflow as a supported cloud provider, I would think twice about using it.

It's amazing it works at all in my opinion.

This file [0] contains much of the complexity as a messy, stateful, monolithic block of Python. Having had to chase down deep bugs / limitations in this software, I'm now convinced that Python, with it's GIL, weak typing, lack of concurrency primitives, and generally OOP / imperative style is just the wrong tool for the job.

[0]: https://github.com/apache/airflow/blob/master/airflow/jobs/s...

I don't know if Python is the best tool for the job, but with modern tooling it is leagues better for complex applications than old python.

https://trio.readthedocs.io is an extremely good python concurrency library based on the model of Structured Concurrency (https://vorpus.org/blog/notes-on-structured-concurrency-or-g...).

The typing issues are far improved in current Python with annotations and attrs/dataclasses.

I'm using Airflow for a lot of critical tasks and it works really well. But I agree that Python may not be the best language to implement a workflow engine.

It's fine for moderate workflows. We ran into several hard limits when scaling up, and thought to try to patch some limitations. I think it's got a number of edge cases / scalability issues that will be very hard for them to fix without a full rewrite of the internals.

>Because there’s a ton of different needs, and a lot of these focus on a few in particular:

Indeed! I am working on Cylc [1] right now, which is a cyclic workflow system, where users need more than DAG.

It was created to automate weather forecast operations, but now there are a few cases of users trying to use it for cyclic graphs for more general problems.

[1] https://github.com/cylc/cylc-flow

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact