Hacker News new | past | comments | ask | show | jobs | submit login

How does this differ from something like Airflow ?



One way to contrast these in broad terms is that tools like Airflow are generally much more focussed on continually running well defined production workflows, while tools like Snakemake are mostly focused on creating a reproducible pipeline for one particular analysis, or making it easy to perform exploratory analysis, exploring the effect of different input data and parameters.

One way this focus is reflected is how e.g. Snakemake are much more focused on naming files in a way that you can figure out what input data and parameters were used to produce them, making it easier to compare results from variations on the main analysis.


If you are interested in knowing more about the differences of a pull-style workflow engine like Snakemake which is geared towards Bioinfo problems vs a push-style workflow engine which is geared towards data engineering, you might find our write up helpful: https://insitro.github.io/redun/design.html#influences

There are other important dimensions on which workflow engines differ, such as reactivity, file staging, and dynamism.


There are a lot of differences.

By design, Airflow needs a centralized server/daemon, whereas snakemake is just a command line tool, like make/cmake. This would become an issue in the HPC environment.

In Airflow the workflow is assumed to be repetitively executed whereas in snakemake it is usually run once (like you will only compile the program when source files change).

Airflow has a queuing system, whereas Snakemake is to be used in conjuncture with other task management systems.

In Snakemake, shell script always feels like a second-order citizen, whereas with Snakemake, it has good support for shell scripting, which enables easy integration with tools made in other programming languages.


yaml




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: