Do you or the parent or others take steps to abstract away the distributed computing steps so that others may run the pipelines in their distributed computing environments? More specifically, I use Condor (http://research.cs.wisc.edu/htcondor/) but other batch systems like PBS are also popular. Ideally my pipeline would support both systems (and many others).
Ultimately though, I rely on some common environment (such as a particular binary existing in PATH) that lives outside the build system and would need to be recreated by whoever who would like to reproduce the calculations. I never looked into abstracting that away with something like Docker.
(I plan to document the build system once it's at least roughly stable and I finish my Phd.)
However, I also have in mind a pipeline of scripts where one script may be a prerequisite to the other. Condor has some nice abstractions for this by organizing the scripts/jobs as a directed acyclic graph according to their dependencies. I was thinking other batch systems might support this as well. But some of my challenge comes in learning how each batch system would run these DAGs. Each one will have some commands to launch jobs, to wait for a job to finish before running some other job, to check if jobs failed, to rerun failed jobs in the case of hardware failure, etc.
It seems like the DAG representation would contain enough detail for any batch system but there may be some nuances. For example, I tend to think of these jobs as a command to run, the arguments to give that command, and somewhere to put stdout and stderr. But Condor also will report information about the job execution in some other log files. Cases like this illustrate where my DAG representation (or at least the data tracked in nodes) might break down, but I haven't used these other systems like PBS enough to know for sure.