
The impact of Docker containers on the performance of genomic pipelines - michaelhoffman
https://peerj.com/articles/1273/
======
jamesblonde
This paper misses the bigger picture that genomics is a Big Data problem.
Setting up pipelines to put together perl, bash, python, and C++ programs is
not where the field will be in a few years time.

~~~
csirac2
I think you underestimate the diversity of genome research activities,
technologies and methods out there :) It's such an incredibly fragmented
field; sure, many ad-hoc pipelines eventually become productized beyond a pile
of scripts and a dozen or so users, and there are definitely plenty of
applications which demand HPC and "big data" techniques - but that describes a
tiny fraction of all the research projects out there.

In any case, many parts of the field simply don't have the software
engineering discipline to pull off proper "big data" workflows. Advances in
commodity hardware, stronger programming tools for ad-hoc work, and
"cloudification" toolchains will probably delay a lot of what used to require
proper engineering effort from maturing.

Not to mention there's plenty of fertile ground solving problems which by now
can be answered with merely "annoyingly non-small" rather than "big data"
techniques.

~~~
heuermh
(Minor) co-author on the paper here, just wanted to second your experience in
the field.

The big win I've found with Nextflow is that once you've written a workflow,
you have a lot of flexibility in the execution environment: Have all the tools
already installed on your workstation or large compute instance? Use the local
executor to saturate the box with concurrently running jobs. Don't have or
want all those tools installed? Use the local executor with Docker images.
Have access to a traditional compute cluster (e.g. LSF, SGE, Torque, etc.)?
Use the cluster executor with Docker images.

A couple other resources worth checking out:

Toil workflow engine
[https://github.com/BD2KGenomics/toil](https://github.com/BD2KGenomics/toil)

Common Workflow Language (CWL) specification [https://github.com/common-
workflow-language/common-workflow-...](https://github.com/common-workflow-
language/common-workflow-language)

~~~
csirac2
That sounds fantastic. I no longer work in bioinformatics but regularly keep
in touch with some of my old colleagues. Definitely going to speak in person
about this with them.

------
alexchamberlain
This seems to be fixing the wrong problem. Packaging software is not hard, but
it does need to be learnt and tutorials are scarce.

