
Guix Workflow Language - smartmic
https://www.guixwl.org/
======
batbomb
A workflow language is only as good as it’s engine.

Nextflow was mentioned. I think what most people want is probably closer to
Airflow, although it takes some time getting it up to production in a cloud
environment (there is astronomer.io and a GCP product).

HTCondor via DAGMan has existed a long time, and there’s even engines built on
that (Pegasus, Wings).

There’s Swift ([http://swift-lang.org/main/](http://swift-lang.org/main/)) and
it’s successor Parsl. Cray has Chapel. These are a bit different, in that they
are more like a distributed computer program. Of course, so is Julia, but
built into these languages is the assumption you can be using unreliable, in
some way, computing. Makeflow and GNU Parallel are closer to this category
too.

Then there’s Beam, but that’s dataflow.

The crappy thing about this is it’s hard to understand when to use a solution
and when to not use a solution. Why are there so many solutions? Because
there’s a ton of different needs, and a lot of these focus on a few in
particular:

Latency

Scalability or workers

Dynamic Scalability of workers

Throughput

Polyglot

Integration with existing Schedulers

Workflow Code Management (container support)

Maintainability of very large DAGs

Testability of DAGs/Development support

Execution Management support/Web APIs

Error recovery (especially for long running workflows)

Re-execution capabilities

Provenance tracking

Domain Specificity

Data Management (next to data processing)

... the list goes on.

~~~
djtriptych
Just regarding Airflow: unless Google has done a lot of work upgrading the
internals since embracing Airflow as a supported cloud provider, I would think
twice about using it.

It's amazing it works at all in my opinion.

This file [0] contains much of the complexity as a messy, stateful, monolithic
block of Python. Having had to chase down deep bugs / limitations in this
software, I'm now convinced that Python, with it's GIL, weak typing, lack of
concurrency primitives, and generally OOP / imperative style is just the wrong
tool for the job.

[0]:
[https://github.com/apache/airflow/blob/master/airflow/jobs/s...](https://github.com/apache/airflow/blob/master/airflow/jobs/scheduler_job.py)

~~~
thesorrow
I'm using Airflow for a lot of critical tasks and it works really well. But I
agree that Python may not be the best language to implement a workflow engine.

~~~
djtriptych
It's fine for moderate workflows. We ran into several hard limits when scaling
up, and thought to try to patch some limitations. I think it's got a number of
edge cases / scalability issues that will be very hard for them to fix without
a full rewrite of the internals.

------
arianvanp
I wonder why it is not in Guile? I thought one of the selling points of Guix
was one config language to rule them all. Or is it some syntactic sugar on top
of Guile? It's not clear. The page doesn't really explain the syntax clearly
anywhere so I'm a bit confused.

~~~
leethargo
I think it is actually in Guile. Even though there are not parens in sight,
this is a whitespace-based syntax for Scheme.

I don't have a reference at hand now, but I vaguely remember such a comment
from the FOSDEM presentation.

~~~
serhart
It is implemented in Guile. The language is just syntactic sugar via macros
and utilizes WISP.

[https://git.savannah.gnu.org/cgit/gwl.git/tree/gwl/sugar.scm](https://git.savannah.gnu.org/cgit/gwl.git/tree/gwl/sugar.scm)
[https://srfi.schemers.org/srfi-119/srfi-119.html](https://srfi.schemers.org/srfi-119/srfi-119.html)

------
danielecook
For bioinformatics, take a look at Nextflow. I personally think it is miles
ahead of the competition having reviewed about a dozen options out there.

This looks useful, but can it submit jobs to cloud compute clusters or HPC
systems and operate locally? Maybe I’m missing the point in terms of the
purpose.

~~~
snackematician
Nextflow is indeed the best option today. It's sad that it's based on Groovy
though, which seems past its heyday as a language/community.

I'm very excited that this is based on Scheme -- I've long wanted a lispy
workflow language!

I'm a little concerned about the coverage of the Guix package manager though.
I guess instead of writing Dockerfiles the user would have to learn to write
Guix packages.

The "Getting Started" example uses samtools so I guess this is oriented
towards a similar bioinformatics audience. However without HPC/Cloud support
it's probably not too practical, yet.

Addendum: Listened to the FOSDEM 2019 talk, seems like it does support Docker
and HPC. However I need AWS Batch support for it to be really useful to me,
hopefully that will be implemented at some point.

~~~
zekrioca
If there is Docker/HPC/AWS support in the system through its local commands,
there will be support.. Check the cluster mode setup in guix documentation
([https://www.gnu.org/software/guix/manual/en/guix.html](https://www.gnu.org/software/guix/manual/en/guix.html))

Edit: adding gnu guix manual link

------
svd4anything
I’ve found that combining Nix with Luigi provides a solution to managing
complex reproducible workflows.

[https://luigi.readthedocs.io/en/stable/](https://luigi.readthedocs.io/en/stable/)

“Conceptually, Luigi is similar to GNU Make where you have certain tasks and
these tasks in turn may have dependencies on other tasks.”

Having this directly type of functionality integrated into Guix (an
alternative Nix) looks very interesting. I’d encourage the Guix workflow
developers to study Luigi and SciLuigi for inspiration on design ideas.

I will be sure to follow this effort and see how it progresses.

------
dkimbel
For the curious, 'Guix' is pronounced the same way as 'geeks' [0].

0:
[https://www.gnu.org/software/guix/manual/en/html_node/Introd...](https://www.gnu.org/software/guix/manual/en/html_node/Introduction.html#DOCF1)

~~~
yarrel
Nobody is going to do that.

It's goo-icks.

~~~
lfam
Debian, Ubuntu, UNIX, Linux — none of these have an obvious pronunciation to a
native English speaker from USA.

Why take such a defeatist attitude? I'm sure you can get the pronunciation
right with a little effort.

~~~
michaelmrose
I also have a hard time saying gaaaa nome or matey like all aboard matey with
a straight face or hey you should use the gimp.

------
vector_spaces
What is a workflow language exactly? What benefits do they bring vs using say
Python?

~~~
danielecook
The two main things you want out of a workflow language are re-entrancy and a
DAG of job dependencies. Re-entrancy is the ability for the workflow to pick
up where it left if something crashes (basically via caching or detecting the
presence of expected output files). The DAG is a directed acyclic graph of job
dependencies: First do A, then B, then C. The DAG is worked out by the
workflow manager, and jobs can be managed accordingly.

A good workflow manager builds on these ideas further by managing
environments, job submission, parallelization, cloud/cluster submission, and
other options that make processing large amounts of data a lot easier and more
efficient.

------
TeMPOraL
I see a surprising lack of parenthesis for something related to Guix. What's
the story behind it?

~~~
chriswarbo
Lisp doesn't need to use s-expressions (i.e. parentheses), that just-so-
happens to be the most popular serialisation format. These examples look more
like I-expressions to me (
[https://srfi.schemers.org/srfi-49/srfi-49.html](https://srfi.schemers.org/srfi-49/srfi-49.html)
); i-expressions and s-expressions are equivalent and can be converted back
and forth trivially.

Lisps can also support arbitrary input formats using reader macros, so it
might be using that (I haven't looked at the implementation yet).

~~~
blunte
But with sexps, if your code suffers some formatting catastrophe (such as all
instances of whitespace being reduced to a single space), the sexp code is
trivially recoverable/reformattable.

Depending on whitespace and indentation creates such fragile code that I can't
understand why the trade-off would be made.

~~~
chriswarbo
If that's a concern then serialise using s-expressions. You can still edit
using i-expressions or something equivalent if you like (it's trivial to
convert, after all). Code on-disk doesn't need to be the same as code in-
editor (for example, syntax highlighting isn't saved to disk either)

------
lenkite
[https://www.commonwl.org/](https://www.commonwl.org/) is pretty good and
supported by several workflow engines

------
eterps
What are common use cases for this?

------
lkirk
It would be nice if this language was extensible to running on various compute
cluster managers. From what I can tell, these workflows only run on one
machine. I like the bioinformatics tool examples though... you can tell who
their target market is ;P

------
stilley2
I've recently been using nipype [0] for workflows, which is fairly domain
specific, but pretty nice.

0:
[https://nipype.readthedocs.io/en/latest/](https://nipype.readthedocs.io/en/latest/)

------
gravypod
Would something like this make sense for defining machine learning/data
science processes? Like obtain, clean, reformat, split (train/test) datasets.

------
1-6
I love workflow languages. I would really want to make a visual one with input
output nodes (in a true GUI fashion). See Autodesk Dynamo for a close concept.

~~~
jkh1
There are already a few visual workflow composers, e.g. Rabix,KNIME

------
jkh1
Q:How many workflow languages/engines does the world need? A: As many as
possible: [https://github.com/common-workflow-language/common-
workflow-...](https://github.com/common-workflow-language/common-workflow-
language/wiki/Existing-Workflow-systems)

~~~
heavenlyhash
Is it really so surprising that people continue to iterate and explore the
space of possible DSLs -- literally, _domain specific_ languages -- especially
when people are solving problems from many different _specific domains_?

~~~
jkh1
but many, if not most, are not domain-specific.

~~~
zmmmmm
Depends what you mean by that. For example, in the bioinformatics space it's
super common to parallelise a workflow over genomic regions and then merge the
results. So I use a tool that has a top level construct for that, literally
language syntax which makes that both utterly trivial and extremely robust
(for example, deals with the annoying problems of edge effects, overlapping
regions, trying not to create breaks in important regions, etc). You can argue
all of that is basic parallelism and not domain specific, but in practice it's
extremely useful to have these constructs at the language level.

~~~
heavenlyhash
That's a really excellent example -- crossing the regionality information of
genomics with an otherwise-basic parallelization problem definitely makes it
nontrivial. Thank you :D

