
Introduction to Simple Workflow Service (SWF) - adrianancona
https://ncona.com/2020/07/introduction-to-aws-simple-workflow-service/
======
mabbo
There is almost nothing that SWF can do that AWS Step Functions don't do
better by a large margin.

> Using SWF turned out to be a lot more complicated than I expected. Having to
> parse all to events to figure out which activity goes next, seems error
> prone and makes the code confusing. The documentation is also not very clear
> on how this should be done, so hopefully this example helps people
> interested.

That's the only point in this article you need to know. SWF was a great
stepping stone towards something better, but wow I wouldn't dare build
anything new on it today.

~~~
gtsteve
The main thing I liked about SWF was that the entire workflow was represented
in my own code. We ended up cloning the parts of SWF we used when we had to
deploy outside AWS. The decider/actor pattern is a great pattern and I'd
definitely use it again.

However, as you say, the 'Simple' part is definitely a misnomer.

~~~
mfateev
Look at temporal.io. I believe we managed to get rid of the original SWF rough
edges while maintaining the "workflow as code" approach. And the best part is
that it is open source and you can run it anywhere yourself.

~~~
lukerohde
What is the relationship between this project and uber/cadence? I see you have
worked on both and they seem very similar/the same.

~~~
mfateev
[https://stackoverflow.com/questions/61157400/temporal-
workfl...](https://stackoverflow.com/questions/61157400/temporal-workflow-vs-
cadence-workflow)

------
nmyk
I'm gonna go out on a limb and say any purported workflow tool that comes with
a data model you have to memorize (i.e. "Before we start building a workflow,
let’s learn a little about the components of an SWF") is too complex to be
effective.

My problem with tools like these is that I already know the components of an
"SWF" or whatever—these are the tasks I have that need to be run/managed. When
a tool starts telling me what the architecture needs to look like, then it
stops being a helpful tool and starts being a little know-it-all.

My favorite workflow tool is actually two pieces of software: cron and
postgres. Cron schedules tasks and postgres handles shared state. It's easy
enough to whip up an ACID-compliant task queue in SQL that has whatever bells
and whistles you want, and all cron wants is a command to run and a schedule.
No need to read a bunch of documentation about what a "task" is supposed to be
vs. an "activity" vs. an "execution" or anything like that.

Of course, what my setup does not do is provide common functionality out of
the box like "just gimme a way to kick off a series of FS-dependent tasks
every day and record errors/halt if anything fails." I don't mind. It's not
like Apache Airflow (just to give another example) has saved me from having to
think about and express my system's dependencies and failure modes—it has only
put a lot of unnecessary and unhelpful constraints on how I am able to express
them.

~~~
teraku
Can you specify with what you mean by Airflow introduced unnecessary and
unhelpful constraints? I'm very interested.

I'm currently working on a standard format for defining such workflows [0] and
my own scheduling engine, which hopes to be as non-imposing as possible. It's
supposed to be a "cron" for task scheduling with a dependency graph. Only
added thing is that you can specify on what kind of environment you want to
run your tasks.

So I would appreciate if you could tell me what were things that annoyed you
in particular. I use airflow at work and I can list a million, but I don't
know exactly with what you meant with that sentence.

[0]
[https://github.com/OpenWorkflow/OpenWorkflow](https://github.com/OpenWorkflow/OpenWorkflow)

~~~
nmyk
Sure! To start, just fundamentally—why assume workflows are DAG-shaped? Why no
cycles? Lots of real-world processes contain unscheduled repetition that
arises at "runtime."

Or what if I can only find out what the rest of the workflow looks like once
I'm halfway through it? Why must workflow definitions be static? No "decision"
elements as in a flowchart?

Someone might read these complaints and think I'm asking for a programming
environment rather than a workflow tool, and that's kinda my point :P

The "unnecessary" side is typically project-specific, but I tend not to need a
separate notion of `backfill`, or any of the `Executor` functionality for
distributed execution. I suppose if I needed to run stuff on multiple nodes I
would just schedule jobs on Kubernetes directly.

~~~
teraku
The concept of a cyclic workflow sounds interesting to me! I also intend mine
to be acyclic, but I guess a possible infinite loop is not bad in itself.
Maybe I should reconsider my underlying data structure.

Why would a workflow need to be static? I would say that a specific instance
of a workflow should be static. The main reason one would want to switch to
such a system is because they want their jobs to do (almost) the same thing,
but at different points in time. If your task is also changing while it runs,
maybe this logic should be within the task, not the workflow. But if your
workflow changes over time, that is a very valid point. One which I'm also
trying to incorporate with incremental versioning.

Backfills as a separate notion I also find weird.

Lastly I view the executor part as very important. Because imagine you want to
run different processes across an organisation inside your scheduling engine.
And some are written in a Python environment and others are compiled code.
Sure you can schedule it all directly to k8s, but then you lose the advantage
of bundling all your workflows into one system specified for that reason. You
basically go back to your "cron" example, where you deploy directly on
infrastructure. Meaning you never intended to use a workflow engine in the
first place :P

Thanks for the input!

------
mfateev
I'm original tech lead of SWF and later of the Cadence Workflow
(cadenceworkflow.io) and Temporal Workflow (temporal.io) open source projects.

AMA.

~~~
juancampa
Very cool projects. In general, across all projects, what is the best approach
to store/checkpoint the state of the program? I'd imagine something like CRIU
but I'd love to hear your thoughts. When do you take snapshots. Do you hook-up
to the VM's event loop being drained? How do you store an d replicate these
snapshots?

~~~
mfateev
The Temporal/Cadence/SWF operate as libraries any application can include to
be able to implement workflow and activity logic. So hooking into low-level VM
event loop was not an option. So they rely purely on event sourcing to
reexecute the program code from the beginning assuming that the workflow code
is deterministic. The library provides various API wrappers to execute
multithreaded code deterministically using cooperative multithreading. In the
future, WebAssembly can be used as a container as determinism is one of its
core features.

------
teraku
If you do not have all your stuff in AWS, and are not sure if you want to pay
high amounts for managed or "serverless" solutions, I am currently developing
a standard[0] for workflows and a distributed scheduler which is compliant to
this standard. It's still WIP, but I've used my fair share of managed
solutions, "serverless", open-source (Airflow/Luigi) and BI engines to wonder,
why there is no general standard just for the definition of the workflows. So
that moving between systems is much less friction...

[0]
[https://github.com/OpenWorkflow/OpenWorkflow](https://github.com/OpenWorkflow/OpenWorkflow)

~~~
brightball
Isn’t that what BPMN is supposed to offer?

~~~
teraku
Yes and no. So it's the same in a sense that I define a format for these
workflows. This means that the workflow authoring tool is completly
independant of the scheduler/execution engine, as long as both are compliant
to the standard. BPMN tries to do a lot more, and I think it's overkill for
most use-cases.

The major difference is OpenWorkflow defines tasks very loosely: you specify
the environment you seek, and what to do on it. You can incorporate it into
your current set of jobs without much friction.

I don't think BPMN is a good comparison, as companies who seek this are mostly
huge and have slowly changing processes. This is more for cases when the world
around you might be everchanging, incorporating APIs, shuffling files around
or moving data for analytics/model training.

------
MrUssek
This reminds me of a POC I did to serialize Kotlin continuation state in order
to achieve something similar with Kotlin coroutines. The main issue with that
approach is that the continuation context can end up closing over a huge
variety of types (since it has to serialize the entire asynchronous stack).
There was some discussion in this issue
([https://github.com/Kotlin/kotlinx.coroutines/issues/76](https://github.com/Kotlin/kotlinx.coroutines/issues/76))
about annotating suspend functions so that their continuation only closes over
objects compatible with Kotlin serialization. This approach removes the
requirement for all non-determinism in the code to go through some kind of
framework, which seems like a plus.

I think in the future what will happen is someone will build a language with
CSP style concurrency that deals with the "update-in-flight" and "persistence"
problems. Essentially extending the notion of a programming language runtime
from "once, here, on this computer, right now", to something much broader.

~~~
mfateev
In the future WebAssembly will do it out of the box.

------
7ArcticSealz
I used SWF for an auto-failover scenario many years back and found it very
effective. I always liked the fact that worker machines could be on-premise if
needed. I think Step Functions can also interoperate with on-premise
resources/code also, though I have yet to work with Step Functions so far.

------
setheron
The Java framework Flow at first I thought was off putting because it uses
AspectJ to modify the bytecode but after having used it it's kind of elegant.

I like how it re-runs through your decider each time with the state
information filling up Promises as it goes.

Helps make.sure your decider is written in a deterministic fashion to the
workflow events.

~~~
guitarbill
In my experience, Flow is truly awful. It bloats build times, and prevents
incremental rebuilds. You'll curse AspectJ and Flow as your codebase grows.
Stack traces become atrocious (even more atrocious than standard Java).

> make.sure your decider is written in a deterministic fashion

You're right, but this is easier said than done. And if it isn't, it will
error at runtime.

~~~
mfateev
Try temporal.io. It is just a Java library without any code generation and
AspectJ. It also allows to write synchronous code while Flow was only
asynchronous.

~~~
guitarbill
No thanks mate, the endless `Impl` of interfaces is giving me bad flashbacks.
At least it's open source I guess. Not to mention the deployment hassle that
the different workflow and activity workers represent, or how you roll back
such a beast. Or even what the testing strategy looks like.

~~~
mfateev
I agree that Flow being POC had all the above issues. Temporal listened to the
users and solved all of them:

1\. No code generation. So, you define one interface for activities and use it
for both calling activities synchronously from the workflow and to implement
them. Here is how would you do it in Temporal:

    
    
      @ActivityInterface
      public interface GreetingActivities {
        String composeGreeting(String greeting, String name);
      }
    

Workflow code:

    
    
      GreetingActivities activities = Workflow.newActivityStub(GreetingActivities.class)
      // This is a blocking call that returns only after the activity has completed.
      String greeting = activities.composeGreeting("Hello", name);
    

The full source at [https://github.com/temporalio/java-
samples/blob/master/src/m...](https://github.com/temporalio/java-
samples/blob/master/src/main/java/io/temporal/samples/hello/HelloActivity.java)

2\. SWF or Temporal does not drive the deployment strategy. You can run all of
the workflows and activities as a monolith in a single process or break them
in multiple services. But it is purely your choice.

3\. SWF wasn't possible to run locally. Temporal fully supports unit testing
of long-running workflows with automatic time skipping and local integration
testing using the service running in docker-compose.

------
ttsda
I've been using temporal.io, which afaik is developed by some of the people
who built SWF (and Cadence at Uber), and I think it's great. I'll probably use
it for a ton of stuff in the future.

~~~
teraku
This looks awesome, but I don't understand what this has that airflow, prefect
or luigi don't have?

~~~
ttsda
In SWF/Durable Functions/Cadence/Temporal, The entire workflow is defined as
(deterministic) code, which is orchestrated by the engine.

~~~
teraku
As as opposed to... having undeterministic code? Workflow as code is also the
theme of airflow, luigi and prefect.

~~~
firdaus
With Temporal/Cadence you write (mostly) plain code to implement your business
logic rather than using code to define DAGs.

This discussion has more on Airflow vs Cadence:
[https://news.ycombinator.com/item?id=19732447](https://news.ycombinator.com/item?id=19732447)

