
Cadence: Uber's Workflow Orchestration Engine - vruiz
https://github.com/uber/cadence
======
jbbarth
Cadence looks like an OSS version of Amazon Simple Workflow (SWF) service. The
author used to work on SWF at AWS afaik.

I'm a heavy SWF user at work for managing complex data pipelines. SWF requires
an important conceptual and tooling effort in the beginning, but it gets
reimbursed if you use it a lot.

As for other comments mentioning Airflow: the programming model is quite
different , since Airflow as far as I understand forces to provide a DAG of
tasks upfront. SWF (and Cadence?) doesn't, it coordinates the work of Deciders
and Activity Workers and only acts as a source of truth for the state of the
workflow (+ distribute task in a unique manner to many long-polling workers).
As a result you don't declare anything upfront and can have deciders take
dynamic decisions along the way, which is really nice when you want very
dynamic logic for your workflows (e.g. dynamic partitioning of tasks,
decisions depending on external factors, etc.).

I'd love to have Maxim insights about how Cadence compares to SWF, and what
would be the reasons/challenges behind migrating from SWF to Cadence for SWF
users (except that SWF is basically stale for 4+ years and rigged with
arbitrary limits)

~~~
mfateev
Cadence vs SWF

Cadence was conceived and is still led by the original tech leads of the SWF.

SWF had no new features added for the last 5 years. Cadence is open sourced
and is under active development.

Cadence was initially based on SWF public API. It uses Thrift and TChannel for
communication and SWF uses AWS version of REST. Currently the API is not
compatible with SWF as Cadence added a large number of new features and
deprecated a few problematic ones. We are planning migrating to gRPC later
this year.

Cadence can potentially run on any database that supports single shard multi-
row transactions as a backend. Currently it supports Cassandra and MySQL.

SWF has pretty tight throttling limits. Cadence scales very well with use
cases in production that require 100s of millions of open workflows and
thousands of events per second.

SWF has pretty tight limits on individual payloads and number of events. For
example maximum activity input size is 32k. Cadence currently has 256k limit.
SWF history size limit is 10k events while Cadence limit 200k. All other
limits are also higher.

Cadence has no limit on the activity and workflow execution duration.

Cadence through archival supports unlimited retention after a workflow
closure.

SWF has Java and Ruby client libraries. Cadence has Java and Go client
libraries.

SWF Java library is fully asynchronous and relies on both code generation
(through annotation processor) and AspectJ. It is hard to set up, doesn't play
well with IDEs and has very steep learning curve. Cadence Java library (as
well as Go one) allow writing workflows as synchronous programs which greatly
simplifies the programming model. It also just a library without any need for
code generation or AspectJ or similar intrusive technologies.

Cadence client side libraries have much better unit testing support. For
example the Java library utilizes an in-memory implementation of the Cadence
service.

Cadence features that SWF doesn't have:

Workflow stickiness. SWF replays the whole workflow history on every decision.
Which means that a workflow resource usage is proportional to O(n*n) of number
of events in the history. Cadence caches workflows on a worker and delivers
only new events to them. The whole history is replayed only when a worker goes
down or the workflow gets out of cache. So Cadence workflow resource usage is
O(n) of number of events in the history. For large workflows it makes a huge
difference. It also leads to higher per workflow scale. For example it is not
recommended to have workflows that execute over a hundred activities in SWF.
Cadence routinely executes workflows that have over thousand activities or
child workflows.

Query workflow execution. It allows synchronously get any information out of a
workflow. An example of a built-in query is a stack trace of a running
workflow.

Cross region (in AWS terminology) replication. SWF in each region is fully
independent and if the regional SWF is down all workflows in the region are
stuck. Cadence supports asynchronous replication across regions. So even in
the event of a complete loss of a region the workflows continue execution
without interruption.

Server side retry is an ability to retry an activity or a workflow according
to an exponential retry policy without growing the history size.

Reset is an ability to restart a workflow from any point of its execution by
creating a new run and copying a part of the history. For example the reset is
used to automatically roll back workflows to the point before a bad deployment
that was rolled back.

Cron is an ability to schedule a periodic workflow execution by passing cron
string to the start method.

Local activity is a short activity that is executed in the context of a
decision. It uses 6x less DB operations that a normal activity execution.

Long poll on history allows to efficiently watch for new history events and is
also used for efficiently waiting for a workflow completion.

Cadence uses the elastic search for visibility. Soon it is going to support
complex searches across multiple customer defined columns which is far
superior to the tag based search SWF supports.

If decider constantly fails during a decision SWF records a few events on
every failure eventually growing the history beyond the limit and terminating
a workflow. Cadence supports transient decision feature that doesn't grow
history on such failures. It allows continuing workflows without a problem
after the fix to the workflow code is deployed.

Cadence provides command line interface

Cadence Web is open sourced and is much nicer than the SWF console.

Cadence supports local development through unit testing as well as using local
docker container that contains the full implementation of the Cadence service
and the UI.

Cadence doesn’t yet have activity and workflow type registration. The
advantage is that changes to activity or workflow scheduling options do not
require version bumps that affect clients.

~~~
jbbarth
Wow thanks for the completeness of the answer.

I found myself nodding on all the conceptual limits and features you added,
I'm sold, gonna try Cadence asap :)

The tooling around SWF (web console and ability to get insights about tasks,
failures, etc.) is definitely a big one from an operational perspective. The
SWF console is indeed absolutely terrible (with basic bugs not fixed for
years, like broken pagination), so we ended up developing our own here at
Botify, along with a python based client lib that mimics most of RubyFlow
principles. I'm curious if all this can be integrated with Cadence, will have
a look. I can keep you informed if you feel it's valuable for the Cadence
project.

~~~
mfateev
Besides the UI Cadence provides a CLI that supports most of the API features.

The core API is almost the same, so porting an existing python client should
not be a very large task.

------
redact207
I've been working on a similar workflow engine for node at
[https://www.npmjs.com/package/@node-ts/bus-
workflow](https://www.npmjs.com/package/@node-ts/bus-workflow)

The main objective of workflows is to manage long running processes. By
processes I mean business processes like coordinating the activities of
fulfilling a customer order (settling charges, picking inventory, packing,
dispatching, email receipts etc). It's a way to keep all those individual
commands decoupled but coordinate them at a higher level.

This isn't a new concept by any means, and is often paired with Domain Driven
Design and message based systems. Doing so gives you a library of events
everytime something happens in your system that can be reacted to in a
workflow.

If you've ever dealt with microservices, or even a monolith where two internal
services are incorrectly coupled together then this approach may be worth
looking into.

~~~
wetpaste
Thank you. I feel kind of silly about this but I feel like I've had a hard
time understanding when an org should, or could use something like this. I
have seen them mentioned but every time it's explained it's explained with
more abstract language on top of it that confuses me. I keep hearing "it
manages business processes" but then it fails to mention if this means like, a
human being's process within an org, or something coupled with an application
of some sort that has business processes in the application? Does this type of
thing replace sort of what Jira does, make a ticket and then pass it off to
the next team or whatever? Do you ship it with the app for on-premise
deployments of a software product? I have a hard time seeing the big picture
with things like this sometimes. Then I hear workflow orchestrator and I
think, oh okay so like ansible, but for, work...flows? But what is a workflow
really exactly?

~~~
raxxorrax
This could also be used to kill off systems like SharePoint in many businesses
and that would be great.

Seriously, its workflow engine has race conditions, randomly fails and has no
transaction management. But there are few alternatives. I don't know why there
hasn't been any real contender. You would need a full suite to challange it
though.

~~~
Angostura
Speaking as someone who has just implemented a complex business multi-step
business process workflow in Sharepoint 2016 - I concur.

------
fokinsean
Sounds cool, but even after poking through the repos I still don't fully
understand what it does.

~~~
maxmcd
The linked talk provides context and use cases within the first 10 minutes or
so: [https://atscaleconference.com/videos/cadence-microservice-
ar...](https://atscaleconference.com/videos/cadence-microservice-architecture-
beyond-requestreply/)

------
bozoUser
Looking at a few comments, can someone answer on whats are the nuances between
managing data workflows vs service workflows ?

------
formalsystem
Does anyone know what are the best OSS Orchestration Engines? Am wondering
what I should be comparing this to.

~~~
manyxcxi
I don’t know about best, as there are lots of trade offs between various
projects. I’ve reviewed and worked with a lot of Java based ones, top of mind:

\- Airflow

\- jBPM

\- Enhydra Shark

\- Activiti

\- Netflix Conductor

Most of those are more workflow engine than pure orchestration.

A few years back, we searched high and low and I was generally unhappy with
the commercial offerings so we wound up building around Netflix Conductor
(after a disastrous run with Joget, which is built on Enhydra Shark).

Since then I’ve been pretty happy with Conductor, submitted numerous PRs and
accidentally became one of the people keeping the MySQL backend implementation
going forward.

~~~
mfateev
AFAIK Netflix Conductor was inspired by the AWS Simple Workflow engine. The
Cadence is a direct evolution of SWF in the open source world. I was tech lead
for both of them :).

------
iblaine
Workflow engine = can support hundreds of parallel workflows. This includes
airflow, luigi, dagster, appworx, are used to manage data and are typically
processes that run in minutes to hours. Orchestration engine = can support
millions of parallel workflows. This includes Uber Cadence & Netflix
Conductor, are used to manage services and are typically processes that run in
microseconds to minutes.

~~~
mfateev
Cadence does support processes that run unlimited time. We have workflows in
production that are always running. For example there are services at Uber
that have an always open Cadence workflow per rider.

~~~
iblaine
Very interesting. Can you give an example where you'd want to track a process
that takes days or months to execute?

~~~
mfateev
For example the Uber loyalty program needs to accumulate points (similar to
airline points). A customer workflow receives trip completion events and
updates the state accordingly. When a certain number of points is reached some
actions (mostly calls to downstream services) are executed.

~~~
chrischen
For this use case, could it be solved with an asynchronous task system such as
Celery? What advantages does Cadence offer for something like this, which
seems to be repurposing the scheduling system of Cadence to process
instantaneous events?

Also how does it listen for events? From polling?

~~~
mfateev
It could be solved by Celery, but it would also require a database and Celery
doesn't scale that well unless runs on top of Redis which is not really fault
tolerant. Also actions invocation with guarantees and exponential retries is
not trivial with Celery.

Actually at Uber large number of services are being migrated from
Python/Celery to Go/Cadence.

------
ajbosco
I thought they used Piper (based on Airflow) for workflows at Uber.
[https://eng.uber.com/managing-data-workflows-at-
scale/](https://eng.uber.com/managing-data-workflows-at-scale/)

~~~
thor24
That is for data workflows (your etl jobs basically). This is for services.

~~~
dgladkov
If you already have a service that requires ETL, Cadence might be a good
choice as well. You will need more boilerplate and setup compared to
Airflow/Piper, but you can share data structures between ETLs and services
that use it and have more control over execution and deployment as you own
your workers.

------
mleonard
Hi I watched the Cadence talks and read through the golang code a while back
and love what you're doing with cadence. Really glad to see you're moving from
Thrift/TChannel to protobuf/grpc - that was a blocker before.

If anyone could help me understand the following I'd appreciate it...

I understand that the event history is cached at worker nodes and the whole
history of events is only delivered to workers if needed (ie if out of cache)
and that normally cadence manages to deliver events for the same workflow to
the same worker.

My question relates to what exactly happens within a _single_ worker process.

On each new event, does the worker process loop through all events in the
history - starting from the beginning - in order to get back its internal
state, and _then_ process the new event, and then shutdown ready to repeat
this for the next event.

Or... does cadence keep the internal state around by keeping the goroutine
alive waiting on a channel midway through it's workflow logic and waiting for
the next event to continue execution.

Thanks

~~~
mfateev
It is the latter. The workflow state object is chached including the
goroutines the workflow code is blocked on. And the new events are applied to
the cached object.

------
timbray
If you want to do your workflow in your own procedural code, SWF (and
presumably Cadence) are good choices. If you want to use a dependency graph,
Airflow is for you (but I hear operating it is kind of tricky). If you like a
state-machine/flowchart kind of approach, AWS Step Functions.

AWS customers these days seem to mostly like Step Functions, although SWF
isn't going away, and lots of EC2 instances are running Airflow. Obviously,
some people want a managed service and others want OSS that they can control &
fine-tune. Nothing wrong with either choice.

Most of the engineering cycles these days are going into Step Functions, keep
an eye on that space.

~~~
timbray
(Oh, should disclose, I helped design & build Step Functions.)

~~~
mfateev
If only SWF was extended to run deciders on AWS Lamda. Without this the main
advantage of the Step Functions is hosting. I personally would rather see an
integrated system where the Step Functions are a natural extension of SWF not
a completely separate system.

This is the direction the Cadence is going. We are planning to add support for
integrating custom DSLs easily, but maintaining the core code based libraries.

BTW: If anyone is interested on running the Step Functions DSL on top of
Cadence contact the Cadence team. We could work together to get it
implemented.

------
onionking
Hello, I am very interested in knowing the open source version of SWF. I am
heavy SWF user and I watched all Cadence videos especially the architecture
one. I wonder what is the shard recovery mechanism for a single shard on one
host ? Let's say one host is down, how the shards on that host are recovered
on the next host ? I heard the presenter said consistent hashing or RingPop,
so I am thinking all shards should be migrated to the next available host or
hosts ?

------
jontro
How does this compare to netflix conductor? I've just started experimenting
with it and somehow I missed this during my evaluation

~~~
mfateev
I'm from Cadence team, so I'm obviously biased :).

Besides very different implementation backends the main difference is that
Conductor defines workflows through a JSON based DSL and Cadence defines
workflows as code. Because of that it is possible to extend Cadence to
interpret Conductor DSL, but the reverse is not possible.

I believe that any non trivial workflows that have some state management
requirements are more easily expressed as code. Any attempt to come up with
JSON or XML or YAML or whatever language for workflows is always be inferior
to existing programming languages like Go, Python or Java.

~~~
sandGorgon
How does this work ? Do you store the code for the workflow in the database ?

~~~
mfateev
Cadence is a service. The workflow and activity code lives outside of it.
Think about Cadence workflow and activity code the same way you think about a
queue consumer which is external to the queueing service.

~~~
sandGorgon
No - I'm wondering about the abstractions that allow for specific workflow
specifications to happen in code versus a DSL/JSON.

There are two extremes here - I can slap celery and run a bunch of custom code
as workflows. On the other hand, I can use a workflow system with a built in
DSL that abstracts some of the underlying behaviour out.

Cadence seems to fall in the middle - and I'm wondering how it works. Why
doesn't it generate to the same mess that celery+a bunch of custom python code
become ?

~~~
mfateev
Cadence is above :).

It allows:

Integrate any DSL without modiying the core service. Internally at Uber there
are at least half dozen DSLs running on top of it.

It allows to write code that hides all the complexity that leads to the mess
of queue + db implementations. The beginning of this talk explains the idea:
[https://youtu.be/BJwFxqdSx4Y](https://youtu.be/BJwFxqdSx4Y)

The gist of it is that you write just your business logic without thinking
about callbacks and storage.

I recommend looking at the Cadence samples to get the taste of it. Join the
Cadence Slack channel if you have any specific questions:
[https://join.slack.com/t/uber-
cadence/shared_invite/enQtNDcz...](https://join.slack.com/t/uber-
cadence/shared_invite/enQtNDczNTgxMjYxNDEzLTI5Yzc5ODYwMjg1ZmI3NmRmMTU1MjQ0YzQyZDc5NzMwMmM0NjkzNDE5MmM0NzU5YTlhMmI4NzIzMDhiNzFjMDM)

~~~
sandGorgon
Thanks for a detailed reply. Really appreciate it. We have an internal
workflow engine that's nice, but still has numerous quirks. We are trying to
learn better.

I'm wondering why you didn't create a DSL in the first place...if it's ending
up in DSL all over the place. To choose an example - why did you go the
kubernetes way with yaml . Sure it's verbose as hell...but there aren't
multiple forms of yaml that could bitrot.

Why do language primitives matter ?

Let me also ask you another related question - suppose you wanted to build a
GUI that builds workflows for your business teams..are you saying you generate
language code ? Or do you generate yaml/JSON and then interpret it using your
worker code ? Again - the same question: wouldn't it have been better to have
a uniform declarative JSON ?

~~~
mfateev
DSL is DOMAIN specific language. But it is common mistake to call GENERIC
workflow definition a DSL.

My opinion is that DOMAIN specific languages are awesome when they are used
for a specific narrow domain. For example AWS CloudFormation template is a DSL
for a cloud deployment. If This Then That, also known as IFTTT is another good
example of a narrow workflow definition.

At the same time a generic turning complete language in JSON/YAML/XML always
starts simple but ends up as a complete mess. See
[https://mikehadlow.blogspot.com/2012/05/configuration-
comple...](https://mikehadlow.blogspot.com/2012/05/configuration-complexity-
clock.html). Any programming language is much better for writing complex
programs. The problem is that most existing workflow/orchestration systems
force developers to use unnatural programming patterns and libraries to make
the code fault tolerant. Cadence is an attempt (pioneered by my team at AWS
Simple Workflow and later picked up by Azure Durable Functions) to implement
workflows as natural programs without much boilerplate concepts and code.

Think, why nobody tries to write complex backend programs in JSON? The reason
that programming languages have well defined ways to deal with complexity.
JSON based languages are good for limited domains, but anything complex makes
them unusable. I've seen it hundreds of times already when DSL is abused and
developers hate it.

So most of the Cadence workflows are written directly as Go/Java code. But
when DSL is appropriate it can be added trivially by interpreting it by the
worker code.

>Again - the same question: wouldn't it have been better to have a uniform
declarative JSON ?

Again, declarative JSON is declarative only in narrow domains. The generic
workflow definition language in JSON works only for very simple scenarios and
is harder to write and debug then Go/Java code.

------
chrischen
Is this like Celery + RabbitMQ but with a GUI? On a high-level it sounds like
that, but could someone be so kind as to give an example use case of this? How
is it different that it's not just described as a distributed task queue?

~~~
mfateev
The main difference is that workflow has state and tasks can be very long
running. Also workflow can react to external asynchronous events. Visibility
into overall progress is also a very important feature. When using a queueing
system it is hard to answer about the current state of the business process.

For example implementing service deployment to a public cloud usign Celery +
RabbitMQ is very non trivial and error prone. It is a pretty streightforward
Cadence workflow.

------
amelius
Cadence is also the name of an electronics design automation company.

[https://www.cadence.com/](https://www.cadence.com/)

------
cle
What does the workflow versioning story look like? This is one of the most
frustrating parts of SWF that Step Functions + Lambda has effectively solved
for us.

~~~
mfateev
Cadence doesn't require workflow and activity type registrations which
eliminates most of the problems with SWF versioning. Cadence supports
versioning of the workflow code out of the box. Any change has to be protected
with a version condition. It works even with shared libraries and very long
running workflows.

------
viswabharathi
Can somebody from Uber clarify? Uber could've used Cadence instead of Piper /
Airflow isn't it. Please correct me if I'm wrong?

~~~
mfateev
Yes, it is technically possible to use Cadence instead of Piper/Airflow. The
reason the Piper/Airflow is used is mostly historical. It was around for quite
a time and Cadence is a relatively new project.

------
vxa_victor
How does it compare to BPM solution like Camunda?

~~~
mfateev
Cadence allows to write workflows as code. Think of it as a virtual machine
for OO code that makes that code fully fault tolerant to process failures. As
it is code it can be used to implement any business logic. Camunda is BPMN
engine which interprets BPMN workflow definition. It is possible to implement
a Cadence workflow that interprets BPMN without changing the core Cadence
service.

~~~
_ph_
And we are talking about a software company and a software package. Sounds
like a strong overlap to me - or do you think you can call a new software
library "microsoft"?

------
dlphn___xyz
how does this compare to Luigi or Airflow?

~~~
mfateev
The biggest difference besides the programming model is scale. Luigi and
Airflow target data pipelines that don't require much scale. Cadence is built
to support business level transactions. It can handle tens of thousands of
events per second and hundreds of millions of open workflows. Obviously it is
also a good fit for low scale use cases. Cadence is also so generic that it
can be used to implement practically any workflow definition language. For
example it is possible to create an extension to run Airflow pipelines on
Cadence.

I'm from the Cadence team.

~~~
davis_m
> Obviously it is also a good fit for low scale use cases.

Is this a given? Just because something is necessary at scale doesn't mean it
is a good fit for low scale use cases. I would expect the opposite is actually
true.

~~~
mfateev
We have both very low scale use cases as a single distributed cron as well as
very high scale use cases in production at Uber.

------
mshockwave
just a random comment: I thought Cadence, the EDA company, owns the trademark
for the name

~~~
jacques_chester
Trademarks are partitioned by subject matter:
[https://www.uspto.gov/web/patents/classification/selectnumwi...](https://www.uspto.gov/web/patents/classification/selectnumwithtitle.htm)

------
bhouston
How does this compare to Argo?

~~~
mfateev
Argo workflows are DAGs written in YAML. The types of workfows you can create
using this syntax is very limited. Cadence gives you full power of a
programming language like Java or Go to implement workflow logic. It is
possible to implement support for argo DSL on top of Cadence. The reverse is
not possible.

Cadence is also more scalable supporting tens of thousands events per second
and hundreds of millions of open workflows.

------
hcnews
Does Cadence work for low latency scenarios? Ex. web serving?

~~~
kbuckner
We use Cadence to run low-latency workflows for routing customer support
tickets. It handles our use case very well. [https://eng.uber.com/customer-
obsession-ticket-routing-workf...](https://eng.uber.com/customer-obsession-
ticket-routing-workflow-and-orchestration-engine/)

------
mleonard
The way I think about workflow engines is as follows. Please comment and
correct me. Keen to discuss.

Workflow engines like Cadence essentially work by letting you write regular-
looking procedural logic for your workflow. This looks and feels very much
like writing an async function with async-await in javascript or C#.

The state in your workflow is then implicit in your code instead of explicit.
Here's what I mean by that:

Usually you would _explicity_ serialise your state between each incoming event
and do an atomic-compare-and-set operation on an external database to store
the new state.

For a new incoming event: (1) fetch the state, (2) marshal into an object in
your programming language (ie a java class or golang struct), (3) given the
current state, process the event and perform any external actions like sending
an email. These external events need to be ok with being done with at-least-
once semantics. Update the state object ready for the next incoming event. (4)
serialise the state, (5) store the state in the database (atomically update
with a compare-and-set operation). (6) Repeat on each event. Do everything
with at-least-once repeat-on-failure semantics.

In a workflow engine like cadence, what is persisted to the database is the
entire history of events instead of a single state object as described above.

In Cadence the code you write looks very much like async-await style code in
languages like javascript or C#. The workflow logic is in some sense an async
function that pauses at await statements and picks up again where it paused
when the next event comes in.

Remember that cadence stores the entire history of events for a workflow. It
does this so that it can rerun the workflow from the beginning, this time with
the new incoming event on the end of the history.

Notice that you need to be careful about your workflow being deterministic.

Optimisations: (1) it knows when it is replaying already-seen-events and
doesn't redo external events such as sending emails. (2) it tries to resend
events to the same worker node each time. It caches events at worker nodes.
(3) at the macro level everything is highly-available and repeat-on-failure-
with-backoff to ensure progress and at-least-once-semantics. (4) it supports
repeating workflows and child workflows (5) monitoring, tracing, other things
you'd expect (6) etc

Importantly: notice that there is still in some sense a single state object.
The state is just implicit: it is deep down in the internal state machine of
the language you wrote the workflow function in (java, golang). Instead of
serialising state such as 'time-since-last-email' in a state object to a
database... you have 'time-since-last-email' as a local variable in the scope
of the workflow function. Similarly your programming language is tracking the
call stack and current execution position of the function... normally you'd
keep track of progress through the workflow in the state object and condition
on this state when receiving a new incoming event.

Thinking about state as explicit (state object approach) versus implicit
(replay-history approach) helps me when thinking about cadence and similar
workflow engines.

....................................................... Thanks for reading so
far. I'd love to hear from users of cadence at uber or elsewhere:

(1) why do you choose to write workflows with implicit state (by replaying
history) instead of storing the state explicitly as a serialised state object
in the database?

My guess: developer productivity of writing and maintaining the workflows.
Having a common approach and single observable system for many different
workflows.

(2) how do you reason about long-running workflows where the business logic
needs to be updated? Would this not be much easier if the state object (say a
serialised protobuf) was stored explicitly in the database?

(3) wouldn't non-determinism be much easier as well if you stored the state
explicitly?

~~~
mfateev
(1) It simplifies the programming model. There is no way to serialize a state
of the call stack through a library in most programming languages. You
mentioned that Cadence is similar to C# wait/async. The SWF Flow library is.
But the Cadence workflow code is fully synchronous, not requiring callbacks
unless needed by the business logic. Applying new events to the cached workfow
is also more efficient for large states.

(2) It depends. Nothing prevents a workflow writer from checkpointing the
state explicitly (by calling continue as new). Infinitely running workflows do
it periodically. But having the events history is awesome for rollbacks. For
example in Cadence it is possible to rollback a bad change and automatically
rollback the state of all your workflows to the good state. In database world
a change that corrupted the state is much harder to deal with.

(3) The experience shows that the determinism requirement while requires some
learning is not that hard to deal with. But the superior programming model it
allows is liked by users.

