Conductor's JSON DSL was a bit of a nightmare to work with in my opinion. But otherwise, it did the job OK-ish. Felt more akin to Step Functions.
Arguably, Zeebe was the easiest to get started with once you get past the initial hurdle of BPMN. Their model of job processing is very simple, and because of that, it is very easy to write an SDK for it in any language. The biggest downside is that it is far from production ready, and there are ongoing complaints in their Slack about its lack of stability, and relatively poor performance. Zeebe does not require an external storage since workflows are transient, and replicated using their own RocksDB, and Raft set-up. You need to export, and index the workflows if you want to keep a history of it or even if you want to manage them. It is very eventually consistent.
With both Conductor, and Zeebe however, if you have a complex enough online workflow, it starts getting very difficult to model them in their respective DSLs. Especially if you have a dynamic workflow. And that complexity can translate to bugs at an orchestration level which you do not catch unless running the different scenarios.
Cadence (Temporal) handles this very well. You essentially write the workflow in the programming language itself, with appropriate wrappers / decorators, and helpers. There is no need to learn a new DSL per se. But, as a result, building an SDK for it in a specific programming language is a non-trivial exercise, and currently, the stable implementations are in Java, and Go. Performance, and reliability wise, it is great (relies on Cassandra, but there are SQL adapters, though, not mature yet).
We have somewhat settled on Temporal now having worked with the other two for quite some time. We also explored Lyft's Flyte, but it seemed more appropriate for data engineering, and offline processing.
As it is mentioned elsewhere here, we also use Argo, but I do not think it falls in the same space as these workflow engines I have mentioned (which can handle the orchestration of complex business logic a lot better rather than simple pipelines for like CI / CD or ETL).
Also worth mentioning is that we went with a workflow engine to reduce the boilerplate, and time / effort needed to write orchestration logic / glue code. You do this in lots of projects without knowing. We definitely feel like we have succeeded in that goal. And I feel this is an exciting space.
The concept of having business users able to review (or even, holy grail, edit/author) workflows was one of the potentially appealing aspects of the BPMN products; did you get a signal on whether there were any benefits? "the initial hurdle of BPMN" sounds like maybe this isn't as good as it seems on the face of it?
Also, how do you go about testing long-lived workflows? Do any of these orchestrators have tools/environments that help with system-testing (or even just doing isolated simutions on) your flows? I've not found anything off-the-shelf for this yet.
1. It was good for communicating the engine room
I remember demo'ing the workflows within my team, and to non-technical stakeholders. It was very easy to demonstrate what was happening, and to provide a live view into the state of things. From there, it was easy to get conversations going, e.g. about how certain business processes can be extended for more complex use-cases.
2. It empowered others to communicate their intent
Zeebe comes with a modeller which is simple enough even for non-technical users to stitch together a rough workflow. The problem is, the end-result often requires a lot of changes to be production-ready. But I have found that this still helps communicate ideas, and intent.
You do not really need BPMN for this, but if this becomes the standard practice, now you have a way of talking on the same wavelength. In my case, we were productionising ML pipelines so data scientists who were not incredibly attuned to data engineering practices, and limitations, were slowly able to open up to them. And as a data engineer, it became clearer what the requirements were.
On the point about testing, the test framework in Zeebe is still a bit immature. There is quite a few tooling / libraries in Java, but not really in other languages. The way we approached it was lots of semi-auto / manual QA, and fixing live in production (Zeebe provides several mechanisms for essentially rescuing broken workflows).
The testing in Cadence / Temporal is definitely more mature. But you do not have the same level of simplicity as Zeebe. That said, the way I like to see it / compare them, you could build something like Zeebe or even Conductor on Cadence / Temporal, but not vice versa.