I can't speak to their solution, but when solving an equivalent problem within Gazette, where you desire a distributed transaction that includes both a) published downstream messages, and b) state mutations in a DB, the solution is to 1) write downstream messages marked as pending a future ACK, and 2) encode the ACK you _intend_ to write into the checkpoint itself.
Commit the checkpoint alongside state mutations in a single store transaction. Only then do you publish ACKs to all of the downstream streams.
Of course, you can fail immediately after commit but before you get around to publishing all of those ACKS. So, on recovery, the first thing a task assignment does is publish (or re-publish) the ACKs encoded in the recovered checkpoint. This will either 1) provide a first notification that a commit occurred, or 2) be an effective no-op because the ACK was already observed, or 3) roll-back pending messages of a partial, failed transaction.
Commit the checkpoint alongside state mutations in a single store transaction. Only then do you publish ACKs to all of the downstream streams.
Of course, you can fail immediately after commit but before you get around to publishing all of those ACKS. So, on recovery, the first thing a task assignment does is publish (or re-publish) the ACKs encoded in the recovered checkpoint. This will either 1) provide a first notification that a commit occurred, or 2) be an effective no-op because the ACK was already observed, or 3) roll-back pending messages of a partial, failed transaction.
More details: https://gazette.readthedocs.io/en/latest/architecture-exactl...